Upload
leslie-curtis
View
222
Download
3
Tags:
Embed Size (px)
Citation preview
Detecting software clones in binariesDetecting software clones in binaries
Zaharije Radivojević, Saša Stojanović, Miloš CvetanovićSchool of Electrical Engineering, Belgrade University
14th Workshop “Software Engineering Education and Reverse Engineering”
Sinaia, Romania24-30 August 2014
14th Workshop SEE and RE 2/16
AgendaAgenda
• Clone detection• Binary code clones• Metrics approach• Conclusions
14th Workshop SEE and RE 3/16
Motivation (1)Motivation (1)
• A motivating scenario is to find the reuse of a software library in a source code without an appropriate permission from the owner of the library.
14th Workshop SEE and RE 4/16
Code clonesCode clones
• Type-1: Identical code (ignoring formatting)
• Type-2: Syntactically identical fragments (ignoring naming and formatting)
• Type-3: Copied fragments with further modifications (ignoring some statements, naming and formatting)
• Type-4: Two or more code fragments that perform the same computation
14th Workshop SEE and RE 5/16
Existing toolsExisting tools
SimCad CCFinder Deckard ACD Moss
Supported languages
C, C#, Java, Py C/C++, C#, Cobol, Java, VB, Text
C, Java, Php C/C++ C/C++, C#, Cobol, Java, VB, MIPS, Text…
Language in experiment
C C C C ASM
Comparison level block, procedure
file file file file
Clone detection technique
text based token based AST based text based (ASM generated from C)
text based
Types of detected clones
1, 2, and 3 1, 2, and 3 1, 2, and 3 1, 2, and 3 1, 2, and 3
Source code required not available for commercial product
14th Workshop SEE and RE 6/16
Motivation (2)Motivation (2)
• A motivating scenario is to find the reuse of a software library in a commercial product binary without an appropriate permission from the owner of the library.
Source code transformed by compiler (what compiler?)
ARM architecture
14th Workshop SEE and RE 7/16
ApproachApproach
14th Workshop SEE and RE 8/16
ApproachApproach
14th Workshop SEE and RE 9/16
MetricsMetrics
14th Workshop SEE and RE 10/16
Filters/FormulasFilters/Formulas
Filters:- No filtering- Adaptive filtering(based on previous knowledge)- Interval filtering
Formulas:- Arithmetic mean- Geometric mean- Harmonic mean- Weighted functions(based on previous knowledge)
14th Workshop SEE and RE 11/16
Results (STAMP + Busy Box)Results (STAMP + Busy Box)
14th Workshop SEE and RE 12/16
Results (STAMP + Busy Box)Results (STAMP + Busy Box)
Support Vector Machines and K-Nearest neighbors had much lower results!
14th Workshop SEE and RE 13/16
Results (STAMP + Busy Box)Results (STAMP + Busy Box)
• Configurations with newly introduced metrics achieves up to 1.44 times better recall than configurations that use only metrics from the high level languages.
• Comparison of the proposed approach with some clone detection tools shows that it achieves a higher recall for an acceptable level of precision.
• Observing only the first position, for the real world example, the proposed approach achieves recall of 43% and precision of 43% (Busy Box).
14th Workshop SEE and RE 14/16
ConclusionConclusion
14th Workshop SEE and RE 15/16
Motivation (3) - finalMotivation (3) - final
• A motivating scenario is to find the use of apatent in a commercial product binary without an appropriate permission from the owner of the patent.
Thank you!Thank you!
Radivojevic Zaharije