Upload
alfred-douglas
View
216
Download
2
Embed Size (px)
DESCRIPTION
6 September 2005Nicholas Kidd - String Analysis for Binaries - PASTE Why Do We Need String Analysis? We could just use the strings program: $ strings no_msg /lib/ld-linux.so.2 libc.so.6... no msg This food has %s. $
Citation preview
String Analysis for BinariesString Analysis for Binaries
Mihai ChristodorescuNicholas KiddNicholas KiddWen-Han Goh
University of Wisconsin, Madison
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 2
What is What is String AnalysisString Analysis??• Recovery of values a string variable
might take at a given program point.void main( void ){char * msg = “no msg”;printf( “This food has %s.\n”, msg );
}
Output: This food has no msg.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 3
Why Do We Need String Why Do We Need String Analysis?Analysis?
• We could just use the strings program:$ strings no_msg/lib/ld-linux.so.2libc.so.6......no msgThis food has %s.$
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 4
Why Perform String Analysis?Why Perform String Analysis?• Computer forensics
Given an unknown program, we want to know the files it might access,the registry keys it might get and set,the commands it might execute.
• Program verificationSQL queries, embedded scripting, …
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 5
A Complicated ExampleA Complicated Examplevoid main( void ){char buf[257];strcpy( buf, “/” );strcat( buf, “b” );strcat( buf, “i” );strcat( buf, “n” );...system( buf );
}
Running strings:/abcd...
Running a string analysis:/bin/ifconfig –a |
/bin/mail ...@...
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 6
Our ContributionsOur Contributions• Developed a string analysis for
binaries.
• Implemented x86sa, a string analyzer for Intel IA-32 binaries.
• Evaluated on both benign and malicious binaries.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 7
OutlineOutline String analysis for Java.
String analysis for x86.
Evaluation.
Applications & future work.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 8
String Analysis for JavaString Analysis for Java1.Create string flowgraph.void main( void ){
String x = “/”;x = x + “b”;x = x + “i”;x = x + “n”;...System.exec( x );
}
Christensen, Møller, Schwartzbach "Precise Analysis of String Expressions" (SAS’03)
“/” “b”
concat “i”
concat...
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 9
String Analysis for Java [2]String Analysis for Java [2]2. Create context-free grammar.3. Approximate with finite automaton.
“/” “b”
concat “i”
concat...
A1 “/” “b”A2 A1 “i”A3 A2 “n”A4 A3 “/”…
“/bin/ifconfig ...”
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 10
From Java to x86 executablesFrom Java to x86 executables
Flowgraph CFG FSAJava.class
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 11
From Java to x86 executablesFrom Java to x86 executables
Rest of this talk:Bridge the syntactic and semantic gaps between Java and assembly language.
x86executable Flowgraph CFG FSA
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 12
OutlineOutline String analysis for Java.
String analysis for x86.
Evaluation.
Applications & future work.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 13
Four Problems with AssemblyFour Problems with Assembly1. No types.
2. No high-level constructs.
3. No argument passing convention.
4. No Java string semantics.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 14
Problem 1: Problem 1: No TypesNo Types• Solution: infer types from C lib. funcs.
Assumption #1:Strings are manipulated only using string library functions.
char * strcat( char * dest, char * src )• After: “eax” points to a string.• Before: “dest” and “src” point to a
string.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 15
Problem 1: Problem 1: No Types [cont.]No Types [cont.]• Perform a backwards analysis to find
the strings:– Destination registers “kill” string type
information.– Libc string functions “gen” string type
information.– Strings at entry to CFG are constant
strings or function parameters.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 16
Problem 1: Problem 1: No Types No Types [example][example]
String variables:– after the call: { eax }– before the call: { ebx, ecx }
eax = _strcat( ebx, ecx );
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 17
Problem 2: Problem 2: Function Function ParametersParameters
• Function parameters are not explicit in x86 machine code.
mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 18
Problem 2: Problem 2: Fn. Params Fn. Params [example][example]
mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8
Stack pointer push ecx
push ebx
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 19
mov ecx, [ebp+var1]push ecxmov ebx, [ebp+var2]push ebxcall _strcatadd esp, 8
Problem 2: Problem 2: Function Function ParametersParameters
• Solution: Perform forwards analysis modeling x86 instructions effects on the stack.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 20
Problem 3: Problem 3: Unmodeled Unmodeled FunctionsFunctions
• String type information and stack model may be incorrect!
Assumption #2:“_cdecl” calling convention and well behaved functions
• Treat all function arguments and return values as strings.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 21
Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics
• Java strings are immutable,x86 strings are not.
String y, x=“x”;y = x;y = y + “123”;System.out.println(x);
char *y;char x[10] = “x”;y = x;y = strcat(y,”123”);printf(x);
=> “x” => “x123”
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 22
Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics
• Solution: May-Must alias analysis.
0x200: _strcat( ebx,ecx )
eax
ebx
esi
200
eax
ebx
esi
“May alias” relations:
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 23
Problem 4: Problem 4: Java vs. x86 Java vs. x86 SemanticsSemantics
• Solution: May-Must alias analysis.
0x200: _strcat( ebx,ecx )
eax
ebx
esi
200
eax
ebx
esi
“Must alias” relations:
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 24
Stack HeightTransformers
String TypeTransformersAlias AnalysisTransformers
UTF-8 func’s
Stack HeightTransformers
String TypeTransformersAlias AnalysisTransformers
libc char* func’s
Stack HeightTransformers
String TypeTransformersAlias AnalysisTransformers
Unicode func’s
x86sax86sa Architecture Architecture
EXE
IDA Pro Connector WPDS++
JSA
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 25
IntraIntraprocedural Analysis procedural Analysis SummarySummary
1. Recover callsite arguments.(stack-operation modeling)
2. Infer string types.(backward type analysis)
3. Discover aliases.(may-, must-alias forward analysis)
Generate the String Flow Graph for the Control Flow Graph.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 26
OutlineOutline String analysis for Java.
String analysis for x86.
Evaluation.
Applications & future work.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 27
Example 1: Example 1: simplesimplechar * s1 = "Argc has ";char * s2;char * s3 = " arguments";char * s4;switch( argc ) { case 1: s2 = "1“ ; break; case 2: s2 = "2“ ; break; default: s2 = "> 2"; break;}s4 = malloc( strlen(s1)+strlen(s2)+strlen(s3)+1 );s4[0] = 0;strcat( strcat( strcat( s4, s1 ), s2), s3 );printf( "%s\n", s4 );
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 28
Example 1: String Flow GraphExample 1: String Flow Graph“Argc has”
“ arguments”
“1”
concat
concat
concat
malloc “2” “> 2”
assign
Our result:"Argc has 1 arguments""Argc has 2 arguments""Argc has > 2 arguments"
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 29
Example 2: Example 2: cstarcstarchar * c = “c";char * s4 = malloc(101);for( int i=0; i < 100 ; i++ ) { strcat( s4, c );}printf( "%s\n", s4 );
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 30
Example 2: String Flow GraphExample 2: String Flow Graph
“c”
concat
malloc
assign
Our result: c*Correct answer: c100
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 31
Example 3: Example 3: Lion WormLion Worm• Code and String Flow Graph omitted.
• x86sa analysis results:"/sbin/ifconfig -a|/bin/mail [email protected]"
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 32
Future Work:Future Work:InterInterprocedural Analysisprocedural Analysis
1. Inline everything and apply intra-procedural analysis.
2. “Hook” intraprocedural String Flow Graphs into a “Super String Flow Graph”.
3. Polyvariant analysis over function summaries for String Flow Graphs.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 33
Future Work:Future Work:Relax AssumptionsRelax Assumptions
• Relax the assumptions:– Strings can be manipulated in many ways.– Calling conventions can vary in a
program.
• Value Set Analysis looks promising:– Identifies “variables” based on usage
patterns.
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 34
Future Work:Future Work:More ApplicationsMore Applications
• Malicious code analysis
• Analysis of dynamic code generators:– Packed programs– Shell code generators
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 35
What to Take Away:What to Take Away:• If you have type information, AND
arguments to calls, ANDno aliasing
then string analysis is straightforward.
• String analysis on binaries performs well and provides useful results.
String Analysis for BinariesString Analysis for Binaries
Mihai ChristodorescuNicholas KiddNicholas KiddWen-Han Goh
University of Wisconsin, Madison{ mihai, kiddkidd, wen-han }@cs.wisc.edu
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 37
From Java to x86 BinariesFrom Java to x86 Binaries• Input: x86 binary file
• Java string analyzer input: Flowgraph• Output: finite automaton
?
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 38
Problem 4: Java vs. x86 Problem 4: Java vs. x86 SemanticsSemantics
• Solution: Alias analysis.0x100: mov eax,ecx
• λS.let A = aliases(ecx)S – {(eax,*)} ({eax} A)
0x200: _strcat( ebx, ecx )• λS. let A = alias(ebx)
let N = vars(A) (S (N {200})) – {(ebx,*),{(eax,*)}
{(ebx,200),(eax,200)}
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 39
Example 1: x86sa Analysis Example 1: x86sa Analysis ResultsResults
1. "Argc has 1 arguments"2. "Argc has 2 arguments"3. "Argc has > 2 arguments"
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 40
cstar’s x86sa analysis resultscstar’s x86sa analysis results• Infinitely many strings with common
prefix: “c”
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 41
Transformers – String Transformers – String InferenceInference
• 0x200: _strcat( ebx, ecx )• String inference
λd.{ d – {eax} {ebx,ecx} }
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 42
Transformers – Alias AnalysisTransformers – Alias Analysis• 0x200: _strcat( ebx, ecx )
• ebx mustlet T = aliasmust(ebx)let V = aliasmay(ebx)λd. { must(d) {(ebx, *), (eax, *)}
{(ebx, 200), (eax, 200)} -aT{(a,*)} aT{(a,200)},
(may(d) {(ebx, *), (eax, *)} aV{(a,200)})
6 September 2005 Nicholas Kidd - String Analysis for Binaries - PASTE 2005 43
Transformers – Alias AnalysisTransformers – Alias Analysis• 0x200: _strcat( ebx, ecx )
• ebx maylet T = aliasmust(ebx)let V = aliasmay(ebx)λd. { must(d) {(eax, *)} -aT{(a,*)}
{(ebx, 200), (eax, 200)}, {may(d) {(ebx, *), (eax, *)}
aT{(a,200),(a,old(a)} aV{(a,200)}}