Upload
vandieu
View
215
Download
2
Embed Size (px)
Citation preview
2
whoami
o Security Research at ERNW Research GmbH from Heidelberg, Germany
o Organizer of the Wizards of Dos CTF team from Darmstadt, Germany
o Reach me via:o Twitter: @0x464D
o Email: [email protected]
3
Who we are
o Germany-based ERNW GmbHo Independent
o Deep technical knowledge
o Structured (assessment) approach
o Business reasonable recommendations
o We understand corporate
o Blog: www.insinuator.net
o Conference: www.troopers.de
44
Agenda
I. General Intro to Program Analysis
II. Concepts and Analysis Techniques
III. Tools and Frameworks
IV. Applications and Examples
V. Takeaway/Recap
5
What is Automated Binary Analysis?
6
Binary Analysis performed by algorithms
What is Automated Binary Analysis?
7
Wait, isn’t that impossible?
o It’s impossible to generally tell if a program halts for a given input (Halting Problem)
o Also Rice’s Theorem
o Also, what exactly are we even looking for?
o Crashes?
o Memory Corruptions?
o Logic Errors?
8
Bit of History
o The ideas themselves are 40 years oldo Robert S. Boyer and Bernard Elspas and Karl N.
Levitt, SELECT--a formal system for testing and debugging programs by symbolic execution, 1975
o Analysis is resource intensiveo Cray-1 supercomputer from 1975 had 80MFLOPS
(8MB of RAM)
o iPhone 5s from 2013 produces about 76.8 GFLOPS (1GB of RAM)
o Advances in SMT solving around 2005 made it feasible
9
DARPA CGC
o Task: Develop a “Cyber Reasoning System”
o Big push in moving the ideas from academia to practicability
o Qualification Prize: $750,000
o Final Prizes:
1. $2,000,000
2. $1,000,000
3. $750,000
10
DARPA CGC
o CRS needs to:o Find vulnerabilitieso Patch themo Partially exploit them
o Ran on 64 Nodes, each:o 2x Intel Xeon Processor E5-2670 v2 (25M cache,
2.50GHz) (20 physical cores per machine)o 256GB Memory
o 7 finalists, winner competed at DEFCON CTFo It was better than some of the human teams
some of the time
1111
Agenda
I. General Intro to Program Analysis
II. Concepts and Analysis Techniques
III. Tools and Frameworks
IV. Applications and Examples
V. Takeaway/Recap
12
Intermediate Representations
o Every architecture is differento What do we actually analyze?
o Abstract/Common Representation
o Typical case of too may standardso VEX IR (used by Valgrind and angr)o Binary Ninja IRo LLVM IRo So many more
o Different IRs for different purposeso Easy to read (as a human)o Easy to analyzeo Easy to optimize
13
CFG Recovery
o Recursively build a graph with jumps as edges and basic blocks as nodes
o Easy with calls and direct jumps
o But what about “jmp eax”?
o Jump table
o Callbacks/Higher Order Functions
o Functions of Objects in OOP
o “Graph-based vulnerability discovery”
14
Data-Flow Analysis/Taint Analysis
o Track where data ends up
o Data dependencies
o Discover functions that handle user input
o Discover possible data exfiltration
15
Slicing
o Reducing the program statements to those dealing/changing with a specific variable from a specific point
o Backward: All statements before the point
o Forward: All statements after
o Static or dynamic
o Static looks at all statements
o Dynamic looks at statements in the execution trace
16
Slicing Exampleint i;
int sum = 0;
int product = 1;
for(i = 1; i < N; ++i) {
sum = sum + i;
product = product * i;
}
write(sum); //Slicing criterion
write(product);
int i;
int sum = 0;
for(i = 1; i < N; ++i) {
sum = sum + i;
}
write(sum); //May or may not
be included
17
Symbolic Execution
o Symbolic instead of concrete variables
o Following example is from a presentation about angr
18
19
20
21
22
23
SMT/Constraint Solving
o Abstract your problem, model it, solve it
o Can be as simple as in previous slide
o Can be significantly more complicated
o Dedicated tools available
o Complicated theory involved
2424
Agenda
I. General Intro to Program Analysis
II. Concepts and Analysis Techniques
III. Tools and Frameworks
IV. Applications and Examples
V. Takeaway/Recap
25
Z3 Theorem Prover
o Developed by Microsoft Researcho Effort by Microsoft to formally verify some parts of
their products with ito Windows Kernelo Hyper-V
o MIT Licenseo Provides a SMT Solver and theories for
o Bitvectorso Stringso Arrayso etc
o Seems to be the standard for program analysis
26
angr
o Most beginner friendly of all tools
o Written in Python
o Good Documentation
o Plenty of available research
o Used by Mechaphish ( 3rd at Darpa’s CGC)
o Developed at University of California, Santa Barbara
27
Triton
o x86 and x86_64 only
o Designed as a library (LibTriton.so)
o Should be easier to integrate into C Projects
o Has python bindings
o Not focused on automating but assisting
o Sponsored by Quarkslab
28
Others
o Bitblaze
o bap: Binary Analysis Platformo Carnegie Mellon University/ForAllSecure
o Written in OCaml
o Used by Mayhem (1st Place at Darpa’s CGC)
o Miasm
o S2E(2)
o Microsoft Sage
o Manticore by Trail of Bits
2929
Agenda
I. General Intro to Program Analysis
II. Concepts and Analysis Techniques
III. Tools and Frameworks
IV. Applications and Examples
V. Takeaway/Recap
30
What to do with all this?
o These techniques don’t scale
o State Explosion
o Constraint solving is generally NP-Complete
o Combine with something smart but slow: a human
o Combine with something dumb but fast: a fuzzer
Source:
https://twitter.com/matalaz/status/580600098
092105728
31
Augmented Fuzzing
o Usual fuzzing to detect likely code paths
o Taint Analysis to discover what branch depends on what input to fuzz that
o Symbolic Execution with constraint solver to build input to take a specific new branch
32
Installing Angr
o Libraries modified by angr don’t mix well with their originals
o stripped down z3
o forked libvex
o etc.
o Run it isolated and officially supported
o Python2 virtualenv (pip install angr)
o Official Docker Container (docker pull angr/angr)
33
void xmalloc(unsigned_int sz)
if(sz > PAGE_SIZE) {
s = ((sz+PAGE_SIZE+1) & ~(PAGE_SIZE-1));
}
else {
s = sz;
}
return malloc(s)
o sz is the parameter supplied to xmalloc()
o s is the parameter for malloc()
o PAGE_SIZE is 0x1000 or 4096
o Assumption: Allocating a buffer of certain size via xmalloc() and getting a smaller buffer from malloc() is bad
SMT Solving Example
34
void xmalloc(unsigned_int sz)
if(sz > PAGE_SIZE) {
s = ((sz+PAGE_SIZE+1) & ~(PAGE_SIZE-1));
}
else {
s = sz;
}
return malloc(s)
1. Get the function into Python
2. Formulate the problem
3. Let the solver do its magic
How to solve this?
35
o Foreign Function Interface
o See next Slide
o Handwrite it in python
o If it’s just some simple logic it’s often copy-paste able from source or decompiler
o Just remember that your variables are bit vectors
Get the function into Python
36
o Automagically import binary functions
o angr detects calling convention
o Maps python types to binary representation
o Call them from python
o With concrete values
o With symbolic value
Foreign Function Interface
>>> import angr
>>>b=angr.Project(‘/path/binary’)
>>>f = b.factory.callable(address)
>>>f?
Type: Callable
[…]
Callable is a representation of a function in the binary that can be
interacted with like a native python function.
[…]
37
DEMO: Finding Bugs with Math
38
DEMO: Finding Bugs with Math
def f(x):
return (x+PAGE_SIZE+1) & ~(PAGE_SIZE-1)
solver = claripy.Solver()
sz = claripy.BVS('sz', WORD_SIZE)
constraints = [ sz > PAGE_SIZE,
f(sz) < sz,
f(sz) != 0 ]
solver.add(constraints)
return solver.eval(sz, 3)
39
o Get an input so that a function returns a certain value
o Function can be from Python or from binary(see FFI)
o f(x, y) -> (x*3 >> 1) * y
o f(?,0x42) - > 0x76E24
Inversing Functions
40
o Get an input so that a function returns a certain value
o Function can be from Python or from binary(see FFI)
o f(x, y) -> (x*3 >> 1) * y
o f(?,0x42) - > 0x76E24
Inversing Functions
DEMO
41
o Get an input so that a function returns a certain value
o Function can be from Python or from binary(see FFI)
o f(x, y) -> (x*3 >> 1) * y
o f(?,0x42) - > 0x76E24
o We got two possible solutions
o 0x1337 (intended)
o 0x555555555555688c which returns 0x76e24
-> Integer Overflow
Inversing Functions
42
o Binary that takes some user input
o stdin
o argv
o Some file
o Checks it against constraints
o Determines if it’s valid
Automagical Solving of Crackmes
43
o Binary that takes some user input
o stdin
o argv
o Some file
o Checks it against constraints
o Determines if it’s valid
Automagical Solving of Crackmes
DEMO
44
o Binary that takes some user input
o stdin
o argv
o Some file
o Checks it against constraints
o Determines if it’s valid
o We just declare that input as symbolic
o Choose a starting point and explore the possible paths from there
o Solve for an input that brings us down the wanted path that’s the solution
Automagical Solving of Crackmes
45
o Breakpoints with callbacks before or after:o Instructions or address
o Memory read/write
o Register read/write
o Many others
o Hookso Optimized libc functions
o Own Python Code
>>> import angr, simuvex
>>>b=angr.Project(‘/path/binary’)
>>> s = b.factory.entry_state()
>>> def debug_func(state):
... print 'Read', state.inspect.mem_read_expr, 'from', state.inspect.mem_read_address
>>> s.inspect.b('mem_read', when=simuvex.BP_AFTER, action=debug_func)
Debugging capabilities
46
Anti-Anti-Debugging
o angr is not a debugger
o Some anti debug tricks wont work
o Others accidentally break angr in other ways
o Simuvex or Unicorn can be used as an emulator
o Breakpoints without the program noticing
o Invisible Hooks
o Overall it needs a different approach
4747
Agenda
I. General Intro to Program Analysis
II. Concepts and Analysis Techniques
III. Tools and Frameworks
IV. Applications and Examples
V. Takeaway/Recap
48
Takeaway
o Reverse Engineering is already regarded as some arcane art
o Adding tons of complicated math seems to make it true black magic
o BUT it actually makes many things easier if you know which parts to treat as magic and which to understand
o You don’t need to understand how Z3 works
49
Interested in the theory and want
to learn more?
o Extensive List of Materials http://www.msreverseengineering.com/program-analysis-reading-list/ for self learning
o If possible check your uni for lectures on the topics in the above list
o Some aspects are part of the “Computer Science” curriculum
o Other aspects are typical math topics
50
Want to contribute without getting
too deep into the theory?
o Plenty of work that would be cool to have but no one is doing yeto Proper UIs to visualize the concepts (angr-
management needs more work)
o https://github.com/angr/angr-doc/blob/master/HELPWANTED.md
o Documentation could be bettero Triton lacks integrated python doc like angr
o Building tools can be annoyingo Package it for your favorite distro
51
www.ernw.de
www.insinuator.net
Thank you for your attention
Any Questions?
Florian Magin @0x464D
52
References & Literature
o “SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis”
o https://docs.angr.io/
53
Side Note: LLVM Compiler
Infrastructure
o Own IR (LLVM IR)
o Own symbolic execution engine (KLEE)
o Own constraint solver (Kleaver)
54
Value-Set Analysis
o Approximate program states
o Values in memory or registers
o Reconstruct buffers
o Can be enough to detect buffer overflows
o Helps with complex CFG recovery