Upload
raghu-palakodety
View
43
Download
3
Embed Size (px)
Citation preview
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Methods of binary analysis and Valgrind toolMA-INF 3318 - Seminar Verification of Complex Systems
Raghunandan PalakodetySupervised by: Dr. Michael Gerz
Universitat Bonn
12-01-2015
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Outline
1 Program analysis
2 Sample (Bad) Code
3 Drawbacks
4 Advanced Valgrind usageOther Tools in ValgrindFine Tuning and Client Requests
5 How does Valgrind do that?The Core - All Valgrind Tools have in commonMemcheck - Behind the Scenes
6 Q&A
7 Thanks
8 References
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Introduction
What is Program analysis?
Program analysis is the process of automatically deriving propertiesabout the behavior of computer programs.
Dynamic program analysis Analysis is performed by executingthe program on chosen inputs. Traces of the actual executions arecollected and processed. Properties about the program behaviourare deduced based on the analysis of these concrete executions.[2]Static program analysis Analysis is performed without actuallyexecuting the program. An abstract model of the program is issuedand symbolically executed. Properties about program behavior arededuced from the analysis of these symbolic executions. [2]
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Sample (Bad) Code
1 We will use the following code for demonstration purposes:
1 #def ine SIZE 1002 i n t main ( ) {3 i n t i , sum = 0 ;4 i n t ∗a = ma l l o c ( SIZE ) ;5 f o r ( i =0; i < SIZE ; ++i ) sum += a [ i ] ;6 a [ 1 0 1 ] = 1 ;7 f r e e ( a ) ;8 a = NULL ;9 i f ( sum > 0) p r i n t f ( ”Hi !\ n” ) ;
10 re tu rn 0 ;11 }
2 Contains many bugs. Compiles without warnings or errors.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Bugs
1 Memory was allocated but not initialized as in line 6 of thecode.
2 Use of uninitialized values as shown in line 7 of the code.
3 Trying to read past the end of allocated array as shown in line8 of the code.
4 Memory not de-allocated.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Usage of Valgrind tool
1 Run the above program with Valgrind
valgrind −−leak-check=full ./sample
2 Valgrind uses the default tool Memcheck. Error summarylooks as shown below
Example
==31152== Invalid read of size 4==31152== at 0x804844A: main (bad code.c:7)==31152== Address 0x41ff08c is 0 bytes after a block of size 100alloc’d==31152== at 0x402C5A9: malloc (vg replace malloc.c:296)==31152== by 0x8048430: main (bad code.c:6)
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Invalid Reads
Example
==31152== Invalid read of size 4==31152== at 0x804844A: main (bad code.c:7)==31152== Address 0x41ff08c is 0 bytes after a block of size 100alloc’d==31152== at 0x402C5A9: malloc (vg replace malloc.c:296)==31152== by 0x8048430: main (bad code.c:6)
We read past the end of the allocated array.
Trying to read from area which we are not allowed to access.
Could result in a SEGFAULT.
Valgrind provides enough details to find the problem.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Invalid Writes
Example
==31152== Invalid write of size 4==31152== at 0x8048463: main (bad code.c:8)==31152== Address 0x41ff090 is 4 bytes after a block of size100 alloc’d==31152== at 0x402C5A9: malloc (vg replace malloc.c:296)==31152== by 0x8048430: main (bad code.c:6)
Similar to invalid read
Details provided by Valgrind:
Location of fault (addresses, line number if debug-informationpresent).Stack-trace to fault (you can get more using−−num−callers=30).Relevant blocks details and allocation/de-allocationstack-trace.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Memory Leaks
At the end of the run, Valgrind does “Garbage Collection”.
Unreferenced memory in C/C++ ⇒ memory leak
.
Example
==31152== 100 bytes in 1 blocks are definitely lost in loss record1 of 1==31152== at 0x402C5A9: malloc (vg replace malloc.c:296)==31152== by 0x8048430: main (bad code.c:6)
Valgrind provides stack-trace for the allocation point3 kinds:
Definitely lost (no pointers to allocation).Probably lost (pointers only to the middle of the allocatedblock).Still reachable (block hasn’t been free’d before exit, butpointers to it still exist).
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Use of Uninitialized Value
Example
==31152== Conditional jump or move depends on uninitialisedvalue(s)==31152== at 0x8048476: main (bad code.c:10)
Valgrind checks and make sure the program flow isdeterministic.
Usage of values which haven’t been initialized in conditions isreported.
Also if they are passed as parameters for syscalls.
Valgrind will detail location in which the uninitialized datawas used.
To get trace to the source of it, add “−−track−origins=yes”to the command-line.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Reliability on Valgrind
1 Memcheck was used on an application that used the OpenSSLlibrary.
2 The tool complained about the library using uninitializedmemory in two locations in the filecrypto/rand/md rand.c. In detail .
3 Most of the time Valgrind’s errors describe latent flaws in theprogram.
4 However, sometimes Valgrind is wrong.5 Cryptography related code requires special attention.6 A debian developer commented out code that Valgrind didn’t
like.7 Resulting in a latent bug.8 With massive security implications.9 All because Valgrind claimed a value used is uninitialized.10 Was intentionally used so, to collect more entropy.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Problems Valgrind cannot address I
1 Buffer overflows adversely accessing valid memory.
2 Accesses to stack and global variables are not checked.
3 Business logic/algorithmic problems are not detected.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Problems Valgrind cannot address II
Example
char ∗p = ma l l oc ( 1024 ) ; /∗ b l o ck 1 ∗/char ∗q = ma l l oc ( 1024 ) ; /∗ b l o ck 2 ∗/p += 1200 ; /∗ ”p” now po i n t s i n t o b l o ck 2 ∗/∗p = ’ a ’ ; /∗ i n v a l i d w r i t e − but goes unde t ec t ed ∗/
#de f i n e BUFSIZE 5i n t main ( i n t argc , char ∗∗ a rgv ) {char buf [ BUFSIZE ] ;s t r c p y ( buf , a rgv [ 1 ] ) ; // no bounds check ing ,// a l though a s e c u r e v e r s i o n s t r c p y s ( ) e x i s t s .re tu rn 0 ;}
4 Checks only code that is executed.
5 Does not work well with statically-linked executable.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Valgrind on statically linked executable
Example
==22660== Conditional jump or move depends on uninitialisedvalue(s)==22660== at 0x804B413: IO cleanup (in/home/raghu/Desktop/WS14/Seminar/Presentation/Code/driver)==22660== by 0x80497D8: run exit handlers (in/home/raghu/Desktop/WS14/Seminar/Presentation/Code/driver)==22660== by 0x804981E: exit (in/home/raghu/Desktop/WS14/Seminar/Presentation/Code/driver)==22660== by 0x804911D: (below main) (in/home/raghu/Desktop/WS14/Seminar/Presentation/Code/driver)
1 Valgrind run on statically linked executable.
2 Too many error reports as shown above.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Why? Does Valgrind just not work with -static?
1 It does.
2 The problem is not in Valgrind, it is in glibc, which is notValgrind clean.
3 When you link statically, these errors come from executable(which includes code from libc.a).
4 But when linked dynamically, these errors come from theshared object libc.so.6.
5 By default Valgrind suppresses the errors.
6 In the case of statically linked executable, as a workaround, wecan write suppression files.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Other Tools in Valgrind
We talked about Memcheck – the default tool in Valgrind.
But Valgrind can do much more than that.
Contains many tools, some stable, some experimental.
You can even write your own tools in few days of work.
Valgrind ships with the following tools:
Memory Error Checkers
– SGCheck – Memcheck
Profilers
– Cachegrind – Callgrind– Massif
Thread Error Detectors
– Helgrind – DRD
Sample Tools and others
– Lackey – None– BBV
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Experimental tool SGCheck
Activate by adding −−tool=exp-sgcheck.
Similar in goals to Memcheck.
Still experimental.
Uses very different approach than Memcheck.
Can detect failures Memcheck doesn’t detect (like the codewe saw ).
Has got false positives (different than the ones Memcheckgets).
Slower than Memcheck.
Doesn’t check for memory leakage and validity of accesses.
Use alongside Memcheck for better coverage.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
SGCheck illustration
i n t main ( ){i n t i , a [ 1 0 ] ;f o r ( i = 0 ; i <= 10 ; i++)
a [ i ] = 42 ;}
1 Principle: The key observation is that if a memory referencinginstruction accesses inside a stack or global array once, then itis highly likely to always access that same array.
2 At run time we will know the precise address of a[] on thestack, and so we can observe that the first store resultingfrom a[i] = 42 writes a[], and we will (correctly) assume thatthat instruction is intended always to access a[].
3 Then, on the 11th iteration, it accesses somewhere else,possibly a different local, possibly an un-accounted for area ofthe stack (eg, spill slot), so SGCheck reports an error.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Profiling Tools
Cachegrind
Traces the code memory accesses and jump patterns.Simulates a 2-level cache and branch predictor.Provide details about cache misses and their source.Can help optimizing performance critical code.
Callgrind
Extends Cachegrind.Propagates the costs along the call-tree.Has a KDE front-end – KCacheGrind.
Massif
Memory allocations profiler.Keeps stack-trace of every memory allocation/deallocation.Print memory usage status in peak times and upon specificintervals.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Thread Error Detectors
Functionality similar to Intel’s ThreadChecker.
Detects a variety of threading related problems:
Threading API misuse.Lock order problems (potential dead-locks).Data-Races.
Two similar implementations in Valgrind
Helgrind
Discussions on most forums attribute this tool produces manyfalse positives.Supposedly catches more errors too.
DRD
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Client requests
Valgrind provides communication channel for tested programs.
Good for unit-test harnesses.
Also good if you are doing weird stuff in your code
A special memory allocator, such as object-pool
Use VALGRIND CREATE MEMPOOL to mark a memory area as anobject-poolUse VALGRIND MEMPOOL ALLOC to mark an object as allocatedUse VALGRIND MEMPOOL FREE to mark an object as free
Self-Modifying code, i.e. JIT compiler
Use VALGRIND DISCARD TRANSLATIONS to report about areasin which code has been changed
Useful also for hunting bugs
For example, check for memory leaks, usingVALGRIND DO LEAK CHECK
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Suppression Files
Valgrind’s error summary contains too many details.
Most of the times it is indicating bugs that should be fixed.
But not always the one we want to fix right now.
Sometimes it is correct code, which Valgrind failed tounderstand (false positives).
Mostly in sophisticated/extremely optimized library code.Also possible when having unusual interactions with the kernelcode.
Valgrind includes a mechanism to silent a specific error.
Works with all tools that report errors.Simple file format, see documentation for details.Valgrind includes suppression for many common libs.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Outline
1 Program analysis
2 Sample (Bad) Code
3 Drawbacks
4 Advanced Valgrind usageOther Tools in ValgrindFine Tuning and Client Requests
5 How does Valgrind do that?The Core - All Valgrind Tools have in commonMemcheck - Behind the Scenes
6 Q&A
7 Thanks
8 References
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
VEX - Binary Translation
We want to inspect all memory accesses in the code
Straight forward solution - CPU emulation
But this is really slow
Treat the program binary as “source-code”
Allow the tools to modify the code being compiled
VEX’s front-end translates X86 opcodes into IntermediateRepresentation code
VEX’s back-end translates IR code back to X86 code
Tools can manipulate the IR code in the middle
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
IR code
VEX translates each guest opcode into a block of IR code.
IR code is similar to assembly for a RISC-style machine.
IR code assumes machine with infinite number of variables.
The “guest state” is stored in a special memory area.
The following example is taken from libvex ir.h.
IR translation of “addl %eax, %edx”
—— IMark(0x24F275, 7) ——t3 = GET:I32(0) # get %eax, a 32-bit integert2 = GET:I32(12) # get %ebx, a 32-bit integert1 = Add32(t3,t2) # addlPUT(0) = t1 # put %eax
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
IR code - Properties
IR code is fully typed:
RISC-like assembly language with arbitrary number oftemporary registers. [1]VEX performs sanity checks on these types all of the time. [1]
IR code is in a Single Static Assignment form.
Each variable is assigned only once.Simplifies the instrumentation of the code.Also simplifies the optimization of the code when running it.
IR code is presented to the tools in semi-parsed form.
Easy to manipulate lists of instructions.Instructions are presented in a convenient data-structure.Useful functions for manipulating IR code (add/removeinstructions, etc.)
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Instrumentation technique: Disassemble and Resynthesize
Example
Figure 1: Object code → Code cache
1 IR blocks.2 Each of the IR blocks is generated by the VEX system.3 Finally object code is re-generated and stored in code-cache.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
JIT, VEX and IR
The X86→ IR → X86 translation is done in a Just In Timemanner
Every basic-block is translated upon its first execution.
Definition
Basic-Block - a linear sequence of code, with one entry point, oneexit point.
The translation is done so that Valgrind’s dispatcher regainscontrol after each basic-block.(There are exceptions)
Caching of translated code blocks improves execution speed.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Bootstrap Code
Small, shared launcher - launches the relevant tool
Each tool’s binary contains a complete copy of the core code
The interesting stuff in bootstrap:
Reading the debug information for the target program or client.Initialize VEX - Valgrind’s binary-translation mechanism.Call tool-specific initialization code.Load the target program.Setup the environment for the target program run.Initialize Valgrind’s thread-scheduler.
The scheduler makes sure that only one thread runs at a time.Scheduler also handles signals.
Kick-start the “client” application main thread, using VEX andValgrind’s dispatcher.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
High-level view of Valgrind’s activity
Example
Figure 2: Valgrind’s scheduler and dispatcher
1 Auxiliary mapped cache updated for every access oftranslation block.
2 Figure shows Valgrind’s scheduler and dispatcher.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Function Call Redirection
Valgrind implements a redirection mechanism.
This mechanism can “hijack” or capture various function calls.
Done on the binary translation level.
Some examples where this ability is useful:
Memory allocation functions.Various sys-calls (write/read files, etc.)Loading dynamic-link library.Replace some optimized functions with debug versions
The guest code can request that as well.
Redirect requests are indicated by a specially mangled functionname.Special “magic-sequence” to call the original function.Nice C macros make it developer friendly.Please see documentation for details.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Outline
1 Program analysis
2 Sample (Bad) Code
3 Drawbacks
4 Advanced Valgrind usageOther Tools in ValgrindFine Tuning and Client Requests
5 How does Valgrind do that?The Core - All Valgrind Tools have in commonMemcheck - Behind the Scenes
6 Q&A
7 Thanks
8 References
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Memcheck Basics, A-bits
Memcheck tracks or shadows what memory the clientapplication may access.
Two levels of allowed access:
Write to / read from the memory.Use the value in memory for anything serious.
The first one is tracked with “A” bits (Access/Addressability).
One bit per byte of memory.Set to 0 if client is not allowed to access (i.e. free’d block).Set to 1 upon memory allocation.Report an error if client code touched memory with A == 0.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
V-bits
The second access level is tracked with “V” bits (Validity)
One bit per bit of memory, so a byte of validity.Set to 1 if the original memory bit haven’t been defined yet.Set to zero once the memory bit is set.State is transitive.
Example
If c is not defined, evaluating a = b + c will get a to be undefined too.
Reports an error when conditional jumps and syscalls are givenundefined values.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Instrumentation technique
Valgrind uses Disassemble-and-resysnthesize approach.
Eight phases. Little complex to demonstrate in this session.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Discussion and Clarification
Q & A
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
Thank you
Thank you all for listening.
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References
References
A. H. ASHOURI, The VEX System.
A. V. Emmanuel Fleury, Gerald Point, Binaryprogram analysis: Theory and practice (what you code is notwhat you execute).2013.