Visualizing Software Behavior Wu Yongzheng
14/Sep/2011 NUS SoC CSTalks 1
Problems
• Software is complex – Large codebase – Interaction between components – Components from different vendor – Closed source, closed API
• Why understand software? – As developer => less bugs – As administrator => diagnosis – Curiosity?
• Execution trace contains software behavior information, but it’s huge.
14/Sep/2011 NUS SoC CSTalks 2
Software Traces
• Types of traces – Instruction trace: records machine instructions – Call trace: records function calls – System call trace: records system calls – Software logs: important events
• System trace – System call trace from all processes – Mainly resource usage, system & process
interaction
14/Sep/2011 NUS SoC CSTalks 3
WinResMon
• WinResMon: our trace recorder. • Works in Windows • Types of events:
– File: open, read, write, close, rename, … – Registry: open, get value, set value, delete, … – Network: connect, listen, send, receive, … – Process/thread: create, terminate.
14/Sep/2011 NUS SoC CSTalks 4
Information (fields) in an Event
• PID/TID Process/thread ID • Program name Path of program’s EXE • User name/group Process’ owner • Start/end time Event timing in CPU ticks • Operation type E.g. file open • Parameter Type dependent. E.g.
– file path, system call flags, registry path – IP address
• Call stack trace Call stack in user process
14/Sep/2011 NUS SoC CSTalks 5
Why visualize System Traces
• Software is complex – Interaction between modules, other software
• Software can be closed source, but interaction is open
• Human is good at detecting – Repeated pattern – Anomaly
NUS SoC CSTalks 6 14/Sep/2011
What is DotPlot?
E A C B E E E D C
A
C
B
C
D
E
B
C
E
NUS SoC CSTalks 7
Trace X
Trace Y
14/Sep/2011
What is DotPlot?
E A C B E E E D C
A
C
B
C
D
E
B
C
E
NUS SoC CSTalks 8
Trace X
Trace Y
14/Sep/2011
An Example
NUS SoC CSTalks 9
Visualization comparing: MS PowerPoint, MS Word, OO Word, and OO PowerPoint.
14/Sep/2011
Elements of VDP
NUS SoC CSTalks 10
1: Extended DotPlot 2,3: Axis Histogram 4,5: Barcode
1 3 4
2
3 14/Sep/2011
Extended DotPlot
NUS SoC CSTalks 11
• Matching Rule – Define whether two events match – By fields: e.g. “if PIDs and resource paths are
the same”, “if program names are the same”
• DP Coloring Rule – Define color for matched events – Traditional DP uses black only – Use RGB model on black background, CMY
on white background – Use regular expression to specify events – E.g. “.*file_open.*”→blue. “.*reg_.*”→cyan
14/Sep/2011
Event-ordered and Time-ordered
• Each event takes different time • The meaning/unit of each axis
NUS SoC CSTalks 12
Event-ordered Time-ordered
14/Sep/2011
Axis Histogram
NUS SoC CSTalks 13
– Ticks mark unit time (e.g. 1 second) – Histogram
• Event density (time-ordered) • Time spent (event-ordered)
14/Sep/2011
Barcode
NUS SoC CSTalks 14
• One dimensional • Highlight user chosen events
• E.g. file_open → red • One or more (e.g. three below) • Barcode coloring rules
14/Sep/2011
Example 1: File Copying
NUS SoC CSTalks 15
Self-comparison, event-ordered xcopy copying 8 files: 1MB, 10KB, 10MB, 100KB, 1MB, 10KB, 10MB and 100KB DP match : operation + parameter (pathname) DP color : magenta → source; cyan → destination; black → other
File Operation
Source/Dst File Operation
Registry Operation
14/Sep/2011
File Size
NUS SoC CSTalks 16
File size is visible Two 1MB and 10MB are shown Two 10KB and two 100KB are visible only when zoomed in
14/Sep/2011
Zooming in
NUS SoC CSTalks 17
DP color : magenta → source; cyan → destination; black → other
14/Sep/2011
A Surprise: Registry Operations
NUS SoC CSTalks 18
So many registry operations for a console application
Registry Operation
14/Sep/2011
Another Surprise: DLLs
NUS SoC CSTalks 19
File, but not source or destination. Time on DLLs is more than a 1MB file.
File Operation
Source/Dst File Operation
DLLs
14/Sep/2011
Example 2: Software Build
NUS SoC CSTalks 20
X: succeed; Y: failed due to missing .c file DP match : program + operation + value (pathname) DP color : black → any Bar1 color : black → nmake.exe Bar2 color : cyan → cl.exe; magenta → link.exe Bar3 color : cyan → reading .c files; magenta → reading .h files
Y: Failed due to missing .c file
X: succeed
14/Sep/2011
Number of Executions
NUS SoC CSTalks 21
X: 4 compiles (cl.exe), 1 link (link.exe) Y: 3 compiles, 0 link Y: Third compile doesn’t read .c or .h. Bar2 color : cyan → cl.exe; magenta → link.exe Bar3 color : cyan → reading .c files; magenta → reading .h files
X: 4 compiler, 1 linker
Y: 3 compiler, 0 linker
14/Sep/2011
Similarity & Difference
NUS SoC CSTalks 22
Two traces are similar. Y (failed) trace terminates earlier. Right before reading .c file
14/Sep/2011
Different Matching Rule
NUS SoC CSTalks 23
Operation Type Program Name
14/Sep/2011
Example 3: Two Idle Windows Machine
NUS SoC CSTalks 24
• Time-ordered • 1 hour each • Different time • About 750K events
each
14/Sep/2011
Anomaly & Repeated Pattern
NUS SoC CSTalks 25
• Periodic pattern • Most events in R1 • Most time in R2 alike • Easily spot anomaly &
regular pattern
R1
R2
14/Sep/2011
Zoom In
NUS SoC CSTalks 26
R1
R2
14/Sep/2011
R1: Windows Update
• Similar events (darker area) are by Windows Auto Updater
• More file operation, less registry operation
NUS SoC CSTalks 27
magenta → wuauclt.exe (Windows Update)
File Operation
Registry Operation
14/Sep/2011
14/Sep/2011 NUS SoC CSTalks 28
Visualizing Module Dependencies
• The problem – There’s vulnerability in X. Which software uses X? – Why my software uses X? I never call it. – Is it safe to uninstall X?
• Software module – Windows DLLs – UNIX .so – Java class, packages
14/Sep/2011 NUS SoC CSTalks 29
Examples of dependencies (1)
• Binaries used by notepad – c:\windows\apppatch\acgenral.dll – c:\windows\system32\avgrsstx.dll – c:\windows\system32\imm32.dll – c:\windows\system32\lpk.dll – c:\windows\system32\msacm32.dll – c:\windows\system32\msctf.dll – c:\windows\system32\msctfime.ime – c:\windows\system32\shimeng.dll – c:\windows\system32\usp10.dll – c:\windows\system32\uxtheme.dll – c:\windows\system32\winmm.dll – c:\windows\system32\winspool.drv – c:\windows\winsxs\x86_microsoft.windows.common-
controls_6595b64144ccf1df_6.0.2600.5512_x-ww_35d4ce83\comctl32.dll
14/Sep/2011 NUS SoC CSTalks 30
Examples of dependencies (2) • Simple boot (only Windows installed)
– DLLs: 154 – EXEs: 10 – Drivers: 1 – Ime: 1
• Typical boot (Windows + applications) – DLLs: 274 – EXEs: 15 – Telephony/Modem: 6 – Drivers: 3 – ActiveX: 2 – Ime: 1
14/Sep/2011 NUS SoC CSTalks 31
Visualization (1)
• Basic dependency graph • Graph is too dense
14/Sep/2011 NUS SoC CSTalks 32
Binary Dependency Visualization • Two types of nodes: EXE, DLL + etc • Three types of directed edges
1. EXE X launches another EXE Y 2. EXE X load a DLL Y 3. A function in binary X calls a function in binary Y
• How are binaries shared among programs? – EXE Dependency Graph – Only Type 1 and 2 edge – Group DLLs by loader
• How binaries interact? – DLL Dependency Graph – Only Type 2 and 3 edge – Group DLLs manually by functionality or software vendor
14/Sep/2011 NUS SoC CSTalks 33
Visualization (1)
• Basic dependency graph • Graph is too dense
14/Sep/2011 NUS SoC CSTalks 34
A more usable Visualization: EXE Dependency Graph
• Grouped dependency graph 1
1
1
2
2
14/Sep/2011 NUS SoC CSTalks 35
Comparing Microsoft Word and Open Office Writer
14/Sep/2011 NUS SoC CSTalks 36
DLL Dependency Graph: actual binary usage
• Some definitions: – An EXE-DLL dependency in a DLL Dependency Graph is
when there is has a control transfer from code in executable x to code in DLL y. We say that x has an EXE-DLL dependency on y.
– A DLL-DLL dependency in a DLL Dependency Graph is when there is has a control transfer from code in DLL x to code in DLL y. We say that x has a DLL-DLL dependency on y
14/Sep/2011 NUS SoC CSTalks 37
wget: DLL dependency without grouping
14/Sep/2011 NUS SoC CSTalks 38
wget: DLL dependency group by fnctionality
14/Sep/2011 NUS SoC CSTalks 39
Examples of grouping By functionality (GIMP)
14/Sep/2011 NUS SoC CSTalks 40
Examples of grouping By software vendor (GIMP)
14/Sep/2011 NUS SoC CSTalks 41
Two Operations
• Diff – Compare two graphs.
• E.g. from same program but different environment/input • E.g. from two related programs
– Diff graph G1 and G2 to get G3. • Projection
– Focus on a particular module X – Only show modules that calls X or called by X
(recursive defination) – Project graph G1 on module M to get G2 – Not a simple subgraph problem
14/Sep/2011 NUS SoC CSTalks 42
Diff of DLL dependency graph of Internet Explorer with Flash and without
14/Sep/2011 NUS SoC CSTalks 43
Projection of the DLL dependency graph of Internet Explorer on Flash
14/Sep/2011 NUS SoC CSTalks 44
Firefox using tortoisesvn
14/Sep/2011 NUS SoC CSTalks 45
Questions?
14/Sep/2011 NUS SoC CSTalks 46
Visualizing binaries executed
• Call graph is large. • Group functions to images => DLL dependency
graph. • DLL dependency graph is still large. • Group DLLs by properties:
– By functionality: graphics, audio, network… – By vendor: microsoft, adobe… – By path: C:\windows\system32\*.dll,
D:\vmware\*.dll…
14/Sep/2011 NUS SoC CSTalks 47
Visualizing binaries executed (1)
• Generate call tree, call graph, DLL dependency graph • PIN tool to collect execution trace
– Trace include call, return, thread, context, system call events
– Call and return records stack pointer, PC and target address.
• Not trivial to maintain call stack by tracking call and return – Non-return function (long jump) – Thread, fiber – Context – Kernel callback
14/Sep/2011 NUS SoC CSTalks 48
Projection void main (void) { A(); B(1); } void A (void) { B(0); } void B (int i) { if (i) D(); else C(); } void C (void) {} void D (void) {}
14/Sep/2011 NUS SoC CSTalks 49
main
A
B
C
D
main
A
B
C
Full Graph
Project on A