Upload
code-blue
View
55
Download
0
Embed Size (px)
Citation preview
Anti exploitation and Control Flow Integrity with Processor Trace
Brought to you by
Shlomi Obermanindependent security
researcher
Ron Shinaindependent security
researcher
Tracing – what executed and when?
Code optimization and profiling◦Sampling◦Instrumentation
Intel Processor Trace (PT)
Intel PTProcessor feature enabling instruction
tracing with low overhead – documentation says about 5%◦Tens of times faster than the previous option
Available on Intel Broadwell and Skylake processors
A similar feature, Real Time Instruction Trace, exists on certain Intel Atom processors
Intel PT
PacketsProcessor writes trace to memory as packets
Packet Types◦ Taken / Not Taken packets for conditional branches◦ IP packets for indirect branches◦ Timestamp packets◦ …
Binary is needed to recreate the instruction trace
call to foo
branch taken / not taken
Decoded Trace Packets
User and or Kernel tracing
Filter by process
Starting or stopping the trace based on address ranges (only in later processors)
Configuration options
Atom processors supporting RTIT – tracing guests possible, but not the hypervisor
Broadwell – no support at all
Skylake – full support
Tracing VM guests and hypervisors
+ Traced Program’s Binary
Instruction Trace
Intel PT output
Linux kernel 4.1 comes with integrated PT supportLinux kernel 4.3 supports tracing using perf user tools
An open source PT decoding library – libipt
Gdb 7.10 supports using PT for tracing
simple-pt – an open source implementation of PT on Linux(used to create the trace pictures on the previous slide)
* processor supporting PT included separately ;)
Want to use Processor Trace right now? *
Exploitation and the NX Bit
Hi!
shellcode
When pdf is opened, the shellcode will be in memory that isn’t executable – NX bit
How do attackers run the code to make their shellcode executable?◦ Use code that is already executable (the
program’s code )
This exploitation technique comes in many forms, most notably, ROP – Return Oriented Programming
Using executable memory already in the program usually involves moving around the process rather strangely
for example:
◦ Not returning to a function’s caller
◦ Calling addresses in the middle of functions, instead of at the beginning
◦ …
“Jump Around, Jump around…” / House of Pain
Hi!
shellcode
Establish rules for how the code flows in the process◦ Functions return to their callers◦ Calls are made to the beginning of functions◦ …
How can those rules be enforced?◦ Add rule checking to the program’s binary◦ Trace the program while running and go over the log (this work)◦ Use other CPU features to detect “surprising” branches
“Control Flow Integrity Principles, Implementations, and Applications”, Abadi, Budiu, Erlingsson, Ligatti, 2005
Control Flow Integrity (CFI)
“Security Breaches as PMU Deviation”, Yuan, Xing, Chen, Zang 2011
“kBouncer: Efficient and Transparent ROP Mitigation” – Pappas, Winner of Microsoft BlueHat competition 2012, uses previous CPU branch tracing capabilities
“CFIMon: Detecting Violation of Control Flow Integrity using Performance Counters” – Xia, Liu, Chen, Zang 2012
“Taming ROP on Sandy Bridge”, Wicherski of Crowdstrike, 2013
“Transparent ROP Detection using CPU Performance Counters”, Li, Crouse, THREADS 2014
and more…
Prior Work
Anti exploitation system to scan files based on CFI (think pdf on Adobe Reader)
Detects whether “illegal” returns were made, like in ROP◦ Easy to add other CFI mitigations, such as checking the
targets of calls (no calls to the middle of functions, …)
(Soon to be) Open SourceDeveloped in 2015
Our Implementation
Verifying CFI via Processor TraceWas the flow OK? Just follow the arrows
and calls using the PT generated packets
What information is needed to follow the execution and verify it?
Control Flow Graph (CFG)◦ Location of functions◦ Location of basic blocks◦ …
Need this for all the libraries loaded by the process – Adobe Reader dlls, Windows dlls◦ If not – false positives
All we have is debugging symbols, pdb files, for the Windows binaries
We used IDA to recover the CFG
IDA didn’t do a good enough job◦Part of the functions and basic blocks in Adobe
Reader / Windows binaries weren’t detected
Static Analysis
When supporting a new version of Adobe Reader, IDA is used to get the initial CFG (static analysis)
Afterwards, many pdf files are traced with PT◦ When a new basic block or function is discovered while following the
trace – the CFG is updated
Repeat◦ run IDA on the new CFG◦ run the pdf files on IDA’s output◦ If the CFG was updated in the last iteration
Repeat
Dynamic Analysis
Most of the edges in the CFG are:◦ Calls relative to the current IP (no
packet for those)◦ Conditional branches
When traversing the CFG during trace verification, fetching the next node in these cases has to be (very) fast
Since the CFG is fixed and built in preprocessing, this isn’t a problem
Optimization
Ideally, no disassembly and CFG modification (slow) would be done during verification
However, some of the code analyzed is created dynamically – as long as it doesn’t change, this can be dealt with in preprocessing
In cases where it changes every time “Adobe Reader” is run to open a file, preprocessing isn’t enough◦ code is disassembled and CFG is updated
Optimization
Following the execution trace is done on a per thread basis
How to know which thread was executing at each part of the trace?◦PT packets give timing information, but
only output the current process
Thread information
Event Tracing for Windows (ETW)
◦It should be possible to get the thread context switching times from the CSwitch events provided by ETW as TSC
◦Then these timestamps could be synched with the TSC packets from PT to determine which thread was running in different parts of the trace
Thread Information
What about getting a callback every time a thread in the traced process is switched in?
◦ AFAWK, no direct way
◦ We hooked the Windows context switch function - don’t do that
◦ Endgame presented a way to achieve this via Asynchronous Procedure Calls (Blackhat 2016)
Thread Information
Need to know the executable memory ranges at all points in the trace – what modules are loaded
Knowing when the PT trace reached ntdll!LdrLoadDll and ntdll!LdrUnloadDll isn’t enough◦ Module name is needed to update the current memory
map
ETW was used to retrieve module load / unload name and time (tsc) and this is then synched with the times of the load/unload functions in the trace
Module load / unload
For example:◦ Exception dispatching code◦ User mode callbacks◦ …
When going over the trace, when suspected mismatches occur, the above special cases are checked via binary signatures
This mostly needs to be done per operating system, not per-application
Still not done – functions don’t always return to their callers
(almost entirely) Not dealt with by our implementation
For PT tracing the code being executed is needed One obvious problem is pages that get written to and
executed from simultaneously
(maybe) One could remove the write permission every time a page becomes writable and executable and handle the access violation when it gets written to, in order to obtain the code’s new version
Dynamically generated code
A case of dynamically generated code that was dealt with:
Applications that hook themselves… with identical hooks, at the same locations and same time
To the trace verifier, the code is essentially static
Dynamically generated code
Benign, non malicious files◦Run on 10000 pdf, 3000 ppt/x, 3000 doc/x without false positives
Malicious files containing a ROP chain◦Run on 5 such files, detecting the exploit and displaying the CFI violation
Scanning Results
you’d still need◦Module load / unload information◦Thread context switch times
but could somewhat do without◦The CFG – a partial CFG can be built from the
trace (it doesn’t need to be built in advance)
Forget CFI and anti-exploitation…What if I just want to trace a process quickly with Processor Trace?
Control-flow Enforcement Technology announced by Intel June 2016. Release date ?
Processors will directly support:◦Shadow (call) Stack tracking –unmatching return control protection exception
◦Indirect branch tracking – an indirect branch to a target containing an instruction different than ENDBRANCH control protection fault
Coming soon to a motherboard near you
ARM has a feature similar to Processor Trace called CoreSight
Tracing on linux has been integrated with perfOpen source decoding library exists – OpenCSD
http://www.linaro.org/blog/core-dump/coresight-perf-and-the-opencsd-library/
What about tracing quickly on ARM?
“Control Jujutsu” – Evans, Long, Otogonbaatar, Shrobe, Rinard, Okhravi, Stelios, CCS 2015
Uses indirect call sites with controllable targets and arguments (via vulnerability) to achieve arbitrary code execution (e.g., call exec or system)
Bypasses CFI because the target functions are legal in the CFG
Bypassing CFI
“Write Once, Pwn Anywhere”, Yu, Black Hat USA 2014
◦Sometimes applications have security critical information in one variable
◦Pseudo-code from internet explorer’s javascript engine:
if (safemode & 0xB == 0) {turn_on_god_mode();}
Bypassing CFI with “data attacks”
“Control Flow Bending”, Carlini, Barresi, Payer, Wagner, Gross, USENIX 2015
◦printf-oriented-programming – if you control the arguments, printf can do arbitrary computation
Bypassing CFI with “data attacks”
“Data oriented programming” – Hu, Shinde, Sendroiu, Zheng, Prateek , Zhenkai, S&P 2016
goal: perform arbitrary computation while adhering to the CFG
Similar to ROP in spirit – use parts of the original program as “instructions” of a “VM” controlled by the attacker
“data gadgets” are used to perform computation on data
Bypassing CFI with “data attacks”
gadgets are executed one after the other by using constructs already in the vulnerable program – such as loops
the vulnerability being exploited is used to determine which data gadget gets run and on what data
“data oriented programming” (cont)
any questions?