Upload
bess
View
28
Download
0
Embed Size (px)
DESCRIPTION
FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation. G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008. Motivation. Tainting Schemes extremely useful for security and debugging purposes Eg TaintCheck, PointerCheck - PowerPoint PPT Presentation
Citation preview
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08
Reading Group Presentation 02/14/2008
Tainting Schemes extremely useful for security and debugging purposes◦ Eg TaintCheck, PointerCheck
Implemented in Software◦ Usually some kind of DBI◦ Extremely Versatile◦ Really Slow◦ Problems with Multithreaded Apps, JIT
compilation, and self-modifying Code
So, make hardware for it◦ Multiple examples: Raksha, Minos, etc◦ Fast◦ Can deal with strange codes that trouble S/W◦ Extensive modifications in the OoO core, caches,
buses, memories required◦ Limit the state which can be manipulated,
usually to a few bits, easily managed by H/W◦ So, who is going to implement it?
Solution: FlexiTaint◦ Use H/W to accelerate what the S/W is doing
Common Case Propagation, and metadata manipulation
RISC ISA
Taint State 1..16 bits per word 1-Level table in the application address space
◦ Protected from the application◦ No need to widen buses, caches etc◦ L1-T cache for taint bits: 4 kB for 2-bit states
No changing L1-D, no port contention◦ Taint state shares L2
2 Registers for that◦ MTBR: Memory Taint Base Register: start of the
table◦ FTCR: FlexiTaint configuration Register: bits/word◦ Both must be saved on a context switch by the O/S
All loads/stores prefetch taint state to L1-T State 0..0 is assumed to be a safe one State can manipulated directly by special
instructions◦ Must be added somehow after special events
Read a file, malloc, input purging etc
Takes place after the OoO core◦ Can be turned off and completely bypassed if
unnecessary The normal Commit becomes Pre-CoMmiT A software handler receives 4 arguments:
◦ OpCode, Reg1 State, Reg2 State, Mem State And returns the output state and whether
an exception should be raised Handler address stored in TPCHR
◦ Restricted access register
The answer of the S/W handler for the same inputs will be the same◦ Cache it
128 entry direct mapped response cache Indexed by opcode, Reg1 state, Reg2 state,
Mem State (folded in 7 bits) Stores the Output State and Exception bit Cleared every time the TPCHR (software
handler address register) is changed◦ Usually on context switch
After the OoO core has ended. Size of the Architectural Register File, NOT the physical one
State of Reg0 hardwired to 0
Reserved for instructions that touch memory
Example: For instructions that do not touch memory◦ Remember RISC ISA
ALARM!
128-entry Direct MappedCleared when TPCHR
changes
Suppresses silent stores
Example: Stores
Still, TPCache lookups take 1 cycle If dependent instructions were retired in the
same cycle, the In Order taint propagation will stall◦ Pressure to the physical register file and ROB
Well, usually 0..00 is good, and when zeroes are combined, the result is 0..00
Also, if only one Non-zero, then usually you have unary propagation
Create a table to store that
Stores for each opcode (256) 2-bit value◦ 512 bits total, must be stored on context switch
Really fast lookups, allows for same-cycle propagation
4 stage in order pipeline◦ Receives non-speculative instructions
First 2 stages: Look up◦ Filter TPT◦ L1-T
3rd stage Taint Propagation◦ TPC Lookup◦ Or trivial propagation through Filter TPT
4th stage commit
Summary of what the O/S needs to store on context switches◦ TPCHR (handler address)◦ FTCR (state size)◦ MTBR (shadow state address)◦ Filter TPT content (64 bytes)
The TPCache can simply be discarded All state in the address space of the
application◦ So swapping, virtualization, etc normally
Data and Metadata accessed in 2 different cycles◦ Potential consistency issues
Solution for Loads:◦ Prefetch State when data address is resolved◦ If state does not hit in the L1-T a few cycles later,
replay the load Solution for Stores:
◦ Prefetch State (same with load)◦ Write only when data/metadata both hit in the L1
Usually L1-T is always a hit due to prefetch
1st: TaintCheck 1 bit state per word◦ Allows for maximum optimization 10 in the Filter
TPT (unary propagation and zero optimization)◦ TPCache and S/W will consider XOR R1,R1,R1 cases
2nd: 1-bit PointerCheck◦ Stores which words are valid heap pointers◦ Good for leak detection◦ And something that Raksha cannot handle◦ Filter TPT: 01 (non-pointers produce non-pointers)
3rd: A Combination with 2-bit states◦ Filter TPT: 01 (untainted non-pointers produce
untainted non-pointers)
TaintCheck Rules 1-bit Heap PointerCheck
SESC simulator 8-core system 4-issue OoO superscalar cores @ 2.93GHz L1-D 32-Kbytes, 8-way set associative, dual
ported, 64 byte blocks L2 4MBytes 16-way set associate, single-
ported, 64-byte blocks◦ Small for 8 core system
L1-T: 4 KB, 4-ways set associative, dual ported, 64-byte blocks
Bus 64-bits wide @ 1333 MHz
~1% for SPEC 2K and 4% for Splash2Splash 2 is worse due to false sharing of metadata
Smaller Cache line → Less false sharing of Metadata
For 4 KB ~1% overhead for SPEC 2k 8 KB minimal gains 2 KB 2.8% overhead
Conclusion: 4 KB is fine for 1 and 2 bit states
Use FlexiTaint to simulate previously proposed hardware
And implement the lifeguard that they couldn’t handle (1-bit Heap PointerCheck)
Obviously FlexiTaint proves better
Versatile scheme to handle most lifeguards with low overhead
Nice idea to cache the answer of the software handler
In general, a good idea◦ With its limitation though (LockSet)
Questions?