25
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008

FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

  • Upload
    bess

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation. G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008. Motivation. Tainting Schemes extremely useful for security and debugging purposes Eg TaintCheck, PointerCheck - PowerPoint PPT Presentation

Citation preview

Page 1: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08

Reading Group Presentation 02/14/2008

Page 2: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Tainting Schemes extremely useful for security and debugging purposes◦ Eg TaintCheck, PointerCheck

Implemented in Software◦ Usually some kind of DBI◦ Extremely Versatile◦ Really Slow◦ Problems with Multithreaded Apps, JIT

compilation, and self-modifying Code

Page 3: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

So, make hardware for it◦ Multiple examples: Raksha, Minos, etc◦ Fast◦ Can deal with strange codes that trouble S/W◦ Extensive modifications in the OoO core, caches,

buses, memories required◦ Limit the state which can be manipulated,

usually to a few bits, easily managed by H/W◦ So, who is going to implement it?

Solution: FlexiTaint◦ Use H/W to accelerate what the S/W is doing

Common Case Propagation, and metadata manipulation

Page 4: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

RISC ISA

Page 5: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Taint State 1..16 bits per word 1-Level table in the application address space

◦ Protected from the application◦ No need to widen buses, caches etc◦ L1-T cache for taint bits: 4 kB for 2-bit states

No changing L1-D, no port contention◦ Taint state shares L2

Page 6: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

2 Registers for that◦ MTBR: Memory Taint Base Register: start of the

table◦ FTCR: FlexiTaint configuration Register: bits/word◦ Both must be saved on a context switch by the O/S

All loads/stores prefetch taint state to L1-T State 0..0 is assumed to be a safe one State can manipulated directly by special

instructions◦ Must be added somehow after special events

Read a file, malloc, input purging etc

Page 7: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Takes place after the OoO core◦ Can be turned off and completely bypassed if

unnecessary The normal Commit becomes Pre-CoMmiT A software handler receives 4 arguments:

◦ OpCode, Reg1 State, Reg2 State, Mem State And returns the output state and whether

an exception should be raised Handler address stored in TPCHR

◦ Restricted access register

Page 8: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

The answer of the S/W handler for the same inputs will be the same◦ Cache it

128 entry direct mapped response cache Indexed by opcode, Reg1 state, Reg2 state,

Mem State (folded in 7 bits) Stores the Output State and Exception bit Cleared every time the TPCHR (software

handler address register) is changed◦ Usually on context switch

Page 9: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

After the OoO core has ended. Size of the Architectural Register File, NOT the physical one

State of Reg0 hardwired to 0

Reserved for instructions that touch memory

Example: For instructions that do not touch memory◦ Remember RISC ISA

ALARM!

128-entry Direct MappedCleared when TPCHR

changes

Page 10: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Suppresses silent stores

Example: Stores

Page 11: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Still, TPCache lookups take 1 cycle If dependent instructions were retired in the

same cycle, the In Order taint propagation will stall◦ Pressure to the physical register file and ROB

Well, usually 0..00 is good, and when zeroes are combined, the result is 0..00

Also, if only one Non-zero, then usually you have unary propagation

Create a table to store that

Page 12: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Stores for each opcode (256) 2-bit value◦ 512 bits total, must be stored on context switch

Really fast lookups, allows for same-cycle propagation

Page 13: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

4 stage in order pipeline◦ Receives non-speculative instructions

First 2 stages: Look up◦ Filter TPT◦ L1-T

3rd stage Taint Propagation◦ TPC Lookup◦ Or trivial propagation through Filter TPT

4th stage commit

Page 14: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Summary of what the O/S needs to store on context switches◦ TPCHR (handler address)◦ FTCR (state size)◦ MTBR (shadow state address)◦ Filter TPT content (64 bytes)

The TPCache can simply be discarded All state in the address space of the

application◦ So swapping, virtualization, etc normally

Page 15: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Data and Metadata accessed in 2 different cycles◦ Potential consistency issues

Solution for Loads:◦ Prefetch State when data address is resolved◦ If state does not hit in the L1-T a few cycles later,

replay the load Solution for Stores:

◦ Prefetch State (same with load)◦ Write only when data/metadata both hit in the L1

Usually L1-T is always a hit due to prefetch

Page 16: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

1st: TaintCheck 1 bit state per word◦ Allows for maximum optimization 10 in the Filter

TPT (unary propagation and zero optimization)◦ TPCache and S/W will consider XOR R1,R1,R1 cases

2nd: 1-bit PointerCheck◦ Stores which words are valid heap pointers◦ Good for leak detection◦ And something that Raksha cannot handle◦ Filter TPT: 01 (non-pointers produce non-pointers)

3rd: A Combination with 2-bit states◦ Filter TPT: 01 (untainted non-pointers produce

untainted non-pointers)

Page 17: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

TaintCheck Rules 1-bit Heap PointerCheck

Page 18: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

SESC simulator 8-core system 4-issue OoO superscalar cores @ 2.93GHz L1-D 32-Kbytes, 8-way set associative, dual

ported, 64 byte blocks L2 4MBytes 16-way set associate, single-

ported, 64-byte blocks◦ Small for 8 core system

L1-T: 4 KB, 4-ways set associative, dual ported, 64-byte blocks

Bus 64-bits wide @ 1333 MHz

Page 19: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

~1% for SPEC 2K and 4% for Splash2Splash 2 is worse due to false sharing of metadata

Page 20: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation
Page 21: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation
Page 22: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Smaller Cache line → Less false sharing of Metadata

Page 23: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

For 4 KB ~1% overhead for SPEC 2k 8 KB minimal gains 2 KB 2.8% overhead

Conclusion: 4 KB is fine for 1 and 2 bit states

Page 24: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Use FlexiTaint to simulate previously proposed hardware

And implement the lifeguard that they couldn’t handle (1-bit Heap PointerCheck)

Obviously FlexiTaint proves better

Page 25: FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

Versatile scheme to handle most lifeguards with low overhead

Nice idea to cache the answer of the software handler

In general, a good idea◦ With its limitation though (LockSet)

Questions?