Shimin Chen LBA Reading Group

Complete Information Flow Tracking from the Gates UpTiwari, Wassel, Mazloom, Mysore, Chong, Sherwood, UCSB, ASPLOS 2009

Shimin Chen

LBA Reading Group

Introduction In a traditional microprocessor, information is leaked practically

everywhere and by everything Can be a serious problem for exceptionally sensitive financial, military,

and personal data Cryptography, authentication

Developers in these domains are willing to go to remarkable lengths to minimize the amount of leaked information flushing the cache before and after executing a piece of critical code

(Osvik et al. 2006) attempting to scrub the branch predictor state (Aciicmez et al. 2007) normalizing the execution time of loops by hand (Kocher 1996) randomizing or prioritizing the placement of data into the cache (Lee et

al. 2005)

Previous works on DIFT are not adequate

GLIFT: Gate-Level Information-Flow Tracking This paper:

presents a processor architecture and implementation can track all information flows

A novel logic discipline: GLIFT logic Augment arbitrary logic blocks with tracking logic Make compositions of augmented blocks

Synthesizable processor implementation with a restricted ISA Provably-sound information-flow tracking Allow tasks such as public-key cryptography and message

authentication

Theoretical Understanding In a Turing-complete machine, the general problem

of determining whether information flows in a program from variable x to variable y is undecidable: “any procedure purported to decide it could be applied to

the statement if f(x) halts then y := 0 and thus provide a solution to the halting problem for arbitrary recursive function” (Denning and Denning 1977).

The paper builds a machine: by construction, will not allow unbounded execution All hidden flows of information are made explicit

Outline Introduction Gate Level Information Flow Tracking Architecture Evaluation Conclusions

Idea Understand how information flows through primitive logic

gates Compose these gates together into more complex structures Treat the whole processor as a logical function

Operates on a set of inputs Results in a set of outputs The trust of outputs should be determined based on the trust of

inputs Assumption:

Binary state: trusted (0) or untrusted (1)

GLIFT for an AND gate

AND Gate

AND GateTruth Table

Shadow logic for AND Gate

Partial truth table for the shadow logic

Composing Larger Functions

• Use MUX as a simple example

• The shadow logic can be composed from shadow logics of gates

• Not minimum but always sound, for example, the two inputs to the OR gate cannot be both 1

• If S is trusted and the selected input is trusted, o is trusted

• If S is untrusted, o is untrusted unless both a and b are trusted and are equal

Step 1: Handling Conditionals

Problem with conventional architecture If X is untrusted, then PC becomes untrusted Selected instruction becomes untrusted Bits that select target register are untrusted All of the registers may be marked as untrusted

Must keep PC trusted

Solution: Predication

All the instructions are executed If predicate is 0, the instruction does not have

effects: target register is not overwritten

PC is trusted Predicates can become untrusted

Suppose P0 is untrusted

Example

The line selecting R2 is untrusted The other control lines are trusted

R2 will be marked untrusted no matter P0= 0 or 1 End result: no matter the untrusted predicate is true

or not, the destination is marked as untrusted.

target

Step 2: Handling Loops Loops are hard

for (i=0; i<=X; i++) A[i]=1; Information flow from X to A[X+1]

A[X+1]==0 tells us about X Information flow from X to A[X+n] for all n

Implicit timing channel

Solution: Statically Specify Number of Iterations countjump instruction:

Specify number of loop iterations jump target address

Example (my understanding from the description) Loop start address:

…………countjump # iterations, loop start address

The first time countjump is encountered, the # iterations is loaded into an internal loop counter register

The loop counter register is decremented every time countjump is encountered, and PC loop start address

When the register becomes 0, PC PC + 1 countjump cannot be predicated

Early Termination In “C”, we have “break” statement that can terminate

a loop early Here, the paper proposes:

Predicate all the instructions in the loop with the termination condition

When the termination condition becomes true, the loop body does not have effects

Step 3: Constraining Loads and Stores Indirect loads and stores are bad

e.g., M[reg] value If reg is untrusted, then essentially all the memory locations

become untrusted “Intuitively, the problem is that accessing one untrusted address

causes every other address to become implicitly untrusted by virtue of them not being accessed or modified.”

Limit the ISA to only allow: Direct load/store: addresses are immediate constants Loop-relative addressing: load-looprel, store-looprel

e.g., load-looprel R0, 0x100, C0 Loads M[0x100 + C0] C0..C7 are counters: explicitly initialized by init-counter, and

incremented by a fixed value w/ increment-counter counter operations cannot be predicated

Proof-of-Concept Implementation Verilog Use Altera’s QuartusII software to synthesize it onto a Stratix

II FPGA 32-bit machine 64KB Instruction memory, 64KB Data Memory Registers:

A program counter 8 general purpose registers 2 predicate registers 8 registers to store loop counters (that count down the number of

iterations) 8 other registers to store explicit array indices (used as offsets for

load-looprel and store-looprel instructions). No pipelining

Augment the Processor with GLIFT Logic Each bit of processor state is explicitly shadowed:

every register gets a shadow register every memory has a shadow RAM

The logic and signals are shadowed by generating the proper trust propagation logic

A code snippet from the SubBytes function in AES encryption algorithm

Basically this is the following in “C”:

for (i=0; i<16; i++) { state[i] = SBox[state[i]]; }

Hardware Impact

Altera’s Nios is a commercial product: RISC instruction set, reasonably optimized

Nios econ: unpipelined 6 stage core, without caches, branch-predictors etc.

Nios std: pipelined, 4KB instruction cache

GLIFT base: unpipelined, no tracking

GLIFT full: GLIFT base + tracking

Hardware Impact

70 % area increase compared to GLIFT base

Small frequency degradation: adding GLIFT tracking does not have big impact on the latency

Application Kernels

Dynamic instruction counts vary substantially

• FSM and AES have a lot of table look-ups, which become full table iterations

Conclusions Bigger, slower, harder to program, and

computationally less powerful For the first time, provides the ability to account for

all information flows through the chip.

My learning: Understanding deeper about information leaks Efforts to prevent leaks are very significant

Sacrifice programmability: restrictions on loop, load/store Proof-of-concept does not even talk about issues such as

Shimin Chen LBA Reading Group

Documents

Carnegie Mellon Improving Index Performance through Prefetching Shimin Chen, Phillip B. Gibbons † and Todd C. Mowry School of Computer Science Carnegie

Computer Architecture Lab at Evangelos Vlachos, Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Phillip B. Gibbons, Babak Falsafi and Todd C. Mowry

MapReduce: Simplified Data Processing on Large Clusters J. Dean and S. Ghemawat (Google) OSDI 2004 Shimin Chen DISC Reading Group

Lba Totorales

Inspector Joins IC-65 Advances in Data Management Systems 1 Inspector Joins By Shimin Chen, Anastassia Ailamaki, Phillip, and Todd C. Mowry VLDB 2005 Rammohan

Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:

LBA 부동산경제연구소

Habrá medicopediatra nuevamente en el hospital shimin

1 Complete Information Flow Tracking from the Gates Up Tiwari, Wassel, Mazloom, Mysore, Chong, Sherwood, UCSB, ASPLOS 2009 Shimin Chen LBA Reading Group

Chris Olston Benjamin Reed Utkarsh Srivastava Ravi Kumar Andrew Tomkins Pig Latin: A Not-So-Foreign Language For Data Processing Research Shimin Chen Big

Shimin Chen Intel Labs Pittsburgh UPitt CS 3150, Guest Lecture, February 24, 2010

A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh

15213 Recitation Section C Introduction Unix and C Playing with Bits Practice Problems Shimin Chen Sept. 9, 2002 Outline

Lba analisys

“A Cost-Effective, High-Bandwidth Storage Architecture”. Gibson et al. ASPLOS’98 Shimin Chen Big Data Reading Group Presentation NASD (Network-Attached

1 A Comparison of Approaches to Large-Scale Data Analysis Pavlo, Paulson, Rasin, Abadi, DeWitt, Madden, Stonebraker, SIGMOD’09 Shimin Chen Big data reading

Vigilante: End-to-End Containment of Internet Worms M. Costa et al. (MSR) SOSP 2005 Shimin Chen LBA Reading Group

Shimin Chen (LBA Reading Group Presentation)

LBA Completo

LBA 38_2011