34
Korea University G. Lee - 2009 1 CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing and Branch Prediction

CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing and Branch Prediction

  • Upload
    inga

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing and Branch Prediction. Trusting Buggy Software. Development and Installation of Computer System in general and Software in particular. No Guarantee at any stage. No Design Proven Correct! - PowerPoint PPT Presentation

Citation preview

Page 1: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 1

CRE652 Processor Architecture

Making it Trustworthy

Trustworthy Computing and Branch Prediction

Page 2: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Trusting Buggy Software

2

No Guarantee at any stageNo Guarantee at any stage

No Design Proven Correct!No Implementation Proven Bug-Free!No Design Proven Correct!No Implementation Proven Bug-Free!

Development and Installation of Computer System in general andSoftware in particular

Page 3: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 3

Trustworthiness in ComputingExample - SSH Communications SSH Server

void do_authentication(char *user, ...) { int auth = 0; ... while (!auth) { /* Get a packet from the client */ type = packet_read(); switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: if (auth_password(user, password)) auth =1; case ... } if (auth) break; } /* Perform session preparation. */ do_authenticated(…);}

auth = 1

Session starts with Invalid Authentication

Page 4: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

My ViewMy View

expected

described

executed

?

?

Difference in Program Behavior Space

•Description not proven correctw.r.t Expected

•Execution with no cross-checkingw.r.t Description

•Description not proven correctw.r.t Expected

•Execution with no cross-checkingw.r.t Description

Page 5: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

First Issue:Protection Measure Precision

Secure(w\ false negatives)

Precise Broad (insecure)(w\ false positives)

set of reachable statesin program execution

set of secure (or certified)states captured in digest

NeitherNeither nor clearly defined or known!nor clearly defined or known!NeitherNeither nor clearly defined or known!nor clearly defined or known!

Page 6: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Second Issue: Semantic Gap

…next: read(a);

…..assign: X := a

if not_in(X, set) then goto next else goto print ;…..

print: print(whatever);…..return

Program semantics or behavior specified in program control and dataflow is only in user/programmer’s mind:• Isolated instruction instance• Blinded instruction sequencing

Page 7: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Program Behavior ValidationProgram Behavior Validation

1. Empirical capture of {Executed Behavior} for {Expected Behavior}2. SW Transparent micro-architecture level validation1. Empirical capture of {Executed Behavior} for {Expected Behavior}2. SW Transparent micro-architecture level validation

expected

described

executed

?

?ValidationValidation

Page 8: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Representing Program BehaviorRepresenting Program Behaviorfrom processor’s perspectivefrom processor’s perspective

…………jr $r6………

pcValidate at each indirect branch

•Legitimacy of in-flow•Legitimacy of out-flowat micro-architecture

Where it comesAND where it goesAND where it is?

Unique IDUnique ID for each dynamic instance of instruction for each dynamic instance of instruction

Build up legitimate {Build up legitimate {Unique IDUnique ID} empirically over time} empirically over timeBuild up legitimate {Build up legitimate {Unique IDUnique ID} empirically over time} empirically over time

For control flow:For control flow:

Page 9: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Program Attribute Triplet (PAT)

• PC + target + Branch History (EP)

• dynamic behavior signature…jr $r3…if (b0)

then {…b1=1..}else…

…If (b1) then ..

else ………jr $r6…

pc

PAT = (pc)||($r6)||(b1b0)Unique IDUnique IDUnique IDUnique ID

Page 10: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 10

Protection via PAT Validation: Example - SSH Communications SSH Server

void do_authentication(char *user, ...) { int auth = 0; ... while (!auth) { /* Get a packet from the client */ type = packet_read(); integer overflow switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: if (auth_password(user, password)) auth =1; case ... } if (auth) break; } /* Perform session preparation. */ do_authenticated(…);}

auth = 1

PAT = (pc|TPC|EP)Invalid!

Page 11: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Execution Path based ValidationExecution Path based Validation

0

2000

4000

6000

8000

10000

12000

14000

16000

apache ftpd sshd telnetd

PAT=IBP+8-bitEP

PAT=IBP+6-bitEP

PAT=IBP+4-bitEP

IBP=BPC|TPC

•PAT - How many? Training: Convergence in PATs

{PAT} = Expected Behavior Space

Page 12: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Validation Flow in Micro-ArchitectureValidation Flow in Micro-Architecture

Misprediction

pc=branch instruction address

its TPC

Branch prediction

next instructionswith predicted target….….

Branch verification

Fetch next instructionswith verified target

Micro-Architecture with Branch Prediction

Attack modified TPC or reached PC out of sequence

BTB

global BHSR

Page 13: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Validation Flow in Micro-ArchitectureValidation Flow in Micro-Architecture

Bit vector

hashhash

hashhash

Mispredictionor EP miss

pc=branch instruction address

its TPC

Branch predictionextend BTB withEP buffer

next instructionswith predicted target….….

Branch verification

Fetch next instructionswith validated target

“invalid” exception

01

Bloom Filter for PATBloom Filter for PAT

preceding EP

With Enhanced Branch Predictor for Validation

Ref. Yixin Shi and Gyungho Lee, “Augmenting Branch Predictor for Secure Program Execution”, Proc. the IEEE 37th Dependable Systems and Networks (DSN 2007), pp. 10 -19, June 2007

Page 14: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Validation UnitValidation Unit (outside to Branch Predictor) (outside to Branch Predictor)

time-mux’d with H3 hash functions

Hashing (H-3) logic 256K-bit vector Ouput Buffer Total delay

1.48 ns 1.062ns 0.99 ns 3.532ns

Q01

H3

256 K Bit Array

Output Buffer

H3 H3 H3

BPC||TPC||EP

Found?

Q00Q11

Q10Q21

Q20Q31

Q30 1.48ns

1.062ns

0.99ns

3.532ns X n

H3 hash

•estimated by a Verilog HDL implementation and a synthesis with TSMC’s 0.09um library

Page 15: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Performance SimulationPerformance Simulation

Parameter Value

BTB 512 set, 4-way set associative

RAS 8 entries

Branch miss penalty 7 cycles

Pipeline stage 9

Branch Predictor g-share, 12 bits history, 2048 entries

Fetch/dispatch/issue width 4

RUU size 64 entries

Load/Store Queue 32 entries

I-cache 64K, 2 way set-asso., 2 cc hit time, LRU

D-cache 64K, 4 way set-asso., 2 cc hit time, LRU

L2 cache Unified, 512KB, 4 way set-associative,

L2 access time 10 cycles

Function unit 4 Int ALUs, 1 Int MUL/DIV, 4 FP Adder, 1 FP MUL/DIV

Memory 100 cycles access time, 2 memory ports

4-issue processor

Page 16: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Performance ImpactPerformance Impact

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

2.25

2.50

bzip2 crafty eon gap gcc gzip mcf parser perl twolf vortex vpr AVG

No validation20-cycle validation delay25-cycle validation delay

ipc<2% performance overhead on average

(~3GHZ) 4-issue processor with EP length=8 and EP buffer =4; Bloom filter = 256kbit twice to have 8 hashes 7 ~ 8ns

Page 17: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 17

Performance Overhead

CFI In-line Instrumentation (applicable for static linking only)crafty 45%gcc 10% 21% on average

Program Shepherding with trace cache (w. monitoring overhead)crafty 4%(209%)gcc 625%(760%) 12%(32%) on average

includes some fp benchmarks

PAT validation (in HW) in SW (w. interrupt overhead)crafty 2.4% 17%(120%)gcc 0.3% 6%(24%)avg 0.9% 14%(29%)Ref. M. Abadi, et. al., “Control Flow Integrity: principles, implementations, and applications”, ACM CCS’05, 2005

Ref. V. Kriansky, et.al., “Secure Execution via program shepherding”, Proc. Usenix Security Symposium, 2002

Order of Magnitude less than Other Approaches:

Page 18: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 200918/22

SummarySummary

Issues:•Generating {PATs}

at various stages of program development/use; testing, compile-time flow analysis, training, etc.

•Managing {PATs}•How to incorporate into program code

• a part of object code; similar to PLT•What to do at invalid exception

• criteria for new legitimate flow or attack;control flow integrity policy and supporting tool

•How to secure {PATs} • attack focus moves to {PATs};

encryption and read-only

Behavior Monitoring-Analysis tool

System software changes

Security Policy: access control on control flow{PAT} behavior proof;Public Key based DRMwith support from TPM

Page 19: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009

Trusting Program BehaviorTrusting Program Behavior

19

{PATs}: Fine Grain Program Behavior Signature{PATs}: Fine Grain Program Behavior Signature• Server ApplicationsServer Applications• Industrial Control System Industrial Control System

• SCADASCADA• Embedded SWEmbedded SW

• Other Key SoftwareOther Key Software• OS KernelOS Kernel

Empirical Build-Up of Trust over timeEmpirical Build-Up of Trust over time

Signatures for Dynamic Data Flow

Page 20: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 20

Program Counter (PC) Encoding

Encode PC-bound data at definition and decode them before de-reference at PC loading Tight security

no gap between object to be protected and protection Little performance penalty

Just one machine instruction (XOR)

Checking only PC-bound No compatibility Issue

Nothing, code and memory layout, has changed No new HW or architecture change

encoding/decoding key –

stack (or frame) pointer or from TSC

Page 21: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 21

PC-encoding

• Encode PC-bound variable at its definition• Decode prior to upload PC-bound variable to PC• PC-bound variables:

Return Address Old Frame (Base) Pointer Function Pointer Function Pointer passed as parameter Longjmp buffer pointer Longjmp buffer pointer passed as parameter

Page 22: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 22

PC-encoding:

…Static int (*funcptr)(..);Static char buf[BUFSIZE];funcptr = goodfunc;/*Overflow funcptr*/strcncpy(buf, argv[1],…);…(void)(*funcptr)(..);…

Guess the address of “system()”.Add the address to the end of buf[BUFSIZE].execl(VULPROG, VULPROG, buf,…)

The program attacker specified is NOT executeddue to decoding failure

Code under attack – VULPROG.c

Encoding

Decoding

function pointer attack example

Page 23: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 23

PC-Encoding at Linking

Explicit PC-bound variable RET address in stack

PC-encoding at compiler longjmp() buffer pointer

PC-encoding of return address at setjmp() (static) function pointer

Identifying when to encode PC-bound data beyond explicit ones:e.g. Dynamic function calls

trap vector table, dynamic linked library, etc.

Page 24: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 24

PC-Encoding at (Dynamic) Linking

textdata

…call lib_f…

PLTf: jump *GOT[f]

push offset into stackjump PLT0

….….GOT[f]:….

lib_f: ………………

shared library f

1

2

34

5

Linker

Page 25: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 25

PC-Encoding at Linking

textdata

…call lib_f…

PLTf: jump *GOT[f]

push offset into stackjump PLT0

….….GOT[f]:….

lib_f: ………………

shared library f

1

2

34

5

Linker

encoding

decoding

Page 26: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 26

PC-Encoding at Linking

function pointers and function label

int (*funcptr)(..);

int (*funcptrcp)(..);

funcptr = goodfunc;

funcptrcp = funcptr;

(void)(*funcptr)(..);

(void)(*funcptrcp)(..);

Decoding atde-referencing offunction pointers at run-time

Encodingfunction label at linking

No need of pointer variable tracking

Page 27: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 27

PC-Encoding Issues Replay attack vulnerable

Guessing encoding key, i.e. $sp (or $fp)

Recompilation needsApplicable to open source only

Unusual function pointer: arithmetic expression

e.g. static int (*funcptr)(..);static int (*anotherfuncptr)(..);…unsigned int tmp; …funcptr = goodfunc;…anotherfuncptr = funcptr + tmp + 4;

Page 28: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 28

PC-Encoding Key

desirable:• Random – no lucky guess• No Repeated sequence – no replay• Simple – no overhead

NOTE:• Crypto Key - Too much overhead• Physical/natural random

Page 29: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 29

PC-Encoding Key

Time Stamp Counter• increases every cycle• non-sequential reads, i.e. no guarantee the

sequence of reads before and after machine instructions

Chi-Square test:

entropyreduce

%

Chi-Square

Value, %Arith. Mean

Monte Carlo

Pi error %

Serial correlation coefficient-C

TSC 7.99963

0 245.08, 52.75%

127.506224

0.8526 -0.106322

C rand 7.99961

0 267.51, 25%

127.4948 0.22 -0.000333

Page 30: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 30

PC-Encoding Key Storage

• Stack/frame pointer – register• Procedure specific – object header • Separate protected area

• TPM and its extended memory area

Page 31: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 31

PC-Encoding Efficacy

Protection fromcontrol flow altering attacks including buffer-overflow, print format string error.

Tool Attacksprevented

Attackmissed

Error

StackGuard 4 (20%) 16(80%) 0

Stack Shield Global & Range check

6 (30%) 14 (70%) 0

Libsafe 4 (20%) 16 (80%) 0

ProPolice 10 (50%) 9 (45%) 1 memory fault

PC-encoding 20 (100%) 0 0

e.g. Buffer Overflow; 20 different attack casesRef. J. Wilander and M. Kamkar, “A Comparison of publicly available tools for dynamic buffer overflow protection”, Proc. Network and Distributed System Security Symp., 2003

Page 32: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 32

PC-Encoding Efficacy

• No Protection from• Impossible Path (mimicry) attacks

due to Data Corruption or unchecked trap• Trojan horse

• Encoding – weak in crypto• Key – vulnerable

PC-encoding provides tamper resistance to most control flow altering attemptsbut no protection from control flow change by un-trusted software or compromised data induced impossible path

• Trade-off btw complexity and efficacy

ghlee0
$fp is user readable; so anybody can read, which make it a weak key. But two facts:it keeps changing and its value pertains to the current routine. So, the attacker to read it, it needs to gain control first; sort of chicken-and-egg problem.
Page 33: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 33

PC-Encoding: Performance effects

Program Counter Encoding with gcc and recompiled Linux

Connection Rate (con/sec) Avg. Latency (sec.) Avg. Throughput (Mbit/sec)

# ofclients

Without PC Encoding

With PC Encoding

Over-head

Without PC Encoding

With PC Encoding

Over-head

Without PC Encoding

With PC Encoding

Over-head

4 165.15 160.27 2.95 0.024 0.023 -4.17 23.14 22.48 2.85

8 168.18 159.18 5.35 0.046 0.049 6.52 24.06 21.66 9.98

12 184.00 173.87 5.51 0.064 0.067 4.69 25.82 23.91 7.4

16 184.33 184.4 -0.04 0.08 0.084 5.00 26.94 27.29 -1.3

20 192.62 191.53 0.57 0.10 0.091 -9.00 27.28 27.0 1.03

24 187.77 183.77 2.13 0.120 0.120 0 27.79 26.74 3.78

28 193.2 192.53 0.35 0.129 0.135 4.65 28.1 26.65 5.16

32 199.2 204.27 2.55 0.147 0.142 -3.4 27.78 26.73 3.78

Overhead (%) = 100* nic/IC, where nic is the instruction count with the extra instructions added for PC-Encoding, and IC is the instruction count without PC-encoding (all instruction counts are dynamic).

Apache Web Server Performance with and without PC-Encoding

Page 34: CRE652 Processor Architecture Making it Trustworthy Trustworthy Computing  and Branch Prediction

Korea UniversityG. Lee - 2009 34

Architecture Support for PC-Encoding

Instruction extension:Incorporate encoding/decoding into store/loadIncorporate decoding into indirect branche.g.

•key-register $key •pc-store $n, $m(c); Mem[($m) + c] := ($n) xor ($key);•pc-load $n, $m(c); $n := (Mem[($m) + c]) xor ($key);•decode-&-jmp $n; pc:= ($n) xor ($key)

int (*funcptr)(..);int (*funcptrcp)(..);funcptr = goodfunc;funcptrcp = funcptr;…(void)(*funcptr)(..);…(void)(*funcptrcp)(..);…

mov #goodfunc, funcptrmov funcptr, funcptrcp…mov funcptr, $r1dec-&-jmp $r1…mov funcptrcp, $r2dec-&-jmp $r2