Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong...

Preview:

Citation preview

Accelerating Memory Decryption and Authentication With Frequent Value

Prediction

Weidong Shi Hsien-Hsin Sean LeeMotorola Labs Georgia Tech

2/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Security Frontier

Transistor Leaf Cell

Register/Unit Processor SoC

Embedded Secrets

Counterfeit Detection

Authentication/Secure Token

Isolation

Content Confidentiality

Circuit Camouflage/Obfuscation/Private Circuit

(Eurocrypt 02/06)

Secure MMU/Buses/Memory(CASES-04, ASPLOS-04,

PACT-06)

Secure Processor(e.g., IBM 06, MICRO-36/37/39,

ASPLOS 02/04, ISCA32/33)

Secure SoC

Chip De-liddingDie Analysis

Probing PCB

Side-channel

Clocking-Timing

Backdoor

3/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Secure Processor Architecture

Encrypted Memory

[MICRO-36,37, 39, ASPLOS-02,04, ISCA-32,33, IBM SecureBlue]

Trusted Secure Processor

Processor Core

Memory Enc/Dec,Integrity

Verification Engine

L2

4/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Agenda• Counter Mode Cipher

• “Direct Memory” Block Ciphers

• Frequent Value Speculation

• Performance Analysis

• Conclusion

5/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Counter Mode Encryption

Counter

Block Cipher(AES)

Plaintxt0

Ciphertxt0

XOR

Secret Key

One Time Pad

Nonce/IV

• Use Counter to generate a secret keystream that encrypts a memory block with a simple XOR

• Turn a block cipher into a stream cipher

6/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Counter Mode Encryption

Counter

Block Cipher(AES)

Plaintxt0

Ciphertxt0

XOR

Nonce Counter+1

Block Cipher(AES)

Plaintxt1

Ciphertxt1

XOR

Nonce Counter+N

Block Cipher(AES)

PlaintxtN

CiphertxtN

XOR

Nonce

• Use Counter to generate a secret keystream that encrypts a memory block with a simple XOR

• Turn a block cipher into a stream cipher

7/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Parallelization for Counter Mode Secure Arch

• OTP generation and Data fetch are done in parallel

• How to obtain Counter values– Counter Cache [MICRO36]– Prediction & Precomputation

[ISCA32]

Counter

Block Cipher(AES)

Plaintxt cache line X

Ciphertxt cache line X XORXOR

One Time Pad

Secure Processor

Memory

?

Nonce

8/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Block Cipher (ECB)

Block Cipher(AES)

Plaintxt0

Ciphertxt0

Secret Key

• “Direct” Memory Encryption• Electronic Code Book

9/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Block Cipher (ECB)

Block Cipher(AES)

Plaintxt0

Ciphertxt0

Secret Key

Block Cipher(AES)

PlaintxtN

CiphertxtN

Secret Key

• “Direct” Memory Encryption• Electronic Code Book

10/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Block Cipher (CBC)• Cipher-Block Chaining• A dependency with the neighboring ciphertext for

decrypting a target

Block Cipher(AES)

Plaintxt0

Ciphertxt0

Secret Key

XORInit. Vector

Block Cipher(AES)

Plaintxt1

Ciphertxt1

Secret Key

XOR

Block Cipher(AES)

Plaintxt2

Ciphertxt2

Secret Key

XOR

11/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Authenticated Encryption• The same cipher protects

– Confidentiality (tamper-resistance)

– Message Integrity (tamper-evidence)

• Offset Code Block (OCB)– One of the authenticated encryption methods– Non-malleable under chosen-ciphertxt -- which

counter mode is vulnerable to– 802.11i currently specifies AES-OCB as an

alternative to CCM for confidentiality and integrity

A B

C

A B

C

12/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Authenticated Encryption: OCB Encryption

Block Cipher(AES)

L pseudo random #

R

XOR

Secret Key

Nonce || mem addr

PlaintxtN

XOR

Block Cipher(AES)

Secret Key

aL+R

XORaL+R

CiphertxtN

13/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Authenticated Encryption: OCB Authentication

Plaintxt0 Plaintxt1 Plaintxt2 Plaintxt3

5L+R XOR

Block Cipher(AES)

Secret Key

Message Authentication Code(MAC)

Hash

14/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

OCB ─ Decryption and Integrity Verification

• Decryption can start after encrypted memory blocks are fetched.

• Decrypted blocks cannot be issued till its integrity is verified.

• MAC verification can take longer time than decryption.

E(B0)

Memory Fetch

E(B1) E(B2) E(B3)

Decryption

B0 B1 B2 B3

MAC Verification

Issue Issue Issue Issue

MAC

15/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Speculations in Secure Processor

Examples of

Prediction

Applicable Cipher Scenario

What can be Predicted

Why Predicable?

Counter Prediction[ISCA-32]

Counter Mode Encryption

Counter Values Coherence of Counter Values

Value Prediction[CF-07]

“Direct” Encryption mode

Encrypted Value Existence of Frequent Values

• Improve performance by taking advantage of – The nature of the data or,– Statistical property of the data.

• Do not compromise security as performed only

within the secure boundary.

16/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Analysis of Frequent ValuesFrequent value - 256K L2

0

10

20

30

40

50

60

70

80

90

Applu

apsi

art

bzip2

crafty

facerec

gadel

gcc

gzip

mcf

mesa

mgrid

parser

six

swim

twolf

vortex

vpr

wupw

ise

average

8 16 32

Frequent value - 1M L2

0

10

20

30

40

50

60

70

80

Applu

Apsi

Art

bzip2

crafty

facerec

gadel

gcc

gzip

mcf

mesa

mgrid

parser

six

swim

twolf

vortex

vpr

wupw

ise

average

• 40 to 60% encrypted memory data are frequent values

• 8 to 32 frequent values account for over 40% encrypted data

17/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Speculation Using Idle Pipelined Crypto Engine• Generate “encrypted” frequent values using

otherwise idle crypto engines

Encryption Pipeline

Memory Pipeline

Retrieving the Encrypted Cache Line Ek(X)

Frequent value Ek(A)

T1

Ek(B)

T2

Ek(C)

T3

Ek(D)

T4

Ek(E)

T5

Ek(F)

T6

Ek(G)

T7

=?

• Integrity verification can also be speculated. • Generate MAC for speculated frequent values

Ek(E) matches

Time Line

18/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Value Prediction Based Decryption

WBBuffer

Pipelined Encryption Engine

Pipelined Encryption Engine

Pipelined Decryption Engine

Scheduler

Cache

Returned Encrypted Data

Frequent Value Table

CAM

Secure processor

XYZW

E(X)E(Y)E(Z)E(W)

19/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Handle Large Block Size

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

Freq Value

Non-Freq Value

128-bit Cipher

128-bit Cipher

128-bit Cipher

128-bit Cipher

Four 64-bit frequent value blocks

• Under 128 bit cipher, is predictable. is not.

64-bitblock

64-bitblock

64-bitblock

64-bitblock

Predictable Blocks of Freq Value Blocks (%), L2=256KB

0

10

20

30

40

50

60

applu

apsi

art

bzip2

crafty

facerec

galgel

gap

gcc

gzip

mcf

mesa

mgrid

parser

sixtrack

swim

twolf

vortex

vpr

wupw

ise

average

20/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Block Re-ordering64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

Predictable Freq Value Pair

Predictable Freq Value Pair

Predictable Blocks of Frequent Values Blocks (%) L2=256KB

0

20

40

60

80

100

applu

apsi

art

bzip2

crafty

facerec

galgel

gap

gcc

gzip

mcf

mesa

mgrid

parser

sixtrack

swim

twolf

vortex

vpr

wupw

ise

average

without reorder with_reorder

64-bitblock

64-bitblock

Freq Value

Non-Freq Value

21/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

1 0 0 0 0 1 0 10 1 0 1 0 1 1 0

0 1 0 0 0 0 1 1…

1 1 0 1 0 1 0 10 0 0 1 0 1 1 0

0 1 0 0 0 0 1 0

Frequent Value Map

• Speculation targeted only for frequent value blocks

• Overhead– 1 frequent value map bit

per encrypted block (128 bits)

– 8 bits per cache line (64B cache line size)

– 512 bits per page– Total 64K bits for 128-enry

TLB

• Can be shared for many other purposes – frequent value based cache

compression– power saving cache

Cache line FV bit map

Page

Pages in TLB

Frequent Value Map for All TLB Pages

0 1 0 1 0 1 0 11 0 0 1 0 1 1 0

0 0 0 1 0 1 1 0

22/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

MAC Speculation Speculated

Encrypted Block

Memory FetchMACSpeculation

Comparison

SpeculatedEncrypted Block

MACSpeculation

SpeculatedEncrypted Block

MACSpeculation

SpeculatedEncrypted Block

MACSpeculation

Comparison Comparison Comparison

• Compute MAC for speculated frequent value blocks

• Compare

• fetched encrypted block with speculated encrypted block

• fetched MAC with speculated MAC

• If both match, issue the fetched instruction/data

23/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Experimental Setup

Parameters Value

L1 I/D Cache DM, 16KB

L2 Cache 4way, unified, 256KB and 1MB

Memory Bus 8B wide, 1:4, 1:5, 1:6 Ratio

CPU Clock 1GHz

L1 Latency 1 cycle

L2 Latency 8 cycles (1MB), 4 cycles (256KB)

TDES Decryption Latency 96ns

AES Decryption Latency 65ns

Block Size 64-bit (Triple DES), 128-bit (AES)

24/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Results – Value PredictionIPC Speedup L2=256KB

1

1.051.1

1.151.2

1.251.3

1.35

applu

apsi

art

bzip2

crafty

facerec

galgel

gap

gcc

gzip

mcf

mesa

mgrid

parser

sixtrack

swim

twolf

vortex

vpr

wupw

ise

average

IPC Speedup L2=1MB

1

1.05

1.1

1.151.2

1.25

1.3

1.35

applu

apsi

art

bzip2

crafty

facerec

galgel

gap

gcc

gzip

mcf

mesa

mgrid

parser

sixtrack

swim

twolf

vortex

vpr

wupw

ise

average

25/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Performance ― Number of Frequent Values

• 64-bit block size

IPC Speedup L2=256KB

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

ap

plu

ap

si

art

bzip

2

crafty

face

rec

ga

lge

l

ga

p

gcc

gzip

mcf

me

sa

mg

rid

pa

rser

sixtrack

swim

two

lf

vorte

x

vpr

wu

pw

ise

ave

rag

e

8_freq_values 16_freq_values 32_freq_values

26/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Sensitivity to Memory SpeedIPC Speedup, L2=256KB, Ratio=1:4

11.05

1.11.15

1.21.25

1.31.35

1.4

ap

plu

ap

si

art

bzip

2

crafty

face

rec

ga

lge

l

ga

p

gcc

gzip

mcf

me

sa

mg

rid

pa

rser

sixtrack

swim

two

lf

vorte

x

vpr

wu

pw

ise

ave

rag

e

IPC Speedup, L2=256KB, Ratio=1:6

11.05

1.11.15

1.21.25

1.31.35

1.4

ap

plu

ap

si

art

bzip

2

crafty

face

rec

ga

lge

l

ga

p

gcc

gzip

mcf

me

sa

mg

rid

pa

rser

sixtrack

swim

two

lf

vorte

x

vpr

wu

pw

ise

ave

rag

e

27/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Conclusion

• Frequent value speculation can hide both• Decryption latency• Integrity verification latency• For direct memory block ciphers

• Encrypted values demonstrate predictability.

• We propose block re-ordering to consolidate the predictability

• Memory-bound benchmark programs show 10%- 30% performance improvement.

Thank You!

Georgia TechECE MARS Labshttp://arch.ece.gatech.edu

Recommended