37
The Ascend Secure Processor Christopher Fletcher MIT 1

The Ascend Secure Processor - Rutgers University

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

The Ascend Secure Processor

Christopher Fletcher

MIT

1

Joint work with

• Srini Devadas, Marten van Dijk

• Ling Ren, Albert Kwon, Xiangyao Yu

• Elaine Shi & Emil Stefanov

• David Wentzlaff & Princeton Team (Mike, Tri, Jonathan, Alexey, Yaosheng)

• Omer Khan

2

Last talk: Intel SGX

Integrity

Data

AddressMEE

3

Timing

Ascend Processor

Memory controller

DataIntegrity

AddressTiming

4

This talk

Outline

•Motivation + Oblivious RAM (ORAM) primer

•ORAM in Hardware

•Demo

5

• SGX broken through page faults [Xu et al.’15]

• Shared library usage [Zhuang et al.’04]

• Search queries [Islam et al.’12]

Op Address TimeR 0 1W 1 10R 5 15R 6 16R 7 17

If (secret variable) {… scan memory …

}

Binary search

6

ORAM Controller Shuffled

Cache miss

Chip pins

Provably removes all access pattern leakage 7

On-chip

Oblivious RAM (ORAM) [Goldreich-Ostrovsky’96]

ORAM security definition

• Access is 3 tuple: (op = Read/write, address, data)

• Consider access sequences A and A’

A = [ (op1, address1, data1), (op2, address2, data2), … ]

A’ = [ (op1’, address1’, data1’), (op2’, address2’, data2’), … ]

• If |A| == |A’|

then ORAM(A) ≈ ORAM(A’)

8

Path ORAM [CCS’13]

ORAM Controller (on-chip)

“The ORAM”Chip pins

Read/writes

9

PosMap

Block assigned to random path.Block lives on that path.

Off-chip

Path 1 2 3 4

A, 2

A, 2

B, 3

B, 3

10

Off-chip

Path 1 2 3 4

A, 2A, 2

B, 3

B, 3

EncryptedNot Encrypted

Empty space = dummy encryptions

dummy

dummy

dummydummy dummy

Chip pins

11

PosMap

Path ORAM Access:Read+write the path the block is assigned to.

12

Off-chip

Path 1 2 3 4

A, 2B, 3

A, 2

Path ORAM Access:Read+write the path the block is assigned to.

4

B, 3A, 4

B, 3

dummy

dummy

dummydummy dummy

Stash

dummy

dummy

13

PosMap

Off-chip

Path 1 2 3 4

Typically, 4 slots per bucket “Z=4”

B, 3

dummy

dummy

dummydummy dummy

14

Z=1…for simplicity

Off-chip

A, 4

Too big!

A, 4B, 3

B, 3

15

Map recursion [Shi et al., 11]

Map’

PosMap ORAM PosMap ORAM 2

On-chip MapMap’

Smaller Small enough

On-chip

16

Block

A, 4B, 3

Map recursion [Shi et al., 11]

17

Data ORAMPosMapORAM 1

On-chip PosMap

PosMapORAM 2

Path ORAM summary

Blocks assigned to paths.

Access block: Read+write path.

Adversary sees: random paths.

18

B

1 2 3 4

ORAM in Hardware

19

Ascend in silicon

•Collaboration with David Wentzlaff’s group @ Princeton

20

First ORAM in silicon

MIT Team

21

First silicon fully functional@ 500 MHz & .9 V

Design (Verilog)Open Source

PosMap

Off-chip

Path 1 2 3 4

A, 4B, 2

A, 4

Stash

B, 2

Blocks must liveon assigned path or in stash.

Can overflow

22

PosMap

Off-chip

Path 1 2 3 4

A, 4B, 2

A, 4

Stash

B, 2

Bottleneck in prior work [Maas et al. ‘13]Causes 3 X avg. slowdown on SPEC.

Blocks must liveon assigned path or in stash.

23

Bit blasted stash eviction [FCCM’15]

def evict(Pathblock, Pathevict, occ):

t1 = Pathblock⊕ Pathevict

t2 = bit_reverse( ( (t1 Ʌ – t1) – 1) Ʌ occ )

ret bit_reverse( t2 Ʌ – t2 )

evict()circuit

Pathblock, Pathevict, occ

occ’≈ greatest common prefix

~

Can be pipelined

Bit vectors

24

Simple design, no performance bottleneck.

25

Integrity protection for ORAM

OR

AM

ShuffledCache miss

Overlay Hash tree26

One SHA-3 unit

27

ORAM logic SHA-3Test harness

FPGA Prototype

Cheap Integrity Scheme [ASPLOS’15]

•Per-block MAC

•Good: Hash 1 block, NOT path

•Bad: Need to store counters on-chip

• Replace entries in map with counters!28

{Block data, Hash(Block data, Block addr, counter)}

P’P

P’’ Block (A, D)Block A’’

Block A’’’

A’

PosMapORAM 2

PosMapORAM 1

Data ORAM

On-chip PosMap(root of trust)

Counter

Want: Path P = PosMap[A]

Algorithm:Given A: derive A’, A’’, A’’’P’ = PRF(A’ || PosMap[A’] = Counter)P’’ = PRF(A’’ || ORAMAccess(A’’, P’)) P = PRF(A’’’ || ORAMAccess(A’’’, P’’))Data = ORAMAccess(A, P)

Block A’’

Counter for A’’’ + 1

Counter for A’’’A’’’A’’’+1

MACK(Counter|| A’’ || data)

Integrity checks

Problem: |C| > |P|

More schemes to get |C| < |P| …

Cheap Integrity Scheme [ASPLOS’15]

Result:

Hashing decreased by 68 X, simple design

30

ORAM randomizes data layout.

Computer architecture assumes data locality.

ISCA’13

Subtree size = row size

31

# Row misses:

tree height

subtree height 32

~

~

tree height

Row misses:

60% overhead 13% overhead

33

Encryption Stash Recursion, PLB, Integrity

2 mm.5

mmTile 0 Tile 1 Tile 2 Tile 3 Tile 4

Tile 5 Tile 6 Tile 7 Tile 8 Tile 9

Tile 10

Tile 11

Tile 12

Tile 13

Tile 14

Tile 15

Tile 16

Tile 17

Tile 18

Tile 19

Tile 20

Tile 21

Tile 22

Tile 23

Tile 24

ORAM PLL

6 mm

6 m

m

34

460M transistors

AES rounds Stash evict()

Hash unit

1

11

tpcc ycsb astar bzip2 gcc gob h264 libq mcf omnet perl sjeng avg

Slo

wd

ow

n (

X)

2 DRAM channels, In-order core,2-level cache hierarchy, 1 MByte last-level cacheORAM = 1208 cycles / tree lookup

35

Slowdown vs. insecure

Demo

36

Backup

37