How to Implement (ORAM in) MPC...Tree-based ORAM I Entries stored in tree of trivial ORAMs of xed size (buckets) I Access only buckets on path to speci c leaf per ORAM access I Store

How to Implement (ORAM in) MPC

Marcel Keller Peter Scholl Nigel Smart

University of Bristol

21 February 2014

Overview

1. How to Implement MPC

2. ORAM in MPC

Part I

How to Implement MPC

SPDZ: MPC with Preprocessing

Offline Phase

Correlated Randomness

Online Phase

Secret Inputs

Outputs

I Offline PhaseI Independent of secret inputsI Homomorphic encryption

with distributed decryptionI Highly parallelizable

I Online PhaseI No encryptionI Information-theoretic security

in random oracle model

How to Share a Secret with Authentication

Shares MAC shares MAC key

a1 γ(a)1 α1

a2 γ(a)2 α2

a3 γ(a)3 α3

a α · a α=

∑i ai =

∑i γ(a)i =

∑i αi

= a

How to Share a Secret with Authentication

Shares MAC shares MAC key

a1 + b1 γ(a)1 + γ(b)1 α1

a2 + b2 γ(a)2 + γ(b)2 α2

a3 + b3 γ(a)3 + γ(b)3 α3

a + b α · (a + b) α=

∑i ai + bi =

∑i γ(a)i + γ(b)i =

∑i αi

= a + b

Multiplication with Random Triple(Beaver Randomization)

x · y = (x + a− a) · (y + b − b)

= (x + a) · (y + b) − (y + b) · a − (x + a) · b + a · b

Masked and opened Random secret triple

Multiplication with Random Triple(Beaver Randomization)

x · y = (x + a− a) · (y + b − b)

= (x + a) · (y + b) − (y + b) · a − (x + a) · b + a · b

Masked and opened Random secret triple

Toolchain Overview

Python-like high-level code

Compiler

Bytecode

Virtual machine

I CompilerI PythonI Easier developmentI Circuit optimizationI Speed not an issueI Memory overhead

I Virtual machineI Online phaseI C++I FastI ∼ 150 instructions

Core Technique: I/O Parallelization

z = x · yu = z · w

1. Mask and open x and y

2. Compute z

3. Mask and open z and w

4. Compute u

z = x · yu = v · w

1. Mask and open x , y , v , and w

2. Compute z and u

Core Technique: I/O Parallelization

z = x · yu = z · w

1. Mask and open x and y

2. Compute z

3. Mask and open z and w

4. Compute u

z = x · yu = v · w

1. Mask and open x , y , v , and w

2. Compute z and u

Goal: Automatize I/O Parallelization

I Manual parallelization is tedious:

x10 = x2 · x3

x11 = x8 + x4

x12 = x10 · x1

x13 = x7 + x9

x14 = x7 · x1

x15 = x9 + x12

x16 = x13 · x14

x17 = x0 + x11

x18 = x11 · x15

x19 = x13 · x7

x20 = x4 + x6

x21 = x16 + x2

x22 = x0 + x12

x23 = x22 + x14

x24 = x11 + x19

x25 = x4 · x19

x26 = x23 · x9

x27 = x7 · x5

x28 = x13 + x21

x29 = x14 + x27

x30 = x19 · x1

x31 = x16 + x26

x32 = x0 · x10

x33 = x26 + x32

x34 = x7 + x3

x35 = x9 · x29

x36 = x33 + x22

x37 = x29 · x24

x38 = x16 + x23

x39 = x15 + x37

x40 = x12 · x39

x41 = x34 + x7

x42 = x32 + x5

x43 = x12 + x26

x44 = x43 · x38

x45 = x38 + x14

x46 = x44 · x27

x47 = x22 + x24

x48 = x39 · x38

x49 = x21 · x3

x50 = x28 + x16

x51 = x15 + x38

x52 = x50 · x46

x53 = x19 + x2

x54 = x20 · x13

x55 = x21 + x22

x56 = x19 · x6

x57 = x46 + x1

x58 = x38 · x55

x59 = x47 + x29

I SIMD not suitable for every application

Circuit as Directed Acyclic Graph

11

−

+ +

send send

+

recv

∗

triple

∗

y x

recv

∗

−

I Nodes: Instructions

I Edges: Output of instruction is input to another

I Two instructions for open: sending and receivingI Edge weight:

I One between sending and receivingI Zero otherwise

I Longest path with respect to weightsdetermines communication round

I Merge all communication per round⇒ Optimal number of rounds

Merge by Communication Round / Longest Path

CEF

BH

AD

GI

I Need to re-compute topological order(no edges pointing backwards)⇒ Standard algorithm linear in number of edges

I Heuristic to shorten lifetime of variables⇒ Reduced memory usage

Part II

Oblivious RAM in MPC

Goal: Oblivious Data Structures

I GenerallyI Secret pointersI Secret type of access if needed

I Oblivious array / dictionaryI Secret index / keyI Secret whether reading or writing

I Oblivious priority queueI Secret priority and valueI Secret whether decreasing priority or inserting

Oblivious RAM

in MPC

Client(CPU)

Server(Encrypted RAM)

x1

x2

x0

x0

x0

Oblivious RAM

in MPC

Client(CPU)

Server(Encrypted RAM)

x$

x$

x$

x$

x$

Oblivious RAM in MPC

ClientMPC circuit

ServerSecret Sharing

x$

x$

x$

x$

x$

Reveal

Simple Oblivious Array (Trivial ORAM)

Inner product with index vector

[0]...

[0][1][0]...

[0]

·

[a0]...

[ai−1][ai ]

[ai+1]...

[an−1]

= [ai ]

Index vector computation

[i ] 7→

[i ]?= 0...

[i ]?= i...

[i ]?= n − 1

Simple Oblivious Array

Index vector computation without equality testI Bit decomposition: [x ] 7→ ([x0], . . . , [xn−1]) such that x =

∑i xi2

i

I Demux: ([x0], . . . , [xn−1]) 7→ ([δ0], . . . , [δ2n−1]) such that δi = (i?= x)

Example: n = 4, i = 2

[2]

[0]

[1]

[1]

[0]

[0]

[1]

[0]

[0]

[1]

[0]

Tree-based ORAM

I Entries stored in tree of trivial ORAMs of fixed size (buckets)

I Access only buckets on path to specific leafper ORAM access

I Store path in smaller ORAM ⇒ recursion

I Data-independent eviction to distribute entries over tree

I Original idea by Shi et al. (Asiacrypt 2011)

I Path ORAM: improved eviction by Stefanov et al. (CCS 2013)

Oblivious Array Access TimingsTwo Parties, Online Phase

100 101 102 103 104 105 106

100

101

102

Size

Acc

ess

tim

e(m

s)Original tree-based ORAMPath ORAMSimple oblivious array

Oblivious Priority Queue

1 : 10

2 : 18

0 : 20 4 : 23

3 : 25

[00, λ, 0, 1, 01]

I Store value-priority pairs

I Values uniqueI Operations

I Remove value with minimal priorityI Insert new pairI Update value to lower priority

I Two oblivious arraysI Binary heap by priorityI Index to find entries in heap by value

Dijkstra’s Shortest Path Algorithm

s

a

c

b

1

3

2

1


s

a : 1

c : 1

b

1

3

2

1a : 1

c : 1


s

a : 1

c : 1

b : 3

1

3

2

1c : 1

b : 3


s

a : 1

c : 1

b : 2

1

3

2

1b : 2


s

a : 1

c : 1

b : 2

1

3

2

1

Dijkstra’s Algorithm in MPC

for each vertex doouter loop bodyfor each neighbor do

inner loop body

I Number of vertices and edges public

I Graph structure in two oblivious arrays(vertices and edges)

I Use oblivious priority queueI Dijkstra’s algorithms uses two nested loops

I One vertices, one of neighbors thereofI MPC would reveal the number of neighbors

for every vertexI Replace by loop over all edges in same orderI Flag set when starting with a new vertex

I Polylog overhead over classical algorithm

I Previous work: polynomial overhead

Dijkstra’s Algorithm in MPC

for each edge doouter loop body(dummy if same vertex)

inner loop body

I Number of vertices and edges public

I Graph structure in two oblivious arrays(vertices and edges)

I Use oblivious priority queueI Dijkstra’s algorithms uses two nested loops

I One vertices, one of neighbors thereofI MPC would reveal the number of neighbors

for every vertexI Replace by loop over all edges in same orderI Flag set when starting with a new vertex

I Polylog overhead over classical algorithm

I Previous work: polynomial overhead

Dijkstra’s Algorithm in MPCTimings for Cycle Graphs

100 101 102 103 104 10510−2

100

102

104

106

108

Size

Tot

alti

me

(s)

No ORAM

No ORAM (estimated)Simple array

Simple array (estimated)Tree-based array

Tree-based array (estimated)

Conclusion

I Practical MPC requires a dedicated compiler.(ACM CCS 2013 / ePrint 2013:143)

I Oblivious data structures in MPC are feasible and useful.(ePrint 2014:137)

Documents

How to Implement (ORAM in) MPC...Tree-based ORAM I Entries stored in tree of trivial ORAMs of xed size (buckets) I Access only buckets on path to speci c leaf per ORAM access I Store