27
Secure Cloud Database using Multiparty Computation

Secure Cloud Database using Multiparty Computation

Embed Size (px)

Citation preview

Secure Cloud Database using Multiparty Computation

Introduction

• Security in cloud environment– The service providers are typically third party– Goal: protect sensitive data

• Related paper in secure DB– NetDB2, IBM (Outsourced database)– Relational Cloud, CryptDB (MIT, CIDR 2011)– TrustedDB using secure hardware (VLDB 2011

demo, Radu Sion)

NetDB2

Tuple 1 xxx yyyTuple 2 aaa bbb

Tuple 1 !a4 a3gTuple 2 L%j m*KValue-level encryption

SELECT * WHERE value = `xxx’ SELECT * WHERE value = `!a4’

DB

Encrypted DB

Tuple 1 P2 P2

Tuple 2 P1 P1+Partition information

Partition:P1: < `m’; otherwise P2

SELECT * WHERE value < `xxx’ SELECT * WHERE value in [P1, P2]

Simple deterministic encryption

CryptDB

• Onion-encryption: multiple encryption done on 1 data

10

Original data

encryptE1(10) =A*65h

OPES: numeric comparisons

E2(A*65h) = BB647

Deterministic encryptionEquality can be done

Non-deterministic encryptionNo computation is feasible

E3(BB647) = %j@9G

If the user wants more computation power, decrypt to the desired level (one way!)

Summary

• Mainly on encryption technique– Provide limited computation capabilities

• Also note that security strength depends on the encryption function– For example, deterministic encryption may allow a

frequency analysis attack• `Male’ , `Female’ => `%k9)2’, `Ah475’• `Ah475’ x 21; `%k9)2’ x 5 in DB group

Secure multiparty computationBackground

Secret sharing (around 1980)

10

Secret

46 shares

Alice Bob

6+4 = 10

What is the secret value?

Alice’s share would be 5? 20? -3?

The secret is recovered only when the two parties exchange their shares

Secret sharing

• General case

s

Secret

s1 s2 … sn

The secret can be divided into n parties, for any n

s = g(s1, s2, …, sn)

Example:Sum of all shares (modular)Bitwise XOR of all sharesProduct, string concatenation, etc…

Security requirement:Given k < n shares, it is hard to recover s

Secure multiparty computation

Party 1

x1

Party 2

x2

Party n

xn

Objective:Every party obtains f(x1, x2, …, xn) but cannot observe any other information apart from its own data

r = f(x1, x2, …, xn)

r

r

r

Secure multiparty computation

• Any function f that can be expressed as a circuit can be computed securely in SMC– Limitation of the generic solution• Not efficient

• Many efficient protocols are developed to support certain operations

Building a secure database system

• To hide the data– Secret sharing

• To provide query processing functionality– Secure multiparty computation (SMC)

• Done?

Secure Cloud Database =Secret Sharing + SMC?

DB

A

B

C

Service Provider 1

Service Provider 2

Service Provider 3DB = A + B + C

SMC

Queries

Result

R

R

R

Difference

• Security requirement– SMC allows all party obtain the result

vs SDB allows only the user obtain the result• Computational model– SMC: a single function computation

vs SDB: follow-up queries

An adaption of SMC + secret sharing

• Example: SHAREMIND– Outsourced privacy preserving data mining

DB

A

B

C

Service Provider 1

Service Provider 2

Service Provider 3DB = A + B + C

An adaption of SMC + secret sharing

• Example: SHAREMIND– Key: computational result is also shared among

partiesA

B

C

Service Provider 1

Service Provider 2

Service Provider 3

QueryResult

A

B

C

A + B + C = Result

SHAREMIND Toolkit

• Provide several basic operations to build mining application– Arithmetic (add, multiply, divide), bitwise

operations (XOR), equality

SHAREMIND – Recursive processing

SMC operations

Workspaces in different parties

Result in shares

Intermediate results as part of data of future processing

Example:SELECT *WHERE A > AVERAGE(B)

Query execution:SMC1. Compute average(B)SMC2. Filter with result from SMC1

Research problem

Secure DB Model

DB

A

B

C

Service Provider 1

Service Provider 2

Service Provider 3DB = A + B + C

Owner/User

Before we proceed….Clarifying the security

• Negative result– Ideal security:• Querying workflow: user issues query => service

providers compute result and return to user• Knowledge gained by service providers: NONE. Not

even anything about query and result!

– A solution achieving ideal security is not more efficient than a non-outsourcing solution (not using cloud)

Knowledge gained by service provider

• Output space of a simple selection query: varies from no tuple to the entire database– Even larger space if we consider joins

• Example knowledge gain– If the output size is small, the service provider knows

it is not the case that the query selects entire table• To hide the above information, each returned

query result should be at least of size = entire table

Security in secure database

• Each service provider can observe– Query content• The tables that are related to the query• Number of conditions, types of conditions, attributes

that are related• But not other info about query

– Query answer• the set of shares of tuples in some query answer• But not other content

Example query

• SELECT NameFROM EmployerWHERE Salary > 6000

• Transformed query may look like to one service providerSELECT ATTRIBUTE_7FROM TABLE_AWHERE ATTRIBUTE_3 > XWITH SHARE_X = 1000

Answer

Tom

Kitty

Answer

T

Ki

Answer

o

t

Answer

m

tyThe other two parties may get SHARE_X = 2000 and SHARE_X = 3000

Building a secure database

• Baseline solution– Use the existing SHAREMIND Toolkit• Each value is divided into shares• Selection using equality operation or greater than

(detailed protocol not found ????)

One efficiency problem

• SMC is distributed computing– Number of rounds should be as small as possible!– Handshaking is expensive

• Naïve compiling of query– May result in series of SMC protocols– Example• SELECT A+B+C+D• 3 sum operations separately? 3X latency• Sum in 1 round!

Better solution?

• 1. Query execution plan optimization– We have different possible ways to translate the query into

SMC primitives, how to optimize in terms of number of rounds of communication? Even better is to have a cost model to consider everything

• 2. Shortcut operator– Example: (X+Y) mod 5, original two individual SMC

operators, but we can use a single SMC operator to replace this combination

• 3. Index– How to implement index efficiently and securely?

Solutions?

• No solid ideas now……