Secure Database System. Introduction Demand of secure database systems – Cloud computing...

Preview:

Citation preview

Secure Database System

Introduction

• Demand of secure database systems– Cloud computing• Database-as-a-Service

• Current cloud database systems– Amazon RDS– Microsoft SQL Azure

• Advantages of cloud database systems– Economies of scale– Focus on own business

Security challenge

• Security concern– Data is put to third party service providers– The servers may be compromised

• To enforce security, encryption is necessary• Challenge– How to compute queries on encrypted data

Single method approach

• A standalone encryption system is developed to address a particular query pattern

• Example: – Order-preserving encryption scheme (OPES)

supports comparison (E(x) > E(y) iff x > y)– RSA (E(x)E(y) = E(xy))

Difficulty in building a generic query system

• Each method (e.g. OPES, RSA) has its own encryption mechanism. The encrypted values by each method are not interoperable– There is no trivial method to translate an

encrypted value by OPES to the corresponding encrypted value by RSA

– The following query cannot be supported:• SELECT * WHERE price * quantity > 1000

Supported by RSASupported by OPES

Cannot be done by OPES, RSA or composition of OPES and RSA

Building database system based on single method approach

• Example systems: NetDB2 (with encryption), CryptDB

• Limitations– Limited support on complex queries• Need to develop a new encryption method to support

each query pattern

– Lowered security guarantee in order to support more query patterns at the same time by one method

Our approach

• How to develop a query system that supports generic querying?

• Relational algebra– A few primitives are enough to build any queries

• Observation– Data interchangeability: the result of one primitive

operator can be used as input by other primitive operators

To enforce data interchangeability

• There is only one encrypted data format• All operations operate on this format

• A similar secure mechanism with data interchangeability – ShareMind– Using secure multiparty computation (SMC) with

secret sharing• Each data is split into shares and is distributed to multiple

parties. A distributed algorithm among all parties is executed and gives the result in shared form.

Illustration of SMC + secret sharing

Party 1

x: 3 y:8

Party 2

x: 2 y:4

Party 3

x: 5 y:-7

After some communications

Party 1

z: 13

Party 2

z: 6

Party 3

z: 6

Plain values:x = 10y = 5

Note:10 = 3 + 2 + 55 = 8 + 7 + (-7)

Plain values:z = (x – y)2

z = 25

SMC algorithms

Secret sharing

Generic operations in SMC

• Basic:– Addition– Multiplication

• Any operations that can be expressed as circuit can be computed– Addition on binary data can be regarded as XOR gate– Multiplication on binary data can be regarded as AND

gate– The two gates can form a universal gate which can

express any circuit

Using the idea of SMC + secret sharing on encrypted database?

• Multiple parties vs client-server

• Same storage size (= original database size) for all parties– Secure share generation reduces the storage cost at user

Data Owner / User Cloud server

User Cloud server User Cloud server

Development of new operators

• Why?

• Our goal:– To develop (i) a secure generator with (ii) its corresponding

operators

SMC Secure database system

Operations are done between multiple parties

Operations are done between user and service provider (SP)

No privileged party User is privileged. Can observe any plain data and should always have a low cost in any computation

Shares in secret sharing are materialized in each party

Shares at user are not materialized but can be generated

Attack model

• Security is defined w.r.t. to an attack model• The attack model in our case: chosen

ciphertext attack (CPA)– Formally: an attacker can observe the ciphertext

of any chosen plaintext. But it is still computationally hard to recover the key

• Some remarks on CPA– CPA is also used in RSA– OPES cannot guard against CPA

System Scope

• First address integer type data• Focus on operations between columns in the

same table– SELECT (PRICE * QUANTITY)

• Also support aggregate operation and limited join operation

DESCRIPTION OF OUR SOLUTION

Encryption procedure

• Secret sharing– Multiplicative secret sharing– Given a plain value v, the share at user vk, and the share

at SP v’• v = vkv’ mod n (n is a parameter in share generating function)

• The share at user is called the item key of the value v– The item key of each cell in the table is different– Each item key can be identified by the row ID and

column ID

Encryption illustration

A B

1 2 3

2 4 1

Plain data

A B

1 8 9

2 16 11

A B

1 9 12

2 9 16

n=35

Item keys at user Encrypted values at SP

Number of item keys = number of values in the table

Secure item key generator

• We extend RSA as our generator• Each column has a column key <m, x> (private values)• Each row has a row ID r (public value)• Item key: mxr mod n

– n: the system parameter generated in RSA; n is a composite number with two big prime factors• n is public

– m, x, r are non-zero random values < n• Note: n is at least 1024-bit value

A<4, 2>

B<1, 9>

1 8 9

2 16 11

Item keys at user

Actual storage

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16

n=35

Table schema, and column keys at user

Encrypted values at SPA B

1 8 9

2 16 11 Conceptual item keys

Note: User does not need to keep row IDs

Recovering values

• Example query: SELECT A

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16

n=35

Table schema, and column keys at user Encrypted values at SP

A

2

4

A

1 9

2 9

A<4, 2>

8

16 *Row IDs are passed to user too

Security of our item key generator

• Our generating function extends RSA function– Ours: mxr mod n (r, n are public, m, x are private)– RSA: xe mod n (e, n are public, x is private)

• Imagine m = 1, the functions are equivalent

PRIMITIVE OPERATORS

Overview

• Operations between columns– Multiplication (SELECT A * B)– Addition (SELECT A + B)– Will show that the above two are enough to support generic

function evaluation (that can be expressed as a circuit and inputs are values in the same row)

• Note: above operations assume both inputs are encrypted– We are interested in encrypt-plain column-column operations

(SELECT A * B; A is encrypted but B is not)– Special case: one of the operands is constant

• Column-constant operation (SELECT 10 * A)

Basic primitive operations

• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration

General Procedure

C

1 y

2 z

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SP

Each operation is an algorithm which may contain some communication between user and SP

C<m, x>

1

2

The result is always a new column

Security remark: The underlying item key generator is secure. In order to show the entire system is secure, it is adequate to show that the messages (if any) in the algorithm does not breach security w.r.t. CPA

Basic primitive operations

• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration

Column-column multiplication• C=AB (SELECT A*B AS C)• In some row r, the values of A, B are a, b

– a = aka’ (ak: item key at user, a’ encrypted value of a)

– b = bkb’ (bk: item key at user, b’ encrypted value of b)

• c=ab = (akbk) (a’b’) mod nA B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SP

C

3

4

n=35

Can be done by SP

Item keys are not materialized at user. User operates on column key level

Column-column multiplication

A B

… … …

r 4*2r mod 35 1*9r mod 35

… … …

A<4, 2>

B<1, 9>

Table schema, and column keys at user

Item key table

C

(4*1)*(2*9)r mod 35

C<4, 18>

Column-column multiplication - Result

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SPn=35

C<4, 18>

C

1 3

2 4

Result: C

1 2

2 1

C=AB

6

4

Answer

Security: No information about item keys of A and B is sent to SP

Basic primitive operations

• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration

Column-constant multiplication• C=kA (e.g., SELECT 5*A AS C)

– k is a constant• In some row r, the values of A is a

– a = aka’ (ak: item key at user, a’ encrypted value of a)

• c=5a = (5ak) (a’) mod nA B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SP

C

9

9

n=35

No action at SP

(4*2r mod 35) * 5= 20 * 2r mod 35

C<20, 2>

Column-constant multiplication - Result

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SPn=35

C<20, 2>

C

1 9

2 9

Result: C

1 5

2 10

C=5A

10

20

Answer

Security: No information about item keys of A is sent to SP

Basic primitive operations

• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration

Power• C=Ak (e.g., SELECT A^2 AS C)

– k is a constant• In some row r, the values of A is a

– a = aka’ (ak: item key at user, a’ encrypted value of a)

• c=a2 = (ak)2 (a’)2 mod nA B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SP

C

11

11

n=35

A2 at SP

(4*2r mod 35)2

= 16 * 4r mod 35

C<16, 4>

Power - Result

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SPn=35

C<16, 4>

C

1 11

2 11

Result: C

1 29

2 11

C=A2

4

16

Answer

Security: No information about item keys of A is sent to SP

Basic primitive operations

• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration

Key regeneration

• Objective: Set C = A, but C’s column key is different from A– C’s key appears to be random to SP

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SPn=35

C

2

4

C<??, ??>

C

??

??

Adding a constant column

A B K

1 2 3 4

2 4 1 4

… … … 4

… … … 4

Plain data

An artificial column K is added

The value on K is the same for all rows.The value is randomly determined by user at the beginning (CREATE TABLE). In the example, it is 4.

A<4, 2>

B<1, 9>

K<3, 3>α = 4

Table schema, and column keys at user

A B K

1 9 12 16

2 9 16 17

Encrypted values at SP

K is encrypted like other columns

Key regeneration

• Set C = (α-1)pAKp

– α-1 is modular multiplicative inverse of α w.r.t. n– The multiplicative inverse of 4 is 9 w.r.t. n = 35– p is randomly determined each time– The value of each row at C = value at A

Procedure

C1 = (α-1)pA C2 =Kp

C =C1C2

1 2

3

Column-constant multiplication

Power

Column-column multiplication

Note: SP has no action in step 1

Key regeneration

• C = A = (α-1)pAKp

– α = 4, p = 2, α-1 = 9

=> C = 92 A K2 = 11A K2

A B K

1 2 3 4

2 4 1 4

Plain data

A<4, 2>

B<1, 9>

K<3, 3>α = 4

Table schema, and column keys at user

A B K

1 9 12 16

2 9 16 17

Encrypted values at SP

C<11, 18>

C

29

11

AK2

C1 = 11A<9, 2>

C2 = K2

<9, 9>

C

2

4

C

1 23

2 29

Security: Only parameter sent to SP: p

Even if C’s key is sent to SP, SP cannot get K’s key. In the form of xe, e is known to SP, but x is not. Hard to compute x (like RSA). Thus, it is hard to get A’s key

Basic primitive operations

• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration

Column-column addition• C=A+B (SELECT A+B AS C)• In some row r, the values of A, B are a, b

– a = aka’ (ak: item key at user, a’ encrypted value of a)

– b = bkb’ (bk: item key at user, b’ encrypted value of b)

• c=a+b = (aka’) + (bkb’) mod n

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SP

We must combine ak and a’ to compute addition. But ak is not materialized (generated by A’s key)Send A’s key to SP in a protected way.

Column-column addition• C=A+B (SELECT A+B AS C)• In some row r, the values of A, B are a, b

– a = aka’ (ak: item key at user, a’ encrypted value of a)

– b = bkb’ (bk: item key at user, b’ encrypted value of b)

• c=a+b = (aka’) + (bkb’) mod nIn the end, c should be also encrypted like other values, i.e., c = ckc’ mod n

• ckc’= (aka’) + (bkb’) mod n

• c’ = (ck-1ak)a’ + (ck

-1bk)b’ mod n

ck can be abstracted by C’s column key. User generates C’s key randomly

Remaining problem is to help SP compute c’

User prepares these two partsItem keys are not there yet, but can be abstracted at column key level

C <mc, xc>; A <ma, xa>At row r,

ck = mcxcr mod n

ck-1 = mc

-1(xc-1)r mod n

ak = maxar mod n

ck-1ak = mc

-1ma (xc-1xa)r mod n

=> < mc-1ma, xc

-1xa>

Example

Hint for A

Hint for B

1 23 4

2 3 13

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SP

C<3, 27>

1First, generate C’ key

C-1

<12, 13>

2 C’s inverse

3 Hint for A, BHint A

<13, 26>Hint B

<12, 12>

4 SP materializes the hints for every row

C

10

25

5 SP obtains encrypted values of C

C

5

5

C

1 11

2 17

Obtain the correct answers if we look at plain values

Security

Hint for A

Hint for B

1 23 4

2 3 13

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16Table schema, and column keys at user Encrypted values at SP

C<3, 27>

Hint for A, BHint A

<13, 26>Hint B

<12, 12>

C

10

25

These 4 values are what SP can observe

Security

Hint for A, BHint A

<13, 26>Hint B

<12, 12>

These 4 values are what SP can observe

A<ma, xa>

C-1

<p, q>B

<mb, xb>C-1

<p, q>

4 equations:pma mod 35 = 13qxa mod 35 = 26…

C-1’s key is different in each addition, but A and B are not

In the long run, an attacker can gather enough information to breach security

Each can be imagined as a column in the table. Before sending the key of this column, we do a key regeneration

Security

• Recap: Even if the newly regenerated key is revealed to SP, SP cannot associate it to the old key.– Because there is an exponential in the formula

Hint for A, BHint A

<13, 26>Hint B

<12, 12>

After key regeneration

Hint A<3, 3>

Hint B<3, 12>

Hint A<?, ?>

Hint B<?, ?>

SP’s viewAnd so cannot know A’s or B’s or C’s key

Basic primitive operations

• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration

Trivial as we have a constant column

Basic primitive operations

• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration

Encrypt-plain operations

• C=AB (SELECT A*B AS C) but now B is not encrypted• Encryption will always incur some overheads, e.g.,

decryption is needed. • Encrypted is done only when the data is sensitive

A B

1 2 3

2 4 1

Plain data

A<4, 2>

B<1, 1>

A B

1 9 3

2 9 1Table schema, and column keys at user Encrypted values at SP

n=35

B is not encrypted is equivalent to B has a key of <1, 1>. All operations are the same

Encrypted columns and unencrypted columns are interoperable

Generic column-column operations

• With addition and multiplication, we can compute any function that can be expressed as a circuit

• All data is in binary form• It is sufficient to show that we can build a

universal gate (e.g., NAND gate) on top of binary data

Building NAND gate

• 1 – XY (multiplication and addition)

• Any circuit can be expressed• Note: Since we are using multiplicative sharing, we

have a poor protection on 0 values– Example: RSA has a poor protection on 0 and 1 values

X Y Result

0 0 1

0 1 1

1 0 1

1 1 0

Switching to other values

• X+Y-XY+1 (addition and multiplication and non-zero booleans)

X Y Result

2 2 1

2 1 2

1 2 2

1 1 2

Side note: Addition revisit

• Note: we are protecting the column keys, but an attacker may observe some information in our operations

• c’ = (ck-1ak)a’ + (ck

-1bk)b’ mod n

• c’ = ck-1a + ck

-1b mod n

• Dangerous in the binary case, as ck-1a = ck

-1b iff a = b– An attacker can identify whether the bits are the same

The same factor

A more secure method

• X+Y-XY+1 = (X-p)(q-Y)+ (1-q)X + (1-p)Y + (1+pq)• RHS =qX + pY – XY – pq + (1-q)X + (1-p)Y +

(1+pq)= X + Y –XY + 1

p, q are random numbers

All parts are of different values

Note on circuit construction

• Not efficient if we all use generic gate construction– Shortcut operations should be developed for

common jobs (part of future work, e.g., on string data)

• Still there is no comparison operation (branch)– We will discuss comparison in later slides

• The above generic gate construction is of theoretical interest only

Summary of our operations so far

• With addition and multiplication – Compute any arithmetic function (using addition,

multiplication, power) on integer columns relatively efficiently (significantly smaller overhead than baseline at user)• Baseline: user download the encrypted database,

decrypt it and compute query on its own

EXTENSION OPERATORS

Comparison

• Note: the objective is to let SP filter tuples– The result of comparison should be revealed to SP– Thus, data interchangeability cannot be achieved

by comparison• Side note: If the comparison result is required

to be hided from SP as well, the overhead at user is significantly increased– Such requirement will have a cost at user not less

than baseline

Comparison

• One operation is required only– X > 0• Every other comparison can be transformed to the

above format with 1 addition

• Equivalent operation– Check the sign bit of the data

Domain partitioning

• Modular arithmetic-3 = 32 (mod 35)-10 = 25 (mod 35)

• Domain

0 ~ 1024bit value

Positive if in this range

Negative if in this range

~ 1023 bits

Comparison

• We will let SP observe the comparison result, to achieve efficient selections

• Goal– If the real value is +ve, make it to +ve region– If the real value is –ve, make it to –ve region

0 ~ 1024bit value

Positive if in this range

Negative if in this range

Controlling the parameter

• a = aka’ => a’ = ak-1a

– Regenerate the key to make ak-1

a small constant

A<4, 2>

1 8

2 16

User

ID A

1 9

2 9SP

A

1 2

2 4

Real value

A<12, 1>

1 12

2 12

UserID A

1 6

2 12SP

n = 35n/2 = 17

As long as there is no overflow, the result is correct

A-1

<3, 1>

Overflow?

• Each region is around 21023

– Should be more than enough for usual domains, 4 bytes int => 232

• Security issue– Factoring attack• Each value has the same factor (e.g., 3 in the last

example)

– Order-preserving• A larger value will give a larger value at SP

Random column

• X > 0 f(R)X > 0 for f(R) > 0

• Example of f(R)– (R-p+1)2 : 160 bit value• p is random in every query

ID A B R

1 2 3 2

2 4 1 99Real value R is random in 280

(+ve domain, > 0)

Aggregate query

• Since they are usually the last operations, data interchangeability is not important

• COUNT– Same as selection: after SP filters the tuples, just

count qualifying ones• SUM– Next slide

SUM

• SELECT SUM(A)– Now addition operation is between rows– Using the same logic as column-column addition

A B

1 2 3

2 4 1

A<4, 2>

B<1, 9>

A B

1 9 12

2 9 16

Plain data

Table schema, and column keys at user Encrypted values at SP

r ? ?Generate the result item key (only the row ID)

s=ak1a’1 + ak2a’2

ass’ = ak1a’1 + ak2a’2

s’ = as-1ak1a’1 + as

-1ak2a’2

SUM

• SELECT SUM(A)• s’ = as

-1ak1a’1 + as-1ak2a’2

A<m, x>

ak1 = mxr1 mod nas = mxr mod nas

-1 = m-1(x-1)r mod nas

-1ak1 = xr1 (x-1)r mod n

SP needs x and (x-1)r to compute the above part

Part of column keyCannot be sent to SP directly

Performs a key regeneration (not exactly)

Key regeneration

• Keep C = pA for random p (not = A)– Note that an attacker may know A, but cannot

know C, no CPA attack on CA

<m, x>C = pA

<m’, x’>

As we discussed, key regeneration does not let the attacker trace x from knowing m’ and x’

Revealing this x’ is safe

The sum calculated is multiplied by pThe user just multiply p-1 to get the actual sum

Indexing

• Processing each tuple by linear scan is feasible but slow

• Indexing is needed• Note: index itself is a compromise of security– If certain tuples are filtered without any

processing, the attacker can obtain certain information about the data, e.g., a range about the data

An index option

• Make data become uncertainA B

1 2 3

2 4 1

A B

1 1-2 3-4

2 3-4 1-2

User SPDomain partitioning

Index on uncertain data

Index processing

• First process index, filter all disqualified tuples• Then, use cryptographic operation to compute

the actual answer

Integration with existing DBMS

DBMS

Applications

SPUser

Query

SDB Client Layer SDB ServerLayer

QueryExecution

Plan

SecureOperators

SecureOperators

MemorySQL

Result

Example

• SELECT C WHERE A * B + D > 20

A<…>

B<…>

C<…>

D<…>

Row ID A B C D

105 … … … …

278 … … … …Table schema, and column keys at user Encrypted values at SPn=35

A*B + D – 20 > 0

E<…>

Column-column multiplication:E = AB

Column-column additionF = E + D – 20

Comparison

F<…>

Query execution plan done (with corresponding parameters)Note: E, F can be thrown away, since they are not needed in the result

Example

• SELECT C WHERE A * B + D > 20

A<…>

B<…>

C<…>

D<…>

Row ID A B C D

105 … … … …

278 … … … …Table schema, and column keys at user Encrypted values at SPn=35

SP receives the query planRow ID Answers?

105 No

278 Yes

337 No

129 No

… …

Execute the plan and find the answers

Projection on C only

Row ID C

278 3

776 12

… …

Encrypted answer sent back to userRow IDs must be there

Example

• SELECT C WHERE A * B + D > 20

A<…>

B<…>

C<…>

D<…>

Table schema, and column keys at user n=35

Row ID C

278 3

776 12

… …

Row ID C

278 9

776 9

… …

User computes own item keys

Encrypted answers

C

27

3

Decrypt

Recommended