38
http://www.cs.ucla.edu/~rafail/ How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data) Rafail Ostrovsky William Skeith UCLA (patent (patent pending) pending)

How to compile searching software so that it is impossible to reverse-engineer

  • Upload
    zody

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data). Rafail Ostrovsky William Skeith UCLA. (patent pending). Airport 2 passenger list. Airport 3 passenger list. Airport 1 passenger list. - PowerPoint PPT Presentation

Citation preview

Page 1: How to compile searching software so that it is impossible to reverse-engineer

http://www.cs.ucla.edu/~rafail/

How to compile searching software so that it is impossible to

reverse-engineer.

(Private Keyword Search on Streaming Data)

Rafail Ostrovsky William Skeith UCLA

(patent pending)(patent pending)

Page 2: How to compile searching software so that it is impossible to reverse-engineer

MOTIVATION: Problem 1.MOTIVATION: Problem 1.

Each hour, we wish to find Each hour, we wish to find if any of hundreds of if any of hundreds of passenger lists has a passenger lists has a name from “Possible name from “Possible Terrorists” list and if so Terrorists” list and if so his/hers itinerary.his/hers itinerary.

““Possible Terrorists” list Possible Terrorists” list is classified and should is classified and should not be revealed to airports not be revealed to airports

Tantalizing question:Tantalizing question: can the can the airports help (and do all the airports help (and do all the search work) if they are not search work) if they are not allowed to get “possible allowed to get “possible terrorist” list?terrorist” list?

PROBLEM 1: Is it possible to design mobile software that can be transmitted to all airports (including potentially revealing this software to the adversary due to leaks) so that this software collects ONLY information needed and without revealing what it is collecting at each node?

Non-triviality requirement: must send back only needed information, not everything!

Airport 1 passenger list

Airport 2

passenger list

Airport 3

passenger list

Mobile code

(with state)

Mobile code

(with state)

Page 3: How to compile searching software so that it is impossible to reverse-engineer

MOTIVATION: Problem 2.MOTIVATION: Problem 2.

Looking for malicious insiders Looking for malicious insiders and/or terrorists and/or terrorists communication:communication: (I) First, we must identify some (I) First, we must identify some

“signature”“signature” criteria (rules) for criteria (rules) for suspicious behavior – typically, suspicious behavior – typically, this is done by analysts.this is done by analysts.

(II) (II) Second, we must detectSecond, we must detect whichwhich nodes/stations nodes/stations transmit these signatures.transmit these signatures.

Here, we want to tackle Here, we want to tackle part (II).part (II).

PROBLEM 2: Is it possible to design software that can capture all messages (and network locations) that include secret/classified set of “rules”? Key challenge: the software must not reveal secret “rules”.

Non-triviality requirement: the software must send back only locations and

messages that match given “rules”, not everything it sees.

Public

networks

Page 4: How to compile searching software so that it is impossible to reverse-engineer

What we wantWhat we want Search software, Search software, that has a set of that has a set of “rules” to choose “rules” to choose which documents which documents and/or packets to and/or packets to keep and which to keep and which to toss.toss.

Small storage Small storage

(that collects (that collects selected selected documents documents and/or packets)and/or packets)

Various data streams, consisting of flows of documents/packets

Various data streams, consisting of flows of documents/packets

Our “compiler” outputs straight line executable code (with program state) and a decryption key “D”.

STRAIGHT LINE EXECUTABLE CODE THAT DOES NOT REVEAL SEARCH “RULES”

Small Fixed-size Program State

(encrypted in a special way that our code modifies for each document processed)

documents/packets that match secret “rules”

Decrypt using D

Punch line: we can send executablecode publicly.

(it won’t reveal its secrets!)

Page 5: How to compile searching software so that it is impossible to reverse-engineer

Current PracticeCurrent Practice

Continuously transfer all data to Continuously transfer all data to a secure environment.a secure environment.

After data is transferred, filter in After data is transferred, filter in the classified environment, keep the classified environment, keep only small fraction of only small fraction of documents.documents.

Page 6: How to compile searching software so that it is impossible to reverse-engineer

D(1,3)D(1,2) D(1,1)

D(2,3)D(2,2) D(2,1)

D(3,3) D(3,2) D(3,1)

Classified EnvironmentClassified Environment

FilterFilter StorageStorage

DD(3,1)(3,1)DD(1,1)(1,1)DD(1,2)(1,2)DD(2,2)(2,2)DD(2,3)(2,3)DD(3,2)(3,2)DD(2,1)(2,1)DD(1,3)(1,3)DD(3,3)(3,3)

Filter rules are Filter rules are

written by an written by an

analyst and are analyst and are

classified!classified!

Current practice:

Amount of data that must be transferred to a classified environment is enormous!

Page 7: How to compile searching software so that it is impossible to reverse-engineer

Current PracticeCurrent Practice

Drawbacks:Drawbacks:CommunicationCommunicationProcessingProcessingCost and timelinessCost and timeliness

Page 8: How to compile searching software so that it is impossible to reverse-engineer

How to improve performance?How to improve performance?

Distribute work to many locations on Distribute work to many locations on a network, where you decide “on the a network, where you decide “on the fly” which data is usefulfly” which data is useful

Seemingly ideal solution, but…Seemingly ideal solution, but…Major problem:Major problem:

Not clear how to maintain security, Not clear how to maintain security, which is the focus of this technology.which is the focus of this technology.

Page 9: How to compile searching software so that it is impossible to reverse-engineer

Open network

… D(1,3) D(1,2)D(1,1)

… D(2,3)D(2,2)D(2,1)

… D(3,3)D(3,2)D(3,1)

Classified Classified EnvironmentEnvironmentFilterFilter

StorageStorage

EE (D(D(1,2)(1,2)))

EE (D(D(1,3)(1,3)))

FilterFilter

StorageStorage

EE (D(D(2,2)(2,2)))

FilterFilter

StorageStorage

DecryptDecrypt

StorageStorage

DD(1,2)(1,2)

DD(1,3)(1,3)

DD(2,2)(2,2)

Page 10: How to compile searching software so that it is impossible to reverse-engineer

Example Filters:Example Filters:Look for all documents that contain special Look for all documents that contain special

classified keywords (or string or data-item classified keywords (or string or data-item and/or do not contain some other data), and/or do not contain some other data), selected by an analyst.selected by an analyst.

PrivacyPrivacyMust Must hidehide what rules are used to create the what rules are used to create the

filterfilterOutput must be encryptedOutput must be encrypted

Page 11: How to compile searching software so that it is impossible to reverse-engineer

More generally:More generally:

We define the notion of We define the notion of Public Key Public Key Program ObfuscationProgram Obfuscation

Encrypted version of a programEncrypted version of a programPerforms same functionality as un-obfuscated Performs same functionality as un-obfuscated

program, but:program, but:Produces encrypted outputProduces encrypted output Impossible to reverse engineerImpossible to reverse engineerA little more formally:A little more formally:

Page 12: How to compile searching software so that it is impossible to reverse-engineer

Public Key Program ObfuscationPublic Key Program Obfuscation

Can compile any code into a “obfuscated code Can compile any code into a “obfuscated code with small storage”.with small storage”.

Think of the Compiler as a mapping:Think of the Compiler as a mapping: Source code Source code “Smart Public-Key Encryption” with “Smart Public-Key Encryption” with

initial Encrypted Storage + Decryption Key.initial Encrypted Storage + Decryption Key. Non-triviality: Sizes of complied program & Non-triviality: Sizes of complied program &

encrypted storage & encrypted output are not encrypted storage & encrypted output are not much bigger, compared to uncomplied code.much bigger, compared to uncomplied code.

Nothing about the program is revealed, given Nothing about the program is revealed, given compiled code + storage.compiled code + storage.

Yet, Someone who has the decryption key get Yet, Someone who has the decryption key get recover the “original” output.recover the “original” output.

Page 13: How to compile searching software so that it is impossible to reverse-engineer

PrivacyPrivacy

Page 14: How to compile searching software so that it is impossible to reverse-engineer

Related NotionsRelated Notions

PIR (Private Information Retrieval) [CGKS],[KO],PIR (Private Information Retrieval) [CGKS],[KO],[CMS]…[CMS]…

Keyword PIR [KO],[CGN],[FIPR]Keyword PIR [KO],[CGN],[FIPR] Cryptographic counters [KMO]Cryptographic counters [KMO] Program Obfuscation [BGIRSVY]…Program Obfuscation [BGIRSVY]…

Here output is identical to un-obfuscated program, but Here output is identical to un-obfuscated program, but in our case it is encrypted.in our case it is encrypted.

Public Key Program Obfuscation:Public Key Program Obfuscation: A more general notion than PIR, with lots of A more general notion than PIR, with lots of

applicationsapplications

Page 15: How to compile searching software so that it is impossible to reverse-engineer

What do we want?What do we want?

… D(1,3)D(1,2)D(1,1) FilterFilterStorageStorage

EE (D(D(1,2)(1,2)))

EE (D(D(1,3)(1,3)))

Conundrum: Complied Filter Code is not allowed to have ANY branches (i.e. any “if then else” executables). Only straight-line code is allowed!

2 requirements:

correctness: only matching documents are saved, nothing else.

efficiency: the decoding is proportional to the length of the buffer, not the size of the entire stream.

Page 16: How to compile searching software so that it is impossible to reverse-engineer

Simplifying Assumptions for this Simplifying Assumptions for this TalkTalk

All keywords come from some poly-size All keywords come from some poly-size dictionarydictionary

Truncate documents beyond a certain Truncate documents beyond a certain lengthlength

Page 17: How to compile searching software so that it is impossible to reverse-engineer

Sneak peak: the compiled codeSneak peak: the compiled code

Suppose we are looking for all documents Suppose we are looking for all documents that contain some secret word from that contain some secret word from Webster dictionary.Webster dictionary.

Here is how it looks to the adversary: Here is how it looks to the adversary: For For each document, execute each document, execute the samethe same code code as follows:as follows:

Page 18: How to compile searching software so that it is impossible to reverse-engineer

wwn-2n-2 EE(*)(*)

wwn-1n-1 EE(*)(*)

wwnn EE(*)(*)

ww11 EE(*)(*)

ww22 EE(*)(*)

ww33 EE(*)(*)

ww44 EE(*)(*)

ww55 EE(*)(*)

.

.

.

D

(*,*,*(*,*,*))

(*,*,*(*,*,*))

(*,*,*(*,*,*))

(*,*,*(*,*,*))

(*,*,*(*,*,*))

(*,*,*(*,*,*))

(*,*,*(*,*,*))

(*,*,*(*,*,*))

(*,*,*(*,*,*))

(*,*,*(*,*,*))

gDic

tiona

ry

Small Output Buffer

Lookup encryptions of all words appearing in the document and multiply them together. Take this value and apply a fixed formula to it to get value g.

Page 19: How to compile searching software so that it is impossible to reverse-engineer

How should a solution look?How should a solution look?

Page 20: How to compile searching software so that it is impossible to reverse-engineer

This is matching document #2

This is a Non-matching document

This is matching document #1

This is matching document #3

This is a Non-matching document

This is a Non-matching document

Page 21: How to compile searching software so that it is impossible to reverse-engineer

How do we accomplish this?How do we accomplish this?

Page 22: How to compile searching software so that it is impossible to reverse-engineer

Reminder: PKEReminder: PKE

Key-generation(1Key-generation(1kk) ) (PK, SK) (PK, SK)E(PK,m,r) E(PK,m,r) c cD(c, SK) D(c, SK) m m

We will use PKE with additional properties.We will use PKE with additional properties.

Page 23: How to compile searching software so that it is impossible to reverse-engineer

Several Solutions based on Several Solutions based on Homomorphic Public-Key EncryptionsHomomorphic Public-Key Encryptions

For this talk: For this talk: Paillier EncryptionPaillier Encryption

Properties:Properties:E(x) is probabilistic, in particular can encrypt a E(x) is probabilistic, in particular can encrypt a

single bit in single bit in manymany different ways, s.t. any different ways, s.t. any instances of E(0) and any instance of E(1) instances of E(0) and any instance of E(1) can not be distinguished.can not be distinguished.

Homomorphic: i.e., Homomorphic: i.e., EE(x)*(x)*EE(y) = (y) = EE(x+y)(x+y)

Page 24: How to compile searching software so that it is impossible to reverse-engineer

Using Paillier EncryptionUsing Paillier Encryption

EE(x)(x)EE(y) = (y) = EE(x+y)(x+y) Important to note:Important to note:

E(0)E(0)cc = E(0)*…*E(0) = = E(0)*…*E(0) =

= E(0+0+….+0) = E(0) = E(0+0+….+0) = E(0) E(1)E(1)cc = E(1)*…*E(1) = = E(1)*…*E(1) =

= E(1+1+…+1) = E(c)= E(1+1+…+1) = E(c) Assume we can somehow compute an encrypted value v, where Assume we can somehow compute an encrypted value v, where

we don’t know what v stands for, but v=E(0) for “un-interesting” we don’t know what v stands for, but v=E(0) for “un-interesting” documents and v=E(1) for “interesting” documents.documents and v=E(1) for “interesting” documents.

What’s What’s vvcc ? It is either E(0) or E(C) where we ? It is either E(0) or E(C) where we don’t know which one it is.don’t know which one it is.

Page 25: How to compile searching software so that it is impossible to reverse-engineer

wwn-2n-2 EE(1)(1)

wwn-1n-1 EE(0)(0)

wwnn EE(0)(0)

ww11 EE(0)(0)

ww22 EE(1)(1)

ww33 EE(0)(0)

ww44 EE(0)(0)

ww55 EE(1)(1)

.

.

.

D

EE(0)(0) EE(0)(0) EE(0)(0) EE(0)(0) EE(0)(0) EE(0)(0) EE(0)(0) EE(0)(0) EE(0)(0) EE(0)(0)

(g,gD)Dic

tiona

ry

Output Buffer

g E(0) * E(1) * E(0)

g = E(0) if there are no matching words

g = E(c) if there are c matching words

gD= E(0) if there are no matching words

gD= E(c*D) if there are c matching words

Thus: if we keep g=E(c) and gD=E(c*D), we can calculate D exactly.

Page 26: How to compile searching software so that it is impossible to reverse-engineer

This is matching document #1

This is matching document#3

This is matching document #2

Here’s another matching document

Collisions cause two problems:

1. Good documents are destroyed

2. Non-existent documents could be fabricated

Page 27: How to compile searching software so that it is impossible to reverse-engineer

We’ll make use of two We’ll make use of two combinatorial lemmas…combinatorial lemmas…

Page 28: How to compile searching software so that it is impossible to reverse-engineer
Page 29: How to compile searching software so that it is impossible to reverse-engineer

Combinatorial Lemma 1Combinatorial Lemma 1

Claim:Claim: color survival games succeeds color survival games succeeds with probability > 1-with probability > 1-negneg(())

Page 30: How to compile searching software so that it is impossible to reverse-engineer

How to detect collisions?How to detect collisions?

Idea: append a highly structured, (yet Idea: append a highly structured, (yet random) short combinatorial object to the random) short combinatorial object to the message with the property that if 2 or message with the property that if 2 or more of them “collide” the combinatorial more of them “collide” the combinatorial property is destroyed.property is destroyed.

can always detect collisions!can always detect collisions!

Page 31: How to compile searching software so that it is impossible to reverse-engineer

100|001|100|010|010|100|001|010|010100|001|100|010|010|100|001|010|010

010|001|010|001|100|001|100|001|010010|001|010|001|100|001|100|001|010

010|100|100|100|010|001|010|001|010010|100|100|100|010|001|010|001|010

100|100|010|100|100|010|111111|100|100||100|100|111111|010|010|010|010

==

Page 32: How to compile searching software so that it is impossible to reverse-engineer

Combinatorial Lemma 2Combinatorial Lemma 2

Claim:Claim: collisions are detected with collisions are detected with

probability > 1 - exp(-k/3)probability > 1 - exp(-k/3)

Page 33: How to compile searching software so that it is impossible to reverse-engineer

We do the same for all We do the same for all documents!documents!

Page 34: How to compile searching software so that it is impossible to reverse-engineer

wwn-2n-2 EE(*)(*)

wwn-1n-1 EE(*)(*)

wwnn EE(*)(*)

ww11 EE(*)(*)

ww22 EE(*)(*)

ww33 EE(*)(*)

ww44 EE(*)(*)

ww55 EE(*)(*)

.

.

.

D

(*,*,*)(*,*,*) (*,*,*)(*,*,*) (*,*,*)(*,*,*) (*,*,*)(*,*,*) (*,*,*)(*,*,*) (*,*,*)(*,*,*) (*,*,*)(*,*,*) (*,*,*)(*,*,*) (*,*,*)(*,*,*) (*,*,*)(*,*,*)

(g,gD,f(g))Dic

tiona

ry

Small Output Buffer

For every document in the stream do the same: Lookup encryptions of all words appearing in the document and multiply them together (= g).

multiply (g,gD,f(g))into

randomly chosen locations

Compute gD and f(g)

Page 35: How to compile searching software so that it is impossible to reverse-engineer

Overflow: how to always Overflow: how to always collect at least mcollect at least m itemsitems

((with arbitrary overflow of matching documents)with arbitrary overflow of matching documents)

Idea: create a logarithmic (in stream size) Idea: create a logarithmic (in stream size) number of original buffers.number of original buffers. First buffer is processed for every stream itemFirst buffer is processed for every stream item Second buffer takes every item with probability ½Second buffer takes every item with probability ½ Third buffer takes every item with (independent) probability ¼Third buffer takes every item with (independent) probability ¼ i’th buffer with probability 1/2i’th buffer with probability 1/2 ii

Key point: If number of documents >M, at least Key point: If number of documents >M, at least one buffer will get O(M) matching documents!one buffer will get O(M) matching documents!

Page 36: How to compile searching software so that it is impossible to reverse-engineer

Comparison of our work to Comparison of our work to [Bethencourt, Song, Waters 06][Bethencourt, Song, Waters 06]

[OS-05][OS-05]

Buffer size to store m Buffer size to store m items:items: O(m log m) O(m log m)

Efficiency:Efficiency: decoding decoding time is proportional to time is proportional to the buffer size.the buffer size.

[BSW-06][BSW-06]

Buffer size to store m Buffer size to store m items:items: O(m) O(m)

Efficiency:Efficiency: decoding decoding

time is proportional time is proportional to to the length of the the length of the entire stream.entire stream.

Page 37: How to compile searching software so that it is impossible to reverse-engineer

More from the paper that we don’t More from the paper that we don’t have time to discuss…have time to discuss…

Reducing program size below dictionary size Reducing program size below dictionary size (using (using – Hiding from [CMS]) – Hiding from [CMS])

Queries containing AND (using [BGN] Queries containing AND (using [BGN] machinery)machinery)

Eliminating negligible error (using perfect Eliminating negligible error (using perfect hashing)hashing)

Scheme based on arbitrary homomorphic Scheme based on arbitrary homomorphic encryptionencryption

Extending to words not from dictionary (with Extending to words not from dictionary (with small error prob.)small error prob.)

Page 38: How to compile searching software so that it is impossible to reverse-engineer

ConclusionsConclusions We introduced Private searching on streaming We introduced Private searching on streaming

datadata More generally: Public key program obfuscation More generally: Public key program obfuscation

-- more general than PIR, or cryptographic -- more general than PIR, or cryptographic counterscounters

Practical, efficient protocolsPractical, efficient protocols Eat your cake and have it too: ensure that only Eat your cake and have it too: ensure that only

“useful” documents are collected.“useful” documents are collected. Many possible extensions and lots of open Many possible extensions and lots of open

problemsproblems

THANK YOU! THANK YOU!