Efficient Private Approximation Protocols
Piotr IndykDavid Woodruff
Work in progress
Outline
1. Private approximation of L2 distance
2. Private near neighbor
3. Private approximate near neighbor
1. Private approximation of L2 distance
a {0,1}n b {0,1}n
• Want to compute some function F(a,b)• Security: protocol does not reveal anything except for the
value F(a,b)– Semi-honest: both parties follow protocol– Malicious: parties are adversarial
• Efficiency: want to exchange few bits
Secure communicationAlice Bob
Secure Function Evaluation (SFE)
• [Yao, GMW]: If F computed by circuit C, then F can be computed securely with O~(|C|) bits of communication
• [GMW] + … + [NN]: can assume parties semi-honest– Semi-honest protocol can be compiled to give
security against malicious parties• Problem: circuit size at least linear in n
* O~() hides factors poly(k, log n)
Secure and Efficient Function Evaluation
• Can we achieve sublinear communication?
• Ideally: secure computation with communication comparable to insecure case
• With sublinear communication, many interesting problems can be solved only approximately.
• What does it mean to have a private approximation?
Private Approximation
• [FIMNSW’01]: A protocol computing an approximation G(a,b) of F(a,b) is private, if each party can simulate its view of the protocol given the exact value F(a,b)
• Note: not sufficient to simulate non-private G(a,b) using SFE
• Example: – Define G(a,b):
• bin(G(a,b))i =bin((a,b))i if i>0• bin(G(a,b))0=a0
– G(a,b) is a 1 -approximation of (a,b), but not private
Concrete Pitfall: Dimension Reduction
• A basic problem: Hamming distance (a,b)• Approximate decision version: with prob. 1-,
– If (a,b)≤r, answer NO– If (a,b)≥r(1+) , answer YES
• [Kushilevitz-Ostrovsky-Rabani’98]:– Create mn binary matrix D, where
Pr[Dij=1]= 1/(2r) for m= O~(log 1/ / 2)– Exchange Da, Db (mod 2)– Answer YES if wt[D(a-b)]>r’, r’ function of r,
NOTE: This protocol was not designed to be private
Non-Privacy of KOR
• Let x = a – b. If, wt(x) = r, r log n ¼ m then can recover x from D, Dx in O(mn) time!
• Algorithm: for j=1…n, estimate
Pr[<di, x> =1| dij =1] = Pr[<di, x> =1 dij =1]/Pr[dij =1]
– If xj=1 then Pr[<di, x> =1|dij =1] is high– If xj=0 then Pr[<di x> =1|dij=1] is low
Approximating Hamming Distance
• [FIMNSW’01]: A private protocol with complexity O~(n1/2/ )– wt(x) small: compute wt(x) using O~(wt(x)) bits– wt(x) high: sample O~(n/wt(x)) xi, estimate wt(x)
• Our result: – Complexity: O~(1/2) bits– Works even for L2 norm, i.e., estimates ||x||2 for
a,b {1…M}n
* O~() hides factors poly(k, log n, log M, log 1/)
Crypto Tools• SFE of circuits [Yao’86]: O~(|circuit|) communication• Efficient SPIR or OT1
n: – Alice has A[1] … A[n] 2 {0,1}m , Bob has i 2 [n]– Goal: Bob privately learns A[i] and that’s it– Can be done using O~(m) communication [CMS99, NP99]
• Circuits with ROM [Naor, Nissim’01]:– Standard AND/OR/NOT gates– Lookup gates:
• In: i• Out: Mgate[i]
– Takes care of the security of computation:• begin secure … end secure
– Can just focus on privacy of the output
Communication at most O~(m|C|)
High-dimensional tools
• Random projection:– Take a random orthonormal nn matrix D,
that is ||Dx|| = ||x|| for all x.
– There exists c>0 s.t. for any xRn, i=1…n
Pr[ (Dx)i2 > ||Dx||2/n * k] < e-ck
Approximating ||a-b||2
• Recall:– Alice has a 2 [M]d, Bob has b 2 [M]d
– Goal: estimate ||x||2, x=a-b
Algorithm• Alice and Bob create random orthonormal matrix D such that, for
each i=1…n(Dx)i
2 < k||x||2/n • T=M2 n+1• Repeat
– {Assertion: ||x||2 ≤ T}– Invoke PRIVATESAMPLE to get L=O~(1/ 2) independent bits zi such that
Pr[zi=1]=||Dx||2/(Tk)
– T = T/2• Until Σi zi ≥ L/(4k)• Output E= Σi zi /L * 2Tk as an estimate of ||x||2
Correctness: – Unbiased estimator– High probablity from Chernoff bound
SECURE!
PRIVATESAMPLE
• P=Tk/n• Pick random t[n]• Retrieve (Da)t, (Db)t
• Compute (Dx)t = (Da)t - (Db)t
• Define v=[(Dx)t]2
• If v ≤ P then generate z s.t. Pr[z=1]=v/P Else output fail• Output z
Correct as long as (Dx)2i < Tk/n for each i=1…n
SECURE!
Generate independent bits zi with E[zi] = ||Dx||2/(Tk)
Algorithm, again• Alice and Bob create random* orthonormal** matrix D such that, for
each i=1…n(Dx)i
2 < ||x||2 /n * k• T=M2 n+1• Repeat
– {Assertion: ||x||2 ≤ T}– Invoke PRIVATESAMPLE to get L=O~(1/ 2) independent bits zi such that
Pr[zi=1]= ||Dx||2/Tk
{ Works as long as (Dx)2i < Tk/n for each i=1…n}
– T=T/2• Until Σi zi ≥ L/(4k)• Output E= Σi zi /L * 2Tk as an estimate of ||x||2
If Assertion not true, then Pr[zi=1]>1/(2k) E[Σi zi ] > L/(2k) >> L/(4k)
Simulation
SIMULATION• Repeat
– Choose L independent bits zi such that
Pr[zi=1]= ||x|| 2/Tk
– T=T/2
• Until Σi zi ≥ (L/k)
• Output E= Σi zi /L * 2Tk as an estimate of ||x||2
ALGORITHM• Repeat
– {Assertion: ||x||2 ≤ T}– Invoke PRIVATESAMPLE to get L
independent bits zi such that Pr[zi=1]= ||Dx|| 2/Tk
– T=T/2 • Until Σi zi ≥ (L/k)• Output E= Σi zi /L * 2Tk as an
estimate of ||x||2
Recall:• ||Dx||=||x||
Communication: O~(1/2)
2. Private near neighbor
Private Near Neighbor
q 2 [U]d P = p1, p2, …, pn 2 {1, 2, …, U}d = [U]d
Distance function: f(x,y)
Correctness: Bob learns mini f(q, pi)
Privacy: Alice learns nothing, Bob learns nothing else
Goal: Minimize communication
Alice Bob
Private Near Neighbor
f(a,b) = i fi(ai, bi) L2 Generalized
Hamming
Set
Difference
Previous [DA] O~(ndU) O~(nd) O~(ndU) O~(ndU)
Our Results O~(dU+n) O~(n+d) O~(d2 + n) O~(n+d)
[DA] needs 3rd party, we don’t
Approach: homomorphic encryption + secure function evaluation (SFE)
n points, dimension d, universe [U]
“Coordinate-wise” distance functions
q 2 [U]d P = p1, p2, …, pn 2 [U]d
Alice Bob
Bob: 1. For each coordinate, create a degree-(U-1) polynomial gj(x) = i ai,j xi such that gj(u) = fj(qj, u) for all u 2 [U] 2. Generate (SK, PK) for Paillier Encryption scheme. Send PK and EPK(ai, j) for all i,j
Alice: 1. For all i, E(j gj(pi,j)) = E(f(q, pi))
SFE: Inputs: Alice – E(f(q, pi)) Bob - SK 1. Bob gets mini DSK (E(f(q, pi)))
“Coordinate-wise” distance functions: f(a,b) = fi(ai, bi)
E(x), E(y) -> E(x + y)
E(x), c -> E(cx)
Generic distance functions
Security: 1. Replace SFE with oracle 2. Alice View indistinguishable from PK, E(0), E(0), …, E(0) – E semantically secure 3. Bob View just = output
Efficiency: 1. Send polynomials = O~(dU) 2. SFE = O~(n) (simple circuit)
Private Near Neighbor
“Pointwise”
distance
L2 Generalized
Hamming
Set
Difference
Previous [DA] O~(ndU) O~(nd) O~(ndU) O~(ndU)
Our Results O~(dU+n) O~(n+d) O~(d2 + n) O~(n+d)
n points, dimension d, universe [U]
(homomorphic tricks)
• Alice x1, …, xn 2 {0,1}d , Bob y1, …, yn 2 {0,1}d , Threshold t
• Bob gets all xi s.t. (xi, yj) < t for some j
• Communication: O~(n2 + nd2). Resolves open question of [FNP04]:
• [FNP04] achieve O~((d choose t)nt) May be superpolynomial in n
3. Private Approximate Near Neighbor
Private Near Neighbor
• Drawback: Protocols depend linearly on # points n
• Necessary? Not if algebraically homomorphic E exists
• Our approach: solve the approximate problem
Private c-Approximate Near Neighbor
Alice has P = {p1, …, pn} {0,1}d, Bob has q {0,1}d
Pr
Pcr
Notation: Pr = P B(q, r)
Correctness: Pr nonempty Bob learns some element of Pcr
Privacy: Bob’s view simulatable given q and Pcr
Private Approximate Near Neighbor
Definition Remarks:
Privacy: Don’t care what Bob gets as long as it follows from Pcr Simulator gets Pcr
Correctness: Don’t specify anything if Pr empty, but view still simulatable
Our results:
- O~(n1/2 + d)
- If Bob just wants some coordinate of an element of Pcr, then improve to O~(n1/2 + polylog(d))
Private Approximate Near Neighbor
Two approaches:
1. Dimensionality Reduction in Hamming Cube [KOR98]
2. Locality Sensitive Hashing [IM98]
This talk: protocol using #1
Dimensionality Reduction
• [KOR]: Let A be random m times d binary matrix, m = O(log d /2)
• Then there is a separator r’ s.t. with probability 1-1/n2 , for any p,q {0,1}d
1. (p,q) > cr (Ap, Aq) > r’ 2. (p,q) · r (Ap, Aq) < r’
Idea: Alice 1. Applies A to P dimension small 2. Enumerates all w {0,1}m, forms array: B[w]={p 2 P s.t. (Ap, w) < r’} 3. Use Oblivious ROM
Dimensionality reduction protocol
2. Agree on k matrices A1, …, Ak
3. Create array Bi based on Ai
4. Bi[p] contains any n1/2 points p’ 2 P s.t. (Aip’, p) < r’
5. Alice sets ROM to be the Bis
Pcr
1. Randomly sample O~(n1/2) points P1
2. If |Pcr| > n1/2, then P1 Å Pcr ;, w.h.p.
Protocol:
6. If P1 Å Pcr ;, SFE outputs a random element of P1. Otherwise, SFE uses [i B
i[Aiq] to output a random element of Pr
Dimensionality Reduction Analysis
Properties:
1. If |Pcr| > n1/2 , we output random element of Pcr ,w.h.p.
2. If |Pcr| < n1/2 , by properties of A, for any p Pr ,
PrA [8 p 2 Pr, (Ap, Aq) < r’ and 8 p 2 Pcr, (Ap, Aq) > r’] > 1- 1/n
3. Since bucket size is n1/2 and |Pcr| < n1/2, pBi[Aiq], Pr i Bi[Aiq]
Correctness:
If |Pcr| > n1/2 , output element from Pcr
Else output an element from Pr
Dimensionality Reduction Analysis
• Communication:
1. Sampling O~(n1/2) elements to ensure |Pcr| < n1/2
2. OT on O~(1) buckets of size n1/2
Thus, balanced steps 1 & 2 O~(dn1/2) total communication
• Simulatability: Output either a random element of Pcr , or a random
element of Pr
Dimensionality Reduction Analysis
• Dependence on d:
1. Homomorphic encryption: O~(d + n1/2)
1. Bob sends E(q1), …, E(qd)
2. Alice computes E((pi, q)) - Uses these for sampling and bucketing
2. Reduce to O~(polylog(d) + n1/2) if Bob just wants a coordinate of point in Pcr – use approximations
Conclusions
• Extensions: Can achieve O(n1/3 + d) communication if you allow the protocol to “leak” k bits of information
• Open problems:
1. Polylogarithmic Private Approximation of other distances
2. More efficient protocols for exact near neighbor.Tricks for PIR may be useful
3. Polylogarithmic c-approx NN protocol