10
Hadamard Response: Estimating Distributions Privately, Eciently, and with Little Communication Jayadev Acharya Ziteng Sun Huanyu Zhang Cornell University Cornell University Cornell University Abstract We study the problem of estimating k- ary distributions under Á-local dierential privacy. n samples are distributed across users who send privatized versions of their sample to a central server. All previously known sample optimal algorithms require linear (in k) communication from each user in the high privacy regime (Á = O(1)), and run in time that grows as n · k, which is prohibitive for a large domain size k. We propose Hadamard Response (HR),a local privatization scheme that requires no shared randomness and is symmetric with respect to the users. HR has order optimal sample complexity for all Á, a communica- tion of at most log k + 2 bits per user, and nearly linear running time of ˜ O(n + k). HR is based on Hadamard matrices, and is simple to implement. The statistical performance relies on the coding theo- retic aspects of Hadamard matrices, ie, the large Hamming distance between the rows. Computational eciency is achieved by us- ing the Fast Walsh-Hadamard transform. We compare our approach with Ran- domized Response (RR), RAPPOR, and subset-selection mechanisms (SS), both theoretically, and experimentally. For k = 10000, our algorithm runs about 100x faster than SS, and RAPPOR. 1 Introduction Estimating the underlying probability distribution from data samples is a quintessential statistical Proceedings of the 22 nd International Conference on Ar- tificial Intelligence and Statistics (AISTATS) 2019, Naha, Okinawa, Japan. PMLR: Volume 89. Copyright 2019 by the author(s). The authors are listed in alphabetical order. problem. Given samples from an unknown distri- bution p, the goal is to obtain an estimate ˆ p of p. The problem has a rich, and vast literature (see e.g. [6, 39, 17, 18], and many others), with the pri- mary goal of statistical eciency, namely minimizing the sample complexity for estimation, which is the first resource we consider. 1. Utility. What is the sample complexity of estimation? In many applications, data contains sensitive infor- mation, and preserving the privacy of individuals is paramount. Without proper precautions, sen- sitive information can be inferred as evidenced by well publicized data leaks over the past decade, in- cluding de-anonymization of public health records in Massachusetts [41], de-anonymization of Netflix users [37] and de-anonymization of individuals par- ticipating in the genome wide association study [29]. Private data release and computation on data has been studied in several fields, including statistics, machine learning, database theory, algorithm de- sign, and cryptography (See e.g., [45, 14, 22, 46, 23, 42, 13]). Dierential Privacy (DP) [24] has emerged as one of the most popular notions of pri- vacy (see [24, 46, 26, 9, 36, 32], references therein, and the recent book [25]). DP has been adopted by several companies including Google, and Ap- ple [21, 27]. A popular privacy definition is local dierential pri- vacy (LDP) [45, 23], where users do not trust the data collector, and privatize their data before releas- ing. We study distribution estimation under LDP. Distribution estimation with privacy is an impor- tant problem. For example, understanding the drug usage habits of the entire population (the distribu- tion) is crucial for policy design. Understanding the internet trac distribution is important for ad- placement. In both these applications, preserving individual privacy is essential.

Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Hadamard Response: Estimating Distributions Privately,E�ciently, and with Little Communication

Jayadev Acharya Ziteng Sun Huanyu ZhangCornell University Cornell University Cornell University

AbstractWe study the problem of estimating k-ary distributions under Á-local di�erentialprivacy. n samples are distributed acrossusers who send privatized versions of theirsample to a central server. All previouslyknown sample optimal algorithms requirelinear (in k) communication from each userin the high privacy regime (Á = O(1)), andrun in time that grows as n · k, which isprohibitive for a large domain size k.We propose Hadamard Response (HR), alocal privatization scheme that requires noshared randomness and is symmetric withrespect to the users. HR has order optimalsample complexity for all Á, a communica-tion of at most log k + 2 bits per user, andnearly linear running time of O(n + k).HR is based on Hadamard matrices, andis simple to implement. The statisticalperformance relies on the coding theo-retic aspects of Hadamard matrices, ie, thelarge Hamming distance between the rows.Computational e�ciency is achieved by us-ing the Fast Walsh-Hadamard transform.We compare our approach with Ran-domized Response (RR), RAPPOR, andsubset-selection mechanisms (SS), boththeoretically, and experimentally. For k =10000, our algorithm runs about 100xfaster than SS, and RAPPOR.

1 IntroductionEstimating the underlying probability distributionfrom data samples is a quintessential statisticalProceedings of the 22

ndInternational Conference on Ar-

tificial Intelligence and Statistics (AISTATS) 2019, Naha,

Okinawa, Japan. PMLR: Volume 89. Copyright 2019 by

the author(s).

The authors are listed in alphabetical order.

problem. Given samples from an unknown distri-bution p, the goal is to obtain an estimate p of p.The problem has a rich, and vast literature (seee.g. [6, 39, 17, 18], and many others), with the pri-mary goal of statistical e�ciency, namely minimizingthe sample complexity for estimation, which is thefirst resource we consider.

1. Utility. What is the sample complexity ofestimation?

In many applications, data contains sensitive infor-mation, and preserving the privacy of individualsis paramount. Without proper precautions, sen-sitive information can be inferred as evidenced bywell publicized data leaks over the past decade, in-cluding de-anonymization of public health recordsin Massachusetts [41], de-anonymization of Netflixusers [37] and de-anonymization of individuals par-ticipating in the genome wide association study [29].

Private data release and computation on data hasbeen studied in several fields, including statistics,machine learning, database theory, algorithm de-sign, and cryptography (See e.g., [45, 14, 22, 46,23, 42, 13]). Di�erential Privacy (DP) [24] hasemerged as one of the most popular notions of pri-vacy (see [24, 46, 26, 9, 36, 32], references therein,and the recent book [25]). DP has been adoptedby several companies including Google, and Ap-ple [21, 27].

A popular privacy definition is local di�erential pri-vacy (LDP) [45, 23], where users do not trust thedata collector, and privatize their data before releas-ing. We study distribution estimation under LDP.Distribution estimation with privacy is an impor-tant problem. For example, understanding the drugusage habits of the entire population (the distribu-tion) is crucial for policy design. Understandingthe internet tra�c distribution is important for ad-placement. In both these applications, preservingindividual privacy is essential.

Page 2: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Hadamard Response: Estimating Distributions Privately, E�ciently, and with Little Communication

2. Privacy. How much information about a user isleaked by the scheme?

There are inherent trade-o�s between utility and pri-vacy. Sample privacy trade-o�s have been recentlystudied for various problems, including distributionestimation [23, 31, 47, 43, 20, 35].

However, two crucial resources have not been con-sidered in private distribution estimation, computa-tion, and communication. In applications where theunderlying dimensionality is high, or the number ofsamples is large, it is imperative to have computa-tionally e�cient algorithms. Internet companies col-lect information about user’s browsing history overa large number of users and websites, and large de-partmental stores collect purchase statistics over alarge number of users and products. In these prob-lems, algorithms with high computational overheadare prohibitive, even if they have optimal samplecomplexity. There has been recent interest in com-putationally e�cient distribution estimation in thenon-private setting (see e.g., [15, 1, 33, 12, 16, 40, 3]).

3. Computational Complexity. What is therunning time of the algorithm?

In distributed applications, communication (bothwith and without privacy) is critical. For example,a large fraction of internet tra�c is on hand-held de-vices with limited uplink capacity due to limited bat-tery power, limited uplink bandwidth, or expensivedata rates. Similarly, in large scale distributed ma-chine learning problems, communication from pro-cessors to the server is the bottleneck since localcomputations are fast. Communication limited dis-tributed distribution estimation has been studied inthe non-private setting(e.g., [48, 4, 19, 2, 28]).

In the context of private estimation tasks, the prob-lem of finding the heavy hitters, and learning prop-erties under local di�erential privacy under the as-sumption of public randomness, where the server cansend communication to the clients to reduce commu-nication from user end has received much attentionrecently [8, 7, 5, 30, 44, 11]. However, these algo-rithms require shared randomness, as well as asym-metric schemes, where each user can use a di�erentprivatization mechanism. [7] uses a Hadamard trans-form, but they use it to form orthogonal basis andreduce storage, which is di�erent from us.

4. Communication Complexity. How many bitsare communicated?

In this work, we consider discrete distribution es-timation under the aforementioned four resources.

We provide the first algorithm that is simultaneouslysample order optimal for any privacy value, has loga-rithmic communication per symbol, and runs in lin-ear time in the input and output size.1.1 Organization.In Section 2 we describe the problem set-up. In Sec-tion 2.1, 2.2 and 2.3, we describe prior privatizationschemes, and our results. In Section 3, we providea family of Á-LDP privatization schemes. Based onthese, in Section 4, we specialize and design schemesthat are optimal in the most interesting regime ofhigh privacy. Finally in Section 5, we will brieflydescribe how to extend these schemes to general Á.The full detail will be provided in Section A in thesupplementary file.

2 PreliminariesLocal Di�erential Privacy (LDP). Suppose x isa private information that takes values in a set Xwith k elements (wlog let X = [k]:={0, 1, . . . , k≠1}).A privatization mechanism is a randomized mappingQ from [k] to an output set Z (which can be arbi-trary), that maps x œ X to z œ Z with probabilityQ(z|x). The output z of this mapping, called theprivatized sample, is then released. Q is Á-locallydi�erentially private (Á-LDP) [23] if for all x, xÕ œ X ,

supzœZ

Q(z|x)Q(z|xÕ) Æ eÁ. (1)

Small values of Á(Á = O(1)) are more stringent andare the high privacy regime, and large values of Áare the low privacy regime. When X and Z areboth discrete, the mechanism Q is described by astochastic matrix of size |X | ◊ |Z| whose (x, z)thentry is Q(z|x). Q is Á-LDP if the ratio of any twoentries in a column of this matrix is at most eÁ.

Randomness and Symmetry. A scheme that re-quires shared/public randomness requires the gen-eration of shared randomness at the server, whichneeds to be communicated to the users. Symmet-ric schemes are those where each user uses the sameprivatization scheme [38]. In this paper, we considerschemes that are symmetric and require no sharedrandomness. Other such schemes include RAPPOR,Randomized Response, and subset selection meth-ods, described later. We note that the literatureon heavy hitter estimation has mostly consideredschemes with shared randomness [8, 7, 11], and itwill be interesting to see if our methods can provideimproved algorithms for the heavy hitter problem.

We note that [7] also uses Hadamard matrix dur-ing the encoding phase. We emphasize that we use

Page 3: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Jayadev Acharya, Ziteng Sun, Huanyu Zhang

completely di�erent encoding methods based on dif-ferent properties of Hadamard matrices. They useHadamard matrix to sample one bit in the frequencydomain while we sample an entire column index(represented by log k bits) from a row vector of theHadamard matrix. They use rows of Hadamard ma-trix to provide orthogonal channels for the users.While any orthogonal channels will work for theirproblem, they use Hadamard matrix to reduce stor-age at the users since it is easy to compute withoutstoring it. However, we use Hadamard matrix to de-fine subsets of columns with large symmetric di�er-ence, which provides better statistical performance,and use Fast Hadamard Transform in the decodingfor improving computational performance. This willbecome clearer after we describe our scheme fully.

Another advantage of this work is that our methodcan be generalized to general regimes of Á whileschemes for heavy hitter detection [8, 7] only workfor Á < 1.

LDP distribution estimation. Let �k =Óp(0), . . . , p(k ≠ 1) : p(x) Ø 0,

qk≠1x=0 p(x) = 1

Ôbe

the set of all distributions over [k]. Let X1, . . . , Xn

be independent samples drawn from an unknownp œ �k, where Xi is the private (sensitive) datawith the ith user. Each user maps Xi through anÁ-LDP Q, to obtain Zi. The task at the server,upon observing the privatized samples Z1, . . . , Zn,is to output p : Zn æ �k, an estimate of p. Letd : �k ◊ �k æ R+ be a distance measure betweendistributions in �k. Private distribution estimationtask is the following:

Given – > 0, Á > 0, d : �k ◊ �k æ R, designan Á-LDP Q, and a corresponding estimation p,such that ’p œ �k, with probability at least 0.9,d(p, p) < –.

The sample complexity is the least n for which suchan Á-LDP scheme Q, and a corresponding p ex-ists. The communication complexity is the num-ber of bits to send Zi to the server. The compu-tational complexity is the total time to estimate pfrom Z1, . . . , Zn at the server and to privatize Xi

using Q at each user.

We will use ¸1, and ¸2 distance in this paper.For r Ø 0, the ¸r distance between p, q œ �k is¸r(p, q):=(

qx |p(x) ≠ q(x)|r)1/r. In non-private set-

ting, the sample complexity of distribution estima-tion under these distances is known even includingprecise constants [10, 34].

Á k-RR RAPPOR k -SS Á-HR

(0, 1) k3

Á2–2k2

Á2–2k2

Á2–2k2

Á2–2

(1, log k) k3

e2Á–2k2

eÁ/2–2k2

eÁ–2k2

eÁ–2

(log k, 2 log k) k–2

k2

eÁ/2–2k

–2k

–2

(2 log k, +Œ) k–2

k–2

k–2

k–2

Table 1: Sample complexity, up to constant factors,under ¸1 distance for the di�erent methods. Thesample complexity under ¸2 distance is exactly a fac-tor k smaller in each cell above.

2.1 The privatization mechanismsWe will now briefly describe RR, RAPPOR, themost popular Á-LDP schemes using no interactionand public randomness. We will also mention SS,and our proposed HR. For a detailed description ofRAPPOR and SS, please refer to Section D in thesupplementary file.

k-Randomized Response (RR). The k-RRmechanism [45, 31] is an Á-LDP QRR with Z = X =[k], such that

QRR(z|x) :=I

eÁ+k≠1 if z = x,1

eÁ+k≠1 otherwise.(2)

k-RAPPOR. The randomized aggregatableprivacy-preserving ordinal response (RAPPOR) isan Á-LDP mechanism which was proposed in [23, 27].Its simplest implementation k-RAPPOR [31] mapsX = [k] to Z = {0, 1}k. It first does a one hotencoding to the input x œ [k] to obtain y œ {0, 1}k,such that yj = 1 for j = x, and yj = 0 for j ”= x.The privatized output of k-RAPPOR is a k-bitvector obtained by independently flipping each bitof y with probability 1

eÁ/2+1 .

Subset Selection techniques. [43, 47] propose anÁ-LDP scheme that maps x œ [k] to subsets of [k] ofsize Ák/(eÁ + 1)Ë. The scheme is described in detailin Section D.

Hadamard Response. We propose Hadamard Re-sponse (HR), an Á-LDP scheme with Z = [K], forsome k Æ K Æ 4k. The algorithm is described inSection 4 for high privacy, and in Section 5 and Afor general privacy.

2.2 Previous ResultsTo estimate distributions in �k to ¸1 distance – un-der Á-LDP, the sample, communication and time re-

Page 4: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Hadamard Response: Estimating Distributions Privately, E�ciently, and with Little Communication

Á k-RR RAPPOR k -SS Á-HR

(0, 1) log k k k log k

(1, log k) log k keÁ/2

keÁ log k

(log k, 2 log k) log k keÁ/2 log k log k

(2 log k, +Œ) log k log k log k log k

Table 2: Communication requirements for distribu-tion estimation techniques.

quirements of the various schemes are given in Ta-ble 1, 2 and 3 respectively..

The sample complexity is given in Table 1. Theentries in green boxes are sample-order optimal,namely there is a matching lower bound [47]. Notethat RR is sample-optimal in the low privacy regime(last two rows), and is highly sub-optimal in the highprivacy regime (Á = O(1)). RAPPOR is optimal forhigh-privacy, but sub-optimal for medium privacy.SS, and our proposed HR are sample-order-optimalfor all Á. The sample complexity arguments for RR,RAPPOR, and SS can be found in [31, 47].

Table 2 describes the communication requirementsof various schemes. However, it is not clear how tomeasure the communication requirements, since fora given privatization scheme, there might be com-munication protocols requiring fewer bits than oth-ers. For example, RAPPOR is described as giving kbits as its output, but perhaps these k bits can becompressed further requiring much smaller commu-nication. We get around such concerns by observingthat, once the input distribution p and the privati-zation mechanism Q is fixed, the output distributionof the privatized sample Z is fixed. By Shannon’ssource coding theorem, to faithfully send Z to theserver requires at least H(Z) bits of communication.The entries in the table are derived by consideringthe input distribution to be near uniform, and eval-uating the entropy of the output of the mechanisms.For RR, log k bits of communication follows fromZ = [k]. Note that in this paper all logarithms are inbase 2. The communication requirements for RAP-POR, and SS are derived in Section D (Theorems 9,and Theorem 10 respectively).

Table 3 describes the total running time lowerbounds for faithfully implementing the knownschemes. The argument is that at the server, thecomputation complexity is at least the number ofbits that need to be read, which is the amount of

k-RR k-RAPPOR Subset selection Á-HR

n + k n + k + nkeÁ/2 n + k + nk

eÁ n + k

Table 3: Time bounds for distribution estimation.The running times are described in Section D. Theseare upper bounds up to logarithmic factors.

communication from the users. If there are n users,then n · H(Z) serves as our time complexity bound,and these form the entries in the table.

2.3 Motivation and Our Results

Our work is motivated by the first three columnsof the tables, which captures the apparent sample-communication-computation trade-o�s present inthe existing schemes. We elaborate this point in themost interesting regime of high privacy. For sim-plicity, fix Á = 1, and – = 0.1 (chosen arbitrarily!),and treat them as fixed constants in this paragraph.In this setting, from Table 1, note that the optimalsample complexity is �(k2), achieved by RAPPOR,and SS, while RR has a sub-optimal sample com-plexity of �(k3). Now consider the communicationrequirements. Z = [k] for RR, requiring only log kbits. A straight-forward computation shows thatany input distribution to the RAPPOR mechanisminduces an output distribution over {0, 1}k with en-tropy at least �(k), thus requiring �(k) bits to faith-fully send the privatized samples to the server. SSalso requires �(k) bits in this regime. These are for-mally shown in Theorem 9 and Theorem 10 in Sec-tion D. As for the running time at the server end, abound of �(k3) for all these three methods followsfrom the total communication to the server (#sam-ples ◊ #bits per sample), which is a factor k largerthan the �(k2) optimal sample complexity bound.

Our main result is the following, which is formallystated in Theorem 2, and Theorem 5.

Theorem 1. We propose a simple algorithm for Á-LDP distribution estimation that for all parameterregimes, is sample optimal, runs in near-linear timein the number of samples, and has only a logarithmiccommunication complexity in the domain size, forboth the ¸1, and ¸2 distance.

Going back to the high privacy regime, consideredbefore, this shows that our scheme has a runningtime of O(k2), which is nearly linear in the optimalsample complexity under ¸1 distance.

Page 5: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Jayadev Acharya, Ziteng Sun, Huanyu Zhang

3 A family of Á-LDP schemesWe first propose a general family of LDP schemes,and then carefully choose schemes from this familythat are sample-optimal, communication and com-putationally e�cient for distribution estimation.

The scheme involves the following steps:

1. Choose an integer K, and let the output alpha-bet be Z = [K].

2. Choose a positive integer s Æ K.3. For each x œ X = [k], pick Cx ™ [K] with

|Cx| = s.4. The privatization scheme from [k] to [K] is then

given by:

Q(z|x) :=I

seÁ+K≠s if z œ Cx,1

seÁ+K≠s if z œ Z \ Cx.(3)

This scheme satisfies (1), and is Á-LDP. This pri-vatization scheme chooses a set Cx for each x andassigns the elements in Cx a higher probability thanthose not in Cx. We also note that RR is a spe-cial case of this construction when K = k, s = 1,and Cx = {x}. We know from the last section thatRR is sub-optimal in the high privacy regime. Ourgeneral inspiration comes from coding theory, andwe select s, and Cx carefully in order to send moreinformation across Q than RR.

In Section 4 we give an optimal scheme in the highprivacy regime, and extend it to the general case inSection 5 and A.

4 Optimal scheme for high privacyPrivatization scheme. If for x ”= xÕ, Cx = CxÕ ,then we cannot tell them apart. Therefore, the hopeis that the farther apart Cx and CxÕ are, the easierit is to tell them apart. With this in mind, we spec-ify a particular choice of parameters for our scheme,which turns out to be sample-optimal in the high pri-vacy regime. In particular, our privatization schemewill satisfy the following:

An optimal privatization for high privacyChoose K, and Cx’s such that (We will show inSection 4.1 how to satisfy these conditions.):C1. K is between k and 2k, and s = K/2, namely

for all x œ [k], |Cx| = K2 .

C2. For any x, xÕ œ [k], and x ”= xÕ, |�(Cx, CxÕ)| =|(Cx \ CxÕ) fi (CxÕ \ Cx)| = K

2 .Use (3) for privatization.

Performance. We will show that for Á = O(1), this

privatization is sample-order-optimal, namely thereis a corresponding estimator p : [K]n æ �k thatis sample-optimal. Before describing the estimationprocedure, we provide the statistical guarantees.Theorem 2. For any privatization scheme satis-fying C1, C2, there is a corresponding estimationscheme p : [K]n æ �k, such that

E#¸2

2(p, p)$

Æ 4k(eÁ + 1)2

n(eÁ ≠ 1)2 , and (4)

E [¸1(p, p)] Æ

Û4k2(eÁ + 1)2

n(eÁ ≠ 1)2 . (5)

The sample optimality, and small communication forhigh privacy is an immediate corollary.Corollary 3. When Á = O(1), the sample com-plexity of this scheme for estimation to ¸1 distance– is O(k2/Á2–2) samples, and for ¸2

2 distance isO(k/Á2–2). Further, the communication from eachuser is at most log(k) + 1 bits. This is sample-optimal for both ¸1 (Table 1) and ¸2

2 (see [47]).

Proof. Applying Markov’s inequality in Theorem 2,and substituting eÁ + 1 = �(1), and eÁ ≠ 1 =�(Á) when Á = O(1) gives the sample complex-ity bounds. The communication bounds are fromlog K Æ log(k) + 1.

Estimation. Suppose QK,Á is an Á-LDP schemesatisfying C1, and C2. For an input distribution pover [k], let p(Cx) be the probability that the priva-tized sample Z œ Cx. Using |Cx| = K/2, and C2, itfollows that |Cx \CxÕ | = K/4, and |Cx flCxÕ | = K/4.Therefore,

p(Cx) = p(x)A

ÿ

zœCx

QK,Á(z|x)B

+ÿ

xÕ ”=x

p(xÕ)·

Q

aÿ

zœCx\CxÕ

QK,Á(z|xÕ) +ÿ

zœCxflCxÕ

QK,Á(z|xÕ)

R

b

= p(x) · |Cx| · eÁ

(seÁ + K ≠ s)+

ÿ

xÕ ”=x

p(xÕ)3

|Cx \ CxÕ | · 1seÁ + K ≠ s

+ |Cx fl CxÕ | · eÁ

seÁ + K ≠ s

4

(6)

= 12 + eÁ ≠ 1

2(eÁ + 1)p(x), (7)

where (6) follows from (3), and (7) by plugging s =K/2, and from C2. We can rewrite this as

p(x) = 2(eÁ + 1)eÁ ≠ 1

3p(Cx) ≠ 1

2

4. (8)

Page 6: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Hadamard Response: Estimating Distributions Privately, E�ciently, and with Little Communication

This forms the basis of our estimation. From the pri-vatized samples, we estimate p(Cx), and from thatwe estimate p. The entire scheme is given below.

An optimal distribution estimation schemefor high privacy

Input: k, Á, privatized samples Z1, . . . , Zn

1. For each x œ [k], estimate p(Cx) with its em-pirical probability:

\p(Cx) :=nÿ

j=1

I {Zj œ Cx}n

. (9)

2. Estimate p as:

p(x) := 2(eÁ + 1)eÁ ≠ 1

3\p(Cx) ≠ 1

2

4. (10)

Next we will prove Theorem 2 by showing the per-formance of this estimation scheme.

Proof of Theorem 2.1 Let p(C), [p(C), be the vec-tor of probabilities of p(Cx)’s and \p(Cx)’s respec-tively. From (8) and (10),

E#¸2

2(p, p)$

= 4(eÁ + 1)2

(eÁ ≠ 1)2 E˸2

2([p(C), p(C))È

.

From (9), EË\p(Cx)

È= E [I {Zj œ Cx}] = p(Cx).

Therefore,

E˸2

2([p(C), p(C))È

= E

S

Uÿ

xœ[k](\p(Cx) ≠ p(Cx))2

T

V

=ÿ

xœ[k]E

Ë(\p(Cx) ≠ p(Cx))2

È=

ÿ

xœ[k]Var(\p(Cx)).

By the independence of Zi’s, \p(Cx) is the average ofn independent Bernoulli random variables each withexpectation p(Cx). Hence,

ÿ

xœ[k]Var(\p(Cx)) =

ÿ

xœ[k]

1n

· p(Cx)(1 ≠ p(Cx))

Æ 1n

ÿ

xœ[k]p(Cx) Æ k

n.

Plugging this bound in the previous expression givesthe bound on ¸2

2 distance of the theorem.

E#¸2

2(p, p)$

Æ 4k(eÁ + 1)2

n(eÁ ≠ 1)2 . (11)

1A technicality here is that p(x)’s can be negative, but

we can project p onto the simplex with the same order

performance. We therefore only analyze the performance

of p described in (10).

Using k · ¸22(p, p) Ø ¸1(p, p)2 with (11) gives the de-

sired bound on E [¸1(p, p)].

4.1 Computational complexity andHadamard matrices.

We showed the sample, and communication com-plexity guarantees. However, two questions are stillunanswered:

• How to choose K, and design Cx’s that satisfyC1, C2?

• What is the time complexity of privatizationand estimation?

We now address these questions. We start with thecomputational requirements of the proposed scheme,assuming C1,C2.

Computation at users. Given Cx’s, each userneeds to implement (3). This requires uniform sam-pling from Cx’s, as well as from [K] \ Cx. We willdesign schemes to do this in time O(log K).

Computation at the server. The server needsto implement (9) and (10). Note that (10) can beimplemented in time O(k) after implementing (9).However, a straightforward implementation of (9)requires n · k time, since for each x we iterate overall the samples, giving running time of O(n · k). Inparticular, in the high privacy regime (say with Á =1, and – = 0.1) the sample complexity is O(k2) andthe time requirement will be O(k3). We now showhow to design a privatization to satisfy C1, C2,and for which we can implement (9) in time onlyO(n + k).

Hadamard Response (HR) for high privacy.Suppose K is a power of two, and HK œ {±1}K◊K

is the Hadamard matrix of size K ◊ K designed bySylvester’s construction as follows. Let H1 = [1],and for m = 2j , for j Ø 1,

Hm :=

S

WUHm/2 Hm/2

Hm/2 ≠Hm/2

T

XV .

Some standard properties of Hadamard matricesthat we use are the following:

(i) The number of +1’s in each row except the firstis K/2,

(ii) Any two rows agree (and disagree) on exactlyK/2 locations,

(iii) Vector multiplication with HK is possible intime O(K log K) with Fast Walsh Hadamardtransform,

Page 7: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Jayadev Acharya, Ziteng Sun, Huanyu Zhang

(iv) We can uniformly sample from the +1’s (andthe ≠1’s) in any row in time O(log K).

We now describe the parameters for the privacymechanism:

1. Choice of K: Let K = 2Álog2(k+1)Ë Ø k + 1, thesmallest power of 2 larger than k. To satisfyC1, we will choose s = K/2.

2. Choice of Cx’s: Map the symbols [k] ={0, . . . , k ≠ 1} to rows of HK as follows: map0 to the second row, 1 to the third row, and soon. In other words, x is mapped to row x + 1.Given any x, we choose Cx µ [K] to be the col-umn indices with a ‘+1’ in the (x + 1)th row ofHK .

By Property (i) and (ii) of HK , both C1, and C2 aresatisfied. This implies a privatization scheme withoptimal sample and communication complexity inthe high privacy regime.

Fast computation with HR. By Property (iv), wecan e�ciently implement the privatization schemeat the users. We will now provide an e�cient im-plementation of (9). Let q = (q(0), . . . , q(K ≠ 1)) bethe vector of the empirical distribution of Z1, . . . , Zn

over [K] = {0, . . . , K ≠ 1}, namely

q(z) =nÿ

i=1

I {Zi = z}n

.

We can compute q in linear time with a single passover Z1, . . . , Zn. Consider the matrix vector productc = HK · q. For x œ [k], the (x + 1)th entry ofHK · q is

qK≠1z=0 HK(x + 1, z) · q(z). Now note that

the +1’s in the (x + 1)th column correspond to Cx

by construction, therefore

K≠1ÿ

z=0HK(x + 1, z) · q(z) =

ÿ

zœCx

q(z) ≠ÿ

zœ[K]\Cx

q(z)

=2\p(Cx) ≠ 1 =3

eÁ ≠ 1eÁ + 1

4p(x),

where the last line follows from observing thatqzœCx

q(z) = \p(Cx) from (9), and from (10). There-fore the estimator p is simply entries of a Hadamardvector product, appropriately normalized. By prop-erty (iii), this can be done in time O(K log K) =O(k log k). This computational advantage is cap-tured in the following theorem:

Theorem 4. HR is an Á-LDP mechanism satisfyingTheorem 2 that has a running time O(n + k).

5 General privacy regimes.For general values of Á, we will still use the generalstructure of schemes given by (3). However, we willchoose the values of s to be dependent on Á (whichwill be close to k/eÁ for general Á). After fixing thiss, we design the sets Cx’s with size s by formingblock-structured matrices with Hadamard matricesat the diagonals. The general construction, alongwith the encoding, estimation, and analysis is pro-vided in Section A in the supplementary file. Wesimply state the main result for general Á here.

The two parameters we need to describe the the-orem are B, and b, where b can be thought of asa proxy for s in our scheme. Their precise val-ues are given in Section A, but we only need thatB = �(min{eÁ, 2k}), and b = �(k/B + 1) to statethe result below.Theorem 5. There is an Á-LDP estimate p suchthat

E#¸2

2(p, p)$

= O

3(k + (eÁ ≠ 1)b)(B + eÁ)

n(eÁ ≠ 1)2

4,

E [¸1(p, p)] = O

AÛk

n

(k + (eÁ ≠ 1)b)(B + eÁ)2(eÁ ≠ 1)2

B.

The running time of the algorithm is O(n + k), andcommunication is at most 2 + log k bits.

Plugging the values of b, and B, and apply Markov’sinequality, we can obtain all the sample complexitybounds for HR, namely the last column of the samplecomplexity in Table 1.

6 Experiments.We experimentally compare our algorithm withRR, RAPPOR and SS. Our code is availableat https://github.com/zitengsun/hadamard_response. We set k œ {100, 1000, 5000, 10000},n œ {50000, 100000, 150000, ..., 1000000}, andÁ œ {0.1, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. We con-sider geometric distributions Geo(⁄), wherep(i) à (1 ≠ ⁄)i⁄, Zipf distributions Zipf(k, t) wherep(i) à (i + 1)≠t, two-step distributions, and uniformdistributions. For every setting of (k, p, n, Á), andfor each scheme, we simulate 30 runs, and computethe averaged ¸1 error, and averaged decoding timeat the server.

In a nutshell, we observe that in each regime, thestatistical performance of HR is comparable to thebest possible. Moreover, the decoding time of HR issimilar to that of RR. In comparison to RAPPORand SS, our running times can be orders of magni-tude smaller, particularly for large k, and small Á.

Page 8: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Hadamard Response: Estimating Distributions Privately, E�ciently, and with Little Communication

(a) Á = 0.5, p ≥ Geo(0.8) (b) Á = 2, p ≥ Geo(0.8) (c) Á = 5, p ≥ Geo(0.8) (d) Á = 7, p ≥ Geo(0.8)

(e) Á = 0.5, p ≥ U [k] (f) Á = 2, p ≥ U [k] (g) Á = 5, p ≥ U [k] (h) Á = 7, p ≥ U [k]

Figure 1: ¸1-error comparison between four algorithms k = 1000, p ≥ Geo(0.8) and p ≥ U [k]

(a) Á = 1 (b) Á = 2 (c) Á = 5 (d) Á = 10

Figure 2: ¸1-error comparison between four algorithms k = 10000 and p ≥ Geo(0.8)

(a) k = 100 (b) k = 1000 (c) k = 5000 (d) k = 10000

Figure 3: Decoding time comparison between four algorithms for Á = 1 and p ≥ Geo(0.8). Note that the decoding

times are in logarithmic scale.

We remark that we implement RAPPOR, and SSsuch that their running time is almost linear in thetime needed to read the already compressed commu-nication from the users.

We describe some of our experimental results here.Figure 1 plots the ¸1 error for distribution estimationunder geometric distribution and uniform distribu-tion for k = 1000. Note that for Á = 0.5, and Á = 7,our performance matches with the best schemes. Inall the plots SS has the best statistical performance,however that can come at the cost of higher com-munication, and computation. For larger k such ask = 10000, the performance is shown in figure 2. InFigure 1 (d), (h) and Figure 2 (d), you can only seetwo curves because when Á is high, HR, RAPPOR

and SS perform almost the same.

The running time of our algorithm is theoreticallya factor k/ log k smaller than RAPPOR and subsetselection. This is evident from figure 3, which showsthat for large k the running times of RAPPOR andSS are orders of magnitude more than HR, and RR.For example, for k = 10000, our algorithm runs 100xfaster than SS, and RAPPOR.

Acknowledgements

The authors thank Peter Kairouz, Ananda TheerthaSuresh, and Aaron Wagner for many helpful discus-sions, and technical support during this research.

Page 9: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Jayadev Acharya, Ziteng Sun, Huanyu Zhang

References

[1] F. Abramovich, Y. Benjamini, D. L. Donoho,and I. M. Johnstone. Adapting to unknownsparsity by controlling the false discovery rate.The Annals of Statistics, 34(2):584–653, 2006.

[2] J. Acharya, C. L. Canonne, and H. Tyagi. Dis-tributed simulation and distributed inference.CoRR, abs/1804.06952, 2018.

[3] J. Acharya, I. Diakonikolas, J. Li, andL. Schmidt. Sample-optimal density estima-tion in nearly-linear time. In Proceedings of the28th Annual ACM-SIAM Symposium on Dis-crete Algorithms, SODA ’17, pages 1278–1289,Philadelphia, PA, USA, 2017. SIAM.

[4] M. F. Balcan, A. Blum, S. Fine, and Y. Man-sour. Distributed learning, communicationcomplexity and privacy. In Conference onLearning Theory, pages 26–1, 2012.

[5] S. Banerjee, N. Hegde, and L. Massoulie. Theprice of privacy in untrusted recommendationengines. In Communication, control, and com-puting (Allerton), 2012 50th annual Allertonconference on, pages 920–927. IEEE, 2012.

[6] R. Barlow, D. Bartholomew, J. Bremner, andH. Brunk. Statistical Inference under Order Re-strictions. Wiley, New York, 1972.

[7] R. Bassily, K. Nissim, U. Stemmer, and A. G.Thakurta. Practical locally private heavy hit-ters. In Advances in Neural Information Pro-cessing Systems, pages 2285–2293, 2017.

[8] R. Bassily and A. Smith. Local, private, ef-ficient protocols for succinct histograms. InSTOC, pages 127–135. ACM, 2015.

[9] A. Blum, K. Ligett, and A. Roth. A learningtheory approach to noninteractive database pri-vacy. Journal of the ACM (JACM), 60(2):12,2013.

[10] D. Braess and T. Sauer. Bernstein polynomialsand learning theory. Journal of ApproximationTheory, 128(2):187–206, 2004.

[11] M. Bun, J. Nelson, and U. Stemmer. Heavy hit-ters and the structure of local privacy. In Pro-ceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of DatabaseSystems, pages 435–447. ACM, 2018.

[12] S. O. Chan, I. Diakonikolas, R. A. Servedio,and X. Sun. E�cient density estimation viapiecewise polynomial approximation. In STOC,2014.

[13] K. Chaudhuri, C. Monteleoni, and A. D. Sar-wate. Di�erentially private empirical risk min-imization. Journal of Machine Learning Re-search, 12:1069–1109, 2011.

[14] T. Dalenius. Towards a methodology for sta-tistical disclosure control. Statistisk Tidskrift,15:429–444, 1977.

[15] S. Dasgupta. Learning mixtures of gaussians.In Annual Symposium on Foundations of Com-puter Science (FOCS), 1999.

[16] C. Daskalakis and G. Kamath. Faster and sam-ple near-optimal algorithms for proper learningmixtures of gaussians. In COLT, 2014.

[17] L. Devroye and L. Gyorfi. Nonparametric Den-sity Estimation: The L1 View. John Wiley &Sons, 1985.

[18] L. Devroye and G. Lugosi. Combinatorial meth-ods in density estimation. Springer, 2001.

[19] I. Diakonikolas, E. Grigorescu, J. Li, A. Natara-jan, K. Onak, and L. Schmidt. Communication-e�cient distributed learning of discrete distri-butions. In NIPS, pages 6394–6404. Curran As-sociates, Inc., 2017.

[20] I. Diakonikolas, M. Hardt, and L. Schmidt. Dif-ferentially private learning of structured dis-crete distributions. In NIPS, pages 2566–2574,2015.

[21] Di�erential Privacy Team, Apple. Learningwith privacy at scale, December 2017.

[22] I. Dinur and K. Nissim. Revealing informationwhile preserving privacy. In PODS, pages 202–210, New York, NY, USA, 2003. ACM.

[23] J. C. Duchi, M. I. Jordan, and M. J. Wain-wright. Local privacy and statistical minimaxrates. In FOCS, pages 429–438. IEEE, 2013.

[24] C. Dwork, F. McSherry, K. Nissim, andA. Smith. Calibrating noise to sensitivity inprivate data analysis. In TCC, pages 265–284.Springer, 2006.

[25] C. Dwork and A. Roth. The algorithmic foun-dations of di�erential privacy. Foundationsand Trends R• in Theoretical Computer Science,9(3–4):211–407, 2014.

Page 10: Hadamard Response: Estimating Distributions Privately ...proceedings.mlr.press/v89/acharya19a/acharya19a.pdf · Hadamard Response: Estimating Distributions Privately, Eciently, and

Hadamard Response: Estimating Distributions Privately, E�ciently, and with Little Communication

[26] C. Dwork, G. N. Rothblum, and S. Vadhan.Boosting and di�erential privacy. In FOCS,pages 51–60, 2010.

[27] U. Erlingsson, V. Pihur, and A. Korolova.Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings ofthe 2014 ACM SIGSAC conference on computerand communications security, pages 1054–1067.ACM, 2014.

[28] Y. Han, P. Mukherjee, A. Ozgur, and T. Weiss-man. Distributed statistical estimation of high-dimensional and nonparametric distributionswith communication constraints. In ISIT, 2018.

[29] N. Homer, S. Szelinger, M. Redman, D. Dug-gan, W. Tembe, J. Muehling, J. V. Pear-son, D. A. Stephan, S. F. Nelson, and D. W.Craig. Resolving individuals contributing traceamounts of dna to highly complex mixtures us-ing high-density snp genotyping microarrays.PLoS Genetics, 4(8):1–9, 2008.

[30] J. Hsu, S. Khanna, and A. Roth. Distributedprivate heavy hitters. In International Collo-quium on Automata, Languages, and Program-ming, pages 461–472. Springer, 2012.

[31] P. Kairouz, K. Bonawitz, and D. Ramage. Dis-crete distribution estimation under local pri-vacy. arXiv preprint arXiv:1602.07387, 2016.

[32] P. Kairouz, S. Oh, and P. Viswanath. Thecomposition theorem for di�erential privacy.IEEE Transactions on Information Theory,63(6):4037–4049, 2017.

[33] A. T. Kalai, A. Moitra, and G. Valiant. E�-ciently learning mixtures of two gaussians. InSTOC, 2010.

[34] S. Kamath, A. Orlitsky, V. Pichapathi, andA. T. Suresh. On learning distributions fromtheir samples. In preparation, 2015.

[35] J. Lei. Di�erentially private m-estimators.In Advances in Neural Information ProcessingSystems, pages 361–369, 2011.

[36] F. McSherry and K. Talwar. Mechanism designvia di�erential privacy. In FOCS, pages 94–103.IEEE, 2007.

[37] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Se-curity and Privacy.IEEE Symposium on, pages111–125, 2008.

[38] O. She�et. Locally private hypothesis testing.CoRR, abs/1802.03441, 2018.

[39] B. W. Silverman. Density Estimation. Chap-man and Hall, London, 1986.

[40] A. T. Suresh, A. Orlitsky, J. Acharya, andA. Jafarpour. Near-optimal-sample estimatorsfor spherical gaussian mixtures. In NIPS, pages1395–1403. Curran Associates, Inc., 2014.

[41] L. Sweeney. k-anonymity: A model for pro-tecting privacy. International Journal of Un-certainty, Fuzziness and Knowledge-Based Sys-tems, 10(05):557–570, 2002.

[42] M. J. Wainwright, M. I. Jordan, and J. C.Duchi. Privacy aware learning. In Advances inNeural Information Processing Systems, pages1430–1438, 2012.

[43] S. Wang, L. Huang, P. Wang, Y. Nie, H. Xu,W. Yang, X. Li, and C. Qiao. Mutual informa-tion optimally local private discrete distributionestimation. CoRR, abs/1607.08025, 2016.

[44] T. Wang and J. Blocki. Locally di�erentiallyprivate protocols for frequency estimation. InProceedings of the 26th USENIX Security Sym-posium, 2017.

[45] S. L. Warner. Randomized response: A surveytechnique for eliminating evasive answer bias.Journal of the American Statistical Association,60(309):63–69, 1965.

[46] L. Wasserman and S. Zhou. A statistical frame-work for di�erential privacy. Journal of theAmerican Statistical Association, 105(489):375–389, 2010.

[47] M. Ye and A. Barg. Optimal schemes for dis-crete distribution estimation under locally dif-ferential privacy. CoRR, abs/1702.00610, 2017.

[48] Y. Zhang, M. J. Wainwright, and J. C. Duchi.Communication-e�cient algorithms for statis-tical optimization. In NIPS, pages 1502–1510.2012.