12
1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEE Transactions on Dependable and Secure Computing 1 VPSearch: Achieving Verifiability for Privacy-Preserving Multi-Keyword Search over Encrypted Cloud Data Zhiguo Wan and Robert H. Deng Abstract—Although cloud computing offers elastic computa- tion and storage resources, it poses challenges on verifiability of computations and data privacy. In this work we investigate verifiability for privacy-preserving multi-keyword search over outsourced documents. As the cloud server may return incorrect results due to system faults or incentive to reduce computation cost, it is critical to offer verifiability of search results and privacy protection for outsourced data at the same time. To fulfill these requirements, we design a V erifiable P rivacy-preserving keyword S earch scheme, called VPSearch, by integrating an adapted homomorphic MAC technique with a privacy-preserving multi-keyword search scheme. The proposed scheme enables the client to verify search results efficiently without storing a local copy of the outsourced data. We also propose a random challenge technique with ordering for verifying top-k search results, which can detect incorrect top-k results with probability close to 1. We provide detailed analysis on security, verifiability, privacy, and efficiency of the proposed scheme. Finally, we implement VPSearch using Matlab and evaluate its performance over three UCI bag-of-words data sets. Experiment results show that authentication tag generation incurs about 3% overhead only and a search query over 300,000 documents takes about 0.98 seconds on a laptop. To verify 300,000 similarity scores for one query, VPSearch costs only 0.29 seconds. I. I NTRODUCTION Cloud computing revolutionizes the computation model by offering elastic computation and storage resources. A client can outsource his data to the cloud server and later access the data with other devices from anywhere. The client can further ask the cloud server to perform some computation over his data on his behalf. These advantages lead to great success of cloud computing, but they also raise security and privacy concerns with respect to the client’s data. The concerns come from the fact that the service providers can not be fully trusted. The first concern is privacy of outsourced data. To protect data privacy, data should be encrypted before being outsourced to the cloud. However, this is nontrivial as the encryption scheme need to support computations over encrypted data. Although fully homomorphic encryption provides a seemingly perfect solution for supporting arbitrary computations over encrypted data, its prohibitive computation cost makes it impractical currently. One of the fundamental and most frequent data operations is keyword search. How to perform efficient and versatile search Z. Wan ([email protected]) is with School of Computer Science and Technology, Shandong University, Jinan, China. R. H. Deng ([email protected]) is with the School of Information Systems, Singapore Management University, Singapore. operations on encrypted cloud data has been an important research direction since the seminal work of Song et al. [1]. More recent works have achieved keyword privacy for keyword search over encrypted data, i.e., the keywords in queries are protected from the cloud server. Solutions for both single-keyword search [1], [2], [3] and multi-keyword search [4], [5], [6], [7], [8], [9], [10], [11], [12], [13] have been proposed in the literature. Unfortunately, all these schemes assume the cloud server will follow the designated protocols honestly, i.e. the honest-but-curious model, while whether the computation result is authentic is not considered in those proposals. However, as the cloud may return false results to the client either due to system faults or incentive to reduce computation cost, the keyword search results cannot be fully trusted, especially for very critical computation results. Thus, it is important for the client to be able to verify the keyword search results over encrypted cloud data, but this has not been studied sufficiently. Verifiability of delegated computation over outsourced da- ta is an important research problem for cloud computing. This problem has attracted considerable research interests, and many solutions have been proposed, including verifiable computation [14], [15], [16], [17], [18] [19], [20], [21], homomorphic MACs/signatures [22], [23], [24], [25] as well as other proposals. However, most of these proposals suffer from various limitations. Verifiable computation schemes aim to minimize computation effort of the client, but not storage or communication cost. They require the client and the server to interactively authenticate the computation result. Moreover, the client needs to have a copy of the outsourced data, and the data over which computation is verified cannot be changed in the future. Homomorphic message authenticator schemes [22], [23], which were proposed recently, avoid a local copy of the out- sourced data on the client side. Homomorphic message authen- ticator schemes allow a client to authenticate a collection of data m 1 , ..., m n with his secret key sk, and later to authenticate the computation result m = P (m 1 , ..., m n ) of a running program P over the data. The authentication tag σ that certifies m can be produced without knowing sk. Consequently, anyone who knows sk can certify the computation result m of the outsourced data with a short authentication tag σ. A recent homomorphic MAC scheme proposed in [24] improves the verification efficiency, but the same as all aforementioned pro- posals, it does not consider the problem over encrypted data. To achieve verifiable computation over encrypted cloud data,

VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

1

VPSearch: Achieving Verifiability forPrivacy-Preserving Multi-Keyword Search over

Encrypted Cloud DataZhiguo Wan and Robert H. Deng

Abstract—Although cloud computing offers elastic computa-tion and storage resources, it poses challenges on verifiabilityof computations and data privacy. In this work we investigateverifiability for privacy-preserving multi-keyword search overoutsourced documents. As the cloud server may return incorrectresults due to system faults or incentive to reduce computationcost, it is critical to offer verifiability of search results andprivacy protection for outsourced data at the same time. To fulfillthese requirements, we design a Verifiable Privacy-preservingkeyword Search scheme, called VPSearch, by integrating anadapted homomorphic MAC technique with a privacy-preservingmulti-keyword search scheme. The proposed scheme enablesthe client to verify search results efficiently without storing alocal copy of the outsourced data. We also propose a randomchallenge technique with ordering for verifying top-k searchresults, which can detect incorrect top-k results with probabilityclose to 1. We provide detailed analysis on security, verifiability,privacy, and efficiency of the proposed scheme. Finally, weimplement VPSearch using Matlab and evaluate its performanceover three UCI bag-of-words data sets. Experiment results showthat authentication tag generation incurs about 3% overheadonly and a search query over 300,000 documents takes about0.98 seconds on a laptop. To verify 300,000 similarity scores forone query, VPSearch costs only 0.29 seconds.

I. INTRODUCTION

Cloud computing revolutionizes the computation model byoffering elastic computation and storage resources. A clientcan outsource his data to the cloud server and later access thedata with other devices from anywhere. The client can furtherask the cloud server to perform some computation over hisdata on his behalf. These advantages lead to great successof cloud computing, but they also raise security and privacyconcerns with respect to the client’s data. The concerns comefrom the fact that the service providers can not be fully trusted.The first concern is privacy of outsourced data. To protect dataprivacy, data should be encrypted before being outsourced tothe cloud. However, this is nontrivial as the encryption schemeneed to support computations over encrypted data. Althoughfully homomorphic encryption provides a seemingly perfectsolution for supporting arbitrary computations over encrypteddata, its prohibitive computation cost makes it impracticalcurrently.

One of the fundamental and most frequent data operations iskeyword search. How to perform efficient and versatile search

Z. Wan ([email protected]) is with School of Computer Science andTechnology, Shandong University, Jinan, China.

R. H. Deng ([email protected]) is with the School of InformationSystems, Singapore Management University, Singapore.

operations on encrypted cloud data has been an importantresearch direction since the seminal work of Song et al.[1]. More recent works have achieved keyword privacy forkeyword search over encrypted data, i.e., the keywords inqueries are protected from the cloud server. Solutions for bothsingle-keyword search [1], [2], [3] and multi-keyword search[4], [5], [6], [7], [8], [9], [10], [11], [12], [13] have beenproposed in the literature. Unfortunately, all these schemesassume the cloud server will follow the designated protocolshonestly, i.e. the honest-but-curious model, while whether thecomputation result is authentic is not considered in thoseproposals. However, as the cloud may return false results tothe client either due to system faults or incentive to reducecomputation cost, the keyword search results cannot be fullytrusted, especially for very critical computation results. Thus,it is important for the client to be able to verify the keywordsearch results over encrypted cloud data, but this has not beenstudied sufficiently.

Verifiability of delegated computation over outsourced da-ta is an important research problem for cloud computing.This problem has attracted considerable research interests,and many solutions have been proposed, including verifiablecomputation [14], [15], [16], [17], [18] [19], [20], [21],homomorphic MACs/signatures [22], [23], [24], [25] as wellas other proposals. However, most of these proposals sufferfrom various limitations. Verifiable computation schemes aimto minimize computation effort of the client, but not storageor communication cost. They require the client and the serverto interactively authenticate the computation result. Moreover,the client needs to have a copy of the outsourced data, and thedata over which computation is verified cannot be changed inthe future.

Homomorphic message authenticator schemes [22], [23],which were proposed recently, avoid a local copy of the out-sourced data on the client side. Homomorphic message authen-ticator schemes allow a client to authenticate a collection ofdata m1, ...,mn with his secret key sk, and later to authenticatethe computation result m = P(m1, ...,mn) of a runningprogram P over the data. The authentication tag σ that certifiesm can be produced without knowing sk. Consequently, anyonewho knows sk can certify the computation result m of theoutsourced data with a short authentication tag σ. A recenthomomorphic MAC scheme proposed in [24] improves theverification efficiency, but the same as all aforementioned pro-posals, it does not consider the problem over encrypted data.To achieve verifiable computation over encrypted cloud data,

Page 2: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

2

Fiore et al. [26] recently proposed a general scheme based onhomomorphic MACs. Nevertheless, their homomorphic MACscheme consider integer computation only and do not supportcomputations over real numbers.Our Contribution. In this paper, we investigate the problemof achieving verifiability for privacy-preserving multi-keywordsearch over encrypted cloud data. Different from the honest-but-curious model used in existing privacy-preserving keywordsearch schemes, we assume a partially honest model, in whichthe cloud server may return wrong results due to system faultsor incentive to reduce computation cost.

To this end, we adapt the efficient homomorphic MACscheme from [23] to support privacy-preserving multi-keywordsearch. The original homomorphic MAC is designed for com-putations over plaintexts in finite fields only, while privacy-preserving multi-keyword search schemes deal with encryptedreal numbers. Moreover, because the homomorphic MACscheme in [23] uses modulo operations, one cannot compareor sort the computation results. As a result, it is necessary toadapt the homomorphic MAC to support privacy-preservingmulti-keyword search schemes.

We choose to work on a privacy-preserving multi-keywordranked searchable encryption (MRSE) scheme in [10], whichuses matrix multiplication to encrypt data index and innerproduct to compute similarity scores. By applying our adaptedhomomorphic MAC, our proposed scheme is able to achieveverifiable privacy-preserving multi-keyword search over en-crypted cloud data. To our best knowledge, our solution is thefirst to use the homomorphic MAC to achieve both verifiabilityand privacy for multi-keyword search. VPSearch is highlyefficient as it uses only a one-way function (pseudorandomfunction) for security, making it scalable to large databases.Since top-k retrieval for search results is a commonly usedoperation, we further propose a random challenge techniqueto verify top-k multi-keyword search results. This techniqueenables the client to detect wrong top-k results with probabilityclose to 1.

We implement VPSearch using Matlab and evaluate itsperformance over three UCI bag-of-words data sets [27].Compared with the privacy-preserving keyword search schemewithout verifiability, VPSearch incurs only about 3% over-head for authentication tag generation. For keyword search,VPSearch can finish a query over 300,000 documents in 0.98seconds. Parallel execution of keyword search over multipleservers can be used to further speed up this operation.

The contributions of our work can be summarized asfollows:• We design an efficient, verifiable and privacy-preserving

multi-keyword ranked searchable encryption (MRSE)scheme for outsourced cloud data under the partiallyhonest cloud server model. It is realized by integratingan adapted homomorphic MAC technique with a privacy-preserving multi-keyword search scheme. The proposedscheme is very efficient as it relies on only one-wayfunction for security.

• We also provide the random challenge technique to verifytop-k search results for a given query. With this solution,the client can be sure that the top-k results are authentic

for probability close to 1.• We provide detailed analysis on security, privacy, ver-

ifiability and efficiency of VPSearch. Specifically, theunderlying homomorphic MAC scheme used in VPSearchcan be proved to be secure in the same way as [23].

• We implement VPSearch using Matlab and evaluate itsperformance over three UCI data sets. Experiment resultson a laptop show that VPSearch is very efficient onauthentication tag generation and keyword search opera-tions.

More importantly, although we only describe how to im-plement verifiability for a specific MRSE scheme [10], oursolution can be extended to other MRSE schemes adoptingthe similar models, e.g. [28], [12], [11].Paper Organization. We review related work in the nextsection, then some preliminaries on homomorphic MAC areprovided in Section III. System model, threat model and designgoals of our proposed scheme are presented in Section IV.Detailed description of our scheme is given in Section V,followed by discussion and security analysis. Then we presentperformance evaluation of our scheme and conclude the paperafter that.

II. RELATED WORK

A. Verifiable Computation

Our work is closely related to verifiable computation, whichhas been an active area in the past few years, and manyschemes with different design goals have been proposed in theliterature. Verifiable computation enables a client to delegatea computationally heavy task to a server with strong computa-tion capability, and then verify correctness of the computationresult.

Existing works can be categorized into generic protocols[14], [15], [16], [17], [18] which allow arbitrary computationsand ad hoc protocols which deal with specific classes ofcomputation [19], [20], [21]. Generic protocols for verifiablecomputation are computationally more expensive than ad hocones which aim at efficient computation for specific classes offunctions, e.g., high-degree functions [21].

Most verifiable computation schemes aim at reducing com-putation cost of the delegator, while storage cost of thedelegator is not considered, i.e. the delegator needs a copyof the data for verification. Our work enjoys the advantageover verifiable computation in that the client does not need tomaintain a local copy of the outsourced data. This advantageis becoming increasingly important for big data applicationswhich compute over huge volume of data. Another advantageof our solution over verifiable computation is that we deal withencrypted data while most existing schemes consider plaintextdata only.

B. Homomorphic Signatures and MACs

Homomorphic message authentication codes and signatureswith restricted homomorphism have been investigated forsecuring network coding since Johnson et al.’s work [29]. Mostworks since then focus on linear functions until Boneh and

Page 3: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

3

Freeman [25] proposed a homomorphic signature scheme forbounded constant degree polynomials. After that, homomor-phic signatures and MACs are used to verify authenticity ofcomputation results.

Gennaro and Wichs [22] constructed the first fully homo-morphic message authenticator based on fully homomorphicencryption. Though it has great theoretic significance, thescheme suffers from significant computation overhead due tofully homomorphic encryption. Following this work, Cata-lano and Fiore [23] proposed a more computation-efficienthomomorphic MAC for arithmetic circuits. Backs et al. [24]utilized the amortized closed-form efficiency technique [21] toreduce verification overhead. However, none of these solutionsconsider verification of computation over encrypted data, soprivacy of the outsourced data is not preserved.

Although a straightforward enhancement is to apply homo-morphic encryption on client data before outsourcing [26], ithas an obvious drawback on computation overhead. Goldwass-er et al. [30] gave a generic protocol for private and verifiableoutsourced computation based on functional encryption. Thisscheme only works for boolean functions and is very ineffi-cient. Libert et al.’s scheme [31] deals with evaluation of linearcombination over encrypted data, which is not applicableunder our setting. Homomorphic authenticated encryption hasbeen used by [32] [33] to verify computations over encrypteddata. Different from existing schemes on verifiable compu-tation over encrypted data, our work achieves efficient andverifiable keyword search by using the more efficient homo-morphic MAC with the privacy-preserving keyword searchscheme from [10]. The reader is referred to [24] for a morecomplete discussion.

C. Privacy-preserving keyword search over encrypted data

Song et al. [1] initiated the research on keyword searchover encrypted data, and proposed a scheme enabling singlekeyword search over an encrypted document. The basic idea isto build an encrypted searchable index while hiding the con-tent of the corresponding document. Noticing the importanceof keyword privacy, researchers proposed enhanced schemescapable of protecting keywords [2] [3]. Unfortunately, theseschemes can only process single keyword search, so theyobviously lack flexibility and expressiveness.

Multi-keyword search over encrypted data has been realizedin the public key setting with conjunctive keyword search[4] [5] [6]. Some works like [7] employ predicate encryp-tion technique to support conjunctive and disjunctive search.These schemes have several limitations due to the underlyingcryptographic techniques. On one hand, they incur significantcomputation overhead due to expensive computations like bi-linear mapping; on the other hand, conjunctive and disjunctivesearch fall short of search expressiveness.

Cao et al. [9], [10] designed a privacy-preserving multi-keyword ranked search based on “Coordinate matching”. Morerecently, Xia et al. [28] further improve keyword searchefficiency using a tree-based index. This tree-based index iscombined with the vector model and the TF×IDF model. Fuet al. [11] considered the user search experience problem, and

proposed a personalized multi-keyword searchable encryptionscheme using a user’s search history to build an interestmodel. This scheme can improve user search experience whilepreserving user privacy. Fu et al. [12] studied multi-keywordfuzzy ranked searchable encryption problem, and proposedan efficient solution using keyword transformation and wordstemming. This scheme can deal with several types of keywordspelling errors. With appropriate modification, our solutioncan also be applied on the schemes in [11], [12], [28], [34]similarly.

All the above privacy-preserving keyword search schemesassume an honest-but-curious model, in which the cloud serverstrictly follows the designated protocol while being curiousabout privacy of the clients. These schemes did not considerverifiability of keyword search results, but it is possible thereturned results may be incorrect due to system faults orincentive to reduce computation cost.

Sun et al. [35] employs multi-dimensional-B (MDB) tree toimplement multi-keyword search and search result verifiability.Although this approach achieves efficiency and verifiability,it does not support dynamic operations for file addition anddeletion. Whenever a new file is outsourced to the cloud, thedata owner needs to generate a new signature for verification.Cheng et al.’s work [36] is based on indistinguishability obfus-cation technique. Indistinguishability obfuscation may incurhigh computation overhead. In contrast, our work achievesboth keyword privacy and verifiability for searchable encryp-tion at modest cost.

Our work intends to achieve verifiability as well as privacyprotection at the same time.

III. PRELIMINARIES

We review the definition of the labeled program in [22] andpresent our homomorphic message authenticator scheme forreal numbers adapted from [23].

1) Labeled Programs: The labeled program is composed ofa functiona f and n input variables with each input assigneda label Li(i = 1, 2, ..., n). Li is a unique string to “label” thevariable of the function f . For example, suppose a companyoutsources its employee payroll database D to the cloud, andthen asks the cloud server to compute f as the average salaryof all its employees. Then Li can be constructed as (“the i-th employee’s salary”) where i is the index of an employee.Then the salary amount mi can be authenticated with respectto the label Li, which essentially binds the data m with thecorresponding label.

2) Our Homomorphic MAC for Real Numbers: As thehomomorphic MAC in [23] only supports computations overfinite fields, we propose an adapted homomorphic MAC forreal numbers, which is to be used in our proposed scheme.

The underlying idea for the homomorphic MAC is simplebut effective. Each message mi is encoded as a degree-1 polynomial y(x) such that y(0) = mi and y(α) = riwhere α, ri are secrets known only to the verifier. That is,y(x) = mi+(ri−mi) ·x/α. As elementary arithmetic opera-tions over polynomials are homomorphic, the cloud server can

aMore accurately, an arithmetic circuit that defines a polynomial function.

Page 4: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

4

compute a function f , which involves only elementary arith-metic operations, over these degree-1 polynomials to obtain apolynomial g(x). Then one can verify f(m1, ...,mn) = g(0)if f(r1, ..., rn) = g(α).

Our modification to the homomorphic MAC scheme is totreat all messages, e.g. mi and ri, as real numbers encoded bya format like the double-precision floating point format definedin IEEE 754 standard. Then the new MAC scheme can dealwith real numbers, while the homomorphic property of theMAC scheme is preserved. The details of our homomorphicMAC scheme RealHomMAC for real numbers is as follows:

• KeyGen(1λ): This algorithm generates a secret key skfrom the security parameter λ. Specifically, it chooses aseed K of a pseudo-random function FK : {0, 1}∗ → Rλ,where Rλ is the set of real numbers that is encode with λbits. The algorithm also selects a random value α ∈ Rλ,and the secret key is sk = (K,α).

• Auth(sk, L,m): This algorithm takes as inputs the secretkey sk, a label L and a message m ∈ Rλ, generates theauthentication tag σ. Specifically, the algorithm computesrL = FK(L), sets y0 = m and y1 = (rL − m)/α, andoutputs σ = (y0, y1) as the authentication tag for m.

• Eval(f, ~σ): This algorithm performs evaluation of f on avector of tags ~σ = (σ1, ..., σn), and outputs the resultingtag σ. f can be viewed as a circuit composed of additionand multiplication gates.The algorithm processes each gate of the circuit f asfollows. At each gate fg , given two tags σ(1), σ(2), it runsthe algorithm σ ← GateEval(fg, σ

(1), σ(2)) as describedbelow to derive a new tag σ. The new tag is in turn passedon as input to the next gate in the circuit f . The tag σobtained at the last gate of the circuit f is the final output.The subroutine GateEval is defined as follows:– GateEval(fg, σ

(1), σ(2)). Let σ(i) = ~y(i) =

(y(i)0 , ..., y

(i)di

) for i = 1, 2 and di ≥ 1.1) fg = +: The algorithm computes the coefficients

(y0, ..., yd) of the polynomial y(x) = y(1)(x) +y(2)(x) where d = max(d1, d2). This can be doneby simply adding the two vectors of coefficients,~y = ~y(1) + ~y(2).

2) fg = ×: The algorithm computes the coefficients(y0, ..., yd) of the polynomial y(x) = y(1)(x) ∗y(2)(x) using the convolution operator ∗, whered = d1 · d2. That is, yk =

∑ki=0 y

(1)i · y(2)k−i

∀k = 0, ..., d.Finally the algorithm returns σ = (y0, ..., yd), thevector of coefficients of a d-degree polynomial.

• Ver(sk,P,m, σ): This algorithm verifies that m is thecorrect computation result of a labeled program P =(f, L1, ..., Ln) with the secret key sk and a tag σ =(y0, ..., yd).Specifically, this algorithm computes f0 = f(rL1

, ..., rLn)

where rLi= FK(Li). Finally, it verifies the result by

checking the following two equations:

m = y0 (1)

f0 =d∑

k=0

ykαk (2)

It accepts the result if both equations hold, and rejectsotherwise.

IV. PROBLEM FORMULATION

A. System Model

Our scheme involves two different players: the client C whooutsources her encrypted documents to the cloud, and a cloudserver S that provides data storage and keyword search serviceto the clients.

To enable keyword search over encrypted documents, Cconstructs an encrypted searchable index for the encrypteddocuments, generates authentication tags for the index. Thenshe outsources the authenticated index and the encrypted docu-ments to the cloud. When C wants to search the documents forsome keywords, she prepares an authenticated search trapdoorfor the cloud. On receiving the search trapdoor from C, thecloud server performs search using the trapdoor and returnthe result to C.

The cloud server should be able to order the search resultsaccording to a given ranking criteria, so as to enable the clientC to get the most relevant documents with respect to the query.To reduce unnecessary communication, C can ask the cloudserver to returns the k most relevant documents only, whichis called top-k queries.

B. Threat Model

In contrast to the honest-but-curious cloud server assump-tion in most privacy-preserving keyword search schemes, weassume a partially honest model, in which the cloud servermay return wrong results due to system faults or in orderto reduce computation cost. More specifically, the cloud mayfinish only a part of the delegated computation task to reducecost, and return the wrong result to the client [35]. However, togain trust from the clients, the cloud server will not misbehavewhen the misbehavior detection probability is non-negligible.

The client C, who takes both roles of data owner and datauser, is assumed to be always honest in the entire process. Thatis, the client honestly encrypts outsourced documents, buildsencrypted the searchable index, generates authentication tags,and composes search trapdoors.

As analyzed in existing privacy-preserving keyword searchschemes, there are two types of knowledge held by thecloud server: 1) known ciphertext: the cloud server onlyknows the encrypted documents and the searchable index;2) known background knowledge: the cloud server also hassome background knowledge, e.g. statistics of the outsourceddocuments or relationship between different trapdoors. In thiswork, we consider the case that the cloud server can combineits knowledge on ciphertext and background knowledge todeduce privacy about documents or keywords in the queries.

Page 5: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

5

C. Design Goals

Our scheme is designed to achieve the following goals:• Verifiability. The cloud server is able to produce a proof

certifying correctness of the delegated multi-keywordsearch results, and the client is able to verify the resultsusing the proof returned by the server without storing theoutsourced data locally.

• Efficiency. The client is able to produce authenticationtags for outsourced data and verify the search resultsefficiently. On the other hand, the server should be ableto accomplish keyword search and produce authenticationtags for the search results efficiently.

• Keyword Privacy. The cloud server should not be ableto learn keywords in trapdoors or authentication tagsgenerated by the client. Moreover, even if the servergains some background knowledge about the outsourceddocuments, it should not be able to infer any informationabout the queried keywords with such knowledge.

• Data Privacy. The server should not be able to deduceany information about the outsourced data, including thedocuments and the index.

• Unbounded Storage. The client should be able to contin-uously outsource any amount of data to the cloud.

V. VERIFIABLE PRIVACY-PRESERVING MULTI-KEYWORDSEARCH OVER ENCRYPTED DATA

In this section we present VPSearch, the Verifiable Privacy-preserving multi-keyword Search scheme over encrypted clouddata. We first offer a high-level description of VPSearch beforewe present its details. After that, we describe a technique forVPSearch to support verifiability for top-k search results.

A. Overview of Our Scheme

Our scheme builds on the privacy-preserving multi-keywordsearch scheme in [10], which is integrated with our homo-morphic MAC RealHomMAC to achieve both verifiability andprivacy. The keyword search scheme is chosen for two reasons:(i) it supports flexible and expressive multi-keyword searches;(ii) its keyword search operation is actually inner product,which is fully supported by our homomorphic MAC.

The Client Side

Auth. Tags

3. UploadCloud

Plaintext Index

Encrypted Index

+

Encrypted Index

Result w/ Proof

Fig. 1: Overview of Our Verifiable Privacy-preserving Multi-keyword Search Scheme.

An overview of our schemes is illustrated in Fig. 1. Theclient first encrypts the plaintext index, then the encryptedindex is authenticated with our homomorphic MAC technique,which produces authentication tags for the encrypted index.Next, the index and authentication tags are uploaded to thecloud. Then the client can generate a search trapdoor, anduses our homomorphic MAC technique to authenticate thetrapdoor. With the authenticated trapdoor, the cloud servercan homomorphically execute the search function over theauthentication tags to derive the result with a proof, whichcan certify the search result.

Suppose the document set D to be outsourced containsN documents. For the i-th document, its subindex Di isconstructed as an n-dimensional bit vector. Di[j] is set to1 if this document contains the j-th keyword in a givendictionary; otherwise, it is set as 0. Similarly, the query Q isalso an n-dimensional bit vector, with the bits correspondingto the keywords interested by the client set to 1. Then thesubindex Di and the query Q are encrypted using matrixmultiplication, i.e., Di = MTDi and Q = M−1Q whereM is a real secret matrix. The similarity score can still beobtained via inner product of the encrypted index and query,i.e. (MTDi)

TM−1Q = DTi Q. To mask the real similarity

scores for stronger privacy, the document index and the queryvector can be extended with random numbers as in [10].Specifically, a document index is extended to ~Di = (Di, εi, 1),while the query vector is extended to ~Q = (rQ, r, t), whereεi, r, t are random numbers and t is relatively small. At last,the final similarity score computed by the cloud server will ber(Di·Q+εi)+t. Although the similarity scores are randomizedwith the random numbers, the cloud server is still able to rankthe results without knowing the real scores.

We apply our homomorphic MAC technique on the encrypt-ed index Di and query Q (trapdoor) to generate authenticationtags for them, then the cloud server homomorphically executesthe multi-keyword search function (inner product) over theauthentication tags to obtain the results. Since our homomor-phic MAC technique is adapted to support operations overreal numbers, it supports keyword search operations perfectly.Using a one-way function, our homomorphic MAC techniqueis highly efficient in computation.

For each value in the encrypted document subindex andquery, authentication tags are produced as follows. VPSearchauthenticates each item (a real number) in the encryptedindex Di and query Q with two coefficients of a degree-1polynomial. Verifiability of keyword search results is achievedby homomorphically evaluating the search function over theauthentication tags. The cloud server performs the multi-keyword search function as showed in Fig. 2, i.e. calculatingDTi Q.

The labeling approach in our homomorphic MAC can ef-fectively reduce storage space compared to arbitrarily labelinglarge amount of data. Therefore, the client needs to computethe value f(FK(L1), ...,FK(Ln)) for verification for each file.As the function f is lightweight, the verification can beperformed efficiently.

Page 6: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

6

0

1

1

0

.

.

.

1

0

1

0

1

.

.

.

1

0

0.3

1

0.9

.

.

.

1.5

0

0.7

1

-0.9

.

.

.

-0.5

Split

&

Encrypt

8.2

3.1

1.2

-4.9

.

.

.

7.8

.

.

.

-5.5

-6.7

8.2

3.1

1.2

-4.9

.

.

.

7.8

.

.

.

-5.5

-6.7

5.7

9.3

-3.8

2.9

.

.

.

0.3

.

.

.

9.5

4.7

&

9.1

8.7

-3.6

7.9

.

.

.

2.2

.

.

.

-5.2

6.8

3.9

3.1

4.1

-2.3

.

.

.

6.4

.

.

.

-7.4

-6.2

& =

Auth Search

Index

S

M1 M2

MT1~D′i

MT2~D′′i

~D′i

~D′′i

~Di

× σi y1

y2

y0

Di Di Di,1 Q Q1

Fig. 3: Illustration of the VPSearch Scheme: an extended document index ~Di is split into two vectors ~D′i and ~D′′i according tothe secret S; then they are encrypted by multiplying with secret matrices M1,M2; the resulting vector is authenticated usingour homomorphic MAC, yielding Di and Di,1; the authentication tags of the document index and the query are multiplied toobtain the final result σi = (y0, y1, y2), which is used by the client to verify the search result.

+

~y = (y0, y1, y2)

×(y

(1)0 , y

(1)1 , y

(1)2 )

Di[1]

(y(1)0 , y

(1)1 )

Q[1]

(y(1)0,Q, y

(1)1,Q)

×

Di[2]

(y(2)0 , y

(2)1 )

Q[2]

(y(2)0,Q, y

(2)1,Q)

. . . . . .

. . . . . .

×(y

(n)0 , y

(n)1 , y

(n)2 )

Di[n]

(y(n)0 , y

(n)1 )

Q[n]

(y(n)0,Q, y

(n)1,Q)

(y(2)0 , y

(2)1 , y

(2)2 )

Fig. 2: The circuit structure of the keyword search (innerproduct) function f and the homomorphic MACs for thedocument index Di and the query Q. Di[j] or Q[j] isauthenticated with an authentication tag σDj = (y

(j)0 , y

(j)1 ) or

σQj = (y(j)0,Q, y

(j)1,Q), the coefficients of a degree-1 polynomial.

The keyword search function f outputs a new authenticationtag σ = ~y = (y0, y1, y2). n is the length of the documentindex.

B. VPSearch: Verifiable Privacy-preserving Multi-keywordSearch Based on Homomorphic MAC

We present the verifiable privacy-preserving multi-keywordsearch scheme based on one-way functions, which is referredto as VPSearch. The homomorphic MAC technique used inVPSearch can authenticate computations over real numbers,and its security relies on existence of one-way functions.Without any public key operation, this scheme enjoys greatcomputational efficiency.

VPSearch consists the following six algorithms: Setup,Index, Trapdoor, Auth, KeywordSearch and Verify.Setup(1λ, n). This algorithm takes as inputs a security

parameter λ and the dictionary size n. n is the numberof keywords in a given dictionary (in the next algorithm),and it also represents the length of the document index.Given the security parameter λ, C invokes KeyGen(1λ) ofRealHomMAC to obtain two secrets (K,α) as the secretkey. Then C selects two (n+2)×(n+2) invertible matrixM1 and M2 in R(n+2)×(n+2)

λ . C also chooses an (n+2)-

bit random vector S. The bit vector S is used to randomlysplit the document sub-indices for privacy as [10]. Thesecret key held by C is sk = (K,α, S,M1,M2).

Index(D, sk, dict). This algorithm takes as inputs the doc-ument set containing N plaintext documents, the secretkey sk, and a dictionary dict of size n, and outputs theencrypted document index for outsourcing to the cloud.Its goal is to protect the document index from the cloudserver. It processes the document index as follows:For the i-th document in D, its subindex Di is first con-structed as a bit vector. Di[j] is set as 1 if this documentcontains the j-th keyword wj in dict; otherwise, it is setas 0. Then Di is extended to be ~Di = (Di, εi, 1) where εiis a random number conforming to normal distribution.Next, ~Di is split into two vectors ~D′i and ~D′′i according tothe following rule (Split in Fig. 3): if S[j] = 0, ~D′i[j] =~D′′i [j] = ~Di[j]; otherwise, ~D′i[j] + ~D′′i [j] = ~Di[j] (i.e.random split of ~Di[j]).At last, C encrypts the subindex ( ~D′i, ~D′′i ) using thesecret matrices M1,M2, yielding Di = (MT

1~D′i,M

T2~D′′i )

(Encrypt in Fig. 3). The encrypted document index,containing all sub-indices Di, along with the encrypteddocument will be uploaded to the cloud along with someauthentication tags generated by the Auth algorithm.

Trapdoor(Q, sk). This algorithm takes as inputs a searchquery in plaintext and the secret key sk, and outputs theencrypted query as a trapdoor. Its goal is to protect thekeyword information in the query from the cloud server.It processes the keyword query as follows:The search query is constructed as a bit vector, with Q[j]set to 1 if the client is interested in keyword wj in thedictionary or 0 otherwise. Then C generates two randomnumbers r, t ∈ Rλ, and extends Q to ~Q = (rQ, r, t).Next, C split ~Q according to the following rule: ifS[j] = 0, ~Q′[j] + ~Q′′[j] = ~Q[j] (i.e. random split of~Q[j]); otherwise, ~Q′[j] = ~Q′′[j] = ~Q[j].Finally, C generates the keyword search trapdoor as Q =(M−11

~Q′,M−12~Q′′).

Auth(sk, L, Di or Q). This algorithm produces authentica-

Page 7: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

7

tion tags for each item in the encrypted subindex Di

or the search trapdoor Q. Each item in the encryptedsubindex Di and Q is labeled by a string L. For instance,the j-th item in Di is labeled LDi,j

= “The j-th item inthe index for the i-th document”. The j-th item in thetrapdoor is labeled as LQ,j = “The j-th item in trapdoorQ”.To authenticate a message m ∈ Rλ with label L ∈{0, 1}∗, C invokes RealHomMAC.Auth(sk, L,m) to out-put a tag σ. Specifically, the algorithm computes rL =FK(L), sets y0 = m, y1 = (rL −m)/α, and outputs anauthentication tag (y0, y1). Note y0 is just the originalmessage m.Therefore, for the encrypted subindex Di, the authenti-cation tag is σDi

= (Di, Di,1); while for the trapdoorQ it will be σQ = (Q, Q1). The first item in theauthentication tag corresponds to y0 = m, the messageto be authenticated, and the second item corresponds toy1.Since authentication tags are generated over the extendedindex, including not only the original index, but alsoadditional random numbers (i.e., ε, r and t) and dummykeywords, they can be used to verify search results witherrors due to random numbers and dummy keywords.

KeywordSearch(f, ~σD, σQ). This algorithm takes as inputsthe keyword search function f , the authentication tagsof the document index ~σD and the search trapdoor σQ.~σD is parsed as {σD1

, σD2, ..., σDN

}. Then for eachsubindex Di it performs keyword search by invokingRealHomMAC.Eval(f, ~σ) where ~σ = {σDi , σQ}. Essen-tially, it homomorphically evaluates the search functionf over σDi

and σQ, and outputs coefficients (also seenas an authentication tag) σi = (y0, y1, y2).Among the coefficients returned by this algorithm, y0is the actual similarity score. According to the keywordsearch function showed in Fig. 2, the three coefficientsare calculated as follows:

y0 = Di · Q (3)= (MT

1~D′i,M

T2~D′′i ) · (M−11

~Q′,M−12~Q′′) (4)

= r(Di ·Q+ εi) + t (5)

y1 = Di · Q1 + Di,1 · Q (6)

y2 = Di,1 · Q1 (7)

Accordingly, the polynomial corresponding to σi evalu-ates to f(rL1

, ..., rL2n) at the secret point α, i.e. y0 +

y1α + y2α2. Note here keyword search f has 2n inputs

as illustrated in Fig. 2. L1, L2, ..., Ln are labels for thedocument subindex, and Ln+1, Ln+2, ..., L2n are for thesearch trapdoor.

Verify(sk, f,~L,m, σ). C parses ~L as (L1, ..., L2n) and σas (y0, y1, y2). Then it sets P = (f, L1, ..., L2n),and invokes RealHomMAC.Ver(sk,P,m, σ) to obtainthe result. Specifically, the algorithm computes f0 =f(rL1 , ..., rL2n) where rLi = FK(Li). Finally, C verifies

the result by checking the following equation:

m = y0 (8)

f0 =2∑

k=0

ykαk (9)

According to the definition of our homomorphic MACgiven in Section III, f0 = f(rL1

, ..., rL2n) is actually

the search function f computed over the pseudorandomnumbers generated from labels, so it should be equal tothe result evaluated at point α for the polynomial definedby (y0, y1, y2). If both equations are satisfied, C acceptsy0 as the search result; otherwise C rejects the result.

Fig. 3 illustrates the proposed scheme with a simple ex-ample. Besides Setup is omitted in the figure, Trapdoor isnot showed as it is similar to Auth. In this example, anextended document index ~Di is split into two vectors ~D′i and~D′′i according to the secret S. Then they are encrypted bymultiplying with secret matrices M1,M2, and the resultingvector is authenticated using our homomorphic MAC, yieldingDi and Di,1. Here Di and Di,1 can be viewed as coefficientsof a degree-1 polynomial, and the polynomial evaluates to Di

at point 0 and Di,1 at point α.The authentication tags of the document index and the query

are multiplied to obtain the final result σi = (y0, y1, y2). TheKeywordSearch is basically a multiplication of two polynomi-als defined by the authenticated tags of Di and Q.

C. Support for Top-k Verification

Usually the client is only interested in receiving k mostrelevant results for his search query so as to save communi-cation cost. However, the scheme described above achievesverifiability for every single similarity score, not for top-ksearch results. The cloud server may return incorrect top-kresults either because it wants to save energy or the cloudcomputing system goes faulty. For instance, the cloud serveronly searches a part of the document set and returns the top-ksearch results.

It is not easy for the client to discover that since theclient does not know all similarity scores computed by thecloud server. To solve this problem, we propose the randomchallenge technique to achieve verifiability for VPSearch.Random Challenge. A natural way to achieve verifiabilityfor top-k results is for the client to verify similarity scores ofsome documents randomly selected by the client. Only if allthese documents have smaller similarity scores than those ofthe top-k results returned by the cloud server, can the clientbe sure that the returned k results are truly top-k for someprobability.

Suppose the client C has outsourced N documents to thecloud, and the client expects to receive top-k results with thehighest scores with respect to a query Q for some k � N .C randomly chooses m documents and requests for theirsimilarity scores from the cloud server S. Thus the probabilityof C detecting a fraction p of top-k results are not returned byS can be computed as follows:

Pdetect = 1− (1− pk

N − k )m. (10)

Page 8: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

8

When pk � N − k, the detection probability is close topkmN . Given k,N and p, the client can control trade-off

between the detection probability Pdetect and communicationcost O(m). With Fig. 4(a) we show the relationship betweenthe probability of detection Pdetect and the number of randomlyselected documents m for k = 100 and p = 0.1. Forinstance, given N = 10, 000, k = 100 and p = 0.1, one canchoose m = 100 resulting in Pdetect = 9.6%; or one choosesm = 1000 so that Pdetect = 63.6%.

(a) Pdetect for p = 0.1 and k = 100 (b) Pdetect for m = 1000 and N =100, 000

Fig. 4: Pdetect as a function of the number of randomly selecteddocuments m, N , p and k.

Fig. 4(b) shows the relationship between the probability ofdetection Pdetect and k of top-k retrieval for N = 100, 000and m = 1000. It can be seen that the detection probabilityincreases as the number of returned results k increases. Itis highly possible that misbehavior of the cloud server isundetected for small k.Random Challenge with Ordering. As the above detectionprobability of incorrect top-k results is too low for large N(though it can detect incorrect top-k results returned by a ma-licious cloud server), we propose the following enhancementcalled random challenge with ordering.

First, after the cloud server has searched all documentsfor the client, the cloud server sorts the resulted similarityscores in descending order. The sorting operation will incuran overhead of complexity O(N logN). Then the sorted list,along with the top-k results, is returned to the client.

Next, the client challenges the cloud server with m ran-domly selected documents, and the cloud server returns thesimilarity scores with authentication tags to the client.

Finally, the client verifies: 1) the similarity scores of them documents are correct; 2) all the m similarity scores areless than that of the top-k results; 3) the ordering of the msimilarity scores is correct. If all verifications are successful,the client accepts the top-k results.

Suppose the cloud server calculates similarity scores foronly a fraction q of the N outsourced documents. For theremaining (1− q)N documents, the cloud server decides notto compute their similarity scores. Then the cloud server needsto compose a sorted list of documents in descending order withonly qN similarity scores. Without knowing other (1 − q)Nsimilarity scores, the cloud server can only return a randompermutation of the (1 − q)N documents. When the clientchallenges the cloud server for similarity scores of m randomdocuments, the client would detect that the top-k results are

incorrect with the following probability:

Pdetect = 1− 1

P(1−q)m(1−q)m · Cmqm

= 1− (qm)!

m!,

where Pmn denotes permutation and Cmn denotes combination.Note the above probability is independent of N and k. When

m = 20 and q = 0.9, meaning that the cloud server hashonestly searched 90% documents, the detection probabilityis Pdetect = 99.7%. So it is a significant improvement overthe random challenge approach without ordering. If a partiallyhonest cloud server has calculated all similarity scores, it willreturns the correct top-k results to the client as this is in itsbest interest.

VI. ANALYSIS AND DISCUSSION

A. Security of RealHomMAC

VPSearch employs the homomorphic MAC techniqueRealHomMAC adapted from HomMAC in [23], thus it hassimilar features and security properties, namely authenticationcorrectness, evaluation correctness, succinctness and authenti-cator security.

Authentication correctness means that the authentication tagoutput by Auth correctly authenticates the data m under thecorresponding label. Evaluation correctness means that if ~σauthenticates data m1, ...,mn, then σ authenticates the theoutput of f(m1, ...,mn). Succinctness means the size of tag σis bounded by a polynomial in the security parameter λ, and isindependent from the number of inputs or the size of circuit f .Authenticator security means the homomorphic MAC schemeis secure against forgery.

First of all, RealHomMAC satisfies authentication correct-ness and evaluation correctness like HomMAC. This canbe readily proved in the same way as in [23]. Secondly,RealHomMAC also satisfies succinctness as the size of theresulting authentication tag is independent of the number ofinputs of the evaluated function.

Security of RealHomMAC can be argued intuitively. Asdescribed in Section III, the cloud server knows only coef-ficients of a set of polynomials, i.e. (mi, (ri −mi)/α) whereri and α are unknown to the cloud server. Hence, there aren+ 1 unknowns while the cloud server can only establish anequation system of n linear equations. So it is impossible forthe cloud server to solve the linear equation system if ri isgenerated from a pseudorandom function and α is a randomsecret.

The main difference between our homomorphic MAC tech-nique and HomMAC is that RealHomMAC deals with realnumbers while HomMAC deals with integers in finite fields.Therefore, our MAC scheme is expected to have similarsecurity property as HomMAC.

However, RealHomMAC has overflow problems which canhave impact on its security. Suppose we adopt the double pre-cision floating number format defined in IEEE 754 standard.A double-precision floating number of 64-bit size ranges from−(2− 2−52) · 21023 to (2− 2−52) · 21023, with 11 bits for theexponent and 52 bits for the fraction. To prevent overflowproblems in VPSearch, we can restrict the largest values in

Page 9: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

9

the matrices and vectors. According to VPSearch, the largestpossible value comes from y1 in Equation (6). For dictionarysize n = 100, 000 and the largest double-precision floatingnumber allowed M , y1 ≤ 4n2M4 = 4 · 1010 · (M)4 ≤ 21023.Solving this inequation, we get M ≤ 2246. So the exponentpart of the floating number should be less than 246, i.e. only8 bits (1 as the exponent sign) instead of 11 bits can be freelyused for the exponent. That is, for l-bit floating numbers, theactual security of VPSearch is λ = l − 3 bits.

Formally, security of RealHomMAC can be obtained fromthe following theorem:

Theorem 6.1: RealHomMAC is a secure homomorphicMAC scheme that is secure against forgeries if FK() is apseudorandom function.

The above theorem can be rigorously proved in the sameway as in [23], and it is omitted to avoid repetition.

B. VerifiabilityVerifiability achieved by VPSearch is the key to motivate

a partially honest cloud server to act honestly. For a partiallyhonest cloud service provider, it tries its best to reduce energycost while maximizing revenue. If there is no verificationmechanism in the keyword search algorithm, then the cloudserver might simply return a wrong result without doing anycomputation.

Verifiability of VPSearch directly follows from the e-valuation correctness and authenticator security property ofVPSearch. With evaluation correctness, VPSearch guaranteesthat the client can correctly verify the authentication tagproduced by the cloud server. With the authenticator securityproperty, VPSearch ensures that the client would not acceptany forged authentication tag produced by a malicious cloudserver.

Top-k Verifiability. With the random challenge techniquewith ordering, the detection probability can reach close to1 for just tens of challenges, independent of the number ofdocuments and k. However, we emphasize that a maliciouscloud server, i.e. not partially honest, can still manage to cheatin answering top-k queries.

C. EfficiencyIn the following analysis, n denotes the dictionary size and

N is the number of documents. We analyze complexity ofeach algorithm in VPSearch as follows:

Setup. The main computation cost in this algorithm is inver-sion of two matrices, whose complexity is O(n3). However,this step is one-time and can be pre-computed, so it will nothave impact on runtime performance of VPSearch.

Index. The complexity of constructing a plaintext index fora document set depends on the underlying data set, so wecannot give an analytic complexity for it. For index encryption,its complexity is O(n2N). This operation is also a one-time computation, and it will not have impact on runtimeperformance of VPSearch.

Trapdoor. Trapdoor is generated by multiplying with M−11

and M−12 , so the complexity of constructing a trapdoor forkeyword search is O(n2) for each query.

Auth. The cost of generating authentication tags for adocument index is linear to the dictionary size and the numberof documents, i.e. O(nN). This computation is one-time, andwill not have impact on performance of VPSearch duringkeyword search.

To generate authentication tags for trapdoors, one needs tocompute coefficients of a degree-1 polynomial for each itemin each trapdoor. This computation cost is O(n), independentof the dictionary size.

KeywordSearch. This algorithm involves inner product of asearch trapdoor and each document subindex, so its complexityis O(nN) for each query.

Verify. This algorithm is run by the client, and its com-plexity is O(nN) to verify search results of one query. Forverification of top-k results, the random challenge techniquewith ordering requires the cloud server to order all searchresults, so its complexity is O(N logN).

With regard to storage cost, the encrypted document indextakes up 8nN bytes if we adopt 64-bit double-precision float-ing number format. The authentication tags takes up another8nN bytes. That is, the storage required by VPSearch is twiceas that of Cao et al.’s scheme.

D. Data/Keyword PrivacyVPSearch inherits the privacy-preserving feature of Cao et

la.’s scheme in [10], which has shown that this scheme canachieve data privacy, index privacy, keyword privacy and trap-door unlinkability under the known ciphertext model. Underthe known background knowledge model, simple modificationcan be applied to make it secure.

More specifically, it uses secret matrix multiplication toencrypt the index and the queries. Without the secret matrixM1 and M2, the cloud server has no way to figure out thekeywords queried by the client or the index associated with afile. The similarity score is masked with secret random numberr and t, which ensures privacy of the search results. As thisis not the focus of our work, we refer the readers to [10] formore detailed discussion.

E. Unbounded StorageBecause each document subindex is independent of others,

there is no restriction on the number of documents outsourcedto the cloud. As a result, the client can continuously uploaddocuments onto the cloud, and performs search over alldocuments later.

However, because the index is encrypted by matrix multi-plication, the encryption matrices should also be extended ifa new keyword is added. So if a new keyword is added, thewhole index should be rebuilt. As a result, VPSearch as wellas the underlying MRSE scheme does not support keyworddynamics. For the same reason, multi-keyword ranked search-able encryption schemes proposed in [11], [12], [28] do notsupport keyword dynamics either.

VII. PERFORMANCE EVALUATION

To demonstrate performance of the proposed schemeVPSearch, we have implemented a prototype of VPSearch

Page 10: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

10

103 104 105

# documents to be indexed

100

101

102

103

Tim

e (s

)

NIPS Index EncryptionEnron Index EncryptionNYTimes Index EncryptionNIPS Index GenerationEnron Index GenerationNYTimes Index Generation

(a) Plaintext Index Generation and IndexEncryption

2000 4000 6000 8000 10000 12000# keywords in dict

0

2

4

6

8

10

12

14

16

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(b) Index Encryption for NIPS

2000 4000 6000 8000 10000 12000# keywords in dict

0

50

100

150

200

250

300

350

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(c) Index Encryption for Enron

2000 4000 6000 8000 10000 12000# keywords in dict

0

500

1000

1500

2000

2500

3000

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(d) Index Encryption for NYTimes

Fig. 5: Time Cost for Index Construction

using Matlab. Then a set of experiments are conducted toevaluate its performance. All experiments are conducted ona laptop equipped with an Intel Core i7-6560U CPU (2-core2.2GHz) and 16GB RAM running 64-bit Ubuntu 16.04. Weuse 64-bit Matlab R2015b for Linux to implement VPSearchand python to process the document indices.

We evaluate VPSearch’s performance on three UCI bag-of-words data sets:

TABLE I: UCI bag-of-words data setsData set Name Short Name # Documents # Keywords

NIPS full papers NIPS 1,500 12,419Enron Emails Enron 39,861 28,102

NYTimes news articles NYTimes 300,000 102,660

We pad the Enron data set to 40,000 documents and removemeaningless keywords of the NYTimes data set. These threedata sets are chosen as they have different document numbersand the performance over them can provide insights of ourproposed scheme. We conduct comprehensive experimentsover dictionaries with different sizes and different number ofdocuments from these data sets.

As VPSearch is designed by integrating the homomorphicMAC technique with Cao et al.’s scheme in [10], we comparethe its performance with that of VPSearch.

A. Index Construction

The index construction is the most time-consuming step inVPSearch, but note that this step is executed only once.

Fig. 5 shows the time cost for index construction. Fig. 5(a)illustrates the time for index construction when the dictionarysize is 4000. The solid lines in Fig. 5(a) represent the time costof plaintext index generation. The plaintext index generation isdone by scanning the bag-of-words format files with a pythonprogram. To save storage space and speed up processing time,we use bit array to construct the indices. The dashed linesin Fig. 5(a) represent the time for index encryption. For eachdata set, we obtain the computation time for 20%, 40%, 60%,80% and 100% of the documents in the data set.

It can be seen that the time cost for either plaintext indexgeneration or index encryption is linear to the number of doc-uments when the dictionary size is fixed. When the dictionarysize is 4000, the time to construct plaintext document index is

around 34 seconds for 1500 documents (NIPS), 181 secondsfor 40000 documents (Enron), and 3693 seconds for 300,000documents (NYTimes); the time to encrypt document index isaround 1.2 seconds for 1,500 documents (NIPS), 40 secondsfor 40,000 documents (Enron), and 300 seconds for 300,000documents (NYTimes).

Fig. 5(b), 5(c), and 5(d) show the time cost for indexencryption for all three data sets respectively. The time costis approximately linear to the dictionary size, i.e. the numberof keywords in the dictionaryb. Although it takes about 2500seconds to encrypt the entire document index for NYTimes,this time cost is acceptable for a one-time operation. Note thatthe index generation/encryption of VPSearch is essentially thesame as that of Cao et al.’s scheme [10].

B. Trapdoor Generation and Authentication

Fig. 6 illustrates the time cost of trapdoor generation andauthentication respectively.

2000 4000 6000 8000 10000 12000# keywords in dict

0

1

2

3

4

5

6

7

8

9

Tim

e (s

)

# Queries = 10# Queries = 100# Queries = 1000

(a) Trapdoor Generation

2000 4000 6000 8000 10000 12000# keywords in dict

0.00

0.05

0.10

0.15

0.20

0.25

Tim

e (s

)

# Queries = 10# Queries = 100# Queries = 1000

(b) Trapdoor Authentication

Fig. 6: Time Cost for Trapdoor Generation and Authentication

According to Fig. 6(a), the time cost of trapdoor generationexhibits quadratic growth with respect to dictionary size.It costs only 8.24 seconds to generate 1000 trapdoors fordictionary size 12,000. Fig. 6(b) shows that the trapdoorauthentication time is almost strictly linear to dictionary sizeand the number of queries. It costs only 0.252 seconds toauthenticate 1000 trapdoors for the dictionary with 12,000

bIn our experiments, dictionary size ranges from 2,000 to 12,000. For largerdictionary sizes, the computation cost is too much for our laptop.

Page 11: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

11

keywords, which is about 30 times faster than trapdoor genera-tion. VPSearch and Cao et al.’s scheme have the same trapdoorgeneration operation, so the above results mean that trapdoorauthentication incurs about 3% overhead on top of Cao et al.’sscheme.

C. Authenticating Data set Index

Fig. 7 shows the time cost for generating authentication tagsfor the document indices of Enron and NYTimes.

2000 4000 6000 8000 10000 12000# keywords in dict

0

2

4

6

8

10

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(a) Authenticating the Index of Enron

2000 4000 6000 8000 10000 12000# keywords in dict

0

10

20

30

40

50

60

70

80

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(b) Authenticating the Index of NY-Times

Fig. 7: Time Cost for Authenticating Document IndicesFrom Fig. 7(a) and Fig. 7(b) it can be seen that the

time cost is roughly linear to the size of dictionary and thenumber of documents to be authenticated. VPSearch takes9.80 seconds and 73.81 seconds to generate authenticationtags for all documents in Enron (40,000 documents) andNYTimes (300,000 documents), respectively. The time costof authenticating index is less than 3% of that of indexencryption operation (cf. Fig. 5(c), 5(d)), which means indexauthentication of VPSearch adds only 3% overhead comparedto Cao et al.’s scheme.

D. Keyword Search

Fig. 8 shows the time cost for keyword search for 100randomly generated queries over Enron and NYTimes. Inaccordance with our analysis, the cost of keyword search islinear to the dictionary size and the number of documents tobe searched. As showed in Fig. 8(a) and 8(b), it costs 13.76seconds to finish 100 searches over all 40,000 documents inEnron, and 98 seconds over all documents in NYTimes.

The keyword search complexity of VPSearch is 4 times asthat of Cao et al.’s scheme [10], since VPSearch operationson two coefficients of degree-1 polynomials while Cao et al.’sscheme works directly on real numbers. Note that the keywordsearch operation can be executed on multiple servers in parallelto reduce time.

E. Verification

In Fig. 9 we show the verification cost for Enron andNYTimes for 100 randomly generated queries. In accordancewith our analysis, the verification cost is roughly linear tothe size of dictionary and the number of results (searcheddocuments). As showed in Fig. 9(a) and 9(b), it costs 3.27seconds to verify results of 100 query for all 40,000 documents

2000 4000 6000 8000 10000 12000# keywords in dict

0

2

4

6

8

10

12

14

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(a) Keyword Search over Enron

2000 4000 6000 8000 10000 12000# keywords in dict

0

20

40

60

80

100

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(b) Keyword Search over NYTimes

Fig. 8: Time Cost for 100 Keyword Search Queries)

in Enron, and 28.75 seconds to verify results of 100 query forall documents in NYTimes. Note that for each query, there areN similarity scores to be verified where N is the number ofdocuments.

Normally, the client only requests top-k results, so theverification cost will be significantly less than our results.

2000 4000 6000 8000 10000 12000# keywords in dict

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(a) Search Verification for Enron

2000 4000 6000 8000 10000 12000# keywords in dict

0

5

10

15

20

25

30

Tim

e (s

)

20% documents40% documents60% documents80% documents100% documents

(b) Search Verification for NYTimes

Fig. 9: Time Cost for Verification of 100 Queries

F. Runtime Performance

We compare the runtime performance of VPSearch withthat of Cao et al.’s scheme in [10] for 100 queries over theentire Enron data set in Fig. 10, which includes time costs oftrapdoor generation, trapdoor authentication, keyword searchand verification. Time costs of index construction and indexauthentication are excluded as these two operations can bepre-computed.

Fig. 10 shows that the performance of VPSearch at runtimeis roughly 3 times slower than that of Cao et al.’s scheme. Themain cost comes from keyword search, which can be executedon multiple servers in parallel to reduce time. Nevertheless,VPSearch is still efficient as the total time cost is less than 18seconds for 100 queries over the entire Enron.

VIII. CONCLUSION AND FUTURE WORK

In this paper we have proposed the first verifiable privacy-preserving multi-keyword search scheme for cloud comput-ing. Our scheme is implemented by applying an efficienthomomorphic MAC scheme on a privacy-preserving multi-keyword scheme, and we have made necessary modifications

Page 12: VPSearch: Achieving Verifiability for Privacy …of achieving verifiability for privacy-preserving multi-keyword search over encrypted cloud data. Different from the honest-but-curious

1545-5971 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2016.2635128, IEEETransactions on Dependable and Secure Computing

12

2000 4000 6000 8000 10000 12000# keywords in dict

0

10

Tim

e (s

)Trapdoor GenerationTrapdoor AuthenticationKeyword SearchVerification

Fig. 10: Runtime Performance of VPSearch and Comparisonwith Cao et al.’s scheme [10] for 100 queries over the entireEnron. The left bars are for Cao et al.’s scheme, and the rightbars are for VPSearch.

to the homomorphic MAC scheme so that it supports privacy-preserving multi-keyword search. We also provide a verifica-tion technique called random challenge with ordering for top-k search results. We have analyzed security of our schemeand showed that it fulfills the requirement of verifiability,efficiency, data/keyword privacy, and unbounded storage.

However, VPSearch is not efficient for large-scale databasessince the cloud needs to search through the whole database,which is very inefficient. Our future work in this line willbe enhancements for efficient verification for large-scale out-sourced data.

ACKNOWLEDGEMENT

This work was supported by the National Natural ScienceFoundation of China under Grant 61370027.

REFERENCES

[1] D. Song, D. Wagner, and A. Perrig, “Practical techniques for searcheson encrypted data,” in Proc. of IEEE S&P, 2000, pp. 44–55.

[2] Y.-C. Chang and M. Mitzenmacher, “Privacy preserving keyword search-es on remote encrypted data,” in Proc. of ACNS, 2005, pp. 391–421.

[3] C. Wang, N. Cao, K. Ren, and W. Lou, “Enabling secure and efficientranked keyword search over outsourced cloud data,” IEEE Trans. Par-allel Distrib. Syst., vol. 23, no. 8, pp. 1467–1479, 2012.

[4] P. Golle, J. Staddon, and B. R. Waters, “Secure conjunctive keywordsearch over encrypted data,” in Proc. of ACNS, 2004, pp. 31–45.

[5] D. Boneh and B. Waters, “Conjunctive, subset, and range queries onencrypted data,” in Proc. of TCC, 2007, pp. 535–554.

[6] Y. Hwang and P. Lee, “Public key encryption with conjunctive keywordsearch and its extension to a multi-user system,” in Proc. of Pairing,2007, pp. 2–22.

[7] E. Shen, E. Shi, and B. Waters, “Predicate privacy in encryptionsystems,” in Proc. of TCC, 2009, pp. 457–473.

[8] W. K. Wong, D. W. Cheung, B. Kao, and N. Mamoulis, “Secure knncomputation on encrypted databases,” in Proc. of ACM SIGMOD, 2009,pp. 139–152.

[9] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preservingmulti-keyword ranked search over encrypted cloud data,” in Proc. ofINFOCOM, 2011, pp. 829–837.

[10] ——, “Privacy-preserving multi-keyword ranked search over encryptedcloud data,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 1, pp. 222–233, 2014.

[11] Z. Fu, K. Ren, J. Shu, X. Sun, and F. Huang, “Enabling personalizedsearch over encrypted outsourced data with efficiency improvement,”IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 9, pp. 2546–2559, 2016.

[12] Z. Fu, X. Wu, C. Guan, X. Sun, and K. Ren, “Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracyimprovement,” IEEE Trans. Inf. Forensics Security, vol. 11, no. 12, pp.2706–2716, 2016.

[13] Z. Fu, X. Sun, S. Ji, and G. Xie, “Towards efficient content-aware searchover encrypted outsourced data in cloud,” in Proc. of INFOCOM, 2016,pp. 1–9.

[14] B. Applebaum, Y. Ishai, and E. Kushilevitz, “From secrecy to soundness:Efficient verification via secure computation,” in Proc. of ICALP, 2010.

[15] K.-M. Chung, Y. Kalai, and S. P. Vadhan, “Improved delegation ofcomputation using fully homomorphic encryption,” in Proc. of CRYPTO,2010.

[16] R. Gennaro, C. Gentry, and B. Parno, “Non-interactive verifiable com-puting: Outsourcing computation to untrusted workers,” in Proc. ofCRYPTO, 2010.

[17] B. Parno, J. Howell, C. Gentry, and M. Raykova, “Pinocchio: Nearlypractical verifiable computation,” in Proc. of IEEE Symposium onSecurity and Privacy, 2013.

[18] B. Parno, M. Raykova, and V. Vaikuntanathan, “How to delegate andverify in public: Verifiable computation from attribute-based encryption,”in Proc. of TCC, 2012.

[19] D. Fiore and R. Gennaro, “Publicly verifiable delegation of largepolynomials and matrix computations with applications,” in Proc. ofCCS, 2012.

[20] S. Setty, R. McPherson, A. Blumberg, and M. Walfish, “Making ar-gument systems for outsourced computation practical (sometimes),” inProc. of NDSS, 2012.

[21] S. Benabbas, R. Gennaro, and Y. Vahlis, “Verifiable delegation ofcomputation over large datasets,” in Proc. of CRYPTO, 2011.

[22] R. Gennaro and D. Wichs, “Fully homomorphic message authenticators,”in Proc. of ASIACRYPT, 2013.

[23] D. Catalano and D. Fiore, “Practical homomorphic macs for arithmeticcircuits,” in Proc. of EUROCRYPT, LNCS 7881, 2013, pp. 336–352.

[24] M. Backes, D. Fiore, and R. M. Reischuk, “Verifiable delegation ofcomputation on outsourced data,” in Proc. of ACM CCS, Nov. 2013, pp.863–874.

[25] D. Boneh and D. M. Freeman, “Homomorphic signatures for polynomialfunctions,” in Proc. of EUROCRYPT, 2011.

[26] D. Fiore, R. Gennaro, and V. Pastro, “Efficiently verifiable computationon encrypted data,” in Proc. of ACM CCS, New York, NY, USA, 2014,pp. 844–855.

[27] UCI Machine Learning Repository, “Bag of words data set.” [Online].Available: https://archive.ics.uci.edu/ml/datasets/Bag+of+Words

[28] Z. Xia, X. Wang, X. Sun, and Q. Wang, “A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data,” IEEE Trans.Parallel Distrib. Syst., vol. 27, no. 2, pp. 340–352, 2016.

[29] R. Johnson, D. Molnar, D. X. Song, and D. Wagner, “Homomorphicsignature schemes,” in Proc. of CT-RSA, 2002.

[30] S. Goldwasser, Y. T. Kalai, R. A. Popa, V. Vaikuntanathan, and N. Zel-dovich, “Reusable garbled circuits and succinct functional encryption,”in Proc. of STOC, 2013, pp. 555–564.

[31] B. Libert, T. Peters, M. Joye, and M. Yung, “Linearly homomorphicstructure-preserving signatures and their applications,” in Proc. of CRYP-TO, 2013, pp. 289–307.

[32] D. Catalano, A. Marcedone, and O. Puglisi, “Authenticating computationon groups: New homomorphic primitives and applications,” CryptologyePrint Archive, Report 2013/801, 2013.

[33] C. Joo and A. Yun, “Homomorphic authenticated encryption secureagainst chosen-ciphertext attack,” Cryptology ePrint Archive, Report2013/726, 2013.

[34] Z. Fu, X. Sun, Q. Liu, L. Zhou, and J. Shu, “Achieving efficient cloudsearch services: Multi-keyword ranked search over encrypted cloud datasupporting parallel computing,” IEICE Transactions, vol. 98-B, no. 1,pp. 190–200, 2015.

[35] W. Sun, B. Wang, N. Cao, M. Li, W. Lou, Y. T. Hou, and H. Li,“Verifiable privacy-preserving multi-keyword text search in the cloudsupporting similarity-based ranking,” IEEE Trans. Parallel Distrib. Syst.,vol. 25, no. 1, 2014.

[36] R. Cheng, J. Yan, C. Guan, F. Zhang, and K. Ren, “Verifiable searchablesymmetric encryption from indistinguishability obfuscation,” in Proc. ofASIACCS. New York, NY, USA: ACM, 2015, pp. 621–626.