20
IEEE Proof Web Version IEEE SYSTEMS JOURNAL 1 Secure Computation for Biometric Data Security—Application to Speaker Verification Bon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation techniques for two biometric security tasks in complex distributed systems involving multiple parties; namely biometric data retrieval and authentication. We first present models for privacy and security that delineate the conditions under which biometric data disclosure are allowed. We then discuss the secure computation techniques for retrieval and au- thentication that satisfy the conditions for privacy and security. For proof-of-concept, we show a practical implementation of a privacy preserving speaker verification system and discuss the performance tradeoff. Index Terms—Biometrics, privacy, secure computation. I. INTRODUCTION B IOMETRIC security is based on “something one is” rather than “something one knows/has”. We argue that privacy must be properly addressed when biometric data are retrieved for security application. When biometric data are inadvertently disclosed or leaked to the unauthorized parties, it has the conse- quence of “lose it once, it’s gone forever.” In this research, we are interested in biometric security in complex distributed systems involving multiple parties. Specif- ically, our focus is on secure computation techniques for bio- metric data retrieval and authentication that are provable private and secure. The scope of this research could be considered using the following hypothetical scenario on privacy preserving bio- metric security application: There is a database of biometric data (e.g., voiceprints) about individuals. Due to privacy concerns, the biometric data are scattered across different depository systems. The data in each depository system is withheld by one inde- pendent data escrow agency within the law enforcement unit. Let’s assume a phone was wiretapped and the con- versation about a crime was recorded. We need to retrieve the biometric data on voiceprint features from the database and perform a match between the voiceprint features ex- tracted from the database and that from the phone system for speaker verification/identification purpose. Manuscript received December 07, 2009; revised October 24, 2009. This work was supported in part by the National Science Foundation (NSF) under DUE 0837535 and by a PSC-CUNY Research Award. A preliminary version of this work appeared in EuroISI 2008. The author is with Queens College and University Graduate Center, Computer Science Department, City University of New York, Flushing NY,11367 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSYST.2009.2035979 For security and privacy reason, we only allow biometric data retrieval in the presence of an “electronic warrant” from an au- thority; e.g., judge. When an “electronic warrant” is issued, all agencies will collaborate to participate in a secure multiparty computation to re-construct the biometric data. This particular scenario raises an emerging question: what is a reasonable ex- pectation of privacy if we have to balance security and privacy [1], [20]? In other words, can we technically facilitate the right to control over information gathering and access, and partic- ularly the information derived from one’s body and personal space? To make this concrete, consider a law enforcement agency needs biometric data for verification, identification, or surveil- lance purposes, how could this be achieved in a provable private and secure manner? Specific technical questions to address in this research using the above example are: 1) What provable security and privacy properties should be introduced for biometric data retrieval, and for subsequent applications such as biometric verification or identifica- tion? 2) What secure multiparty computation scheme is appropriate for the data escrow agencies and other parties to collaborate in computing the biometric data? The objective of this research is three folds. First, we examine models that encapsulate security and privacy properties in terms of their reasonableness and appropriateness for biometric data retrieval. Second, we develop secure multiparty computation techniques for recovering biometric data and for authentication that are provable secure and private according to our models. Third, we demonstrate a practical implementation of a privacy preserving speaker verification system for proof-of-concept. The contribution of this research is a novel and practical scheme for privacy preserving biometric data retrieval and authentication, and a proof-of-concept application to the state-of-the-art speaker verification techniques and system implementation. The main idea of our scheme is to first de- lineate the conditions for biometric data retrieval as well as the “capabilities” of the participating entities in the form of privacy and security models. Biometric data retrieval is then formulated as a secure multiparty computation problem for solving a system of algebraic equations; whereas the solution for the algebraic system is a feature vector of the biometric data. Biometric authentication is also formulated as a secure computation problem for computing Kullback-Leibler (KL) distance measure between the Gaussian model of a creden- tial presented for verification and the Gaussian model of the corresponding biometric reference template. The decision on acceptance/rejection is then based on a comparison of the KL 1932-8184/$26.00 © 2009 IEEE

Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

IEEE SYSTEMS JOURNAL 1

Secure Computation for Biometric DataSecurity—Application to Speaker Verification

Bon K. Sy, Member, IEEE

Abstract—The goal of this research is to develop provablesecure computation techniques for two biometric security tasks incomplex distributed systems involving multiple parties; namelybiometric data retrieval and authentication. We first presentmodels for privacy and security that delineate the conditionsunder which biometric data disclosure are allowed. We thendiscuss the secure computation techniques for retrieval and au-thentication that satisfy the conditions for privacy and security.For proof-of-concept, we show a practical implementation of aprivacy preserving speaker verification system and discuss theperformance tradeoff.

Index Terms—Biometrics, privacy, secure computation.

I. INTRODUCTION

B IOMETRIC security is based on “something one is” ratherthan “something one knows/has”. We argue that privacy

must be properly addressed when biometric data are retrievedfor security application. When biometric data are inadvertentlydisclosed or leaked to the unauthorized parties, it has the conse-quence of “lose it once, it’s gone forever.”

In this research, we are interested in biometric security incomplex distributed systems involving multiple parties. Specif-ically, our focus is on secure computation techniques for bio-metric data retrieval and authentication that are provable privateand secure. The scope of this research could be considered usingthe following hypothetical scenario on privacy preserving bio-metric security application:

There is a database of biometric data (e.g., voiceprints)about individuals. Due to privacy concerns, the biometricdata are scattered across different depository systems. Thedata in each depository system is withheld by one inde-pendent data escrow agency within the law enforcementunit. Let’s assume a phone was wiretapped and the con-versation about a crime was recorded. We need to retrievethe biometric data on voiceprint features from the databaseand perform a match between the voiceprint features ex-tracted from the database and that from the phone systemfor speaker verification/identification purpose.

Manuscript received December 07, 2009; revised October 24, 2009. Thiswork was supported in part by the National Science Foundation (NSF) underDUE 0837535 and by a PSC-CUNY Research Award. A preliminary version ofthis work appeared in EuroISI 2008.

The author is with Queens College and University Graduate Center, ComputerScience Department, City University of New York, Flushing NY, 11367 USA(e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSYST.2009.2035979

For security and privacy reason, we only allow biometric dataretrieval in the presence of an “electronic warrant” from an au-thority; e.g., judge. When an “electronic warrant” is issued, allagencies will collaborate to participate in a secure multipartycomputation to re-construct the biometric data. This particularscenario raises an emerging question: what is a reasonable ex-pectation of privacy if we have to balance security and privacy[1], [20]? In other words, can we technically facilitate the rightto control over information gathering and access, and partic-ularly the information derived from one’s body and personalspace?

To make this concrete, consider a law enforcement agencyneeds biometric data for verification, identification, or surveil-lance purposes, how could this be achieved in a provable privateand secure manner? Specific technical questions to address inthis research using the above example are:

1) What provable security and privacy properties should beintroduced for biometric data retrieval, and for subsequentapplications such as biometric verification or identifica-tion?

2) What secure multiparty computation scheme is appropriatefor the data escrow agencies and other parties to collaboratein computing the biometric data?

The objective of this research is three folds. First, we examinemodels that encapsulate security and privacy properties in termsof their reasonableness and appropriateness for biometric dataretrieval. Second, we develop secure multiparty computationtechniques for recovering biometric data and for authenticationthat are provable secure and private according to our models.Third, we demonstrate a practical implementation of a privacypreserving speaker verification system for proof-of-concept.

The contribution of this research is a novel and practicalscheme for privacy preserving biometric data retrieval andauthentication, and a proof-of-concept application to thestate-of-the-art speaker verification techniques and systemimplementation. The main idea of our scheme is to first de-lineate the conditions for biometric data retrieval as well asthe “capabilities” of the participating entities in the form ofprivacy and security models. Biometric data retrieval is thenformulated as a secure multiparty computation problem forsolving a system of algebraic equations; whereas the solutionfor the algebraic system is a feature vector of the biometricdata. Biometric authentication is also formulated as a securecomputation problem for computing Kullback-Leibler (KL)distance measure between the Gaussian model of a creden-tial presented for verification and the Gaussian model of thecorresponding biometric reference template. The decision onacceptance/rejection is then based on a comparison of the KL

1932-8184/$26.00 © 2009 IEEE

Page 2: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

2 IEEE SYSTEMS JOURNAL

distance to a preset threshold; whereas the credential could becomposed of an “electronic badge/ID” and some biometric data.The integrity and confidentiality of the data exchange duringthe handshake among the participating entities is protected byapplying asymmetric encryption.

II. REVIEW ON PRIOR RESEARCH

There are two main avenues to privacy preserving data re-trieval; namely lossy and lossless retrieval. Lossy retrieval pro-tects private content typically by means of perturbation, ran-domization or masking of the original data to the extent thatit could still be useful for the end users [2], [3]. In a lossy re-trieval, the original content is not preserved for the end user.Lossless retrieval, on the other hand, protects computational pri-vacy while preserving content [4]. In other words, the end usercan retrieve the original content but is limited to what is allowedby the computational mechanism of the retrieval process. Lossyretrieval is sufficient in some applications such as video surveil-lance [5] that actually relies on the “lossy” nature to achieveprivacy protection. However, in this research we concentrate onlossless retrieval that guarantees private computation and recon-struction of the original biometric data.

In biometrics, we argue that data must retain the features ofthe intrinsic physical or behavioral traits of a human being ifsuch features are to be useful for certain practical authentica-tion applications such as verification or identification of an in-dividual. A slight variation on biometric data may alter the fea-tures to an extent that prevents a direct application for the in-tended purposes. For example, the fingerprint recognition per-formance is very sensitive to the quality of elderly fingerprintimages [6]. Elderly fingerprint has a large number of minutiaepoints. Poor image quality skews the frequency distribution ofthe minutiae points and affects the recognition performance.

Our focus in this research is on privacy preserving losslessretrieval and specifically on secure multiparty computation [7],[8] that is provable secure and private. Secure multiparty com-putation (SMC) deals with the problem in which multiple partieswith private inputs would like to jointly compute some functionof their inputs but no party wishes to reveal its private input tothe other participants. For example, each data custodian withpartial biometric template and a law enforcement agency withsample biometric data participate in SMC to jointly compute theoutput of a matching function for biometric identification. Themultiparty computation problem was first introduced by Yao [7]and extended by Goldreich et al. [8], and by many others.

Goldreich [9] pointed out that solutions to specific problemsshould be developed and customized for efficiency reasons. Duand Atallah [10], [11] presented a series of specific solutionsto specific problems; e.g., privacy-preserving cooperative scien-tific computations, privacy-preserving database query, and pri-vacy-preserving geometric computation. In their privacy-pre-serving cooperative scientific computations research [10], theyproposed a protocol between two parties to solve the problem

, where matrix and vector be-long to party P1, matrix and vector belong to party P2. Atthe end of the protocol, both parties know the solution x while

nobody knows the other party’s private inputs. Each party’s pri-vate data are protected by the 1-out-of-N oblivious transfer pro-tocol [12], [13] and by splitting , , , and into a setof random matrices. However, a 1-out-of-N oblivious transfer incertain application could be computationally expensive [14].

In this research we tackle privacy preserving biometricdata retrieval in a way similar to privacy-preserving coop-erative scientific computation (PPCSC). We first solve y in

, and then reconstruct thesolution for the original problem through . The parties P1and P2 as in PPCSC will generate invertible random matrices

and respectively. However, instead of applying the1-out-of-N oblivious transfer protocol, we employ homomor-phic encryption and singular value decomposition (SVD) onand to achieve privacy protection. Our approach is to takeeach private matrix and to break it down into matrices throughSVD, which gives us a partial view of the information neededfor computing the biometric data to be retrieved. We then useSMC and homomorphic encryption to share the partial infor-mation between the participants in such a way that the originalbiometric data can be reconstructed in the PPCSC withoutrevealing any private information not intended for sharing.

As noted in the previous research by others, the efficiency ofSVD is inversely proportional to its complexity O(mnr) [15];where m and n are the number of rows and columns in an mxnmatrix, and r is the rank of the matrix. On the other hand, thecomplexity of 1-out-of-N oblivious transfer protocol is in theorder of [16]; where d is the size of the secure eval-uation circuit. Recent development has suggested that the ef-ficiency of oblivious transfer can be improved to forPPCSC; where is a security parameter. As suggested in [12],a typical value of for providing reasonable security is 256.Yet the rank r of SVD, which is related to the number of di-mensions chosen for representing biometric features, is typi-cally less than that. For example, it has been reported elsewherethat Eigen face recognition [4] can achieve reasonably good re-sults with the size of Eigen face vector being 20; i.e., .In our speaker verification system, r is approximately 50. Onthe other hand, state-of-the-art speaker verification system usingSupport Vector Machine may extend the dimension of featurevector—thus the value of r—to about 300 for improved perfor-mance (reaching above 95%).

III. MODELS

The main characteristic of our proposed privacy model is sep-aration of duty among four entities; namely, the (judge) au-thority, the (FBI) biometric data inquirer, the biometric data cus-todians, and some central authority CA (a role similar to, for ex-ample, VeriSign). In other words, no one single party is allowedor could retrieve the biometric data. The only exception is theowner of the biometric data who has an “electronic badge” forretrieving his/her own biometric data.

Biometric data retrieval by law enforcement agencies can beachieved only when all relevant parties collaborate in a securemultiparty computation. Furthermore, to prevent collusions, anexplicit approval from the authority in form of an “electronicwarrant” is also required for the retrieval process. The following

Page 3: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

SY: SECURE COMPUTATION FOR BIOMETRIC DATA SECURITY—APPLICATION TO SPEAKER VERIFICATION 3

scenario is the basis of our privacy model comprised of sevenconditions (Py1–Py7) that follows.

Assume that an individual claims to be Chuck and hasbiometric data Y (say, voiceprint). This voiceprint may notbe associated with the identity “Chuck” if the individual isan impostor. Let XB be the authentic biometric data (say,the voiceprint of the true Chuck) that is associated with theidentity Chuck. Let EI be the information associated withChuck that is known by Alice (FBI agent); e.g., EI could bea hash value of SSN. Bob (judge) has information JD (e.g.,RSA private key), and the “true” Chuck has informationEB (electronic badge).

Py1: Alice (FBI agent) should not know Y, JD, and XB(which also implies Alice cannot compute XB from EI).Py2: The individual claiming to be Chuck should not knowXB unless Y is similar to XB. If Y is similar to XB, thenthe individual is Chuck; i.e., not an impostor. This impliesthe individual is capable of obtaining biometric data, butdoes not know whether the data is sufficiently close to thatof a specific individual such as Chuck.Py3: The individual claiming to be Chuck should not knowJD.Py4: Bob (the judge) should not know Y, XB, or EB.Py5: If Alice presents EI to Bob and Bob agrees, Bob canuse JD and EI to generate EW (electronic warrant) that canbe used to compute XB.Py6: If (the true) Chuck has EB, EB and Y together cancompute the similarity between Y and XB.Py7: Every entity has an electronic identity EI issued by aCentral Authority (CA); whereas EIs are publicly known.

IV. SECURE COMPUTATION

Biometric data for an individual Pi is conceived as a featurevector . The end goal of biometric data retrieval relevant toPi is to obtain . To realize the goal of privacy in regard to theright of control over biometric information access, this researchimposes two conditions: 1) biometric data retrieval by a thirdparty must be approved by an independent authority and 2) theidentity of a requester must be verified prior to receiving bio-metric data.

Ideally, biometric data retrieval should be both computation-ally secure and information-theoretically secure. Biometric dataretrieval is information-theoretically secure if the risk of infor-mation leak does not depend on the amount of computationalpower available for an adversary. A main focus of this researchis to develop secure computation techniques that are provablesecure—with respect to the models in the previous sections—forboth biometric data retrieval and authentication.

To realize the goal of privacy just mentioned at the begin-ning of this section, a sequence of secure computation that is in-formation-theoretically secure is introduced for biometric dataretrieval. The key concept behind is to solve a two-party se-cure computation problem leading to the value of x satisfying

, and then to reconstruct the feature vectoraccordingly. In this two-party secure computation problem,

let P1 be the biometric data inquirer (i.e., the FBI agent Alice)

mentioned earlier, and let P2 be the biometric data custodian.We first present an overview of the critical parts, and then thedetails in the next sections.

Part 1. P1 and P2 each generates their private matrixand , but obtains a common EW vector from the judge.Part 2. P1 and P2 enroll with a Central Authority CA, andjointly compute for CA.Part 3. CA computes the hash value of for P1;i.e., ; where t is some informationabout P1 known by CA (and P2).Part 4: P1 and P2 engage in a secure computation to find xsatisfying ; where .Part 5: Upon deriving x, P2 can verify the identity of P1by first sending CA the value of x, then engages CA indetermining whether is zero; i.e.,

(i) P2 provides , and;(ii) CA provides

to jointly compute

Part 6: If the secure computation in part 5 is zero, P2 sendsP1 the value of .Part 7. P1 computes . (P2at this point discards as it becomes re-identifiable afterstep 6.)

With reference to parts 1 through 7, secure computation be-tween P1 and P2 can succeed only if both have EW. b1 in part3 serves as a one-time trustworthy token issued by CA to assurethe identity of P1 who requests the biometric data . Note thatCA may know through the computation in part 5, but couldnot know unless CA knows EW. In other words, CA plays therole of a verifier with zero knowledge about the biometric data

. With reference to part 6, any passive sniffing cannot obtainwithout knowing , , EW and . Finally, P1 can de-

rive in part 7 only if P1 has the correct value of x and EW.In part 4, P1 provides and while P2 provides andto jointly compute x such that during the intermediate steps

P1 cannot learn and , and P2 cannot learn and . Thebasic idea behind this is to reformulateas , and then to reconstruct

after finding y. In the next section, we will show aSingular Value Decomposition (SVD) approach to achieve theprivate computation of x.

In part 5, note that P2 has while CAhas ; whereas x satisfieswith P1 providing and , and P2 providing and . IfP1 is an impostor, P1 will not know , and will have to makeup an different from . On the other hand, if equals to

, then P1 is not expected to be an impostor.1

Finally, it is noteworthy that and play two importantroles in the privacy preserving biometric data retrieval process.First, the PPCSC algebraic formulation provides a convenientway to incorporate error detection mechanism. For example,augmenting and with one additional row of all 1sprovides a check sum mechanism for detecting communicationerror.

1In a highly unlikely event of degeneration, there could be some � �� �such that �� � � �� � � � � .

Page 4: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

4 IEEE SYSTEMS JOURNAL

Second, and enable the data exchange between P1and P2 to be information-theoretically secure. Withoutand , P1 can still obtain the biometric data from P2. Asimple mechanism could be a straight-forward AES symmetricencryption or RSA asymmetric encryption; i.e., P2 sends P1

. In AES symmetric encryption, PkP1 is a commonsecret key known to both P1 and P2. In RSA asymmetric en-cryption, it is the public key of P1 that P2 knows. However,both mechanisms cannot be information-theoretic secure. Ifan adversary captures through sniffing, the riskof exposing is then dependent on the computational poweravailable. In AES symmetric encryption, available computa-tional power amounts to the chance of discovering the secretkey through brute force or other approaches. In RSA asym-metric encryption, available computational power amounts tothe chance of successfully factoring the product of two largeprime numbers—a key step known in cryptography for reverseengineering DkP1 (private key of P1), thus .

In contrast, note that is never exposed in the above PPCSCalgebraic formulation for reconstructing the biometric data. Theonly time is ever exposed during the message exchange pro-tocol over the network is through in form of . Even if

is not encrypted in the network communication and iscaptured by the adversary in plain text, the adversary still cannotreverse engineer without EW—no matter how much compu-tational power the adversary has; thus, the scheme assures theinformation-theoretic security.

V. DETAILS OF SECURE COMPUTATION AND COMMUNICATION

For completeness, the details on the secure computation forbiometric data retrieval discussed elsewhere [21] is repeatedbelow. The following notions are defined to facilitate the de-scription of biometric data retrieval process.

Assume the (FBI) party P1 has , the biometric datacustodian P2 has , and (the judicial authority) party P3has , where and are some privatematrices and vectors. Let’s also assume each party Pi keepsa secret RSA private key , and shares with each otherthe corresponding public key .

Step 0: Enrollment process for all entitiesEvery party enrolls with the Central Authority (CA) to

obtain an electronic identity . In addition, CA relies on asecure computation function that computes ,and maintains the record in its database witha retrieval function . Furthermore, CAhas a function that computes a unique vectorby hashing the matrix information related to the pair of entitiescomprised of FBI and biometric data custodian on a given t.

Secure 3-Party Computation Protocol for Step 0: The func-tion is privately computed by a third partythrough exponentiation such that neither CA nor the third partycould know and .

Step 0.1 Content: , where k is an encryptionkey, and m is a random message.Sender: Central Authority (CA).Receiver1: FBI (P1) with private matrix .

Receiver2: Biometric data custodian (party P2) with pri-vate matrix .Step 0.2a Content:Sender: P1 (FBI)Receiver: Third party.Step 0.2b Content: .Sender: P2 (Biometric data custodian)Receiver: Third party.Step 0.3 Content:

.Sender: Third party multiplying the messages from P1 atstep 0.2a and that from P2 at step 0.2b.Receiver: CA.

CA computesupon completion of step 0.3.

Step 1: Request for an electronic warrantThe (FBI) party P1 generates a request R(Pi) for an electronic

warrant on an entity Pi. P1 uses its private key to sign therequest R(Pi). P1 then encrypts the signed request, as well asthe unsigned version of the request, using the public key , of(the judicial authority) party P3. P1 sends the encrypted message

to P3.Step 2: Issuance of electronic warrantP3 decrypts using its private key,

and uses P1’s public key to verify the source of the sender byfirst un-signing the signed request using the public key of P1,and then by comparing it against the unsigned request. If thecomparison yields consistent result, then P3 issues an electronicwarrant in the form of a vector EW such that a value referencingPi can be computed by hashing the value of EW. P3 then signsEW using its private key , and sends the encryption of thesigned EW to both P1 and P2 securely using their correspondingpublic keys.

Secure Communication for Steps 1 and 2: The communica-tion protocol for step 1 is summarized in Fig. 1 (and similarlyfor step 2 as well).

In Fig. 1, Alice keeps the private key and shares herpublic key with Bob. Similarly, Bob keeps the private key

and shares his public key with Alice. In asymmetric en-cryption where is A or B,

. In Fig. 1, the signing process for the message m is to encryptit with Alice’s private key , which achieves the same purposeas creating a digest or a signature of m using hashing exceptboth parties do not have to agree on a common hashing func-tion. The signed message consists of two parts, and ,and is encrypted using Bob’s public key . Since only Bobhas the private key , only he can decrypt the signed message

. In addition, since thepublic key of Alice is known by Bob, Bob can verify the in-tegrity of by comparing the message in withthe decryption of ; i.e., .

Step 3: Secure 2-party computation for data retrievalDefine 2-party secure computation function:

for solving ; where and( , 2) are some matrices and constant vectors respectively.

The (FBI) party P1 assigns as input for , and obtainsfrom CA; whereas P1 provides and

Page 5: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

SY: SECURE COMPUTATION FOR BIOMETRIC DATA SECURITY—APPLICATION TO SPEAKER VERIFICATION 5

to CA so that CA can retrieve and computes. The biometric data custodian P2 assigns

as input for , and ; where is the featurevector for entity Pi. (Note: hash value of EW can determine theidentity of an individual of whom the biometric data is to beretrieved.)

Compute that arrives at a solution x sat-isfying ; i.e.,

.Secure Computation Protocol for Step 3: The key challenge

in step 3 is the 2-party secure computation for solving the alge-braic system . We introduce a securecomputation based on singular value decomposition (SVD) forsolving the algebraic system .

Instead of solving directly , wesolve

, and recover x from . By applying SVD toand , we obtain and , where

, and are diagonal matrices; for , 2.Two-party secure computation for solving

is realized as follows.P1: (Party 1) FBI P2: (Party 2) Biometric data custodian

Step 3.1 Content:Sender: P1 withReceiver: P2Step 3.2 Content: ,

Sender: P2 withReceiver: P1Step 3.3 Content:

Sender: P1 withReceiver: P2

Remark: P2 can construct by decryptingand multiplying the decrypted out-

come with {\rm V}_{2}^{\rm T}Step 3.4 Content: ,Sender: P1 with Receiver: P2

Remark: Introducing random is optional.Step 3.5 Content:

Sender: P2 withReceiver: P1Step 3.6 Content:Sender: P1 with .Receiver: P2

Remark: From step 3.3 and 3.6, P2 solves for y in, then .

In the above steps, is defined as a left-ho-momorphic encryption function with two parameters:is an encryption secret and M is a matrix. A left-homo-morphic encryption has two properties sim-ilar to the scalar version of the homomorphic encryption;i.e., and

, where A is a mxn ma-trix and M is a nxk matrix and the multiplicationresults in a mxk matrix. Likewise, is the right-ho-momorphic encryption function bearing the properties

and.

Due to the space constraint, we detail the following in a sup-plementary document on our web site [22]: illustration on therealization of the left and right homomorphic encryptions, andthe proof for the correctness of the claim about information-the-oretic security.

Step 4a: Feature reconstructionDefine the computation function: . Party P1

(FBI) provides input . Party P2 providesinput . The result of g(w,v) is the biometric data forP1.

Step 4b: Identity verificationDefine the function:

.Biometric data custodian P2 provides identity information {\rmEI}_{\rm i} of P1, that of itself , (optional) t, and x as ob-tained in step 3.6 to CA. CA computes ,and then . P2 then providesto jointly compute with CA the value of . iseither 0 indicating the authenticity of P1, or nonzero indicatingotherwise.

Computation Protocol for Step 4: Recall , and. Since P1 knows EW and , the

biometric data can be derived by computingas described in step 4a.

P1: (Party 1) FBI P2: (Party 2) Biometric data custodianStep 4a Content:Sender: P2 with ( , EW, , and ).Receiver: P1

Upon completion of step 4a, which is essentially an asym-metric encryption on using P1’s public key, P1 can derive

of via . Furthermore,we can observe from step 4a that P1 can extract the biometricdata only if P1 has EW.

VI. BIOMETRIC AUTHENTICATION

In this section, we describe a method for achieving privacypreserving biometric authentication that is based on mixedGaussian model to summarize the statistical behavior of thebiometric data. For Kullback-Leibler divergence measure,the secure computation concerns only the privacy preservingretrieval on the mean vector and covariance matrix of thecorresponding multivariate normal distributionand as the Kullback-Leibler divergence measureis defined as follows:

Secure Computation for Kullback-Leibler Measure:Referring to in the Kullback-Leiblermeasure and

are two main terms that re-quire secure and private computation. The steps that realize thesecure computation of these two main terms are shown below.

Page 6: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

6 IEEE SYSTEMS JOURNAL

Fig. 1. Asymmetric encryption and signing.

(i) Secure computation forwhere ( ) or

( ) .Step 1) Initialize by .Step 2) (Sender) .

For each

End for

Step 3) (Sender)

(ii) Secure computation for wherehas , has , has ,Step 1a:Step 1b:Step 2:

Step 3:

,

Remark: (a) and (d) are necessary only under the assumptionthat is oblivious to the information received in step 1a) andstep 2), respectively.

Step 4:

Let , , , and ,the secure computation just shown privately computes

. Alternatively, when ,

, and , it computes the distance functionfor measuring the distance between scaled GMM super vectorss and for speaker verification as proposed in [23]:

In the next section we will show the secure computation forthe function kernel of the above distance function.

(iii) Secure 3-party computation for

Let be a biometric authenticationfunction returning 1 if

, and 0 otherwise; where is some threshold for the objectclass k. Note that by means of the secure computation shown in(i) and (ii), party P0 can privately compute

(i.e., , ) while P1 cancompute ,where . Let’sassume P0 has V0, P1 has V1, and P2 has . The 3-party securecomputation for is realized below:

Step 1: Sender P2 Receiver P1P2 sends to P1; where R is some random number de-fined by P2, and k is an encryption secret.Step 2a: Sender P1 Receiver P2P1 computes and sends P2 ; where is somepositive random number defined by P1.Step 2b: Sender P1 Receiver P0P1 computes and sends P0 .Step 3: Sender P0 Receiver P2P0 computes and sends P2 .Step 4: Sender P2 Receiver P1P2 multiplies the reply by P0 and P1, decrypts,de-normalizes the results, and sends it to P1; i.e.,

Step 5: Sender P1 Receiver P2P1 de-normalizes and sends toP2.

Upon completion of step 5, P2 can computebased on the value of

.

VII. PROOF-OF-CONCEPT: SPEAKER VERIFICATION SYSTEM

For proof-of-concept, we now show how the secure computa-tion approach can be applied to the state-of-the-art speaker veri-fication using Support Vector Machine (SVM). As reported else-where [23], the function kernel for

that measures the distance between scaled

Page 7: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

SY: SECURE COMPUTATION FOR BIOMETRIC DATA SECURITY—APPLICATION TO SPEAKER VERIFICATION 7

GMM super vectors s and for speaker verification can be de-fined as follows:

The function kernel above for SVM could be privately computedunder a 3-party secure computation protocol with PPCSC de-scribed previously. Let be the2-party secure computation function as in step 3 of Section Vfor deriving x such that . When

, , the upper

and lower portions of are equivalent toand respectively. When a third party P3 (e.g., anauthenticator) with only obtains x, P3 can derive

and needed for the linear kernelfunction of GMM above—without P1 and P2 ever exposingthese two terms (encrypted or not) in any message exchangecommunication with P3, nor to each other.

Furthermore, typical function kernels for speaker factor spacediscussed in [23] based on inner product and radial basis func-tion can be realized in a straightforward manner as follows:

where and are homomorphic encryption and de-cryption respectively.

Brief Overview of the Speaker Verification System: Speakerverification can be based on SVM just discussed or the Kull-back-Leibler divergence measure discussed in Section VI.SVM has been reported to deliver excellent performance whenthe universal background noise model, the user’s speech input,and sufficient samples of impostor speakers are available fortraining. Having sufficient impostor samples will help the SVMclassifier to avoid expanding the user input space to coverregions where no user data is present, thus lowering the risk ofhaving a smaller-than-actual impostor space leading to falseacceptance.

To demonstrate the practicality of our approach and to betterunderstand the effectiveness of our approach, we have devel-oped a prototype speaker verification system using the opensource Asterisk and Asterisk-Java. For our experimentation, our

system is based on the Kullback-Leibler divergence measure be-cause we do not have the universal background noise model andsufficient impostor samples as required by the SVM approach.

Overview of User-System Interaction and Speech Processing:Our speaker verification system prototype allows a speaker tocall into the system and identify one’s identity based on his/herphone number. When a speaker calls into the system, his/hervoice is sampled using 8 KHz sampling rate. The entire chuck ofthe voice is partitioned into 16-ms frames (i.e., 128 data pointsper frame). Typical delay time is assumed to be no less than20 ms. In other words, the first 20 ms of the voice is assumedto be the transient background noise. End point detection algo-rithm [17] is applied in the pre-processing step to eliminate thetransient background noise. The speech processing steps for ex-tracting Mel cepstrum from 20 Mel frequency filter banks aresummarized below (due to Thrasyvoulou [18]).Step 1) Data are normalized by the difference between the

max and min within a frameStep 2) Data are then pre-emphasized by boosting the signal

20 db/decade.Step 3) Frame data are smoothed by a Hamming window

; where N isthe frame size.

Step 4) Mel Cesptrum is derived;where N is the frame size, S(K) is the FFT of theframe data, is the filterfrom the Mel-frequency filter bank, and isthe number of triangular weighing filter.

Privacy Preserved Biometric Enrollment and Verification:During speech processing, the Mel spectrum feature vectors ofthe running sequence of the 16-ms time frame were extractedto derive the mean and covariance of the corresponding multi-variate Gaussian model. For speaker verification, the basis isthe mean and covariance of in step 4 just described, andthe secure computation for the Kullback-Leibler measure de-scribed in Section VI. In other words, the mean and covarianceof a reference template are reconstructed based on the securecomputation protocol during the real-time authentication. Thesecure computation protocol involves an Asterisk-PBX systemthat acts as a proxy for authentication to communicate withthree (biometric custodian database) servers to reconstruct themean and covariance of the Mel spectrum of the referencevoice template.

VIII. PRELIMINARY EXPERIMENTAL STUDY

Experimental Procedure: Eight individuals assuming theidentity of thirteen different users participated in the experi-ment. The identity of a user is defined by the biometric voiceprint of the individual and the phone device used in this ex-periment. For example, an individual will assume two useridentities if the individual uses, for example, a landline phoneand a mobile phone during the experiment.

Performance Evaluation: Verification Accuracy: The mainfocus of the experimental study is on evaluating the accuracyof the speaker verification system. More specifically, we inves-tigate the accuracy as a function of the threshold setting for ac-ceptance and rejection, the intra-variability of a speaker, simi-larity of the voice characteristics among the participants, and the

Page 8: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

8 IEEE SYSTEMS JOURNAL

Fig. 2. KL-distance distribution of user/impostor.

choice of phone devices used in the enrollment and verificationprocess.

In this study, the figure of merits for evaluating the accuracyof the speaker verification system is the false acceptance rate(FAR) and false rejection rate (FRR) with respect to differentthreshold settings. The voiceprint of an individual is character-ized by psychoacoustic modeling of speech feature using Melfrequency spectrum described earlier. We model the statisticalbehavior of the voiceprint as a multivariant Gaussian model

with the model parameters being a 20 1 meanvector of the Mel frequency spectrum and being a 20 20covariance matrix of the Mel frequency spectrum.

The intra-variability of a speaker is the Kullback-Leibler(KL) distance function as defined in Section VI measuringthe difference between the Gaussian model of the enrolledreference voiceprint with that of the voiceprint provided for averification. The intra-variability of every speaker as measuredby KL-distance is shown in the y-axis in Fig. 2. The similarityof the voice characteristics among the participants (with their4-digit ID being the x-axis) is also shown in Fig. 2. For example,when user 1326 impersonates user 6169, the distribution of theKL-distance ranges between 8 and 27.

As noted in Fig. 2, there is no clear separation between therange accounting for the intra-variability of a user and the rangeaccounting for the similarity of the voice characteristics amongthe participants. In other words, there is no single thresholdcut-off that could yield a performance with .

Eight different phone devices were used for biometric enroll-ment and verification. Every combination of a pair of phone de-vice used for an enrollment and a phone device used for verifica-tion is referred to as a device pair. A device pair value is eitherxy or 1xx. For example, a device pair value 43 (i.e., ,

) refers to a combination where a phone device indexed as

Fig. 3. Effect of device pair on KL distance.

4 is used for an enrollment, and a phone device indexed as 3 isused for verification. A device pair value 122 refers to the casewhere the phone device indexed as 2 is used for both an enroll-ment and verification. The distribution of the KL-distance withrespect to the device pair value is shown in Fig. 3.

As shown in Fig. 3, there are few outliers when the samephone device is used for an enrollment and verification. If theseoutliers are excluded, the KL-distance ranges from roughly fiveto low 20s. Furthermore, when different phones are used forenrollment and verification, KL-distance spreads wider (rangingfrom 7.5 to 43).

To evaluate the accuracy of the speaker verification system,False Acceptance Rate (FAR) and False Rejection Rate (FRR)under different threshold settings are derived to facilitate plot-ting Receiving Operating Characteristic (ROC) curves. Thesystem performance in terms of verification accuracy is inves-tigated under different scenario: 1) a single rejection thresholdis applied to all participants; 2) a single rejection threshold iscustomized for each participant; and 3) a rejection/acceptanceinterval, instead of a single threshold value, is customized foreach participant.

In this study, the system rejection threshold is changed everysession; whereas a session typically spans over a day or two.When a single rejection threshold is applied to all participants,the variations on FAR and FRR under different threshold set-tings are shown in Fig. 4.

Fig. 4 shows that equal error rate (ERR) occurs at. The optimal system performance with respect

to using a single threshold setting (9.68) is, and .

When the system is reconfigured so that the system is op-erated under a customized threshold for each participant, theoptimal system performance is improved to the followings:

, , and . We notethat FRR is drastically improved over the previous case when

Page 9: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

SY: SECURE COMPUTATION FOR BIOMETRIC DATA SECURITY—APPLICATION TO SPEAKER VERIFICATION 9

Fig. 4. ROC using one single threshold with ��� � ����.

Fig. 5. ROC (individual).

customized threshold is used for each participant. Fig. 5 belowshows the ROC curve for each participant, as well as the trendof .

Finally, if the system is configured to use threshold interval,instead of a single threshold value, for each participant, theoptimal system performance can be further improved to have

, and . In summary,the optimal system performance in terms of verification accu-racy is shown at the bottom of the previous page.

We believe that the performance result above is useful to pro-vide insights into the level of expectation for deploying bio-metric voice verification in a real world environment. It is be-cause the study is not restricted to noise control environment,nor any specific phone devices—which are two important userfactors that are beyond the control of biometric system develop-ment or deployment. This is particularly so if such a system isto be deployed in a constraint-free environment.

IX. CONCLUSION

A set of conditions for modeling security and privacy in com-plex distributed systems involving multiple parties for biometricdata retrieval were proposed. Based on these conditions we de-veloped a novel, practical secure computation techniques forbiometric data retrieval and authentication. Our contribution inthis research is the techniques for biometric data retrieval anddata exchange that are provable private and secure according tothe conditions of our models. Of particular significance is thatthese techniques are practical for complex distributed systemsinvolving multiple parties. For proof-of-concept, we developeda speaker verification system and applied these secure compu-tation techniques to protect the security and privacy of the data

exchange for authentication purposes. Our experimental studyrevealed several important open questions.

First, all participants of the secure computation are assumedto be semi-honest. What if some participant deviates from therules of the secure communication protocol during the data ex-change? What level of reasonableness can one assume about theparticipant behavior? Second, the speaker verification experi-ment showed that allowing unrestricted choices of phone de-vices for biometric voice acquisition and unconstrained back-ground noise are two challenges that render additional study.If additional information about a device and noise could beacquired, how could it be used to enhance the threshold set-tings for an authentication? Furthermore, the experimental studyshowed that the accuracy performance could be improved usingthreshold interval. It would be an interesting study to further in-vestigate whether multiple threshold intervals or fusing identityinformation obtained from other biometric modality could sig-nificantly improve the performance. And if so, to what extendcould we successfully apply secure computation to balance theneed for security and privacy? These are some of the open ques-tions for future research.

ACKNOWLEDGMENT

This author is grateful to the reviewers for many useful sug-gestions that help to improve this manuscript, and to M. Bicerfor his effort on the proofreading stage.

**AUTHOR: Please provide a citing in the text for ref.[19]**

REFERENCES

[1] H. R. Fineburg and E. A. Intzekostas, “Understanding privacy laws inconnection with biometric identification in the United States and therest of world,” in Proc. Biometric Consortium Conf., Arlington, VA,Sep. 2005.

[2] E. Newton, L. Sweeney, and B. Malin, “Preserving privacy by de-iden-tifying facial images,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 2,pp. 232–243, Feb. 2005.

[3] R. Gross, E. Airoldi, B. Malin, and L. Sweeney, Integrating UtilityInto Face De-Identification**AUTHOR: WHAT KIND OF REFER-ENCE IS THIS? PLEASE PROVIDE MORE INFORMATION**.

[4] Y. Sutcu, Q. Li, and N. Memon, “Protecting biometric templates withsketch: Theory and practice,” IEEE Trans. Inf. Forensics Security, vol.2, no. 3, pp. 503–512, Sep. 2007.

[5] J. Wickramasuriya, M. Datt, S. Mehrotra, and N. Venkatasubramanian,“Privacy protecting data collection in media spaces,” in Proc. ACM Int.Conf. Multimedia, New York, 2004.

[6] S. K. Modi and S. J. Elliott, “Impact of image quality on performance:Comparison of young and elderly fingerprints,” in Proc. 6th Int. Conf.Recent Advances in Soft Computing, K. Sirlantzis, Ed., 2006, pp.449–454.

[7] A. C. Yao, “Protocols for secure computations,” in Proc. 23rd IEEESymp. Foundations of Computer Science, 1982.

[8] O. Goldreich, S. Micali, and A. Wigderson, “How to play any mentalgame,” in Proc. 19th Annu. ACM Symp. Theory of Computing, 1987,pp. 218–229.

[9] O. Goldreich, Secure Multi-Party Computation [Online]. Available:http://www.wisdom.weizmann.ac.il/~oded/pp.html

[10] W. Du and M. J. Atallah, “Privacy-preserving cooperative scientificcomputations,” in Proc. 14th IEEE Computer Security FoundationsWorkshop, 2001, pp. 273–282.

[11] W. Du and M. J. Atallah, “Secure multi-party computation problemsand their applications: A review and open problems,” in New SecurityParadigms Workshop, 2001, pp. 11–20.

[12] G. Brassard, C. Crepeau, and J. Robert, “All-or-nothing disclosureof secrets,” in Advances in Cryptology-Crypto86, LNCS, 1987, pp.234–238.

Page 10: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Web

Ver

sion

10 IEEE SYSTEMS JOURNAL

[13] S. Evan, O. Goldreich, and A. Lempel, “A randomized protocol forsigning contracts,” Commun. ACM, vol. 28, pp. 637–647, 1985.

[14] M. Naor and B. Pinkas, “Efficient oblivious transfer protocols,” in Proc.20th Annu. ACM-SIAM Symp. Discrete Algorithms, 2001, pp. 448–457,D.C..

[15] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling,Numerical Recipes in C: The Art of Scientific Computing, 2nd ed.Cambridge, U.K.: Cambridge University Press, 1992.

[16] N. Muller, L. Magaia, and B. M. Herbst, “Singular value decomposi-tion, eigenfaces, and 3D reconstructions,” SIAM Rev., vol. 46, no. 3,pp. 518–545, 2004.

[17] P. Quintiliano and A. Rosa, “Face recognition applied to computerforensics,” Int. J. Forsenic Comput. Sci., vol. 1, pp. 19–27, 2006.

[18] T. Thrasyvoulou and S. Benton, Speech Parameterization Using theMel Scale (Part II) **AUTHOR: PLEASE PROVIDE MORE IN-FORMATION AND TYPE OF REFERENCE** 2003.

[19] G. Saha, S. Chakraborty, and S. Senapati, “A new silence removal &endpoint detection algorithm for speech & speaker recognition appli-cations,” in Proc. Nat. Conf. Communications, Kharagpur, India, Jan.2005.

[20] M. Reiter, Introduction to Usable Privacy & Security [Online]. Avail-able: http://www.cups.cs.cmu.edu/courses/ups-sp06/slides/060124-overview-privacy.ppt

[21] B. K. Sy, “Secure computation for privacy preserving biometric dataretrieval and authentication,” in EuroISI (Dec. 3–5) 2008, 2008, vol.5376, LNCS, pp. 143–154.

[22] [Online]. Available: http://www.qcwireless.net/biometric_ppr/he_primer.pdf

[23] N. Dehak et al., “Support vector machines and joint factor analysis forspeaker verification,” in Proc. Int. Conf. ICASSP 2009, pp. 4337–4340.

Bon K. Sy received the M.Sc. and Ph.D. degreesin electrical and computer engineering in 1986 and1988, respectively, from Northeastern University,Boston, MA.

He is a Computer Science Professor with the CityUniversity of New York. He over 70 publicationson funded research, two patents, and a book entitledInformation-Statistical Data Mining (New York:Springer, 2007). He is a certified CISSP and hasserved as a technology expert witness for NYCCTechnology Committee in Government Hearing

on broadband access. His current research interest is in secure multipartycomputation as applied to biometrics and IT security, privacy and trust.

Page 11: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

IEEE SYSTEMS JOURNAL 1

Secure Computation for Biometric DataSecurity—Application to Speaker Verification

Bon K. Sy, Member, IEEE

Abstract—The goal of this research is to develop provablesecure computation techniques for two biometric security tasks incomplex distributed systems involving multiple parties; namelybiometric data retrieval and authentication. We first presentmodels for privacy and security that delineate the conditionsunder which biometric data disclosure are allowed. We thendiscuss the secure computation techniques for retrieval and au-thentication that satisfy the conditions for privacy and security.For proof-of-concept, we show a practical implementation of aprivacy preserving speaker verification system and discuss theperformance tradeoff.

Index Terms—Biometrics, privacy, secure computation.

I. INTRODUCTION

B IOMETRIC security is based on “something one is” ratherthan “something one knows/has”. We argue that privacy

must be properly addressed when biometric data are retrievedfor security application. When biometric data are inadvertentlydisclosed or leaked to the unauthorized parties, it has the conse-quence of “lose it once, it’s gone forever.”

In this research, we are interested in biometric security incomplex distributed systems involving multiple parties. Specif-ically, our focus is on secure computation techniques for bio-metric data retrieval and authentication that are provable privateand secure. The scope of this research could be considered usingthe following hypothetical scenario on privacy preserving bio-metric security application:

There is a database of biometric data (e.g., voiceprints)about individuals. Due to privacy concerns, the biometricdata are scattered across different depository systems. Thedata in each depository system is withheld by one inde-pendent data escrow agency within the law enforcementunit. Let’s assume a phone was wiretapped and the con-versation about a crime was recorded. We need to retrievethe biometric data on voiceprint features from the databaseand perform a match between the voiceprint features ex-tracted from the database and that from the phone systemfor speaker verification/identification purpose.

Manuscript received December 07, 2009; revised October 24, 2009. Thiswork was supported in part by the National Science Foundation (NSF) underDUE 0837535 and by a PSC-CUNY Research Award. A preliminary version ofthis work appeared in EuroISI 2008.

The author is with Queens College and University Graduate Center, ComputerScience Department, City University of New York, Flushing NY, 11367 USA(e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSYST.2009.2035979

For security and privacy reason, we only allow biometric dataretrieval in the presence of an “electronic warrant” from an au-thority; e.g., judge. When an “electronic warrant” is issued, allagencies will collaborate to participate in a secure multipartycomputation to re-construct the biometric data. This particularscenario raises an emerging question: what is a reasonable ex-pectation of privacy if we have to balance security and privacy[1], [20]? In other words, can we technically facilitate the rightto control over information gathering and access, and partic-ularly the information derived from one’s body and personalspace?

To make this concrete, consider a law enforcement agencyneeds biometric data for verification, identification, or surveil-lance purposes, how could this be achieved in a provable privateand secure manner? Specific technical questions to address inthis research using the above example are:

1) What provable security and privacy properties should beintroduced for biometric data retrieval, and for subsequentapplications such as biometric verification or identifica-tion?

2) What secure multiparty computation scheme is appropriatefor the data escrow agencies and other parties to collaboratein computing the biometric data?

The objective of this research is three folds. First, we examinemodels that encapsulate security and privacy properties in termsof their reasonableness and appropriateness for biometric dataretrieval. Second, we develop secure multiparty computationtechniques for recovering biometric data and for authenticationthat are provable secure and private according to our models.Third, we demonstrate a practical implementation of a privacypreserving speaker verification system for proof-of-concept.

The contribution of this research is a novel and practicalscheme for privacy preserving biometric data retrieval andauthentication, and a proof-of-concept application to thestate-of-the-art speaker verification techniques and systemimplementation. The main idea of our scheme is to first de-lineate the conditions for biometric data retrieval as well asthe “capabilities” of the participating entities in the form ofprivacy and security models. Biometric data retrieval is thenformulated as a secure multiparty computation problem forsolving a system of algebraic equations; whereas the solutionfor the algebraic system is a feature vector of the biometricdata. Biometric authentication is also formulated as a securecomputation problem for computing Kullback-Leibler (KL)distance measure between the Gaussian model of a creden-tial presented for verification and the Gaussian model of thecorresponding biometric reference template. The decision onacceptance/rejection is then based on a comparison of the KL

1932-8184/$26.00 © 2009 IEEE

Page 12: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

2 IEEE SYSTEMS JOURNAL

distance to a preset threshold; whereas the credential could becomposed of an “electronic badge/ID” and some biometric data.The integrity and confidentiality of the data exchange duringthe handshake among the participating entities is protected byapplying asymmetric encryption.

II. REVIEW ON PRIOR RESEARCH

There are two main avenues to privacy preserving data re-trieval; namely lossy and lossless retrieval. Lossy retrieval pro-tects private content typically by means of perturbation, ran-domization or masking of the original data to the extent thatit could still be useful for the end users [2], [3]. In a lossy re-trieval, the original content is not preserved for the end user.Lossless retrieval, on the other hand, protects computational pri-vacy while preserving content [4]. In other words, the end usercan retrieve the original content but is limited to what is allowedby the computational mechanism of the retrieval process. Lossyretrieval is sufficient in some applications such as video surveil-lance [5] that actually relies on the “lossy” nature to achieveprivacy protection. However, in this research we concentrate onlossless retrieval that guarantees private computation and recon-struction of the original biometric data.

In biometrics, we argue that data must retain the features ofthe intrinsic physical or behavioral traits of a human being ifsuch features are to be useful for certain practical authentica-tion applications such as verification or identification of an in-dividual. A slight variation on biometric data may alter the fea-tures to an extent that prevents a direct application for the in-tended purposes. For example, the fingerprint recognition per-formance is very sensitive to the quality of elderly fingerprintimages [6]. Elderly fingerprint has a large number of minutiaepoints. Poor image quality skews the frequency distribution ofthe minutiae points and affects the recognition performance.

Our focus in this research is on privacy preserving losslessretrieval and specifically on secure multiparty computation [7],[8] that is provable secure and private. Secure multiparty com-putation (SMC) deals with the problem in which multiple partieswith private inputs would like to jointly compute some functionof their inputs but no party wishes to reveal its private input tothe other participants. For example, each data custodian withpartial biometric template and a law enforcement agency withsample biometric data participate in SMC to jointly compute theoutput of a matching function for biometric identification. Themultiparty computation problem was first introduced by Yao [7]and extended by Goldreich et al. [8], and by many others.

Goldreich [9] pointed out that solutions to specific problemsshould be developed and customized for efficiency reasons. Duand Atallah [10], [11] presented a series of specific solutionsto specific problems; e.g., privacy-preserving cooperative scien-tific computations, privacy-preserving database query, and pri-vacy-preserving geometric computation. In their privacy-pre-serving cooperative scientific computations research [10], theyproposed a protocol between two parties to solve the problem

, where matrix and vector be-long to party P1, matrix and vector belong to party P2. Atthe end of the protocol, both parties know the solution x while

nobody knows the other party’s private inputs. Each party’s pri-vate data are protected by the 1-out-of-N oblivious transfer pro-tocol [12], [13] and by splitting , , , and into a setof random matrices. However, a 1-out-of-N oblivious transfer incertain application could be computationally expensive [14].

In this research we tackle privacy preserving biometricdata retrieval in a way similar to privacy-preserving coop-erative scientific computation (PPCSC). We first solve y in

, and then reconstruct thesolution for the original problem through . The parties P1and P2 as in PPCSC will generate invertible random matrices

and respectively. However, instead of applying the1-out-of-N oblivious transfer protocol, we employ homomor-phic encryption and singular value decomposition (SVD) onand to achieve privacy protection. Our approach is to takeeach private matrix and to break it down into matrices throughSVD, which gives us a partial view of the information neededfor computing the biometric data to be retrieved. We then useSMC and homomorphic encryption to share the partial infor-mation between the participants in such a way that the originalbiometric data can be reconstructed in the PPCSC withoutrevealing any private information not intended for sharing.

As noted in the previous research by others, the efficiency ofSVD is inversely proportional to its complexity O(mnr) [15];where m and n are the number of rows and columns in an mxnmatrix, and r is the rank of the matrix. On the other hand, thecomplexity of 1-out-of-N oblivious transfer protocol is in theorder of [16]; where d is the size of the secure eval-uation circuit. Recent development has suggested that the ef-ficiency of oblivious transfer can be improved to forPPCSC; where is a security parameter. As suggested in [12],a typical value of for providing reasonable security is 256.Yet the rank r of SVD, which is related to the number of di-mensions chosen for representing biometric features, is typi-cally less than that. For example, it has been reported elsewherethat Eigen face recognition [4] can achieve reasonably good re-sults with the size of Eigen face vector being 20; i.e., .In our speaker verification system, r is approximately 50. Onthe other hand, state-of-the-art speaker verification system usingSupport Vector Machine may extend the dimension of featurevector—thus the value of r—to about 300 for improved perfor-mance (reaching above 95%).

III. MODELS

The main characteristic of our proposed privacy model is sep-aration of duty among four entities; namely, the (judge) au-thority, the (FBI) biometric data inquirer, the biometric data cus-todians, and some central authority CA (a role similar to, for ex-ample, VeriSign). In other words, no one single party is allowedor could retrieve the biometric data. The only exception is theowner of the biometric data who has an “electronic badge” forretrieving his/her own biometric data.

Biometric data retrieval by law enforcement agencies can beachieved only when all relevant parties collaborate in a securemultiparty computation. Furthermore, to prevent collusions, anexplicit approval from the authority in form of an “electronicwarrant” is also required for the retrieval process. The following

Page 13: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

SY: SECURE COMPUTATION FOR BIOMETRIC DATA SECURITY—APPLICATION TO SPEAKER VERIFICATION 3

scenario is the basis of our privacy model comprised of sevenconditions (Py1–Py7) that follows.

Assume that an individual claims to be Chuck and hasbiometric data Y (say, voiceprint). This voiceprint may notbe associated with the identity “Chuck” if the individual isan impostor. Let XB be the authentic biometric data (say,the voiceprint of the true Chuck) that is associated with theidentity Chuck. Let EI be the information associated withChuck that is known by Alice (FBI agent); e.g., EI could bea hash value of SSN. Bob (judge) has information JD (e.g.,RSA private key), and the “true” Chuck has informationEB (electronic badge).

Py1: Alice (FBI agent) should not know Y, JD, and XB(which also implies Alice cannot compute XB from EI).Py2: The individual claiming to be Chuck should not knowXB unless Y is similar to XB. If Y is similar to XB, thenthe individual is Chuck; i.e., not an impostor. This impliesthe individual is capable of obtaining biometric data, butdoes not know whether the data is sufficiently close to thatof a specific individual such as Chuck.Py3: The individual claiming to be Chuck should not knowJD.Py4: Bob (the judge) should not know Y, XB, or EB.Py5: If Alice presents EI to Bob and Bob agrees, Bob canuse JD and EI to generate EW (electronic warrant) that canbe used to compute XB.Py6: If (the true) Chuck has EB, EB and Y together cancompute the similarity between Y and XB.Py7: Every entity has an electronic identity EI issued by aCentral Authority (CA); whereas EIs are publicly known.

IV. SECURE COMPUTATION

Biometric data for an individual Pi is conceived as a featurevector . The end goal of biometric data retrieval relevant toPi is to obtain . To realize the goal of privacy in regard to theright of control over biometric information access, this researchimposes two conditions: 1) biometric data retrieval by a thirdparty must be approved by an independent authority and 2) theidentity of a requester must be verified prior to receiving bio-metric data.

Ideally, biometric data retrieval should be both computation-ally secure and information-theoretically secure. Biometric dataretrieval is information-theoretically secure if the risk of infor-mation leak does not depend on the amount of computationalpower available for an adversary. A main focus of this researchis to develop secure computation techniques that are provablesecure—with respect to the models in the previous sections—forboth biometric data retrieval and authentication.

To realize the goal of privacy just mentioned at the begin-ning of this section, a sequence of secure computation that is in-formation-theoretically secure is introduced for biometric dataretrieval. The key concept behind is to solve a two-party se-cure computation problem leading to the value of x satisfying

, and then to reconstruct the feature vectoraccordingly. In this two-party secure computation problem,

let P1 be the biometric data inquirer (i.e., the FBI agent Alice)

mentioned earlier, and let P2 be the biometric data custodian.We first present an overview of the critical parts, and then thedetails in the next sections.

Part 1. P1 and P2 each generates their private matrixand , but obtains a common EW vector from the judge.Part 2. P1 and P2 enroll with a Central Authority CA, andjointly compute for CA.Part 3. CA computes the hash value of for P1;i.e., ; where t is some informationabout P1 known by CA (and P2).Part 4: P1 and P2 engage in a secure computation to find xsatisfying ; where .Part 5: Upon deriving x, P2 can verify the identity of P1by first sending CA the value of x, then engages CA indetermining whether is zero; i.e.,

(i) P2 provides , and;(ii) CA provides

to jointly compute

Part 6: If the secure computation in part 5 is zero, P2 sendsP1 the value of .Part 7. P1 computes . (P2at this point discards as it becomes re-identifiable afterstep 6.)

With reference to parts 1 through 7, secure computation be-tween P1 and P2 can succeed only if both have EW. b1 in part3 serves as a one-time trustworthy token issued by CA to assurethe identity of P1 who requests the biometric data . Note thatCA may know through the computation in part 5, but couldnot know unless CA knows EW. In other words, CA plays therole of a verifier with zero knowledge about the biometric data

. With reference to part 6, any passive sniffing cannot obtainwithout knowing , , EW and . Finally, P1 can de-

rive in part 7 only if P1 has the correct value of x and EW.In part 4, P1 provides and while P2 provides andto jointly compute x such that during the intermediate steps

P1 cannot learn and , and P2 cannot learn and . Thebasic idea behind this is to reformulateas , and then to reconstruct

after finding y. In the next section, we will show aSingular Value Decomposition (SVD) approach to achieve theprivate computation of x.

In part 5, note that P2 has while CAhas ; whereas x satisfieswith P1 providing and , and P2 providing and . IfP1 is an impostor, P1 will not know , and will have to makeup an different from . On the other hand, if equals to

, then P1 is not expected to be an impostor.1

Finally, it is noteworthy that and play two importantroles in the privacy preserving biometric data retrieval process.First, the PPCSC algebraic formulation provides a convenientway to incorporate error detection mechanism. For example,augmenting and with one additional row of all 1sprovides a check sum mechanism for detecting communicationerror.

1In a highly unlikely event of degeneration, there could be some � �� �such that �� � � �� � � � � .

Page 14: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

4 IEEE SYSTEMS JOURNAL

Second, and enable the data exchange between P1and P2 to be information-theoretically secure. Withoutand , P1 can still obtain the biometric data from P2. Asimple mechanism could be a straight-forward AES symmetricencryption or RSA asymmetric encryption; i.e., P2 sends P1

. In AES symmetric encryption, PkP1 is a commonsecret key known to both P1 and P2. In RSA asymmetric en-cryption, it is the public key of P1 that P2 knows. However,both mechanisms cannot be information-theoretic secure. Ifan adversary captures through sniffing, the riskof exposing is then dependent on the computational poweravailable. In AES symmetric encryption, available computa-tional power amounts to the chance of discovering the secretkey through brute force or other approaches. In RSA asym-metric encryption, available computational power amounts tothe chance of successfully factoring the product of two largeprime numbers—a key step known in cryptography for reverseengineering DkP1 (private key of P1), thus .

In contrast, note that is never exposed in the above PPCSCalgebraic formulation for reconstructing the biometric data. Theonly time is ever exposed during the message exchange pro-tocol over the network is through in form of . Even if

is not encrypted in the network communication and iscaptured by the adversary in plain text, the adversary still cannotreverse engineer without EW—no matter how much compu-tational power the adversary has; thus, the scheme assures theinformation-theoretic security.

V. DETAILS OF SECURE COMPUTATION AND COMMUNICATION

For completeness, the details on the secure computation forbiometric data retrieval discussed elsewhere [21] is repeatedbelow. The following notions are defined to facilitate the de-scription of biometric data retrieval process.

Assume the (FBI) party P1 has , the biometric datacustodian P2 has , and (the judicial authority) party P3has , where and are some privatematrices and vectors. Let’s also assume each party Pi keepsa secret RSA private key , and shares with each otherthe corresponding public key .

Step 0: Enrollment process for all entitiesEvery party enrolls with the Central Authority (CA) to

obtain an electronic identity . In addition, CA relies on asecure computation function that computes ,and maintains the record in its database witha retrieval function . Furthermore, CAhas a function that computes a unique vectorby hashing the matrix information related to the pair of entitiescomprised of FBI and biometric data custodian on a given t.

Secure 3-Party Computation Protocol for Step 0: The func-tion is privately computed by a third partythrough exponentiation such that neither CA nor the third partycould know and .

Step 0.1 Content: , where k is an encryptionkey, and m is a random message.Sender: Central Authority (CA).Receiver1: FBI (P1) with private matrix .

Receiver2: Biometric data custodian (party P2) with pri-vate matrix .Step 0.2a Content:Sender: P1 (FBI)Receiver: Third party.Step 0.2b Content: .Sender: P2 (Biometric data custodian)Receiver: Third party.Step 0.3 Content:

.Sender: Third party multiplying the messages from P1 atstep 0.2a and that from P2 at step 0.2b.Receiver: CA.

CA computesupon completion of step 0.3.

Step 1: Request for an electronic warrantThe (FBI) party P1 generates a request R(Pi) for an electronic

warrant on an entity Pi. P1 uses its private key to sign therequest R(Pi). P1 then encrypts the signed request, as well asthe unsigned version of the request, using the public key , of(the judicial authority) party P3. P1 sends the encrypted message

to P3.Step 2: Issuance of electronic warrantP3 decrypts using its private key,

and uses P1’s public key to verify the source of the sender byfirst un-signing the signed request using the public key of P1,and then by comparing it against the unsigned request. If thecomparison yields consistent result, then P3 issues an electronicwarrant in the form of a vector EW such that a value referencingPi can be computed by hashing the value of EW. P3 then signsEW using its private key , and sends the encryption of thesigned EW to both P1 and P2 securely using their correspondingpublic keys.

Secure Communication for Steps 1 and 2: The communica-tion protocol for step 1 is summarized in Fig. 1 (and similarlyfor step 2 as well).

In Fig. 1, Alice keeps the private key and shares herpublic key with Bob. Similarly, Bob keeps the private key

and shares his public key with Alice. In asymmetric en-cryption where is A or B,

. In Fig. 1, the signing process for the message m is to encryptit with Alice’s private key , which achieves the same purposeas creating a digest or a signature of m using hashing exceptboth parties do not have to agree on a common hashing func-tion. The signed message consists of two parts, and ,and is encrypted using Bob’s public key . Since only Bobhas the private key , only he can decrypt the signed message

. In addition, since thepublic key of Alice is known by Bob, Bob can verify the in-tegrity of by comparing the message in withthe decryption of ; i.e., .

Step 3: Secure 2-party computation for data retrievalDefine 2-party secure computation function:

for solving ; where and( , 2) are some matrices and constant vectors respectively.

The (FBI) party P1 assigns as input for , and obtainsfrom CA; whereas P1 provides and

Page 15: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

SY: SECURE COMPUTATION FOR BIOMETRIC DATA SECURITY—APPLICATION TO SPEAKER VERIFICATION 5

to CA so that CA can retrieve and computes. The biometric data custodian P2 assigns

as input for , and ; where is the featurevector for entity Pi. (Note: hash value of EW can determine theidentity of an individual of whom the biometric data is to beretrieved.)

Compute that arrives at a solution x sat-isfying ; i.e.,

.Secure Computation Protocol for Step 3: The key challenge

in step 3 is the 2-party secure computation for solving the alge-braic system . We introduce a securecomputation based on singular value decomposition (SVD) forsolving the algebraic system .

Instead of solving directly , wesolve

, and recover x from . By applying SVD toand , we obtain and , where

, and are diagonal matrices; for , 2.Two-party secure computation for solving

is realized as follows.P1: (Party 1) FBI P2: (Party 2) Biometric data custodian

Step 3.1 Content:Sender: P1 withReceiver: P2Step 3.2 Content: ,

Sender: P2 withReceiver: P1Step 3.3 Content:

Sender: P1 withReceiver: P2

Remark: P2 can construct by decryptingand multiplying the decrypted out-

come with {\rm V}_{2}^{\rm T}Step 3.4 Content: ,Sender: P1 with Receiver: P2

Remark: Introducing random is optional.Step 3.5 Content:

Sender: P2 withReceiver: P1Step 3.6 Content:Sender: P1 with .Receiver: P2

Remark: From step 3.3 and 3.6, P2 solves for y in, then .

In the above steps, is defined as a left-ho-momorphic encryption function with two parameters:is an encryption secret and M is a matrix. A left-homo-morphic encryption has two properties sim-ilar to the scalar version of the homomorphic encryption;i.e., and

, where A is a mxn ma-trix and M is a nxk matrix and the multiplicationresults in a mxk matrix. Likewise, is the right-ho-momorphic encryption function bearing the properties

and.

Due to the space constraint, we detail the following in a sup-plementary document on our web site [22]: illustration on therealization of the left and right homomorphic encryptions, andthe proof for the correctness of the claim about information-the-oretic security.

Step 4a: Feature reconstructionDefine the computation function: . Party P1

(FBI) provides input . Party P2 providesinput . The result of g(w,v) is the biometric data forP1.

Step 4b: Identity verificationDefine the function:

.Biometric data custodian P2 provides identity information {\rmEI}_{\rm i} of P1, that of itself , (optional) t, and x as ob-tained in step 3.6 to CA. CA computes ,and then . P2 then providesto jointly compute with CA the value of . iseither 0 indicating the authenticity of P1, or nonzero indicatingotherwise.

Computation Protocol for Step 4: Recall , and. Since P1 knows EW and , the

biometric data can be derived by computingas described in step 4a.

P1: (Party 1) FBI P2: (Party 2) Biometric data custodianStep 4a Content:Sender: P2 with ( , EW, , and ).Receiver: P1

Upon completion of step 4a, which is essentially an asym-metric encryption on using P1’s public key, P1 can derive

of via . Furthermore,we can observe from step 4a that P1 can extract the biometricdata only if P1 has EW.

VI. BIOMETRIC AUTHENTICATION

In this section, we describe a method for achieving privacypreserving biometric authentication that is based on mixedGaussian model to summarize the statistical behavior of thebiometric data. For Kullback-Leibler divergence measure,the secure computation concerns only the privacy preservingretrieval on the mean vector and covariance matrix of thecorresponding multivariate normal distributionand as the Kullback-Leibler divergence measureis defined as follows:

Secure Computation for Kullback-Leibler Measure:Referring to in the Kullback-Leiblermeasure and

are two main terms that re-quire secure and private computation. The steps that realize thesecure computation of these two main terms are shown below.

Page 16: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

6 IEEE SYSTEMS JOURNAL

Fig. 1. Asymmetric encryption and signing.

(i) Secure computation forwhere ( ) or

( ) .Step 1) Initialize by .Step 2) (Sender) .

For each

End for

Step 3) (Sender)

(ii) Secure computation for wherehas , has , has ,Step 1a:Step 1b:Step 2:

Step 3:

,

Remark: (a) and (d) are necessary only under the assumptionthat is oblivious to the information received in step 1a) andstep 2), respectively.

Step 4:

Let , , , and ,the secure computation just shown privately computes

. Alternatively, when ,

, and , it computes the distance functionfor measuring the distance between scaled GMM super vectorss and for speaker verification as proposed in [23]:

In the next section we will show the secure computation forthe function kernel of the above distance function.

(iii) Secure 3-party computation for

Let be a biometric authenticationfunction returning 1 if

, and 0 otherwise; where is some threshold for the objectclass k. Note that by means of the secure computation shown in(i) and (ii), party P0 can privately compute

(i.e., , ) while P1 cancompute ,where . Let’sassume P0 has V0, P1 has V1, and P2 has . The 3-party securecomputation for is realized below:

Step 1: Sender P2 Receiver P1P2 sends to P1; where R is some random number de-fined by P2, and k is an encryption secret.Step 2a: Sender P1 Receiver P2P1 computes and sends P2 ; where is somepositive random number defined by P1.Step 2b: Sender P1 Receiver P0P1 computes and sends P0 .Step 3: Sender P0 Receiver P2P0 computes and sends P2 .Step 4: Sender P2 Receiver P1P2 multiplies the reply by P0 and P1, decrypts,de-normalizes the results, and sends it to P1; i.e.,

Step 5: Sender P1 Receiver P2P1 de-normalizes and sends toP2.

Upon completion of step 5, P2 can computebased on the value of

.

VII. PROOF-OF-CONCEPT: SPEAKER VERIFICATION SYSTEM

For proof-of-concept, we now show how the secure computa-tion approach can be applied to the state-of-the-art speaker veri-fication using Support Vector Machine (SVM). As reported else-where [23], the function kernel for

that measures the distance between scaled

Page 17: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

SY: SECURE COMPUTATION FOR BIOMETRIC DATA SECURITY—APPLICATION TO SPEAKER VERIFICATION 7

GMM super vectors s and for speaker verification can be de-fined as follows:

The function kernel above for SVM could be privately computedunder a 3-party secure computation protocol with PPCSC de-scribed previously. Let be the2-party secure computation function as in step 3 of Section Vfor deriving x such that . When

, , the upper

and lower portions of are equivalent toand respectively. When a third party P3 (e.g., anauthenticator) with only obtains x, P3 can derive

and needed for the linear kernelfunction of GMM above—without P1 and P2 ever exposingthese two terms (encrypted or not) in any message exchangecommunication with P3, nor to each other.

Furthermore, typical function kernels for speaker factor spacediscussed in [23] based on inner product and radial basis func-tion can be realized in a straightforward manner as follows:

where and are homomorphic encryption and de-cryption respectively.

Brief Overview of the Speaker Verification System: Speakerverification can be based on SVM just discussed or the Kull-back-Leibler divergence measure discussed in Section VI.SVM has been reported to deliver excellent performance whenthe universal background noise model, the user’s speech input,and sufficient samples of impostor speakers are available fortraining. Having sufficient impostor samples will help the SVMclassifier to avoid expanding the user input space to coverregions where no user data is present, thus lowering the risk ofhaving a smaller-than-actual impostor space leading to falseacceptance.

To demonstrate the practicality of our approach and to betterunderstand the effectiveness of our approach, we have devel-oped a prototype speaker verification system using the opensource Asterisk and Asterisk-Java. For our experimentation, our

system is based on the Kullback-Leibler divergence measure be-cause we do not have the universal background noise model andsufficient impostor samples as required by the SVM approach.

Overview of User-System Interaction and Speech Processing:Our speaker verification system prototype allows a speaker tocall into the system and identify one’s identity based on his/herphone number. When a speaker calls into the system, his/hervoice is sampled using 8 KHz sampling rate. The entire chuck ofthe voice is partitioned into 16-ms frames (i.e., 128 data pointsper frame). Typical delay time is assumed to be no less than20 ms. In other words, the first 20 ms of the voice is assumedto be the transient background noise. End point detection algo-rithm [17] is applied in the pre-processing step to eliminate thetransient background noise. The speech processing steps for ex-tracting Mel cepstrum from 20 Mel frequency filter banks aresummarized below (due to Thrasyvoulou [18]).Step 1) Data are normalized by the difference between the

max and min within a frameStep 2) Data are then pre-emphasized by boosting the signal

20 db/decade.Step 3) Frame data are smoothed by a Hamming window

; where N isthe frame size.

Step 4) Mel Cesptrum is derived;where N is the frame size, S(K) is the FFT of theframe data, is the filterfrom the Mel-frequency filter bank, and isthe number of triangular weighing filter.

Privacy Preserved Biometric Enrollment and Verification:During speech processing, the Mel spectrum feature vectors ofthe running sequence of the 16-ms time frame were extractedto derive the mean and covariance of the corresponding multi-variate Gaussian model. For speaker verification, the basis isthe mean and covariance of in step 4 just described, andthe secure computation for the Kullback-Leibler measure de-scribed in Section VI. In other words, the mean and covarianceof a reference template are reconstructed based on the securecomputation protocol during the real-time authentication. Thesecure computation protocol involves an Asterisk-PBX systemthat acts as a proxy for authentication to communicate withthree (biometric custodian database) servers to reconstruct themean and covariance of the Mel spectrum of the referencevoice template.

VIII. PRELIMINARY EXPERIMENTAL STUDY

Experimental Procedure: Eight individuals assuming theidentity of thirteen different users participated in the experi-ment. The identity of a user is defined by the biometric voiceprint of the individual and the phone device used in this ex-periment. For example, an individual will assume two useridentities if the individual uses, for example, a landline phoneand a mobile phone during the experiment.

Performance Evaluation: Verification Accuracy: The mainfocus of the experimental study is on evaluating the accuracyof the speaker verification system. More specifically, we inves-tigate the accuracy as a function of the threshold setting for ac-ceptance and rejection, the intra-variability of a speaker, simi-larity of the voice characteristics among the participants, and the

Page 18: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

8 IEEE SYSTEMS JOURNAL

Fig. 2. KL-distance distribution of user/impostor.

choice of phone devices used in the enrollment and verificationprocess.

In this study, the figure of merits for evaluating the accuracyof the speaker verification system is the false acceptance rate(FAR) and false rejection rate (FRR) with respect to differentthreshold settings. The voiceprint of an individual is character-ized by psychoacoustic modeling of speech feature using Melfrequency spectrum described earlier. We model the statisticalbehavior of the voiceprint as a multivariant Gaussian model

with the model parameters being a 20 1 meanvector of the Mel frequency spectrum and being a 20 20covariance matrix of the Mel frequency spectrum.

The intra-variability of a speaker is the Kullback-Leibler(KL) distance function as defined in Section VI measuringthe difference between the Gaussian model of the enrolledreference voiceprint with that of the voiceprint provided for averification. The intra-variability of every speaker as measuredby KL-distance is shown in the y-axis in Fig. 2. The similarityof the voice characteristics among the participants (with their4-digit ID being the x-axis) is also shown in Fig. 2. For example,when user 1326 impersonates user 6169, the distribution of theKL-distance ranges between 8 and 27.

As noted in Fig. 2, there is no clear separation between therange accounting for the intra-variability of a user and the rangeaccounting for the similarity of the voice characteristics amongthe participants. In other words, there is no single thresholdcut-off that could yield a performance with .

Eight different phone devices were used for biometric enroll-ment and verification. Every combination of a pair of phone de-vice used for an enrollment and a phone device used for verifica-tion is referred to as a device pair. A device pair value is eitherxy or 1xx. For example, a device pair value 43 (i.e., ,

) refers to a combination where a phone device indexed as

Fig. 3. Effect of device pair on KL distance.

4 is used for an enrollment, and a phone device indexed as 3 isused for verification. A device pair value 122 refers to the casewhere the phone device indexed as 2 is used for both an enroll-ment and verification. The distribution of the KL-distance withrespect to the device pair value is shown in Fig. 3.

As shown in Fig. 3, there are few outliers when the samephone device is used for an enrollment and verification. If theseoutliers are excluded, the KL-distance ranges from roughly fiveto low 20s. Furthermore, when different phones are used forenrollment and verification, KL-distance spreads wider (rangingfrom 7.5 to 43).

To evaluate the accuracy of the speaker verification system,False Acceptance Rate (FAR) and False Rejection Rate (FRR)under different threshold settings are derived to facilitate plot-ting Receiving Operating Characteristic (ROC) curves. Thesystem performance in terms of verification accuracy is inves-tigated under different scenario: 1) a single rejection thresholdis applied to all participants; 2) a single rejection threshold iscustomized for each participant; and 3) a rejection/acceptanceinterval, instead of a single threshold value, is customized foreach participant.

In this study, the system rejection threshold is changed everysession; whereas a session typically spans over a day or two.When a single rejection threshold is applied to all participants,the variations on FAR and FRR under different threshold set-tings are shown in Fig. 4.

Fig. 4 shows that equal error rate (ERR) occurs at. The optimal system performance with respect

to using a single threshold setting (9.68) is, and .

When the system is reconfigured so that the system is op-erated under a customized threshold for each participant, theoptimal system performance is improved to the followings:

, , and . We notethat FRR is drastically improved over the previous case when

Page 19: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

SY: SECURE COMPUTATION FOR BIOMETRIC DATA SECURITY—APPLICATION TO SPEAKER VERIFICATION 9

Fig. 4. ROC using one single threshold with ��� � ����.

Fig. 5. ROC (individual).

customized threshold is used for each participant. Fig. 5 belowshows the ROC curve for each participant, as well as the trendof .

Finally, if the system is configured to use threshold interval,instead of a single threshold value, for each participant, theoptimal system performance can be further improved to have

, and . In summary,the optimal system performance in terms of verification accu-racy is shown at the bottom of the previous page.

We believe that the performance result above is useful to pro-vide insights into the level of expectation for deploying bio-metric voice verification in a real world environment. It is be-cause the study is not restricted to noise control environment,nor any specific phone devices—which are two important userfactors that are beyond the control of biometric system develop-ment or deployment. This is particularly so if such a system isto be deployed in a constraint-free environment.

IX. CONCLUSION

A set of conditions for modeling security and privacy in com-plex distributed systems involving multiple parties for biometricdata retrieval were proposed. Based on these conditions we de-veloped a novel, practical secure computation techniques forbiometric data retrieval and authentication. Our contribution inthis research is the techniques for biometric data retrieval anddata exchange that are provable private and secure according tothe conditions of our models. Of particular significance is thatthese techniques are practical for complex distributed systemsinvolving multiple parties. For proof-of-concept, we developeda speaker verification system and applied these secure compu-tation techniques to protect the security and privacy of the data

exchange for authentication purposes. Our experimental studyrevealed several important open questions.

First, all participants of the secure computation are assumedto be semi-honest. What if some participant deviates from therules of the secure communication protocol during the data ex-change? What level of reasonableness can one assume about theparticipant behavior? Second, the speaker verification experi-ment showed that allowing unrestricted choices of phone de-vices for biometric voice acquisition and unconstrained back-ground noise are two challenges that render additional study.If additional information about a device and noise could beacquired, how could it be used to enhance the threshold set-tings for an authentication? Furthermore, the experimental studyshowed that the accuracy performance could be improved usingthreshold interval. It would be an interesting study to further in-vestigate whether multiple threshold intervals or fusing identityinformation obtained from other biometric modality could sig-nificantly improve the performance. And if so, to what extendcould we successfully apply secure computation to balance theneed for security and privacy? These are some of the open ques-tions for future research.

ACKNOWLEDGMENT

This author is grateful to the reviewers for many useful sug-gestions that help to improve this manuscript, and to M. Bicerfor his effort on the proofreading stage.

**AUTHOR: Please provide a citing in the text for ref.[19]**

REFERENCES

[1] H. R. Fineburg and E. A. Intzekostas, “Understanding privacy laws inconnection with biometric identification in the United States and therest of world,” in Proc. Biometric Consortium Conf., Arlington, VA,Sep. 2005.

[2] E. Newton, L. Sweeney, and B. Malin, “Preserving privacy by de-iden-tifying facial images,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 2,pp. 232–243, Feb. 2005.

[3] R. Gross, E. Airoldi, B. Malin, and L. Sweeney, Integrating UtilityInto Face De-Identification**AUTHOR: WHAT KIND OF REFER-ENCE IS THIS? PLEASE PROVIDE MORE INFORMATION**.

[4] Y. Sutcu, Q. Li, and N. Memon, “Protecting biometric templates withsketch: Theory and practice,” IEEE Trans. Inf. Forensics Security, vol.2, no. 3, pp. 503–512, Sep. 2007.

[5] J. Wickramasuriya, M. Datt, S. Mehrotra, and N. Venkatasubramanian,“Privacy protecting data collection in media spaces,” in Proc. ACM Int.Conf. Multimedia, New York, 2004.

[6] S. K. Modi and S. J. Elliott, “Impact of image quality on performance:Comparison of young and elderly fingerprints,” in Proc. 6th Int. Conf.Recent Advances in Soft Computing, K. Sirlantzis, Ed., 2006, pp.449–454.

[7] A. C. Yao, “Protocols for secure computations,” in Proc. 23rd IEEESymp. Foundations of Computer Science, 1982.

[8] O. Goldreich, S. Micali, and A. Wigderson, “How to play any mentalgame,” in Proc. 19th Annu. ACM Symp. Theory of Computing, 1987,pp. 218–229.

[9] O. Goldreich, Secure Multi-Party Computation [Online]. Available:http://www.wisdom.weizmann.ac.il/~oded/pp.html

[10] W. Du and M. J. Atallah, “Privacy-preserving cooperative scientificcomputations,” in Proc. 14th IEEE Computer Security FoundationsWorkshop, 2001, pp. 273–282.

[11] W. Du and M. J. Atallah, “Secure multi-party computation problemsand their applications: A review and open problems,” in New SecurityParadigms Workshop, 2001, pp. 11–20.

[12] G. Brassard, C. Crepeau, and J. Robert, “All-or-nothing disclosureof secrets,” in Advances in Cryptology-Crypto86, LNCS, 1987, pp.234–238.

Page 20: Secure Computation for Biometric Data Security—Application ... documents/sy3.pdfBon K. Sy, Member, IEEE Abstract—The goal of this research is to develop provable secure computation

IEEE

Pro

of

Prin

t Ver

sion

10 IEEE SYSTEMS JOURNAL

[13] S. Evan, O. Goldreich, and A. Lempel, “A randomized protocol forsigning contracts,” Commun. ACM, vol. 28, pp. 637–647, 1985.

[14] M. Naor and B. Pinkas, “Efficient oblivious transfer protocols,” in Proc.20th Annu. ACM-SIAM Symp. Discrete Algorithms, 2001, pp. 448–457,D.C..

[15] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling,Numerical Recipes in C: The Art of Scientific Computing, 2nd ed.Cambridge, U.K.: Cambridge University Press, 1992.

[16] N. Muller, L. Magaia, and B. M. Herbst, “Singular value decomposi-tion, eigenfaces, and 3D reconstructions,” SIAM Rev., vol. 46, no. 3,pp. 518–545, 2004.

[17] P. Quintiliano and A. Rosa, “Face recognition applied to computerforensics,” Int. J. Forsenic Comput. Sci., vol. 1, pp. 19–27, 2006.

[18] T. Thrasyvoulou and S. Benton, Speech Parameterization Using theMel Scale (Part II) **AUTHOR: PLEASE PROVIDE MORE IN-FORMATION AND TYPE OF REFERENCE** 2003.

[19] G. Saha, S. Chakraborty, and S. Senapati, “A new silence removal &endpoint detection algorithm for speech & speaker recognition appli-cations,” in Proc. Nat. Conf. Communications, Kharagpur, India, Jan.2005.

[20] M. Reiter, Introduction to Usable Privacy & Security [Online]. Avail-able: http://www.cups.cs.cmu.edu/courses/ups-sp06/slides/060124-overview-privacy.ppt

[21] B. K. Sy, “Secure computation for privacy preserving biometric dataretrieval and authentication,” in EuroISI (Dec. 3–5) 2008, 2008, vol.5376, LNCS, pp. 143–154.

[22] [Online]. Available: http://www.qcwireless.net/biometric_ppr/he_primer.pdf

[23] N. Dehak et al., “Support vector machines and joint factor analysis forspeaker verification,” in Proc. Int. Conf. ICASSP 2009, pp. 4337–4340.

Bon K. Sy received the M.Sc. and Ph.D. degreesin electrical and computer engineering in 1986 and1988, respectively, from Northeastern University,Boston, MA.

He is a Computer Science Professor with the CityUniversity of New York. He over 70 publicationson funded research, two patents, and a book entitledInformation-Statistical Data Mining (New York:Springer, 2007). He is a certified CISSP and hasserved as a technology expert witness for NYCCTechnology Committee in Government Hearing

on broadband access. His current research interest is in secure multipartycomputation as applied to biometrics and IT security, privacy and trust.