Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
SECURE QUERY COMMUNICATION AND PROCESSING PROTOCOLS FOR
CRITICAL CLOUD APPLICATIONS
by
Liangliang Xiao
APPROVED BY SUPERVISORY COMMITTEE:
___________________________________________
I-ling Yen, Chair
___________________________________________
Ding-Zhu Du
___________________________________________
Dung T. Huynh
___________________________________________
Murat Kantarcioglu
Copyright 2012
Liangliang Xiao
All Rights Reserved
Dedicated to my family.
SECURE QUERY COMMUNICATION AND PROCESSING PROTOCOLS FOR
CRITICAL CLOUD APPLICATIONS
by
LIANGLIANG XIAO, B.S., M.S.
DISSERTATION
Presented to the Faculty of
The University of Texas at Dallas
in Partial Fulfillment
of the Requirements
for the Degree of
DOCTOR OF PHILOSOPHY IN
COMPUTER SCIENCE
THE UNIVERSITY OF TEXAS AT DALLAS
December, 2012
v
ACKNOWLEDGEMENTS
First and most of all, I would like to thank my advisor, Dr. I-ling Yen, for her guidance, advice,
and valuable assistance and unending support. She is truly the best advisor I have ever met. Her
sharp and clear insights into various aspects made sure that I focused on the most interesting
problems. Also, her painstaking working on helping me write this Dissertation and other
technical papers are greatly appreciated. Thank you, Dr. Yen, for everything you have done for
me. I would also like to show my deepest appreciation to the members of my Dissertation
committee and professors who have advised me: Dr. Dung T Huynh, Dr. Ding-Zhu Du, Dr.
Murat Kantarcioglu, Dr. Bhavani Thuraisingham, Dr. Farokh B. Bastani, and Dr. Vincent Ng.
I want to thank my classmates and friends who in one way or another were of assistance to me. I
appreciate all of them for their understanding, encouragements, and suggestions: Osbert Bastani,
Manghui Tu, Jicheng Fu, Tong Gao, Qingkai Ma, Jian Huang, Wenke Zhang, Yunqi Ye, Yunlin
Dong, Longsheng Xia, Panfeng Xue, Yansheng Zhang, Na Zhao, Wei Zhu, Daichao Lu, and
Guang Zhou.
Last but not the least, my family has been with me for all the decisions that I have made and
always stood by me. I want to thank my father and my mother for their nourishment and
guidance.
My research is partly sponsored by the NSF Net-Centric Software and Systems I/UCRC under
Award No. 0855944), the NSF Fundamental Research Program under Award No. 1128270, and
the Air Force Office of Scientific Research under Award No. FA-9550-08-1-0260.
vi
August 2012
vii
SECURE COMPUTATION AND COMMUNICATION PROTOCOLS FOR
CRITICAL CLOUD APPLICATIONS
Publication No. ___________________
Liangliang Xiao, Ph.D.
The University of Texas at Dallas, 2012
ABSTRACT
Supervising Professor: Dr. I-ling Yen
It has been a common practice for companies to outsource their online business logics to Web
hosting service providers for over a decade. Generally, databases as well as the business logics of
a company are hosted by a third party to save the IT management time and cost. The cloud
computing further pushes forward this paradigm. There are many cloud-based data centers which
store a very large amount of data from different sources and support data-centric computations.
Security can be a major concern for such data centers when the data they host are sensitive. A
data center may be attacked and compromised. Also, there exists the potential of insider attacks.
If there is a change in management, such as reorganization or buyout, the potential threat
increases due to the additional exposure to multiple management personnel and the unestablished
policies regarding the handling of critical information in such situations.
The security problems with the outsourced databases can be solved if the critical data are
encrypted. Naturally it leads to the problem of how the data center can perform computations on
viii
encrypted data. Some general computations in data intensive systems include arithmetic
operations and search (exact match search and range search). Several secure computation
techniques in the literature can help achieve these computations, including homomorphic
encryption (HE), order-preserving encryption (OPE), prefix-preserving encryption (PPE), and
multi-party secure computation. Multi-party secure computation can securely perform addition
and multiplication operations on the shared data but they require O(n2) communication overhead
for each multiplication operation where n is the number of shares and, hence, have a high
communication cost. HE allows the arithmetic computation (addition and multiplication) on the
plaintexts to be directly performed on the ciphertexts. OPE preserves the order of the plaintexts.
Thus, range search queries can be processed directly on the data. PPE requires that the length of
the longest common prefix of two plaintexts is equal to that of the ciphertexts. Thus, prefix-
matching search and range search can be performed directly on the data.
However, there are limitations in the existing works on HE, OPE, and PPE. The current circuit
based HE has very expensive computation time, and the security analysis of OPE and PPE are
not sufficient. Moreover, the existing HE, OPE, and PPE all consider one encryption key. Thus,
it is difficult to apply them to multi-user systems where the users have different access privileges
to the database. In this Dissertation, we overcome some of the limitations of HE/OPE/PPE in
existing works. We construct an efficient (non-circuit based) HE scheme and prove its security,
analyze the security of OPE and PPE schemes, and develop mechanisms for HE, OPE, PPE to
extend them to multi-user systems. The results presented in this Dissertation greatly enhance the
state-of-the-art in secure computations.
ix
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ………………………………………………………………………v
ABSTRACT …………………………………………………………………………………….vii
LIST OF TABLES ………………………………………………………………………………xi
LIST OF FIGURES ……………………………………………………………………………..xii
CHAPTER 1 INTRODUCTION ………………………………………………………………...1
1.1 Homomorphic Encryption …………………………………………………………4
1.2 Order-Preserving Encryption ……………………………………………………...6
1.3 Prefix-Preserving Encryption ……………………………………………………...8
1.4 Overview and Contributions ………………………………………………………9
1.5 Dissertation Layout ………………………………………………………………12
CHAPTER 2 LITERATURE SURVEY ………………………………………………………..14
2.1 Homomorphic encryption ………………………………………………………..15
2.2 Order-preserving encryption ……………………………………………………..16
2.3 Prefix-preserving encryption ……………………………………………..20
CHAPTER 3 SYSTEM MODLE ………………………………………………………………23
3.1 Single-user and Multiple-user Systems ………………………………………….24
3.2 Database Model ………………………………………………………………….25
3.3 Basic Definitions of the Encryption Algorithms ………………………………...26
3.4 Request and Response Protocols ………………………………………………...29
3.5 Limitations of Database Encryption ……………………………………………..30
3.6 Adversary Model …………………………………………………………………30
CHAPTER 4 HOMOMORPHIC ENCRYPTION ……………………………………………..33
4.1 Preliminaries ……………………………………………………………………..34
4.2 The Homomorphic Encryption Scheme ………………………………38
x
4.3 Homomorphic Encryption in Multi-user Systems ……………………………….54
4.4 Performance of Our Homomorphic Encryption Scheme ………………………...58
4.5 Summary …………………………………………………………………………61
CHAPTER 5 ORDER-PRESERVING ENCRYPTION ……………………………………….62
5.1 Background ………………………………………………………………………67
5.2 Security of OPE ………………………………………………………………….71
5.3 The limitation of the Ideal OPE Object ………………………………………….75
5.4 Generalized OPE …………………………………………………………………80
5.5 Overview of OPE to Multi-user Systems ………………………………………...96
5.6 The Basic DOPE Protocol for Multi-user Systems ………………………………99
5.7 The OE-DOPE Protocol for Multi-user Systems ……………………………….105
5.8 Performance Study ……………………………………………………………...112
5.9 Summary ………………………………………………………………………..116
CHAPTER 6 PREFIX-PRESERVING ENCRYPTION ………………………………………………………….118
6.1 Ideal PPE Object ………………………………………………………………..120
6.2 Security of PPE …………………………………………………………………122
6.3 PPE for Multi-user Systems …………………………………………………….124
6.4 Performance Study ……………………………………………………………...135
6.5 Summary ………………………………………………………………………..138
CHAPTER 7 SUMMARY AND FUTURE RESEARCH ……………………………………140
APPENDIX …………………………………………………………………………………….145
A.1 Security Proof for OPE …………………………………………………………145
A.2 Security Proof for PPE ………………………………………………………….165
REFERENCES ………………………………………………………………………………...174
VITA
xi
LIST OF TABLES
Number Page
4.1 The performance of the communication protocol ……………………………………….60
5.1 Performance of Hyper and Poly OPE schemes ………………………………………...113
5.2 Comparisons of the basic-DOPE and OE-DOPE with Hyper OPE scheme …………...114
5.3 Comparisons of the basic-DOPE and OE-DOPE with Poly OPE scheme …………….115
5.4 Performances of the OE-DOPE for different q ………………………………………...116
6.1 Encryption Cost (in milliseconds) Comparisons for Different Protocols ……………...138
xii
LIST OF FIGURES
Number Page
2.1 The PPE algorithm ………………………………………………………………………21
3.1 Single-user and Multi-user Systems …………………………………………………….25
4.1 Request processing protocol ……………………………………………………………57
4.2 The size of 𝑚 against the speed of addition and multiplication ………………………...59
5.1 DOPE scheme 𝒮pλ,μ
(𝒦pλ,μ
, pλ,μ
, 𝓓pλ,μ
) ……………………………………………….100
5.2 The pseudo code for the basic-DOPE protocol ………………………………………...104
5.3 The structure and message flow of the basic-DOPE protocol …………………………104
5.4 The OE-DOPE protocol ………………………………………………………………..108
5.5 Message Flow of the OE-DOPE protocol ……………………………………………...109
6.1 The DLLCP attack ……………………………………………………………………..122
6.2 The Protocol 𝑃𝑑 ……………………………………………………………………….129
6.3 The Reduction Algorithm RA …………………………………………………………132
6.4 Computation Cost of Secret Sharing over Zp and G (Share Number m = 6) …………..136
6.3 Encryption Cost Comparisons for Different Protocols ………………………………..137
A.1 Numerically Computed c' = z0/logm Against m ……………………………………….165
1
CHAPTER 1
INTRODUCTION
It has been a common practice for companies to outsource their online business logics to Web
hosting service providers for over a decade. Generally, databases as well as the business logics of
a company are hosted by a third party to save the IT management time and cost. The cloud
computing further pushes forward this paradigm. There are many cloud based data centers which
store a very large amount of data from different sources and support data-centric computation.
Security can be a major concern for such data centers when the data they host are sensitive. A
data center may be attacked and compromised. Also, there are the potential of insider attacks. If
there is a change in management, such as reorganization or buyout [44], the potential threat
increases due to the additional exposure to multiple management personnel and the unestablished
policies regarding the handling of critical information in such situations.
The security problems with the outsourced databases can be solved if the critical data are
encrypted. Naturally it leads to the problem of how the data center can perform computation on
encrypted data [61]. Some general computations in data intensive systems include arithmetic
operations and search (exact match search and range search). Correspondingly, several secure
computation techniques, such as homomorphic encryption (HE) [14, 16, 27, 31, 47, 50, 59, 62,
65, 67, 71], order-preserving encryption (OPE) [3, 6, 12, 13, 38, 39, 53], and prefix-preserving
encryption (PPE) [4, 48, 78] are promising solutions to this problem.
2
HE allows the computation (addition and multiplication) on the plaintexts to be directly
performed on the ciphertexts. In other words, 𝐻𝐸(𝑥 + 𝑦) and 𝐻𝐸 𝑥 ⋅ 𝑦 can be computed from
𝐻𝐸(x) and 𝐻𝐸(y) by publically known algorithms, where 𝐻𝐸 is the HE algorithm and x and y
are two plaintexts. Hence, most of the arithmetic computations can be directly performed on the
cipertexts without needing to first decrypt them.
OPE preserves the order of the plaintexts, i.e., the encryption algorithm 𝑂𝑃𝐸 satisfies x <
y 𝑂𝑃𝐸 (x) < 𝑂𝑃𝐸 (y) for any plaintexts x and y. Thus range search queries can be processed
directly on the ciphertext. OPE can facilitate exact-match search as well. It is not difficult to
realize exact-match search by using deterministic encryption schemes [8, 9, 11], where the
encryption algorithm 𝐷𝐸 satisfies x = y 𝐷𝐸 (x) = 𝐷𝐸 (y). Thus, the equality test of the
plaintexts x and y can be directly performed on the ciphertexts 𝐷𝐸(x) and 𝐷𝐸(y). But without
knowing the order of the data, it is difficult to implement efficient search algorithms unless the
content-addressable memory is used [54]. Thus, it is beneficial to use OPE for exact-match
search queries as well. Though the ordering information regarding the data will be revealed in
OPE, the full plaintext is still irreversible.
PPE requires that the length of the longest common prefix of plaintexts x and y equals to
the longest common prefix of 𝑃𝑃𝐸 (x) and 𝑃𝑃𝐸 (y), where 𝑃𝑃𝐸 is the encryption algorithm.
Thus the prefix-matching operation can be performed directly on the ciphertext. PPE can also
support range search since range search can be transformed into prefix-matching searches [48].
In a dataset, different users may have different access rights to its data. In some cases, the
users in an organization may have the same privilege to access the entire dataset stored at an
3
external service provider. Thus, all users can be treated as the same single user and we call such
systems the single-user systems. In single-user systems, the master encryption keys can be
distributed to all the users since there is no need to enforce any access control policies. However,
in case different users have different access rights, then a user may not be able to access some
data that are readable or writable by another user. We call such systems the multi-user systems.
In multi-user systems the master encryption keys cannot be distributed to all the users; otherwise
the system will not be able to enforce access control [10, 69]. Also, the server can collude with
any one of the users to compromise the entire dataset. For classical encryption schemes, a
potential solution is to use different encryption keys for different data. But it may not be easy to
design an HE/OPE/PPE scheme to support computation (such as arithmetic computation and
range search) on data encrypted using different keys. Hence, some key management and secure
communication schemes have to be established to protect the master encryption key while
allowing multiple users (with different access rights) to encrypt and decrypt data that can be used
by the data center. In the following three sections, we summarize existing secure computation
techniques and their problems. We also discuss their deficiencies in handling multi-user systems.
In Section 1.4, we discuss our efforts in improving the existing secure computation approaches
and summarize our main contributions. Section 1.5 gives the layout of the remaining part of this
dissertation.
4
1.1 Homomorphic Encryption
Homomorphic encryption (HE) [14, 16, 27, 31, 47, 50, 59, 62, 65, 67, 71] can be public-
key based or symmetric-key based. It is a promising solution to allow arithmetic operations to
be performed on the encrypted data. In HE, the encryption algorithm satisfies
(x + y) = (x) ⊞ (y) and (x ∙y) = (x) ⊡ (y)
for any plaintexts x and y, where ⊞ and ⊡ refer to two (special) operations on two ciphertexts. It
hasbeenalongtimesincepeopleknowtheexistenceof“partial”HEs,whicharehomomorphic
with respect to one arithmetic operation (either addition or multiplication). For instance, Paillier
encryption [55] isa“partial”HEsuchthattheencryptionalgorithmishomomorphicwithrespect
to addition but not homomorphic with respect to multiplication, and the well known RSA [60] is
another “partial” HE such that the encryption algorithm is homomorphic with respect to
multiplication but not homomorphic with respect to addition. However, the problem of
constructingHEs (in some literatures they use the term “fully” homomorphic encryption and
denoted by FHE) with respect to both addition and multiplication is open for decades. Polly
Cracker [27] is one of the earliest proposed HE algorithms. But unfortunately the security of
Polly Cracker has not been proved. Some approaches weaken the requirement of HE, e.g. the HE
in [14] only allows one multiplication operation and the HE in [62] doubles the size of the
ciphertexts after each operation and, hence, only allowed logarithmic many operations to be
performed on the ciphertexts. Recently, Gentry constructs the first HE [31]. Since then, many
other constructions [16, 65, 67, 71] follows up. We call those constructions the Boolean circuit
based HEs. They are based on different hard problems but have similar design idea. In Boolean
5
circuit based HE, the inputs are expressed by binary strings and the computation is represented
by a Boolean circuit accordingly. The encryption algorithm adds some noise into the plaintext
such that the decryption algorithm can successfully decrypt it if the noise is below some
boundary. The binary operation in the circuit can be directly performed on the ciphertexts, but
the noise in the ciphertext will accumulate when the computation continues. Thus, a
bootstrapping process is needed to decrease the noise in the intermediate ciphertext to prevent it
from exceeding the boundary. All existing Boolean circuit based HEs have high computation
complexity. Although efforts [16, 67] have been made to decrease the complexity, the
computation time is still too expensive to be applied in practice [32, 67]. Also, existing HE
schemes only consider a single key, which is infeasible for a practical system with multiple
users. These problems are further discussed below.
Computation time: In the existing Boolean circuit based HE schemes, they evaluate
functions by performing the binary operations on the corresponding circuit. Moreover, it requires
using the bootstrapping technique to decrease the noise in the ciphertexts. Because of the two
factors, the existing HE schemes are too expensive to be implemented in real application. For
example, Gentry’shomomorphicencryptionscheme[31] requires more than 900 seconds to add
two 32-bit integers and more than 18 hours to multiply two 32-bit integers (based on the
performance data given in [32]). It is therefore desirable to improve the efficiency of HE
schemes.
Single key problem: Although in HE schemes, the database does not need to know the
decryption key and can perform computation on encrypted data, the decryption key is still
6
needed by the users in order to decrypt the retrieved data. The existing HE schemes implicitly
assume that the users know the decryption keys. But giving the decryption key to all the users
will prevent the system from achieving proper access control. Also, the server can collude with
any user to compromise the entire database. A potential solution is to use different encryption
and decryption keys for different data. But it may not be easy to design an HE scheme to support
computations on data that are encrypted using different encryption keys.
1.2 Order-Preserving Encryption
Order preserving encryption (OPE) [3, 6, 12, 13, 38, 39, 53] is deterministic symmetric-
key based. It requires that the ciphertexts preserve the order of the plaintexts, i.e.
x < y (x) < (y)
for any plaintexts x and y, where is the OPE algorithm. Thus, range search can be performed
directly on the encrypted data efficiently using conventional DBMS techniques, such as
establishing the B+ tree on ciphertexts. OPE do not have perfect security since the ciphertexts
leak the ordering information of the plaintexts. But on the other hand, when it is desirable to
have a reasonable performance for range query processing while achieving a reasonable degree
of security protection, the OPE scheme can be used as long as there is a good understanding of
its security risks. Unfortunately, existing security analysis of OPE is not sufficient. Also, similar
to HE, existing OPE schemes only consider a single encryption key and can be impractical in
real systems. In the following, the problems in existing approaches are further elaborated.
7
Security analysis: The existing security analysis for the OPE schemes is not sufficient.
Most of them either prove the security against the author-defined attacks, or illustrate the
security based on experiments. The authors in [12] initiate the cryptographic study of OPE
schemes. They first define the ideal OPE object where the encryption function is uniformly
randomly selected from all order-preserving functions. Since the ideal OPE object is not
computationally infeasible, they construct the real OPE scheme which is computationally
indistinguishable from the ideal OPE object. Thus, the real OPE scheme achieves the security
"implied" by the ideal OPE object. However, the security of the ideal objects has not yet been
analyzed. If the security of the ideal OPE object is unacceptable, then the proof of
indistinguishability between the real OPE scheme and the ideal OPE object is not very indicative
in security assurance.
Single key problem: OPE and HE have the similar security problem when applied to multi-
user systems. Unlike HE (which can be public key based or symmetric key based), OPE is
symmetric key based only. Thus, the users need to both read data from and write data to the
database. In conventional OPE schemes, they implicitly assume that the users know the master
encryption key. But as discussed in Section 1.1, it is not secure to let all users with different
access privileges know the key. Meanwhile it may not be easy to design an OPE to support
comparisons on data if they are encrypted using different keys.
8
1.3 Prefix-Preserving Encryption
Prefix-preserving encryption (PPE) [4, 48, 78] is a deterministic symmetric-key based
encryption algorithm. The longest common prefix of any two ciphertexts is of the same length as
the longest common prefix of the corresponding plaintexts. More formally, given any plaintexts
x and y,
|LCP(x, y)| = |LCP((x), (y))|
where is the PPE algorithm and LCP denotes the function that returns the length of the longest
common prefix of the given two data. Since the plaintext has a matching prefix of x if and only if
the corresponding ciphertext has a matching prefix of (x), the prefix-matching search can be
realized in logarithmic-time if the ciphertexts are organized in some standard tree data structures.
Besides prefix-matching search, PPE can also support range search on encrypted data because
the range query on [a, b] can be transformed into at most 2log2b − 1many prefix-matching
queries [48]. Like OPE, the security of PPE is weakened since some prefix information of
plaintexts is leaked from ciphertexts. Thus, security analysis of PPE becomes crucial. However,
the existing security analysis of PPE is not sufficient. Also, similar to OPE, existing PPE
schemes only consider a single encryption key, which is infeasible for a practical system. In the
following, the problems in existing approaches are further discussed.
Security analysis: The existing security analysis for PPE schemes is not sufficient. Most of
the existing security proofs are either against the author-defined attacks, or based on
experiments. The authors in [4] initiate the cryptographic study of PPE schemes. Analogous to
the security analysis approach for OPE in [12], they first define the ideal PPE object where the
9
encryption function is uniformly randomly selected from all prefix-preserving functions, and
construct the real PPE scheme which is computationally indistinguishable from the ideal PPE
object. Thus, the real PPE scheme achieves the security "implied" by the ideal PPE object.
However, such approach has the same problem as that of OPE: the security of the ideal PPE
objects has not yet been analyzed and, hence, the security analysis for PPE schemes is not
complete.
Single key problem: It is similar to the single key problem for OPE as discussed in
Section 1.2.
1.4 Overview and Contributions
In this Dissertation, we attempt to overcome some of the limitations on computation on
encrypted data (HE/OPE/PPE) in the existing works: We construct an efficient (non-circuit
based) HE scheme and prove its security, analyze the security of OPE and PPE schemes, and
develop mechanisms for HE/OPE/PPE to extend them to multi-user systems. We further
elaborate our contributions in this Dissertation in the following.
The contributions on HE
We construct a non-circuit based encryption scheme that is homomorphic in both
addition and multiplication. We downgrade the security requirement to achieve
efficiency. Although the algorithm is not semantically secure, we have proved that
when facing an adversary with up to 𝑚 ln poly(𝜆) chosen plaintext and ciphertext pairs,
the security of our algorithm is equivalent to the large integer factorization problem.
10
Here, 𝑚 is any predetermined constant that is polynomial in the security parameter 𝜆.
Note that the security of the commonly used RSA encryption is no harder than the large
integer factorization problem [60]. Thus, our homomorphic encryption scheme can be
used in applications where semantic security is not required and one-wayness security
is sufficient.
We conduct experiments to compare the performance of our algorithm with that of
Gentry's algorithm. When withstand an attack with at least 1000 chosen plaintext and
ciphertext pairs, our algorithm runs addition in only tenth of a millisecond and runs
multiplication in 108 milliseconds. In contrast, Gentry’s homomorphic encryption
scheme [31] requires more than 900 seconds to add two 32-bit integers and more than
18 hours to multiply two 32-bit integers (based on the performance data given in [32]).
As can be seen, our algorithm has real world applicability, especially when the large-
scale plaintext attack is not an issue.
We consider multi-user systems and propose a protocol based on similarity transform to
allow our symmetric-key based homomorphic encryption scheme to be used in such
systems. In the request protocol, the secret data in the query will be encrypted by the
distinct user key, and then transformed to the same master key by the database server
based on the similarity transformation. And the response protocol is similar to the
request process but in a reverse way.
The contributions on OPE
11
We prove the one-wayness security of the 𝑖𝑑eal OPE object to complete the security
analysis of the OPE constructed in [12] (A similar result is also given in [13] after our
work was published as a technical report and submitted to conferences). According to
the result, the real OPE schemes which are computationally indistinguishable from the
ideal OPE object (e.g. the real OPE scheme constructed in [12]) also achieve the one-
wayness security.
We show that the ideal OPE object is not the highest possible secure OPE when the
plaintext domain contains two elements. It raises the question of how to construct more
secure OPE for general plaintext domains. We then present two generalized OPE
(GOPE) algorithms that satisfy stronger notions of security than the ideal OPE object.
We develop protocols to support multi-user data-centric systems where any OPE can be
applied to protect the sensitive data that need to be searched in encrypted form. The
digit based OPE (DOPE) protocol is invented to make any OPE become a distributed
OPE. Accordingly the encryption key is distributed to a group of key agents to assure
that they can distributely encrypt the data and no entity knows the key. The oblivious
encryption (OE) protocol based on the oblivious transfer concept is also proposed to
further enhance the security of DOPE protocol. We prove that the OE-DOPE protocol
achieves the one-wayness security if the underlying OPE has the one-wayness security.
Experiments are conducted to show that our protocols have reasonable overheads.
The contributions on PPE
12
We successfully define a security notion, IND-PCPA, to exactly qualify the security of
PPE. Specifically, we design the DLLCP attack to show that it is necessary to weaken
the security notion from IND-CPA to IND-PCPA for the ideal PPE object. We then
prove that the ideal PPE object is secure under IND-PCPA. Thus the PPE schemes
which are computationally indistinguishable from the ideal PPE object achieve the
highest security level IND-PCPA.
We develop a distributed PPE protocol to support the multi-user systems, making PPE
feasible for practical use. The major invention in this protocol is the distributed PPE
encryption by a group of key agents. We cryptographically prove the security of our
protocol by defining an ideal model for PPE protocols and showing that our PPE
protocol is computationally indistinguishable from the ideal model. Experiments are
conducted to study the performance of the protocol, showing that our protocols have
reasonable overheads.
1.5 Dissertation Layout
The rest of this dissertation is organized as follows. First, a thorough literature survey is
given in Chapter 2. Specifically, it discusses the state-of-the-art technologies and research works
in distributed storage system, HE, OPE, and PPE.
In Chapter 3, we introduce our system model including the single-user and multi-user
systems, database model, various encryption schemes, request and response protocols,
limitations of database encryption, and adversary model.
13
In Chapter 4, we design a non-circuit based homomorphic encryption scheme and extend it
to multi-user systems. We analyze the security of our algorithm and conduct the experiments to
compare the performance of our algorithm with the existing ones.
In Chapter 5, we study the security of OPE schemes, propose and construct generalized
OPE (GOPE), and extend OPE to multi-user systems. First, we prove that the ideal OPE object
achieves the one-wayness security and, hence, the real OPE schemes which are computationally
indistinguishable from the ideal OPE object also achieve the one-wayness security. Then we
show that the ideal OPE object may not be the most secure OPE and, hence, propose and
construct GOPE to improve the security of OPE. In order to extend OPE to multi-user systems,
we develop digit based OPE (DOPE) which can be based on any OPE, the corresponding basic
DOPE protocol, and further improve the security of DOPE protocol to OE-DOPE protocol by the
techniques including OE (oblivious encryption), vector permutation, and data mutation.
In Chapter 6, we study the security of PPE schemes and extend PPE to multi-user systems.
We first invent the security notion IND-PCPA and prove it qualifies the security of the ideal PPE
object. Thus, the real PPE schemes which are computationally indistinguishable from the ideal
PPE object are also secure under IND-PCPA. Then we revise an existing PPE (secure under
IND-PCPA)tothe“distributed”versionsothatitcanbeextendedtomulti-user systems.
In Chapter 7, we conclude the PhD research and discuss some future research directions.
14
CHAPTER 2
LITERATURE SURVEY
We consider secure computation as the capabilities of performing computation on encrypted or
secret shared data. The literature of secure computation includes multi-party computation [18,
19, 34, 80], homomorphic encryption, order-preserving encryption, and prefix-preserving
encryption.
In multiparty computation, each data is mapped to n shares and distributed to n servers.
The data can be reconstructed from any t (< n) shares, but any t−1sharesrevealsnoinformation
about the original data. The servers can execute the multi-party computation protocol on the
shares to achieve addition and/or multiplication of any two data. During both the storage time
and the computation time, the adversary cannot retrieve any information about the data even it
compromises t−1servers.Generally,computationonsecretshareddatacannotbedonewithout
information exchanges between the servers holding the shares. Thus, communication cost can be
a concern in multi-party computation.
In homomorphic encryption, the data is encrypted and the computation (addition and
multiplication) on any two data can be directly performed on the ciphertexts. For order-
preserving encryption, the comparison operation on any two data can be directly performed on
the ciphertexts. For prefix-preserving encryption, the prefix matching operation on any two data
can be directly performed on the ciphertexts. In this Dissertation we focus on the secure
computation based on homomorphic encryption, order-preserving encryption, and prefix-
15
preserving encryption. In the following subsections, existing works on homomorphic encryption,
order-preserving encryption, and prefix-preserving encryption are discussed.
2.1 Homomorphic Encryption
Homomorphic encryption (HE) enables the arithmetic computation (addition and
multiplication) on the plaintexts to be directly performed on the ciphertexts. The main works on
HE algorithms are Boolean circuit based, where the plaintext is a single bit. All operations on
various operand types can then be achieved by constructing the corresponding circuits. In [14],
an encryption scheme based on elliptic curve is proposed, which allows computation on the
ciphertexts directly if the computation involves at most one multiplication with any number of
additions. In [62], a homomorphic encryption scheme has been constructed. This scheme doubles
the ciphertext for each binary operation. In [31], Gentry designs a HE scheme based on the
mathematical object ideal lattices and uses the bootstrapping technique to clean the noise in the
ciphertexts. It is semantically secure and the security of the scheme is based on the splitkey
distinguishing problem. However the computational complexity of the scheme is 𝑂 (𝜆6) for
evaluating a gate over two bits, where 𝜆 is the security parameter and
𝑂 𝑔 𝑥 = 𝑂 𝑔 𝑥 log𝑘𝑔 𝑥 for some 𝑘 . Dijk et al. conceptually simplify Gentry’s
construction by using a different hard problem, the approximate-GCD problem over
integers [71]. The authors in [67] improve the computational complexity of HE in [31] from
𝑂 𝜆6 to 𝑂 (𝜆3), and the computational complexity of HE in [71] from 𝑂 (𝜆17) to 𝑂 (𝜆7.25). The
authors in [16] further improve the computational complexity of [31] to 𝑂 (𝜆2) . In [65], a
16
different HE scheme is constructed based on the elementary theory of algebraic number fields.
However, all of these existing HE schemes have high time complexities [32, 67].
Another approach to the constructions of HE is non-circuit based. The idea is to construct
HE algorithms with plaintexts over a finite domain such as finite field where the addition and
multiplication can be performed directly on the ciphertexts. Compared to circuit-based
approaches, this approach can be more efficient since it does not require additional circuit
computation overhead. In [27], the HE algorithm called Polly Cracker is proposed, which
encrypts a plaintext over a field by adding a random polynomial that vanishes under operations
using the technique of Gröbner bases. Following Fellows and Koblitz’swork,manynon-circuit
based HE algorithms using Gröbner bases have been proposed [47] [50] [59]; however, they
have all either been broken [17] [26] [28] [42] [68] or lack conclusive security evidence. In [5],
Armknecht et al. construct a symmetric-key homomorphic encryption scheme based on coding
theory, where the plaintext is encrypted to an b-dimensional codeword (vector). It is semantic
secure against b −1knownplaintextattacksifthe Decisional Synchronized Codeword Problem
(DSCP) is hard. However, the scheme can only support pre-determined (fixed) number of
multiplications.
2.2 Order-Preserving Encryption
Order preserving encryption (OPE) [3, 6, 12, 13, 39, 53] is a very important technique for
database related applications due to its capability of supporting range query processing [4, 15,
41, 48, 64, 66] directly on encrypted data without needing to decrypt them and expose them to
potential attackers who may have compromised the system.
17
There are various constructions of the OPE scheme. In [6], the proposed OPE algorithm
first generates a sequence of random numbers (r1,…,rn,…)andthenencryptsanintegerx to the
sum of the first x random numbers (i.e. E(x) = 1≤i≤x ri). In [39], a sequence of strictly increasing
polynomial functions (f1,…,fn) are used to construct the OPE algorithm. The encryption of an
integer x is the outcome of the iterative operations of those functions on x (i.e. E(x) =
(f1○…○fn)(x)). In [39], the OPE algorithm is constructed by using a mapping function composed
of a partition and an identification functions. The partition function divides the plaintext domain
into multiple partitions, and the identification function assigns an ordered identifier (integer) to
each partition. Then, the mapping function maps the plaintext x to the identifier of the partition it
belongs to. Since different integers may be mapped to the same identifier, the OPE algorithm
may output false comparison results. In [3], the authors construct the OPE algorithm following
three steps: modeling the input and target distributions, flattening the plaintext database into a
flat database, and transforming the flat database into the cipher database.
OPE do not have perfect security since the ciphertexts can leak the ordering information of
the plaintexts. But on the other hand, when it is desirable to have a reasonable performance for
range query processing while achieving a reasonable degree of security protection, the OPE
scheme can be used as long as there is a good understanding of its security risks. However, how
secure is the OPE scheme has not been sufficiently analyzed and further research is needed to
investigate its security properties. Some partial security analysis has been performed on some
OPE algorithms. In [3], the authors construct an OPE scheme and analyze its security, but the
analysis has some limitations: (1) It assumes that the adversaries can only view ciphertexts.
18
(2) The analysis is not based on cryptographic analysis, but based on experiments, i.e., they use
Kolmogorov-Smirnov test to show that the distribution of the ciphertexts and the target
distribution cannot be distinguished. The authors in [12] initiate the cryptographic study of OPE
schemes. They first define the security notion IND-OCPA where the adversary can query the
left-or-right encryption oracle with ordered plaintext pairs. An encryption scheme is secure under
IND-OCPA if the advantage of an efficient adversary (probability to distinguish whether the
returned ciphertexts are encrypted from the left or the right plaintexts) is negligible. It shows that
the OPE scheme is susceptible to the big jump attack, and cannot be secure under IND-OCPA
unless its ciphertext-space is exponential in the size of the plaintext-space. Consequently, there is
no efficient OPE scheme that is secure under IND-OCPA for superpolynomial-sized domains.
Then the paper takes an alternative approach: it defines the security notion POPF-CCA and
constructs an OPE scheme that is secure under POPF-CCA. In POPF-CCA, an “ideal” OPE
object is defined where the encryption function is uniformly randomly selected from all order-
preserving functions. For plaintext domain [m] = {i |1≤i ≤m} and ciphertext range [n] = {j |1≤
j ≤n}, for example, 𝑛 = 𝑚2 and 𝑚 = Ω(2𝜆) where λ is security parameter, it is computationally
infeasible to generate the encryption function of the ideal OPE object since it involves to
generate exponentially many (w.r.t. λ) random bits. Thus, the ideal OPE object is used as the
security goal and a “real” OPE scheme is said to be secure under POPF-CCA if it is
computationally indistinguishable from the ideal OPE object. In [12], two real OPE schemes are
constructed, where a plaintext x is mapped to its ciphertext by a “binary-search-like” process in
the ciphertext space (plaintext space) with the searched points being mapped back to the
19
plaintext (ciphertext) space using the hypergeometric distribution (negative hypergeometric
distribution). More specifically, let the plaintext domain be [mi] and the ciphertext range be [ni]
in step i. For the middle point yi [ni] (xi [mi]), it will be mapped to xi [mi] (yi [mi]) with
the probability 𝑦𝑖
𝑥𝑖 ⋅
𝑛𝑖 − 𝑦𝑖
𝑚𝑖 − 𝑥𝑖
𝑛𝑖
𝑚𝑖
−1
( 𝑦𝑖 − 1𝑥𝑖 − 1
⋅ 𝑛𝑖 − 𝑦𝑖
𝑚𝑖 − 𝑥𝑖
𝑛𝑖
𝑚𝑖
−1
). It has been proved
in [12] that the real OPE scheme is computationally indistinguishable to the ideal OPE object. In
other words, the real OPE scheme is secure under POPF-CCA.
However, while in [12], the authors reduce the security of real OPE scheme to the security
of the ideal OPE object, they do not analyze the security of the ideal OPE object. As an obvious
counter example, the ideal object is not secure when 𝑛 = 𝑚. Indeed, there exists no secure OPE
scheme when 𝑛 = 𝑚 because the encryption algorithm is necessarily the identity function.
In [12], the authors left open the questions of how to measure the security of ideal OPE object. In
[13, 77], it has been shown that the ideal OPE object achieves one-wayness security and, hence,
the real OPE schemes which are computationally indistinguishable from the ideal OPE object
(e.g. the construction in [12]) also achieve one-wayness security. In [13] the authors also
generalize the concept of OPE to EOE (efficient orderable encryption), where the ciphertexts are
allowed to be non-numerical data objects so that a dedicated comparison algorithm is needed to
compare theciphertexts.Thena“committed”EOEisconstructedwith theassumption that the
database is static and completed known to the user in advance of encryption so that the user can
encrypt the databaseonceforall.Theconstructed“committed”EOEisprovedtobesecureunder
IND-OCPA.
20
2.3 Prefix-Preserving Encryption
Prefix-preserving encryption (PPE) [4, 48, 78] is a special encryption such that the longest
common prefix of any two ciphertexts is of the same length as the longest common prefix of the
corresponding plaintexts. Such property enables PPE to support IP addresses anonymization,
prefix-matching search or even range search on ciphertexts.
PPE was first proposed in [78] for securely processing real-world Internet traffic traces
without disclosing the IP addresses in them. Since the private information regarding the senders
and receivers of packets may be inferred from the trace, it is highly desirable for the traffic trace
owners to anonymize the IP addresses before making them publicly available for research (e.g.,
routing performance analysis, or clustering of end-systems). However, the classical encryption
algorithms (e.g. AES) will destroy the prefix relationships among the IP addresses which are
important information for the research. Hence the authors in [78] construct a PPE to anonymize
the IP addresses. It is constructed bit by bit, where the i-th bit of the ciphertext is constructed by
applying an instantiating function to the previous i −1bitsoftheplaintexttopreservetheprefix
consistency. Specifically, let x = x1…xl {0,1}l be the plaintext and y = y1…yl {0,1}
l be the
corresponding ciphertext. Then
yi = xi L(R(x1,…,xi−1, k))
for1≤i ≤l,where“”denotestheXOR(exclusive or) operator, L denotes the least significant
bit operator, R can be any pseudorandom function (L○R is called the instantiating function), and
k is the encryption key.
21
In [4], the authors designed a PPE to support secure processing of prefix-matching queries
(such as searching area-code starting with 310), where the prefix is generalized to a sequence of
blocks (e.g. 64 bits or 4 UTF-16 characters) instead of a sequence of bits in [78]. The
construction is shown in Figure 2.1. Let m = m[1]…m[l] be the plaintext partitioned into l blocks
and C = C[1]…C[l] be the ciphertext partitioned into l blocks. Each block of the ciphertext is
constructed iteratively from the plaintext by a block cipher E with the keys eK and eK’, and a
hash function H with the key hK.
Figure 2.1. The PPE algorithm.
To search for matching entries with a prefix x, the system encrypts x into (x) and use (x) as the
prefix to perform prefix matching on the ciphertexts. Since the plaintext has a matching prefix of
x if and only if the corresponding ciphertext has a matching prefix of (𝑥), the prefix-matching
computation can be achieved in logarithmic-time if the ciphertexts are organized in some
standard tree data structures.
In [48], the authors suggested that PPE constructed in [78] can also be used to support
range search on encrypted data. For example, to search for all data in the interval [32, 111] =
[00100000, 01101111], the query can be transformed into prefix-matching queries for prefixes
C[0] 0n; m[0] 0
n;
For i =1,…,l do
R m[i−1] || C[i−1];
P[i] H(hK, R) m[i];
C[i] E(eK, P[i]) H(hK, R);
R m[l] || C[l];
P[l+1] H(hK, R) 0n;
C[l+1] E(eK’, P[l+1]) H(hK, R);
Return C[1]… C[l+1];
22
{001*, 010*, 0110*} where * denotes an arbitrary suffix. Generally, for a given range [a, b], the
range query can be transformed into at most 2log2b −1prefix-matching queries.
Like OPE, the existing work does not offer sufficient security analysis of the PPE schemes.
Most of the existing security analyses of the PPE schemes are informal: either they prove the
security of the PPE schemes against the author-defined attacks, or they illustrate the security of
the PPE schemes based on experiments. The authors in [4] initiate the cryptographic study of
PPE scheme, where the security notion is defined based on the ideal PPE object (which is
analogous to OPE). The ideal PPE object is a special PPE such that the encryption function is
uniformly randomly selected from all prefix-preserving functions. Although the ideal PPE object
cannot be constructed efficiently, it is used as the security goal. A real PPE scheme is defined to
be “secure” if it is computationally indistinguishable from the ideal PPE object. According to
this security definition, the authors proved that their real PPE construction (in figure 2.1) is
“secure”(i.e.computationallyindistinguishablefromtheidealPPEobject). Infact, theauthors
in [78] have also proved that their PPE scheme (yi = xi L(R(x1,…,xi−1, k))) is computationally
indistinguishable from the ideal PPE object, except that they did not user the crypto
terminologies. Unfortunately, the current cryptographic security analyses of PPE scheme are not
complete since no existing work analyzes the security of the ideal PPE object.
23
CHAPTER 3
SYSTEM MODEL
In order to protect the database system against potential attacks, the data stored in the database
need to be encrypted so that the adversary cannot retrieve the data even if the database is
compromised. Various secure computation schemes (HE, OPE, PPE) can be used to better
protect the data so that computation can be performed directly on encrypted data without needing
to decrypt them. Thus, the database server does not need to hold the encryption keys, greatly
enhancing system security. Since data are used differently in SQL queries, they can be encrypted
differently to facilitate different types of computations. In a database system, the types of
computations that may be performed on the data are generally attribute-dependent, i.e., all the
data under a certain attribute (the same column of a table) generally have the same types of
computations on them. Thus we classify the attributes based on the potential computations on
them and determine the encryption scheme for each attribute accordingly. In Section 3.2, we
discuss how to classify data attributes and how to select the corresponding encryption schemes.
Then, in Section 3.3, we define some general notations for the HE, OPE, and PPE algorithms and
the specific properties they have to satisfy. To facilitate security analysis of the secure
computation schemes (HE, OPE, and PPE), we define the adversary model including the types of
attacks and the adversary structure, in Section 3.6.
When data are encrypted by HE, OPE, PPE, etc., the server can perform computation
without needing to know the encryption keys. But the keys are still needed in order to decrypt the
24
data (in some cases, the users also need the keys to encrypt the data). However, it is not always
possible to let all the users have the encryption keys. We consider two types of user models, the
single-user and the multi-user systems. In single-user systems, all users (come from the same
organization) have the same access privilege to the whole database. Thus, all users can be treated
as the same single user, and the encryption keys can be distributed to all users. In multi-user
systems, different users have different access privileges to different parts of the database and,
hence, the encryption key should not be given to the users. In Section 3.1 we discuss the detailed
single-user and multi-user models and how to manage the encryption keys in multi-user systems.
In encrypted database, users can still send query requests to the database server and receive
the corresponding responses. But unlike conventional database, in encrypted database, the (plain)
queries have to be transformed into a suitable (encrypted) form. In Section 3.4, we show how to
transform the queries in the encrypted database. After the server receives the encrypted queries,
the queries will be processed directly under the encrypted forms, and the (encrypted) results will
be sent back to the users. However, current secure computation schemes have limitations for
query processing and these limitations are discussed in Section 3.5.
3.1 Single-user and Multiple-user Systems
We consider two types of systems based on the types of users, including single-user and
multi-user systems. In a single-user system, all users have the same privilege in accessing the
database, i.e., every user can access the entire database. Thus, the users can share the same key
without security concerns. Also, all the users are treated as the same single user. In multi-user
systems, users have different access privileges to the data stored on the server. Let DB denote the
server and U = {Uj | 1≤j ≤ u} denote the set of users. Note that single-user systems refer to the
25
case of u = 1 and multi-user systems refer to the case of u > 1. As discussed in Chapter 1, it has
the single key problem when applying HE/OPE/PPE to multi-user systems. In order to solve the
problem, we assume that in multi-user systems a group of key agents in the set KA = {KAj | 1≤j
≤ v} are deployed between the users and the DB to manage the encryption keys and mediate the
communication.
Figure 3.1. Single-user and Multi-user Systems.
In a single-user system, the user and DB will authenticate each other before the user can
access the stored data. In multi-user systems, the key agents will validate the access rights of
each user. Additionally, they are responsible to relay the communication between the user and
DB, and transform the data in the messages based on various developed protocols. We assume
that any two entities in the system are connected by a public communication channel (e.g. the
internet), and the communications are protected by conventional techniques such as encryption,
authentication, digital signature, and public-key infrastructure (PKI).
3.2 Database Model
Without loss of generality, we assume that the server hosts a relational database with the
schema R(A1,…,An) and the set of attributes A = {Ai |1≤i ≤n} in both single-user and multi-
user systems. The data in R need to be encrypted to protect their security. Since different
request
response
DB User
Single-user System
Users
request
response response
request
Key agents DB
Multi-user System
26
attributes may be used differently in SQL queries [40], they should be encrypted by different
algorithms to allow query processing in encrypted form. A typical SQL query has the following
form
SELECT f(Ai)
FROM R
WHERE u < Aj < v AND Ak = w;
where f denotes an aggregation function such as SUM, AVG, etc. As can be seen, the data can be
used in two different ways: (1) arithmetic computation (such as computing the function f on Ai),
(2) range search (such as searching on Aj for data in between u and v) and exact match search
(such as searching on Ak for the data with key w). Some data attributes are simply stored and
retrieved without being operated on. Correspondingly, different encryption schemes are
considered for different types of data attributes: the homomorphic encryption (HE) scheme is
used for arithmetic computation attributes, the order-preserving/prefix-preserving encryption
(OPE/PPE) scheme is used for range search and exact match search attributes, and the
probabilistic encryption (PE) scheme is used for no operation attributes.
3.3 Basic Definitions of the Encryption Algorithms
In the previous section, we have introduced different attributes and the corresponding
encryption schemes required for each type of data attribute. Here, we give the basic definitions
for the encryption schemes, including HE, OPE, PPE, CDE and CPE.
An HE scheme allows the arithmetic operations to be performed directly on encrypted
data. Let (𝒦𝐻𝐸 , 𝐻𝐸 , 𝒟𝐻𝐸) denote an encryption scheme, where 𝒦𝐻𝐸 is the corresponding key
27
generation algorithm, and 𝐻𝐸 and 𝒟𝐻𝐸 are the encryption and decryption algorithms. We
present the formal definition of HE scheme 𝒮𝐻𝐸 = (𝒦𝐻𝐸 , 𝐻𝐸 , 𝒟𝐻𝐸 , 𝐻𝐸) as follows.
Definition 3.2.1: Let 𝒮𝐻𝐸 = (𝒦𝐻𝐸 , 𝐻𝐸 , 𝒟𝐻𝐸 , 𝐻𝐸) be a homomorphic encryption
scheme. 𝒦𝐻𝐸 , 𝐻𝐸 , and 𝒟𝐻𝐸 form an encryption scheme (𝒦𝐻𝐸 , 𝐻𝐸 , 𝒟𝐻𝐸) with 𝒦𝐻𝐸 being the
corresponding key generation algorithm, and 𝐻𝐸 and 𝒟𝐻𝐸 being the encryption and decryption
algorithms such that
𝒟𝐻𝐸 𝐻𝐸 𝑥, 𝑘 = 𝑥
for any plaintext x and key k. 𝐻𝐸 is a polynomial time algorithm such that
𝒟𝐻𝐸 𝐻𝐸 𝐻𝐸 𝑜1, 𝑘 , … , 𝐻𝐸 𝑜𝑙 , 𝑘 , 𝑓 𝑥1, … , 𝑥𝑙 , 𝑘 = 𝑓(𝑜1, … , 𝑜𝑙)
for any plaintexts 𝑜1 , … , 𝑜𝑙 and any polynomial function 𝑓 𝑥1, … , 𝑥𝑙 , where 𝐻𝐸 𝑜𝑖 , 𝑘 are the
original ciphertexts, 1≤i ≤l, and 𝐻𝐸 𝐻𝐸 𝑜1, 𝑘 , … , 𝐻𝐸 𝑜𝑚 , 𝑘 , 𝑓 𝑥1, … , 𝑥𝑙 is the computed
ciphertext.
An OPE scheme preserves the order of the plaintexts so that it allows comparisons to be
performed directly on encrypted data. We present the formal definition of OPE scheme 𝒮𝑂𝑃𝐸 =
(𝒦𝑂𝑃𝐸 , 𝑂𝑃𝐸 , 𝒟𝑂𝑃𝐸) as follows.
Definition 3.2.2 (OPE Scheme [12]): Suppose that 𝒮𝑂𝑃𝐸 = (𝒦𝑂𝑃𝐸 , 𝑂𝑃𝐸 , 𝒟𝑂𝑃𝐸) is a
deterministic symmetric-key encryption scheme, where 𝒦𝑂𝑃𝐸 : {0,1}*{0,1}
* is a key
generation algorithm, 𝑂𝑃𝐸 : [m]{0,1}*[n] is a deterministic symmetric-key encryption
algorithm, and 𝒟𝑂𝑃𝐸 : [n]{0,1}*[m] is a decryption algorithm such that
𝒟𝑂𝑃𝐸 𝑂𝑃𝐸 𝑥, 𝑘 = 𝑥
28
for any plaintext x and key k. We say that 𝒮𝑂𝑃𝐸 is an OPE scheme if 𝑂𝑃𝐸 satisfiesthe“order-
preservingproperty”:
x1 < x2 𝑂𝑃𝐸(𝑥1, 𝑘) < 𝑂𝑃𝐸(𝑥2, 𝑘)
for any x1, x2 [m] and key k.
A PPE algorithm has the prefix-preserving property: the longest common prefix of any two
ciphertexts is of the same length as the longest common prefix of the corresponding plaintexts.
Assume that the plaintexts and ciphertexts are in {0,1}l, {0,1}
l denotes the set of binary strings of
length l. Let LCP(x1, x2) denote the longest common prefix function which returns the longest
common prefix of two binary strings x1 and x2, and |LCP(x1, x2)| denote the length of LCP(x1, x2).
Then the PPE scheme 𝒮𝑃𝑃𝐸 = (𝒦𝑃𝑃𝐸 , 𝑃𝑃𝐸 , 𝒟𝑃𝑃𝐸 ) can be defined as follows.
Definition 3.2.3 (PPE Scheme [78]): A PPE scheme 𝒮𝑃𝑃𝐸 = (𝒦𝑃𝑃𝐸 , 𝑃𝑃𝐸 , 𝒟𝑃𝑃𝐸 ) is a
deterministic symmetric-key encryption scheme, where 𝒦𝑃𝑃𝐸 : {0,1}*{0,1}
* is a key
generation algorithm, 𝑃𝑃𝐸 : {0,1}l×{0,1}
*{0,1}
l is a deterministic symmetric-key encryption
algorithm, and 𝒟𝑃𝑃𝐸 : {0,1}l×{0,1}
*{0,1}
l is a decryption algorithm such that
𝒟𝑃𝑃𝐸 𝑃𝑃𝐸 𝑥, 𝑘 = 𝑥
for any plaintext x and key k. The encryption algorithm 𝑃𝑃𝐸 satisfies the “prefix-preserving”
property:
|LCP(x1,x2)| = |LCP(𝑃𝑃𝐸 (x1,k), 𝑃𝑃𝐸 (x2,k))|
for any x1, x2 {0,1}l and key k.
Let 𝒮𝑃𝐸 = (𝒦𝑃𝐸 , 𝑃𝐸 , 𝒟𝑃𝐸) denote the probabilistic encryption scheme, where 𝒦𝑃𝐸 is the
key generation algorithm, 𝑃𝐸 is the probabilistic encryption algorithm, and 𝒟𝑃𝐸 is the
29
decryption algorithm. 𝒮𝑃𝐸 are well studied and have efficient constructions in the existing
literatures [35, 37, 46].
To protect the data stored on the DB, the relational database R(A1, …, An) will be
encrypted to 𝑅(𝐴11 ,…,𝐴𝑛
𝑛 ). 𝐴𝑖𝑖 denotes that attribute Ai is encrypted by 𝑖 which is selected
from 𝐻𝐸 , 𝑂𝑃𝐸 , 𝑃𝑃𝐸 , and 𝑃𝐸 ,1≤i ≤n.
3.4 Request and Response Protocols
In the query request-and-response process, users send SQL queries to DB, the queries are
processed by DB, and the results are sent back to users. In single-user systems, the user holds all
the encryption keys. When the user wants to send a request, e.g. the SQL query in Section 3.2 to
DB, he/her will first authenticate himself/herself to DB by any conventional authentication
mechanism. Then the user will encrypt the data in the predicate (in the WHERE clause).
Specifically, the range search condition “u < Aj < v”will be encrypted to “𝑂𝑃𝐸 (u) < 𝐴𝑗𝑂𝑃𝐸 <
𝑂𝑃𝐸 (v)”andtheexactmatchsearchcondition “Ak = w”willbeencryptedto“𝐴𝑘𝑂𝑃𝐸 = 𝑂𝑃𝐸 (w)”.
The transformed SQL query
SELECT f(𝐴𝑖𝐻𝐸)
FROM 𝑅
WHERE 𝑂𝑃𝐸 (u) < 𝐴𝑗𝑂𝑃𝐸 < 𝑂𝑃𝐸 (v) AND 𝐴𝑘
𝑂𝑃𝐸 = 𝑂𝑃𝐸 (w);
will be sent to DB. DB will process the query directly on the encrypted data: it first selects all
tuples t satisfying 𝑂𝑃𝐸 (u) < t(𝐴𝑗𝑂𝑃𝐸 ) < 𝑂𝑃𝐸 (v) and t(𝐴𝑘
𝑂𝑃𝐸 ) = 𝑂𝑃𝐸 (w), then performs f on the
data t(𝐴𝑖𝐻𝐸), and sends back the response with the encrypted query result f(t(𝐴𝑖
𝐻𝐸)). Finally, the
user decrypts the encryption to get the query results f(t(Ai)).
30
In multi-user systems, the classical deterministic/probabilistic encryption keys will be
distributed to the authorized users. Thus, the authorized users can access the data belonging to
the exact match search and no operation attributes, just like the situation of single-user systems.
However, as discussed in Chapter 1, it is not secure to distribute the encryption keys of
HE/OPE/PPE to users. Instead, they will be distributed to the key agents such that the key agents
will serve as the mediators between the users and DB. The details of the key distribution, request
protocol, and response protocol of HE/OPE/PPE for multi-user systems will be introduced in
Chapters 4, 5 and 6.
3.5 Limitations of Database Encryption
In the relational database R(A1,…,An), it is possible that both the computation and search
operations may be performed on some attribute Ai,1≤i ≤n. Then the encryption of Ai becomes
more difficult. The ideal solution is to encrypt Ai by an encryption algorithm that is both
homomorphic and order-preserving. Unfortunately, such encryption can be very hard to achieve.
To support both arithmetic and comparison operations, Ai can be encrypted by both HE and OPE
algorithms. In other words, the encrypted database becomes 𝑅(..., 𝐴𝑖𝐻𝐸 , 𝐴𝑖
𝑂𝑃𝐸 , ...). However,
such solution may not work fully. When a arithmetic operation is performed on data with
attribute Ai, the results (if to be stored back to DB) can only be in 𝐴𝑖𝐻𝐸 , not 𝐴𝑖
𝑂𝑃𝐸 .
3.6 Adversary Model
There could be internal or external attackers against the systems. For example, the multi-
user systems’ entities such as users, DB, and key agents, may collude to acquire additional
information that they are not authorized to access. Or an external attacker may eavesdrop on the
31
communications (this type of attack will not succeed since the communications are protected by
conventional techniques (Section 3.1)) or even compromise some system entities to acquire
information. We unify the possible attacking situations to the probabilistic polynomial time
(PPT) adversary 𝒜 who tries to compromise some entities in the system. If some entities are
compromised, 𝒜 may control these entities. Thus, 𝒜 can either follow the protocols (called
passive adversary) or deviate from the protocols (called active adversary). Generally, the active
adversary can be coped with by more complicated mechanisms based on the secure mechanisms
against the passive adversary [35, 36]. We therefore consider passive adversary in this
Dissertation and leave the active adversary to the future work.
For single-user systems, if the adversary 𝒜 compromises the user, then 𝒜 can retrieve the
identity and the encryption keys to access all the data stored on the server DB. Note that such
attack cannot be prevented. Hence, it is suffices to consider the security of single-user systems
under the situation where 𝒜 compromises DB. If it happens, 𝒜 can view all the encrypted data
stored on DB. Thus, the security of single-user systems is equivalent to that of the encryption
algorithms 𝐻𝐸 , 𝑂𝑃𝐸 , 𝑃𝑃𝐸 , and 𝐶𝐸 . Since the security of the classical encryption algorithm has
been studied in many existing works, we will discuss the security of HE, OPE, and PPE in
Chapters 4, 5, and 6, respectively.
For multi-user systems, we assume that the adversary structure (the collection of all the
sets of entities which 𝒜 may compromise) is
𝑍 = 𝑈𝒜 ∪ 𝐾𝐴𝒜 , 𝑈𝒜 ∪ 𝐾𝐴𝒜 ∪ 𝐷𝐵 𝑈𝒜 ⊂ 𝑈, 𝐾𝐴𝒜 ⊂ 𝐾𝐴},
where 𝑈𝒜 is the set of compromised users and 𝐾𝐴𝒜 is the set of compromised key agents (note
that 𝑈𝒜 and 𝐾𝐴𝒜 could be empty). If 𝒜 compromises some users, then 𝒜 can retrieve their
32
identities to access the corresponding data stored on the server DB. Therefore, the security of the
system lies in whether 𝒜 can gain information about the data with authorized users in U−𝑈𝒜.
For the security of 𝑅(𝐴11 ,…,𝐴𝑛
𝑛 ), it suffices to consider the security of each 𝐴𝑖𝑖 ,1≤i ≤n. We
will discuss the security of HE, OPE, and PPE for multi-user systems in Chapters 4, 5, and 6,
respectively.
33
CHAPTER 4
HOMOMORPHIC ENCRYPTION PROTOCOL
Homomorphic encryption (HE) scheme allows the operations on the plaintexts to be directly
performed on the ciphertexts. Consequently, the clients can encrypt their critical data by HE and
outsource the corresponding ciphertexts to the storage servers such that their data can be
processed by the servers without decrypting them. Over the last three decades, the problem of
HE has been studied extensively. The main works on HE algorithms are circuit-based.
Unfortunately, all of the existing HE schemes are circuit-based and have high time complexities.
The non-circuit based HE problem is still an open one. The idea is to construct HE
algorithms with plaintexts over a finite domain such as finite field. Compared to circuit-based
approaches, this approach can be more efficient since it does not require additional circuit
computation and bootstrapping. However, existing non-circuit based HE algorithms have all
either been broken or lack conclusive security evidence.
We construct an efficient non-circuit based encryption scheme that is homomorphic in both
addition and multiplication in this chapter. In Section 4.1, we prove a few preliminary lemmas
related to our algorithm and define the concept of the HE scheme. In Section 4.2 we construct
the HE scheme, which is based on eigenvalues and eigenvectors of matrices. We prove that when
facing an adversary with up to 𝑚 ln poly(𝜆) chosen plaintext and ciphertext pairs, the security of
our algorithm is equivalent to the large integer factorization problem. Here, 𝑚 is any
predetermined constant that is polynomial in the security parameter 𝜆. Note that the security of
34
the commonly used RSA encryption is no harder than the large integer factorization
problem [60]. Thus, our HE scheme can be used in applications where semantic security is not
required and one-wayness security is sufficient. In Section 4.3 we extend the encryption scheme
to multi-user systems by considering multiple user keys and establishing the corresponding
request/response communication protocol. The data stored on the database server are encrypted
using HE with a “master key”. Different user keys are assigned to different users.
Correspondingly, the server holds a matching key with respect to each user key. When sending a
request to the server, Ci encrypts the secret data in the request using the user key, and sends the
ciphertext with the request to the server. The server transforms the encryption key from the user
key to the “master key”by the similarity transform on the ciphertext using the matching key.
Similarly, when the server sends a response to user Ci, the ciphertext is transformed by the
similarity transform using the matching key and sent with the response to Ci. Ci decrypts it with
the user key to obtain d. To avoid collusion of the user and the server to reconstruct the master
key, one or more agents in between Ci and the server using the same key transformation
technique can be introduced to enhance system security. The real world performance of our
algorithm and the communication protocols in multi-user systems is conducted in section 4.4.
Finally we summarize this chapter in Section 4.5.
4.1 Preliminaries
4.1.1 The Ring 𝒁𝑵
35
In our homomorphic encryption scheme, all computations are performed in the ring of
integers 𝑍𝑁 . Let 𝜆 denote the security parameter and let poly(𝜆) denote some fixed polynomial
in 𝜆. We construct 𝑁 as follows. First, 2𝑚 prime numbers 𝑝𝑖 and 𝑞𝑖 of size 𝜆/2 bits are chosen,
where 𝑚 ln poly(𝜆) is the number of plaintext attacks which the algorithm can withstand. Then
let 𝑓𝑖 = 𝑝𝑖𝑞𝑖 and 𝑁 = 𝑓𝑖𝑚𝑖=1 . In the following, we show that such an 𝑁 can be constructed in
polynomial time. Specifically, we show that it is possible to find 2𝑚 prime numbers of 𝜆/2 bits
for some given 𝑚 and 𝜆. Note that 𝑚 is required to be polynomial in 𝜆 to ensure that there are
enough primes of length 𝜆/2 bits.
Lemma 4.1.1: Given 𝑚 and 𝜆, where 𝑚 = 𝑂 poly 𝜆 (more precisely, 𝑚 ≪ 2𝜆−4
), it is
possible to obtain 2𝑚 prime numbers of length 𝜆/2 bits in polynomial time.
Proof. By the Prime Number Theorem, there are approximately 𝑥 ln 𝑥 prime numbers
𝑝 ≤ 𝑥. Consider primes of length 𝑘 bits. There are
2𝑘
ln 2𝑘 −2𝑘−1
ln 2𝑘−1 ≈1
ln 2
2𝑘
𝑘−
2𝑘−1
𝑘−1 =
1
ln 2
𝑘−1 2𝑘−𝑘2𝑘−1
𝑘 𝑘−1
=1
ln 2
2𝑘2𝑘−1−2𝑘−𝑘2𝑘−1
𝑘 𝑘−1 =
1
ln 2
𝑘2𝑘−1−2𝑘
𝑘 𝑘−1 =
1
ln 2
𝑘−2
𝑘 𝑘−1 2𝑘−1
such primes. Since we need to find 2𝑚 primes, at any point there are at least 1
ln 2
𝑘−2
𝑘 𝑘−1 2𝑘−1 −
2𝑚 primes left of length 𝑘 bits. Note that there are 2𝑘 − 2𝑘−1 = 2𝑘−1 integers of length 𝑘 bits,
so at any point the probability that a random number chosen is prime is at least
1
ln 2
𝑘−2
𝑘 𝑘−1 2𝑘−1−2𝑚
2𝑘−1=
1
ln 2
𝑘−2
𝑘 𝑘−1 −
2𝑚
2𝑘−1 . For 𝑘 =𝜆
2 this becomes
1
ln 2
𝜆
2−2
𝜆
2 𝜆
2−1
−2𝑚
2𝜆2−1
=1
ln 2
2𝜆−8
𝜆 𝜆−2 −
𝑚
2𝜆−4 . If 𝑚 is
polynomial in 𝜆 (i.e. 𝑚 ≪ 2𝜆−4
), then this probability is nonnegligible. Since it is possible to
36
check whether a number is prime in polynomial time, each of the primes can be found in
polynomial time. Since 𝑚 is polynomial in 𝜆, the number of primes we must find is polynomial
in 𝜆, so the time complexity of the algorithm is polynomial in 𝜆.
Theoretically, for 𝜆 = 1024, 𝑚 is only bounded by 𝑚 ≪ 21024−4
= 2510 . Therefore, for
practical purposes, 𝑚 can be chosen to be arbitrarily large.
Next, we show that factoring any of the 𝑓𝑖 is infeasible if large integer factorization
(factoring numbers of the form 𝑓 = 𝑝𝑞 for large primes 𝑝 and 𝑞) is infeasible.
Lemma 4.1.2: Suppose that large integer factorization is infeasible. Let 𝑓𝑖 = 𝑝𝑖𝑞𝑖 for large
primes 𝑝𝑖 and 𝑞𝑖 , where 1 ≤ 𝑖 ≤ 𝑚. Then given 𝐹 = 𝑓𝑖 𝑖=1𝑚 , it is infeasible to factor any of the
𝑓𝑖 ∈ 𝐹, i.e. there does not exist a PPT (probabilistic polynomial time) algorithm that can return a
factor of 𝑓𝑖 for any 1 ≤ 𝑖 ≤ 𝑚 with nonnegligible probability.
Proof. Assume that a PPT algorithm 𝐴1 exists that, given a set of large integers 𝐹 =
𝑓𝑖 𝑖=1𝑚 , can randomly factor one of the 𝑓𝑖 with some probability 𝑝′ . Let 𝐴2 be an algorithm to
factor some large integer 𝑓 = 𝑝𝑞 for primes 𝑝 and 𝑞 as follows. First construct the set 𝐹 =
{𝑓1, … , 𝑓𝑚−1, 𝑓𝑚 = 𝑓} where 𝑓𝑖 is random for 1 ≤ 𝑖 ≤ 𝑚 − 1, and then run 𝐴1 on this set 𝐹, and
return the result. Note that 𝐴1 successfully factors one of the 𝑓𝑖 with probability 𝑝′ , and the
probability that 𝑖 = 𝑚 is 1
𝑚. Thus 𝐴2 is successful with nonnegligible probability
𝑝 ′
𝑚. Clearly 𝐴2 is
a PPT algorithm, completing the proof.
37
4.1.2 Matrices over 𝒁𝑵
In the algorithm, we need to randomly choose an invertible matrix 𝑘 ∈ 𝐺𝐿4 𝑍𝑁 . Lemma
4.1.3 demonstrates that an arbitrary matrix over 𝑍𝑁 is likely to be invertible, so that it is possible
to choose 𝑘 in polynomial time. For convenience, in the next lemma, we define 𝑁 = 𝑝𝑖𝑚𝑖=1 ,
since our results do not depend on the prime factors of 𝑁 coming in pairs.
Lemma 4.1.3: Let 𝑁 = 𝑝𝑖𝑚𝑖=1 , where the 𝑝𝑖 are prime and 𝑚 = 𝑂 poly 𝜆 . A random
matrix 𝑇 ∈ 𝑀4 𝑍𝑁 is invertible with a high probability.
Proof. Note that an 𝑙 × 𝑙 matrix 𝑇 ′ = (1 , … ,𝑙)𝑡 over a field 𝑍𝑝 , where 𝑡 denotes the
transpose of a matrix and 𝑖 ∈ 𝑍𝑝 𝑙, is not invertible if and only if ∃𝑐𝑖 ∈ 𝑍𝑝 and 𝑐𝑖0
≠ 0 for
some 𝑖0, s.t. 𝑐𝑖𝑖𝑙𝑖=1 = 0. The condition is equivalent to 𝑖0
= 𝑐𝑖 ′𝑖𝑖≠𝑖0, where 𝑐𝑖
′ = −𝑐𝑖0𝑐𝑖.
Since 𝑇 ′ has 𝑙2 entries, totally there are 𝑝𝑙2 possibilities for 𝑇 ′ . Since 𝑖 ≠ 𝑖0, there are a total of
𝑝𝑙−1 possibilities for 𝑐𝑖 ′ , and 𝑝𝑙2−𝑙 possibilities for 𝑖 . Hence, given 𝑇′ , the probability that
𝑖0= 𝑐𝑖 ′𝑖𝑖≠𝑖0
is 𝑝 𝑙−1𝑝 𝑙2−𝑙
𝑝 𝑙2 =1
𝑝. Note that 1 ≤ 𝑖0 ≤ 𝑙, so 𝑇′ is not invertible with probability at
most 𝑙
𝑝.
We show that a matrix 𝑇 over the ring 𝑍𝑁 is invertible if and only if it is invertible over
every field 𝑍𝑝𝑖, where 1 ≤ 𝑖 ≤ 𝑚. Note that if 𝑇 is invertible over 𝑍𝑁 , then we can construct
𝑇−1 mod 𝑝𝑖 . Since 𝑇𝑇−1 = 𝐼 (the identity matrix), 𝑇𝑇−1 = 𝐼 mod 𝑝𝑖 , so 𝑇−1 mod 𝑝𝑖 is the
inverse of 𝑇 mod 𝑝𝑖 , so 𝑇 is invertible over each field 𝑍𝑝𝑖. Now assume that 𝑇 is invertible over
each field 𝑝𝑖 with inverse 𝑆𝑖 , and consider the set of linear congruences 𝑆 = 𝑆𝑖 mod 𝑝𝑖 , which
38
has a unique solution over 𝑍𝑁 by the Chinese Remainder Theorem. Then we have the set of
linear congruences 𝑇𝑆 = 𝑇𝑆𝑖 mod 𝑝𝑖 = 𝐼 mod 𝑝𝑖 . But 𝐼 is clearly a solution to these
congruences, and by the Chinese Remainder Theorem, this solution is unique over 𝑍𝑁 , so 𝑇𝑆 =
𝐼 , so 𝑇 is invertible with inverse 𝑆 . As shown above, this means that 𝑇 ∈ 𝑀4 𝑍𝑁 is not
invertible with negligible probability
1 − 1 −4
𝑝𝑖
𝑚
𝑖=1
≤ 4
𝑝𝑖
𝑚
𝑖=1
≤4𝑚
min1≤𝑖≤𝑚
𝑝𝑖.
This completes the proof.
4.2 The Homomorphic Encryption Scheme
In this section, we present a fully homomorphic encryption scheme (𝒦, , 𝒟, ) , and
prove that the algorithm is secure under the assumption that it is infeasible to factor large
integers of the form 𝑓 = 𝑝𝑞 for large primes 𝑝 and 𝑞. In Subsection 4.2.1 we introduce the idea
of the design. In Subsection 4.2.2, we discuss the encryption, decryption, and key generation
algorithms (𝒦, , 𝒟). Subsection 4.2.3 proves the security of (𝒦, , 𝒟) and Subsection 4.2.4
derives time complexity of the encryption and decryption schemes. Subsection 4.2.5 gives the
computation algorithm and proves the correctness of . The complexity of is derived in
Subsection 4.2.6.
4.2.1 Design Concept
We first discuss the design idea of our homomorphic encryption scheme. We start from the
Rabin’sencryptionalgorithm.Givenaplaintextx, the encryption algorithm is
39
𝑥 = 𝑥2 mod 𝑁
where 𝑁 = 𝑓 = 𝑝𝑞.AlthoughRabin’sencryptionalgorithmishomomorphicinmultiplication,
it is not homomorphic in addition. Thus, we can try to generalize the ciphertext domain from ZN
to ring of matrices over ZN. In particular, consider the encryption algorithm
1 𝑥 = 𝑥 00 𝑟
mod 𝑁.
The homomorphic properties in addition and multiplication can be easily verified.
However, since x is the eigenvalue of the eigenvector 𝑣1,0 = 1,0 𝑡 , the adversary can easily
reverse 𝑥 given the ciphertext by solving the linear equation system 1(𝑥) 𝑣1,0 = 𝑥 𝑣1,0 .
To cope with the problem above, we apply a randomly selected similarity transform 𝑘 to
𝑥 00 𝑟
, which is the ciphertext in 1 and we call it the pre-transformed cipher from now. Note
that the eigenvector corresponding to 𝑥 is also transformed by k. As a result, the encryption
algorithm becomes
2(x, k) = 𝑘−1 𝑥 00 𝑟
𝑘 mod 𝑁
where k is a randomly selected 2 × 2 invertible matrix. We give some informal reasoning as to
why such an algorithm should be secure. Note that the similarity transformation transforms the
eigenvector of 𝑥 from 𝑣1,0 to 𝑘−1 𝑣1,0 . Since the adversary does not know the key 𝑘, he/she does
not know the transformed eigenvector, so he/she cannot establish the linear equation system to
obtain the plaintext. Also, although the adversary can derive the characteristic equation
det 𝑧𝐼 – 𝐸2 𝑥, 𝑘 = 0 mod 𝑁 𝑧2 – 𝑥 + 𝑟 𝑧 + 𝑥𝑟 = 0 mod 𝑁,
40
it is infeasible for the adversary to solve the quadratic equation, because this is equivalent to the
factorization of 𝑁 (thesecurityofRabin’sencryptionalgorithmisalsobasedontheinfeasibility
of solving quadratic equation in 𝑍𝑁).
Unfortunately, the encryption algorithm 2 cannot resist the chosen plaintext attack.
Suppose that the adversary gets the plaintext and ciphertext pair
𝑥, 2 𝑥, 𝑘 = 𝑘−1 𝑥 00 𝑟
𝑘
Then the adversary can establish the equation 2 𝑥, 𝑘 − 𝑥𝐼 𝑣 = 0 , and derive the
transformed eigenvector because
(2(𝑥, 𝑘) − 𝑥𝐼) 𝑣 = 0 𝑘−1 𝑥 00 𝑟
− 𝑥𝐼 𝑘𝑣 = 0 0 00 𝑟 − 𝑥
𝑘𝑣 = 0
k𝑣 =𝑣1,0 𝑣 = 𝑘−1 𝑣1,0 .
Consequently, the adversary can solve for 𝑦, if given an additional ciphertext 𝑦, 𝑘 =
𝑘−1 𝑦 00 𝑟
𝑘 mod 𝑁, by solving the linear equation system (𝑦, 𝑘) 𝑣 = 𝑦 𝑣 .
One remedy is to use different keys to encrypt different plaintexts, but then the
homomorphic properties in addition and multiplication is lost. Also, increasing the number of
primes in N cannot improve security since the attack does not depend on the number of primes in
N.
To improve the encryption algorithm so that it can withstand the chosen plaintext attack,
we associate the eigenvalue 𝑥 with two eigenvectors 𝑣1 and 𝑣2 instead of one. We choose the
same 𝑣1 for all plaintexts so that the homomorphic properties in addition and multiplication are
guaranteed. On the other hand, 𝑣2 cannot be the same for all plaintexts; otherwise, any linear
41
combination of 𝑣1 and 𝑣2 is an eigenvector of all plaintexts, becoming the same as 2. Instead,
we randomly choose 𝑣2 following a probability distribution 𝐷 (to be given). Note that since 𝑥 is
associated with two eigenvectors and there are choices in 𝑣2 , we need to work in matrices of a
higher dimension. For example, we can have 𝑣1 = 1,0,0,0 and 𝑣2 can be randomly chosen
between 0,1,0,0 𝑡 and 0,0,1,0 𝑡 following D.
When encrypting a plaintext, we construct the pre-transformed ciphertext 𝐶 with 𝑣1 and 𝑣2
as eigenvectors of 𝐶 corresponding to eigenvalue 𝑥. Then we perform a similarity transformation
with key 𝑘. Let 3 denote this encryption scheme.
Now, the adversary has to derive the transformed eigenvector 𝑘−1 𝑣1 in order to
compromise the encryption scheme. Suppose that the adversary gets a single plaintext and
ciphertext pair. Then he/she cannot derive 𝑘−1 𝑣1 because any linear combination of 𝑘−1𝑣1 and
𝑘−1 𝑣2 is an eigenvector of 𝐶 with eigenvalue x.
Next consider the case where the adversary gets two pairs of plaintexts and ciphertexts. If
𝑣2 = 𝑣1 , then, same as above, the adversary cannot derive 𝑣1 . If 𝑣2 ≠ 𝑣1 , then, the probability of
success with which the adversary can derive 𝑘−1 𝑣1 after an attack with 𝑚′ chosen plaintexts
follows a probabilistic distributions related to 𝐷. Without loss of generality, let 𝑝𝑚’ denote the
probability for the adversary to derive 𝑣1 after an attack with 𝑚′ chosen plaintext and ciphertext
pairs. To improve the strength of encryption, we want to apply 3 multiple times. More
precisely, let 𝑁 = 𝑓𝑖𝑚𝑖=1 , where 𝑓𝑖 = 𝑝𝑖𝑞𝑖 . For each 𝑖, we apply 3 to encrypt the plaintext 𝑥
over 𝑍𝑓𝑖. Then, the ciphertext over 𝑍𝑁 is obtained by applying the Chinese Remainder Theorem
to the individual encryption results over 𝑍𝑓𝑖, for all 𝑖. Let 4 denote this new encryption scheme.
42
Now, the adversary has to derive the corresponding eigenvectors over all 𝑍𝑓𝑖 in order to
reverse the plaintext over 𝑍𝑁 . The probability for the adversary to derive all corresponding
eigenvectors over all 𝑍𝑓𝑖 after an attack with 𝑚’ chosen plaintexts decreases to 𝑝𝑚’
𝑚 . Hence, by
carefully selecting parameters 𝐷 and 𝑚 , 4 can resist the chosen plaintext attack with a
predetermined number 𝑚’ of plaintexts.
The discussion above is to give a high level concept of our design of a non-circuit based
homomorphic encryption scheme. We will formally prove the security of the encryption scheme
by reduction to the large integer factorization problem. In particular we will show that if there
exists a PPT algorithm that can reverse the ciphertext with nonnegligible probability after the
chosen plaintext attack with 𝑚’ plaintexts, then there exists a PPT algorithm to factor 𝑁.
4.2.2 Encryption and Decryption Algorithms
To formally set up the encryption algorithm following the concept for constructing 4, we
first pick a random 4 × 4 invertible matrix 𝑘 as the key for our encryption scheme. This can be
done efficiently as proved in Lemma 4.1.3. We then construct the diagonal matrix
diag(𝑥, 𝑎, 𝑏, 𝑐), where 𝑎, 𝑏, and 𝑐, the solutions to a sets of linear congruences depending on 𝑥
and a random value 𝑟 ∈ 𝑍𝑁 , are computed using the Chinese Remainder Theorem. The
corresponding ciphertext 𝐶 is the similarity transform of this matrix by 𝑘 ,
𝐶 = 𝑘−1diag(𝑥, 𝑎, 𝑏, 𝑐)𝑘. The encryption algorithm is formally presented as follows.
Given 𝑓𝑖 𝑖=1𝑚 with 𝑁 = 𝑓𝑖
𝑚𝑖=1 , and a plaintext 𝑥 ∈ 𝑍𝑁 , we encrypt 𝑥 into a ciphertext
𝐶 ∈ 𝑀4 𝑍𝑁 as follows:
43
1. Choose a random value 𝑟 ∈ 𝑍𝑁.
2. Define a set of numbers 𝑎𝑖 , 𝑏𝑖 , and 𝑐𝑖 , for 1 ≤ 𝑖 ≤ 𝑚, as follows. For each 𝑖, exactly
one of 𝑎𝑖 , 𝑏𝑖 and 𝑐𝑖 , is equal to 𝑥. Let 𝑎𝑖 = 𝑥 with probability 1 −1
𝑚+1, 𝑏𝑖 = 𝑥 with
probability 1
2(𝑚+1), and 𝑐𝑖 = 𝑥 with probability
1
2(𝑚+1). Set the other two values equal
to 𝑟. This way, for each 𝑖, one of the values 𝑎 , 𝑏, and 𝑐, equals 𝑥 and the other two
equal 𝑟.
3. By the Chinese Remainder Theorem, let 𝑎 , 𝑏 , 𝑐 , be the solution to the set of
simultaneous congruences 𝑎 = 𝑎𝑖 mod 𝑓𝑖 , 𝑏 = 𝑏𝑖 mod 𝑓𝑖 , and 𝑐 = 𝑐𝑖 mod 𝑓𝑖 , for
1 ≤ 𝑖 ≤ 𝑚.
4. Let 𝐶 = 𝑘−1diag(𝑥, 𝑎, 𝑏, 𝑐, )𝑘.
Given ciphertext 𝐶 and key 𝑘 , the decryption algorithm compute the plaintext 𝑥 =
𝑘𝐶𝑘−1 00.
The correctness of the encryption scheme is proved in Lemma 4.2.1.
Lemma 4.2.1: The encryption scheme (𝒦, , 𝒟) is correct.
Proof. Note that 𝑘−1 −1 𝑘−1diag 𝑥, 𝑎, 𝑏, 𝑐 𝑘 𝑘−1 00 = diag(𝑥, 𝑎, 𝑏, 𝑐)00 = 𝑥.
The following is an example of our encryption and decryption algorithms. Let 𝑚 = 2, and
consider the values 𝑓1 = 3 × 5 = 15 and 𝑓2 = 2 × 7 = 14 , so that 𝑁 = 210 . We randomly
choose key
𝑘 =
17 4491 121
169 12684 85
85 710 85
119 25201 44
with inverse 𝑘−1 =
35 3144 57
29 0113 29
74 15759 27
37 194152 103
.
44
We encrypt the plaintext 42 ∈ 𝑍210 using our encryption algorithm. First we randomly
choose 𝑟 = 91 ∈ 𝑍210 . Next, we let 𝑎1 = 𝑥 = 42 and 𝑏1 = 𝑐1 = 𝑟 = 91, and let 𝑎2 = 𝑐2 = 𝑟 =
91 and 𝑏2 = 𝑥 = 42. Then we use the Chinese Remainder Theorem to solve for the values 𝑎 , 𝑏,
and 𝑐 where 𝑎 = 42 mod 15 and 𝑎 = 91 mod 14 , and where 𝑏 = 91 mod 15 and 𝑏 =
42 mod 14 , and where 𝑐 = 91 mod 15 and 𝑐 = 91 mod 14 , and get 𝑎 = 147 , 𝑏 = 196 , and
𝑐 = 91. Finally we construct our ciphertext,
𝐶 = 𝑘−1diag 42, 147, 196, 91 𝑘
=
35 3144 57
29 0113 29
74 15759 27
37 194152 103
∙
42 00 147
0 00 0
0 00 0
196 00 91
∙
17 4491 121
169 12684 85
85 710 85
119 25201 44
=
77 9135 84
154 3549 189
175 13335 98
140 11949 175
Then we decrypt our ciphertext and extract the plaintext 𝑥,
𝑥 = 𝑘𝐶𝑘−1 00
=
17 4491 121
169 12684 85
85 710 85
119 25201 44
∙
77 9135 84
154 3549 189
175 13335 98
140 11949 175
∙
35 3144 57
29 0113 29
74 15759 27
37 194152 103
00
=
42 00 147
0 00 0
0 00 0
196 00 91
00
= 42.
45
4.2.3 Security of the Encryption Scheme
We proceed with the proof of security by reductions. In Lemma 4.2.2 we establish the
existence of matrices 𝑘𝑖 which will be used in the proof of security because, as will be shown in
Lemma 4.2.3, given the ciphertext 𝐶 , an adversary cannot distinguish between (𝑥, 𝑘) and
(𝑦, 𝑘𝑖𝑘). Lemma 4.2.4 demonstrates that this fact implies that, if a PPT algorithm exists to
extract the plaintext 𝑥, with nonnegligible probability, from the ciphertext 𝐶 without the key 𝑘,
then a PPT algorithm exists to factor 𝑓𝑖 ∈ 𝐹, for some arbitrarily chosen 𝐹 = 𝑓𝑖 𝑖=1𝑚 , for some 𝑖,
with nonnegligible probability. Then, by Lemma 4.1.2, there would exist a PPT algorithm for
large integer factorization with nonnegligible probability. Lemmas 4.2.5 and 4.2.6 prove that,
given 𝑚′ ≤ 𝑚 plaintext and ciphertext pairs 𝑥𝑙 , 𝐶𝑙 𝑙=1𝑚 ′
, if a PPT algorithm exists to find the
plaintext 𝑥 given the corresponding ciphertext 𝐶 with nonnegligible probability, then a PPT
algorithm exists to factor 𝑓𝑖 ∈ 𝐹 = 𝑓𝑖 𝑖=1𝑚 ′
, implying that there exists a PPT algorithm for large
integer factorization with nonnegligible probability. Finally, Theorem 4.2.7 proves the same
result with the weaker condition 𝑚′ ≤ 𝑚 ln poly(𝜆) . In other words, if given less than
𝑚 ln poly 𝜆 plaintext ciphertext pairs, then decryption of a ciphertext without the key is at least
as hard as the well known integer factorization problem.
Lemma 4.2.2: For 1 ≤ 𝑖 ≤ 𝑁 , there exists a unique element 𝑘𝑖 ∈ 𝐺𝐿4 𝑍𝑁 so that
𝑘𝑖 =
0 11 0
0 00 0
0 00 0
0 11 0
mod 𝑝𝑖 , 𝑘𝑖 = 𝐼 mod 𝑞𝑖 , and 𝑘𝑖 = 𝐼 mod 𝑓𝑗 for 𝑗 ≠ 𝑖 where 𝐼 is the identity
matrix in 𝐺𝐿4 𝑍𝑁 . Additionally, 𝑘𝑖 = 𝑘𝑖−1.
46
Proof. The first claim follows from the Chinese Remainder Theorem. The fact that
𝑘𝑖 = 𝑘𝑖−1 follows from the fact that
𝑘𝑖2 =
0 11 0
0 00 0
0 00 0
0 11 0
0 11 0
0 00 0
0 00 0
0 11 0
mod 𝑝𝑖 =
1 00 1
0 00 0
0 00 0
1 00 1
mod 𝑝𝑖 = 𝐼 mod 𝑝𝑖
and the trivial fact that 𝐼 = 𝐼−1.
Lemma 4.2.3: Given plaintext 𝑥, key 𝑘 and random element 𝑟, there exist 𝑦 and 𝑠 such
that 𝑥, 𝑘 = 𝑦, 𝑘𝑖𝑘 . Additionally, 𝑦 − 𝑥 divides 𝑞𝑖 , and 𝑦 − 𝑥 does not divide 𝑝𝑖 with
probability 1
𝑚+1 1 −
1
𝑝 𝑖 .
Proof. Note that 𝐶 ′ = diag(𝑥, 𝑎, 𝑏, 𝑐) satisfies the
congruences 𝐶 ′ = diag(𝑥, 𝑎𝑖 , 𝑏𝑖 , 𝑐𝑖) mod 𝑓𝑖 , so
𝑘𝑖𝐶′𝑘𝑖
−1 mod 𝑝𝑖 = 𝑘𝑖diag(𝑥, 𝑎𝑖 , 𝑏𝑖 , 𝑐𝑖)𝑘𝑖−1 mod 𝑝𝑖
=
0 11 0
0 00 0
0 00 0
0 11 0
𝑥 00 𝑎𝑖
0 00 0
0 00 0
𝑏𝑖 00 𝑐𝑖
0 11 0
0 00 0
0 00 0
0 11 0
mod 𝑝𝑖
= diag(𝑎𝑖 , 𝑥, 𝑐𝑖 , 𝑏𝑖) mod 𝑝𝑖
Also, 𝑘𝑖𝐶′𝑘𝑖
−1 mod 𝑞𝑖 = 𝐼𝐶′𝐼 mod 𝑞𝑖 = 𝐶′ mod 𝑞𝑖 and similarly 𝑘𝑖𝐶′𝑘𝑖
−1 mod 𝑓𝑗 =
𝐶′ mod 𝑓𝑗 for 𝑗 ≠ 𝑖 . Let 𝐷′ = diag(𝑦, 𝑎′ , 𝑏′ , 𝑐′) . Then the set of congruences
𝐷′ = diag(𝑎𝑖 , 𝑥, 𝑐𝑖 , 𝑏𝑖) mod 𝑝𝑖 , 𝐷′ = 𝐶′ mod 𝑞𝑖 , and 𝐷′ = 𝐶′ mod 𝑓𝑗 has a unique solution by the
Chinese Remainder Theorem. Note that this solution satisfies 𝐷′ = 𝑘𝑖𝐶′𝑘𝑖
−1, so 𝑘𝑖−1𝐷′𝑘𝑖 = 𝐶′
and 𝑘𝑖𝑘 −1𝐷′ 𝑘𝑖𝑘 = 𝑘−1𝑘𝑖−1𝐷′𝑘𝑖𝑘 = 𝑘−1𝐶′𝑘 = (𝑥, 𝑘) . But this means that 𝑥, 𝑘 =
47
𝑘𝑖𝑘 −1𝐷′ 𝑘𝑖𝑘 = 𝑦, 𝑘𝑖𝑘 , proving the first claim. Note that additionally 𝑦 = 𝑥 mod 𝑞𝑖 so that
𝑦 − 𝑥 divides 𝑞𝑖 , but 𝑦 = 𝑎𝑖 mod 𝑝𝑖 . Note that 𝑎𝑖 = 𝑟 (i.e., 𝑎𝑖 ≠ 𝑥 ) with probability 1
𝑚+1.
Additionally, 𝑟 ≠ 𝑥 mod 𝑝𝑖 with probability 1 −1
𝑝 𝑖 since 𝑟 is chosen uniformly randomly on 𝑍𝑁 ,
and thus with probability 1 −1
𝑝𝑖, 𝑟 − 𝑥 does not divide 𝑝𝑖 . Thus 𝑦 − 𝑥 divides 𝑞𝑖 , and 𝑦 − 𝑥 does
not divide 𝑝𝑖 with probability 1
𝑚+1 1 −
1
𝑝𝑖 , proving the second claim.
Lemma 4.2.4: If a PPT algorithm 𝐴𝑑(𝐶) exists that, given 𝐶 = (𝑥, 𝑘), returns 𝑥 with
probability 𝑝 , then there exists a PPT algorithm 𝐴𝑓 to return a factor 𝑓𝑖 for some 𝑖 with
probability 𝑝′ =𝑝
𝑚+1 1 −
1
𝑝𝑖 .
Proof. Let the algorithm 𝐴𝑓 first choose a random plaintext 𝑥 and a random key 𝑘, and
construct ciphertext 𝐶 = (𝑥, 𝑘) using the encryption scheme. Then, 𝐴𝑓 runs 𝐴𝑑(𝐶) to obtain
value 𝑜. Then 𝐴𝑓 returns gcd 𝑓𝑖 , 𝑜 − 𝑥 . Note that 𝐴𝑓 is clearly a PPT algorithm assuming that
𝐴𝑑 is a PPT algorithm. We also show that 𝐴𝑓 is correct with some probability 𝑝′ . Note that since
𝑥, 𝑘 = (𝑦, 𝑘𝑖𝑘), 𝐴𝑑 will also return 𝑦 with probability 𝑝. Note that 𝑞𝑖 = gcd(𝑓𝑖 , 𝑜 − 𝑥) =
gcd(𝑓𝑖 , 𝑦 − 𝑥) with probability 1
𝑚+1 1 −
1
𝑝𝑖 since 𝑦 − 𝑥 divides 𝑞𝑖 but not 𝑝𝑖 with this
probability. Since 𝑓𝑖 has only factors 𝑝𝑖 and 𝑞𝑖 , the GCD of this pair of numbers is thus 𝑞𝑖 . If p is
nonnegligible, then 𝐴𝑓 is thus correct with nonnegligible probability 𝑝′ =𝑝
𝑚+1 1 −
1
𝑝𝑖 . Clearly
𝐴𝑓 is a PPT algorithm if 𝐴𝑑 is a PPT algorithm, completing the proof.
48
Lemma 4.2.5: Let 𝑚′ be the number of plaintext and ciphertext pairs the adversary has
access to. If for some 𝑚′ there exists an algorithm 𝐴𝑑 𝐶 = 𝑥, 𝑘 , 𝑥𝑙 , 𝐶𝑙 = 𝑥𝑙 , 𝑘 𝑙=1
𝑚 ′
such that, given 𝑚′ chosen plaintext and ciphertext pairs 𝑥𝑙 , 𝐶𝑙 and a ciphertext 𝐶, returns 𝑥
with some probability 𝑝, then there exists a PPT algorithm 𝐴𝑓 using 𝐴𝑑 as an oracle to factor 𝑓𝑖
for some 𝑖 with probability
𝑝′ = 𝑝 1 −1
𝑝𝑖 1 − 1 −
1
𝑚 + 1 1 −
1
𝑚 + 1
𝑚 ′
𝑚
.
Proof. As before, let the algorithm 𝐴𝑓 first choose random plaintexts 𝑥𝑖 ∈ 𝑍𝑁 for 1 ≤ 𝑖 ≤
𝑚′ , an additional random plaintext 𝑥 ∈ 𝑍𝑁 , and a random key 𝑘 , and construct ciphertexts
𝐶𝑖 = (𝑥𝑖 , 𝑘) and 𝐶 = (𝑥, 𝑘) using the encryption scheme. Then let 𝐴𝑓 run 𝐴𝑑 𝐶 to obtain 𝑜.
Then let 𝐴𝑓 return gcd 𝑓𝑖 , 𝑜 − 𝑥 as a factor of 𝑓𝑖 for some 𝑖. We find the probability that 𝐴𝑓
succeeds in factoring 𝑓𝑖 for some 𝑖 . Consider the case where, for some 𝑖0 , for all of the
ciphertexts 𝐶𝑙 , 𝑎𝑙 ,𝑖0= 𝑥𝑙 , and where for the ciphertext 𝐶, 𝑎𝑖0
= 𝑟. Here 𝑎𝑙 ,𝑖0 refers to the 𝑎𝑖0
in
the encryption process of 𝐶𝑙 , and 𝑎𝑖0 refers to the 𝑎𝑖0
in the encryption process of 𝐶. If this is the
case, then as seen in the proof of lemma 4.2.4, 𝑥𝑙 , 𝑘 = 𝑥𝑙 , 𝑘𝑖0𝑘 (since 𝑎𝑙 ,𝑖0
is assumed to
equal 𝑥𝑙 ), so the adversary cannot differentiate 𝑘 from 𝑘𝑖0𝑘. Additionally, as in lemma 4.2.4,
𝑥, 𝑘 = 𝑦, 𝑘𝑖0𝑘 for some 𝑦 for which 𝑦 − 𝑥 divides 𝑞𝑖0
. Also, 𝑦 = 𝑎𝑖0mod 𝑝𝑖0
=
𝑟 mod 𝑝𝑖0, so 𝑜 − 𝑥 = 𝑦 − 𝑥 does not divide 𝑝𝑖0
with probability 1 −1
𝑝𝑖0
. Since the adversary
cannot differentiate 𝑘 from 𝑘𝑖0𝑘 , if running 𝐴𝑑(𝐶) returns 𝑥 with probability 𝑝 , it must also
return 𝑦 with probability 𝑝 . Then the probability that 𝑜 = 𝑦 is 𝑝 , and if this happens the
49
probability that 𝑜 − 𝑥 = 𝑦 − 𝑥 divides 𝑞𝑖 and does not divide 𝑝𝑖 is 1 −1
𝑝𝑖0
. Thus if 𝑖0 exists then
𝐴𝑓 succeeds with probability 𝑝 1 −1
𝑝𝑖 .
To find the probability that such an 𝑖0 exists, note that the probability that, for a specific 𝑙
and 𝑖 , 𝑎𝑙 ,𝑖 = 𝑥𝑙 is 1 −1
𝑚+1, so the probability that for all 𝑙 , 𝑎𝑙 ,𝑖 = 𝑥𝑙 is 1 −
1
𝑚+1
𝑚 ′
. The
probability that, additionally, 𝑎𝑖 = 𝑟 is 1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
. Then the probability that this does not
occur for a given 𝑖 is 1 −1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
, so the probability that this does not occur for any 𝑖
is 1 −1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
𝑚
, and finally the probability that this occurs for some 𝑖 is thus
1 − 1 −1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
𝑚
.
Finally we integrate the derivations and get
𝑝′ = 𝑝 1 −1
𝑝𝑖 1 − 1 −
1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
𝑚
.
Lemma 4.2.6: Assuming that the probability to factor an 𝜆 bit integer in polynomial time
is negligible, the encryption scheme is secure for 𝑚′ ≤ 𝑚.
Proof. As seen the equation of 𝑝′ in Lemma 4.2.5, 1 −1
𝑝𝑖 is nonnegligible. If 1 −
1 −1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
𝑚
is further nonnegligible, then 𝑝 is negligible if and only if 𝑝′ is
negligible. Thus, it implies that the encryption scheme is secure. Otherwise, if a PPT adversary
can attack the scheme with nonnegligible success probability 𝑝, then there will exist a PPT
50
algorithm to factor some integer with nonnegligible success probability 𝑝′ , which contradicts to
Lemma 4.1.2. Therefore, the encryption scheme is secure if 1 − 1 −1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
𝑚
is
nonnegligible.
Note that 𝑑
𝑑𝛼 1 −
1
𝛼
𝛼
= 𝛼 1 −1
𝛼
𝛼−1
𝛼−2 = 1−
1
𝛼 𝛼−1
𝛼> 0 for 𝛼 > 0. Thus the function is
monotonically increasing, so on 𝑍+ it achieves its minimum at 𝛼 = 1, where it takes the value
1 −1
1+1
1+1
=1
4. Additionally, since lim𝛼→∞ 1 −
1
𝛼
𝛼
=1
𝑒 and since 1 −
1
𝛼
𝛼
is
monotonically increasing, 1 −1
𝛼
𝛼
≤1
𝑒. Then we obtain
1 − 1 −1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
𝑚
≥ 1 − 1 −1
𝑚+1 1 −
1
𝑚+1
𝑚+1
𝑚
≥ 1 − 1 −1
4 𝑚+1
𝑚
= 1 − 1 −1
4 𝑚+1
4 𝑚+1
𝑚
4 𝑚 +1
≥ 1 − 1
𝑒
𝑚
4 𝑚 +1 ≥ 1 −
1
2
1
4 1+1
= 1 − 1
2
1
8≥ 1 − .92 = .08
We now prove the security of our homomorphic encryption algorithm.
Theorem 4.2.7: The bound of 𝑚′ in Lemma 4.2.6 can be weakened to 𝑚′ ≤ 𝑚 ln poly(𝜆),
where poly(𝜆) denotes some fixed polynomial in 𝜆.
51
Proof. As shown in Lemma 4.2.6, the encryption scheme is secure if 1 − 1 −
1𝑚+11−1𝑚+1𝑚′𝑚 is nonnegligible. In other words, we require that
1 − 1 −1
𝑚+1 1 −
1
𝑚+1
𝑚 ′
𝑚
=1
poly 𝜆 for some fixed polynomial in 𝜆. Then
𝑚′ =
ln 𝑚 + 1 1 − (1 −1
poly(𝜆))
1𝑚
ln 1 −1
𝑚 + 1
Before estimating the lower bound of 𝑚′ , we first derive two inequalities. Note that
𝑑
𝑑𝛼 ln 1 − 𝛼 +
𝛼
1−𝛼 =
𝛼
1−𝛼 2 > 0 for 0 < 𝛼 < 1, and ln 1 − 𝛼 +𝛼
1−𝛼 |𝛼=0 = 0.
Therefore ln 1 − 𝛼 > −𝛼
1−𝛼 for 0 < 𝛼 < 1. Also, note that
𝑑
𝑑𝛼 𝛼 − 1 + 𝑒−𝛼 = 1 − 𝑒−𝛼 > 0 for 0 < 𝛼 < 1, and 𝛼 − 1 + 𝑒−𝛼 |𝛼=0 = 0.
Therefore 𝛼 > 1 − 𝑒−𝛼 for 0 < 𝛼 < 1. Thus,
ln 1 −1
poly (𝜆) > −
1
poly (𝜆)
1−1
poly (𝜆)
= −1
poly 𝜆 −1
→ ln 1 −1
poly (𝜆)
1
𝑚=
1
𝑚ln 1 −
1
poly (𝜆) > −
1
𝑚(poly 𝜆 −1)
→ 1 −1
poly 𝜆
1𝑚
> 𝑒−
1𝑚 poly 𝜆 −1
→ 1 − 1 −1
poly 𝜆
1𝑚
< 1 − 𝑒−
1𝑚 poly 𝜆 −1 <
1
𝑚 poly 𝜆 − 1
52
→ ln 𝑚 + 1 1 − 1 −1
poly 𝜆
1
𝑚 < ln
𝑚+1
𝑚 poly 𝜆 −1 = − ln poly 𝜆 4.1
The last equation in 4.1 is obtained by replacing poly 𝜆 with the polynomial poly 𝜆 +
1 +poly (𝜆)
𝑚. Also, we have
ln 1 −1
𝑚 + 1 > −
1𝑚 + 1
1 −1
𝑚 + 1
= −1
𝑚→
1
ln 1 −1
𝑚 + 1
< −𝑚 4.2
Hence, by multiplying 4.1 and 4.2 , we get
𝑚′ =
ln 𝑚+1 1− 1−1
poly 𝜆
1𝑚
ln 1−1
𝑚 +1
> 𝑚 ln poly 𝜆 .
We have shown that, under an attack with 𝑚 ln poly 𝜆 chosen plaintext and ciphertext
pairs, our encryption scheme reduces to the large integer factorization problem, under an attack
with 𝑚 ln poly 𝜆 chosen plaintext and ciphertext pairs. Note that there is no constraint on the
chosen plaintexts. In particular, the adversary can choose a plaintext multiple times.
4.2.4 Complexity of the Encryption and Decryption Algorithms
We need to choose 2𝑚 primes in the encryption scheme. As shown in Lemma 4.1.1 this
takes polynomial time. Note that the primes can be precomputed. The decryption algorithm
involves only two matrix multiplications, which, as shown later in Section 4.2.6, takes
𝑂(𝑚𝜆 log 𝑚𝜆 log log 𝑚𝜆) time. The encryption algorithm requires both two matrix
multiplications and also an algorithm to solve the 𝑚 linear congruences that define the values 𝑎,
53
𝑏, and 𝑐. It takes time 𝑂 𝑚𝜆 to construct the solution to these linear congruences, so the overall
complexity for encryption is also 𝑂 𝑚𝜆 log 𝑚𝜆 log log 𝑚𝜆 .
4.2.5 Computation Algorithms
Multiplication and addition of encrypted elements is simply normal matrix multiplication
and addition, respectively.
Lemma 4.2.8: The multiplication and addition algorithms are correct.
Proof. First we show that addition is correct. Note that
𝑥, 𝑘 + 𝑦, 𝑘 = 𝑘−1diag 𝑥, 𝑎, 𝑏, 𝑐 𝑘 + 𝑘−1diag 𝑦, 𝑎′ , 𝑏′ , 𝑐′ 𝑘
= 𝑘−1 diag 𝑥, 𝑎, 𝑏, 𝑐 + diag 𝑦, 𝑎′ , 𝑏′ , 𝑐′ 𝑘
= 𝑘−1diag(𝑥 + 𝑦, 𝑎 + 𝑎′ , 𝑏 + 𝑏′ , 𝑐 + 𝑐′)𝑘 = (𝑥 + 𝑦, 𝑘)
Next we show that multiplication is correct. Note that
𝑥, 𝑘 𝑦, 𝑘 = 𝑘−1diag 𝑥, 𝑎, 𝑏, 𝑐 𝑘𝑘−1diag 𝑦, 𝑎′ , 𝑏′ , 𝑐′ 𝑘
= 𝑘−1diag(𝑥, 𝑎, 𝑏, 𝑐)diag(𝑦, 𝑎′, 𝑏′, 𝑐′)𝑘
= 𝑘−1diag 𝑥𝑦, 𝑎𝑎′ , 𝑏𝑏′ , 𝑐𝑐′ 𝑘 = 𝑥𝑦, 𝑘 .
4.2.6 Complexity of the Computation Algorithms
We now consider the complexity of our multiplication and addition algorithms. First
consider the size of the integers in the ring 𝑍𝑁 . The value 𝑁 is the product of 𝑚 numbers of
length 𝜆 bits, so it is approximately an 𝑚𝜆 bit number. There exist efficient algorithms for
multiplication of 𝑏 bit integers with complexity 𝑂 𝑏 log 𝑏 log log 𝑏 . For 𝑏 = 𝑚𝜆 this becomes
𝑂 𝑚𝜆 log 𝑚𝜆 log log 𝑚𝜆 . In our case matrix multiplication involves 64 multiplications and 64
54
additions. Since addition can be done in linear time, the algorithm is dominated by multiplication
and thus has complexity 𝑂 𝑚𝜆 log 𝑚𝜆 log log 𝑚𝜆 . Addition is linear and thus has complexity
𝑂(𝑚𝜆).
4.3 Homomorphic Encryption in Multi-user Systems
The homomorphic encryption scheme discussed in Section 4.2 cannot be securely used in
practical systems. To allow computation on encrypted data, the data stored on the database server
should be encrypted by the same master key. The master key then has to be shared by multiple
users who need to access the data. A user may need to encrypt secret data and send them with a
request to the server. Also, the server may send a response, which contains some encrypted data,
to a user and the user needs to decrypt the ciphertexts in the response. Having all users holding
the master key can compromise the security of the system, especially if the users are from many
different domains.
Our solution is let each user hold a unique user key 𝑘𝑖 and use a transformation function to
transform the ciphertext encrypted by the user key 𝑘𝑖 to the ciphertext encrypted by the master
key 𝑘. Such a transformation scheme may not always be easy to obtain for some encryption
algorithms. In our scheme, data are encrypted into matrix representation and a similarity
transform function can be used to achieve the goal. We develop the corresponding
communication protocol for sending secret data between the users and the server using
individual user keys and then use similarity transform to convert the user keys to the master
encryption key. In Subsection 4.3.1, we define a model for the multi-user systems. In
Subsection 4.3.2, we present the protocols for the user to send requests and receive responses
55
while using her own user key to encrypt and decrypted the data in the request and the response,
respectively.
4.3.1 Settings
As the multi-user system described in Chapter 3, a single server DB hosting a database and
a set of users U = {Ui | i 1} accessing the data stored on DB. For security assurance, a key
agent KA1 is added in between the server and the users. Thus, the adversary structure is
Z = {𝑈𝒜, {DB}∪ 𝑈𝒜, {KA1}∪ 𝑈𝒜 , {DB, KA1}∪ 𝑈𝒜 | 𝑈𝒜 U},
where 𝑈𝒜 is the set of compromised users which could be empty. We let user Ui holds a user key
𝑘𝑖 , KA1 holds the first matching key of 𝑘𝑖 , denoted as 𝑘𝑖 ′. DB holds the second matching key of
𝑘𝑖 , denoted as 𝑘𝑖 ′′ , where 𝑘 = 𝑘𝑖 ⋅ 𝑘𝑖′ ⋅ 𝑘𝑖
′′ is the master key of the system. The keys are
generated and distributed by a trusted party TP at the system initialization time.
The data hosted by the DB may have different criticality levels and may be protected in
different ways. We only consider the data that should be protected during computation time and
encrypt them using our homomorphic encryption scheme. Additional protection can be added by
encrypting these data using a conventional encryption scheme, such as AES, when they are in
memory or disk and decrypt them in CPU. Also, we assume that the communications between
any two entities are via secure channels (e.g. messages are properly encrypted and
communication keys are properly established). The adversary cannot know the content of the
communication unless it compromises at least one entity. (In our protocol, we do not discuss the
additional protection mechanisms but only consider the steps relevant to our homomorphic
encryption.)
56
4.3.2 The Multi-User Access Protocol
Here, we first introduce the similarity transformation function, which plays the central role
in the construction of the protocols, and discuss its property. Let 𝜑 be the similarity
transformation function
𝜑 𝐶, 𝑘′ = 𝑘′−1 ∙ 𝐶 ∙ 𝑘′
where 𝐶 = (𝑥, 𝑘) is the ciphertext, and 𝑘, 𝑘′ are encryption keys. Then 𝜑 can transform the
encryption key from 𝑘 to 𝑘 ∙ 𝑘′ based on the following lemma.
Lemma 4.3.1: If 𝐶 = (𝑥, 𝑘), then 𝜑 𝐶, 𝑘′ = (𝑥, 𝑘 ∙ 𝑘′).
Proof. 𝜑 𝐶, 𝑘′ = 𝜑 𝑥, 𝑘 , 𝑘′ = 𝑘′−1∙ 𝑥, 𝑘 ∙ 𝑘′ = 𝑘′ −1
∙ 𝑘−1diag 𝑥, 𝑎, 𝑏, 𝑐, 𝑘 ∙ 𝑘′
= 𝑘 ∙ 𝑘′ −1 ∙ diag 𝑥, 𝑎, 𝑏, 𝑐, ∙ 𝑘 ∙ 𝑘′ = (𝑥, 𝑘 ∙ 𝑘′).
The process of the system consists of two phases: the system initialization phase, and the
request-response phase. At the system initialization time, a trusted party TP generates and
distributes the keys to the DB, KA1, and users, then TP exits the system and destroys all the key
related knowledge. In the request-response phase, the user Ui sends a request to DB, and DB
processes it, and sends a response to Ui.
Key generation and distribution. TP generates the master key 𝑘 by the method discussed
in Section 4.2.2. Then TP generates many key triples 𝑘𝑖 , 𝑘′𝑖 , 𝑘′′𝑖 . It uses the same method
(Section 4.2.2) to randomly generate user key 𝑘𝑖 and the first matching key 𝑘′𝑖 . Then, it
computes the second matching key 𝑘′′𝑖 = 𝑘′𝑖−1 ∙ 𝑘𝑖
−1 ∙ 𝑘. TP sends 𝑘𝑖 to user Ui, 𝑘′𝑖 to KA1, and
𝑘′′𝑖 to DB. In a static system, TP exits the system after key initialization and distribution. In a
dynamic system where new users may join the system dynamically, a key manager KM is also
57
introduced to manage user keys. TP sends the list of unused user keys to KM to be distributed to
new users later. TP exits after key initialization and distribution. Note that matching keys can be
associated using their indices.
Request-response protocol. The main issue in the request-response protocol is how to
encrypt the data to be sent with the request and how to decrypt the data in the response. The
pseudo code for the request-response protocol is given in Figure 4.1. In the protocol, the critical
data in the request is encrypted (Line 1), and the encryption key is then transformed (Lines 2 and
3) by the KA1 and DB, respectively. Then, in Line 4, the DB processes the request and generates
the response (with an encrypted data in the response). The encryption key of the critical data in
the response is transformed (Lines 5 and 6) by the DB and KA1, respectively. Finally, the user
decrypts and gets the result (Line 7).
Figure 4.1. Request processing protocol.
Theorem 4.3.2: Suppose that the adversary attacks the multi-user system with the
adversary structure AS. The system is secure if the adversary collects less than 𝑚 ln poly(𝜆)
plaintext-ciphertext pairs.
(1) Ui prepares a request q with a sensitive data 𝑑. It encrypts 𝑑 with 𝑘𝑖 , obtaining
𝑑, 𝑘𝑖 , and sends q with 𝑑, 𝑘𝑖 to KA.
(2) KA computes 𝜑 𝑑, 𝑘𝑖 , 𝑘′𝑖 = 𝑑, 𝑘𝑖 ⋅ 𝑘′𝑖 and sends the updated request q to
DB.
(3) DB computes 𝜑 𝑑, 𝑘𝑖 ⋅ 𝑘′𝑖 , 𝑘′′
𝑖 = 𝑑, 𝑘𝑖 ⋅ 𝑘′𝑖 ∙ 𝑘′′𝑖 = 𝑑, 𝑘 .
(4) DB processes q with 𝑑, 𝑘 and generates response r with an encrypted data
𝐸 𝑑′, 𝑘 .
(5) DB computes 𝜑 𝑑′, 𝑘 , 𝑘′′𝑖−1
= 𝑑′, 𝑘𝑖 ⋅ 𝑘′𝑖 and sends 𝑑′ , 𝑘𝑖 ⋅ 𝑘𝑖
′ with r to
KA.
(6) KA further computes 𝜑 𝑑′ , 𝑘𝑖 ⋅ 𝑘𝑖′ , 𝑘′
𝑖−1
= 𝑑′ , 𝑘𝑖 and sends 𝑑′ , 𝑘𝑖 with r
to Ci.
(7) Ui receives the response and decrypts 𝑑′ , 𝑘𝑖 with 𝑘𝑖 and gets 𝑑′.
58
Proof. We consider three compromising situations with respect to the adversary structure
AS.
Case 1. The adversary compromises DB and KA1. Then 𝑘′′𝑖 and 𝑘′𝑖 are compromised, but
the master key 𝑘 and the key 𝑘𝑖 of the user Ui are intact. Therefore the adversary can neither
reverse the ciphertext encrypted by 𝑘 stored on DB, nor the ciphertext encrypted by 𝑘𝑖 sent from
Ui to KA1.
Case 2. The adversary compromises DB and 𝑈𝒜 . Then 𝑘′′𝑖 are compromised, 𝑘𝑗 are
compromised if Uj 𝑈𝒜, but the master key 𝑘 and the key 𝑘𝑗 ′ of the user Ui’ are intact if Uj’
𝑈𝒜. Therefore the adversary can neither reverse the ciphertext encrypted by 𝑘 stored on DB, nor
the ciphertext encrypted by 𝑘𝑗 ′ ∙ 𝑘′𝑗 ′ sent from KA1 to DB.
Case 3. The adversary compromises KA1 and 𝑈𝒜 . Then 𝑘′𝑖 are compromised, 𝑘𝑗 are
compromised if Uj 𝑈𝒜, but the key 𝑘𝑗 ′ of the user Ui’ are intact if Uj’ 𝑈𝒜. Therefore the
adversary cannot reverse the ciphertext encrypted by 𝑘𝑗 ′ sent from Uj’ to KA1, nor the ciphertext
encrypted by 𝑘𝑗 ′ ∙ 𝑘′𝑗 ′ sent from DB to KA1.
Thus it completes the proof.
4.4 Performance of Our Homomorphic Encryption Scheme
We implement our algorithm and evaluate its execution time. The large integer
multiplication and addition were implemented using the GNU Multiple Precision (GMP)
Arithmetic Library [33]. In Figure 4.2, we give the number of milliseconds required to perform
addition and multiplication of encrypted data ( 4 × 4 matrices over the ring 𝑍𝑁 ). The
59
computations were performed on a 2.16GHz Intel Core 2 Duo Processor. The security parameter
considered was 𝜆 = 1024 . The data was gathered from running 10000 additions and
multiplications of randomly selected numbers of length 𝑚𝜆 bits.
Figure 4.2. The size of 𝑚 against the speed of addition and multiplication of two encrypted data
(in milliseconds).
As can be seen from Figure 4.2, for a small enough 𝑚, the algorithm is very efficient. For
example, for 𝜆 = 1024 and 𝑚 = 16, the algorithm runs multiplication in only 108 milliseconds
and runs addition in a tenth of a millisecond. For such 𝜆 and 𝑚, we can choose poly 𝜆 = 𝜆10 =
2100 , which translates into 1109 chosen plaintexts in an attack that our algorithm can withstand.
This makes the algorithm practical for real world implementation where large scale plaintext
attacks are not an issue.
For the purpose of comparison, we estimate the computation time of Gentry’s
homomorphic encryption scheme [31]. In [32], the performance of the primitive operations has
been studied: The bootstrapping (re-crypt) time is 6 seconds, which dominates the time of the
operations. For l-bit numbers, the addition circuit needs 5*l gates and the multiplication circuit
needs 11*l2 gates [51].Thus,Gentry’shomomorphicencryptionschemeneeds5*32*6(> 900)
seconds to add two 32 bit numbers, and 11*322*6 (> 67000) seconds to multiply two 32 bit
0
2
4
6
8
10
1 4 16 64 256 512 768 1024
Tim
e (
ms)
m
Addition
0
5000
10000
15000
20000
25000
1 8 64 384 768Ti
me
(m
s)m
Multiplication
60
numbers. This is far slower than our homomorphic encryption scheme (0.1 milliseconds to add
two 32 bit numbers and 108 milliseconds to multiply two 32 bit numbers).
We also study the performance of our request and response protocols for multi-user data
processing systems. Assume that the size of the secret data in the request and response is the
same as the security parameter. To factor in the communication cost, we simulate the case where
the user is at UTD, the KA1 is at ASU, and the DB is at UCLA. The performance results for
𝜆 = 1024 and 𝑚 = 16 are shown in Table 4.1 and the execution time are measured in
milliseconds.
Table 4.1. The performance of the communication protocol
Operation Process Description Performance
Encryption Encrypt one data item 215 ms
Decryption Decrypt one data item 34 ms
Transform Local one-time transformation 215 ms
The user sends
a request to
DB
User encrypts the data and sends the
request to KA1, KA1 transforms the
embedded data and sends the request to
DB, DB further transforms the data in the
request.
807 ms
(85 ms if the data in the request is
not encrypted and is sent from user
to DB directly)
DB sends a
response to the
user
DB transforms the data in the response
and sends the response to KA1, KA1
transforms the data and sends the
response to the user, the user decrypts the
data
806 ms
(77 ms if the data in the response is
not encrypted and is sent from DB
to user directly)
As can be seen, by using the communication protocol with our homomorphic encryption
algorithm, the performance for sending a request and receiving a response is degraded by
approximately 10 times from the case where encryption is not used, but it is still a reasonable
cost to achieve the desired security.
61
4.5 Summary
In this chapter, we presented a novel non-circuit based homomorphic encryption algorithm.
Our scheme is fully homomorphic, but it is not semantic secure. The security of our
homomorphic encryption scheme is equivalent to the well known large integer factorization
problem (which is also the security basis for RSA), but it requires a chosen bound on the
plaintext attack.Even thoughGentry’s and the subsequent solutions are semantic secure, their
time complexity is too high for practical use. Also, its circuit based approach suffers from a
significant overhead. Our scheme yields a very practical time complexity for encryption,
decryption, and computation on ciphertexts. Specifically, to withstand a chosen plaintext attack
with over 1000 plaintexts, the algorithm runs addition in only tenth of a millisecond and
multiplication inhundredmilliseconds. Incontrast,Gentry’salgorithmrequiresmore than900
seconds for addition and more than 67000 seconds for multiplication.
Our homomorphic encryption algorithm is symmetric-key based while most of the existing
algorithms are public key based. The only advantage of the public key homomorphic encryption
schemes is the possibility of encrypting data without needing to know the private key, i.e., so that
many clients can issue the requests to the encrypted database. However, in almost all
applications, it is necessary, but not secure, for the client to know the private key in order to read
back and decrypt the data in the response. Our request-response communication protocol for our
symmetric-key homomorphic encryption scheme can secure the request and response processes
in multi-user systems.
62
CHAPTER 5
ORDER PRESERVING ENCRYPTION SCHEMES
Order-preserving encryption (OPE) scheme is a deterministic symmetric-key encryption scheme.
The ciphertexts of OPE preserve the order of the plaintexts. Thus search queries can be
processed efficiently using conventional DBMS techniques, e.g. establishing the B+ tree on
ciphertexts encrypted by OPE. However, on the other hand, OPE is not a perfectly secure
encryption scheme since ciphertexts inevitably leak the order information of the plaintexts. It is
therefore important to know exactly how much security an OPE scheme can provide.
Unfortunately, the existing security analysis for the OPE schemes are either informal (the
security analysis is based on experiments [3]) or incomplete (the security analysis reduces the
security of an OPE scheme to the ideal OPE object (a special OPE) without further analyzing the
security of the ideal OPE object [12]). After presenting the formal definition of the ideal OPE
object in Section 5.1, we complete the security analysis in [12] by proving the one-wayness
security of the ideal OPE object [75, 77] in Section 5.2. We estimate the expected number of bit
information zh (formulated by the average min-entropy) of the plaintext that remain secret from
the adversary against a known plaintext attack with h known plaintexts. The result shows that the
ratio of zh to the length of plaintext is greater than a constant ratio. Since the probability for any
adversary to fully recover the plaintext is less than or equal to 1/2𝑧 , the estimation of zh implies
the one-wayness security of the ideal OPE object, i.e., the probability for any PPT adversary to
fully recover the plaintext encrypted by the ideal OPE object against an h known plaintext attack
63
is a negligible function of the secure parameter. The security analysis result not only helps
improve our understanding of the security of OPE schemes and guides its parameter selections,
but also provides a general method for analyzing their security. A similar result is also given
in [13] after our work was published as a technical report and submitted to conferences. In [15],
the authors first estimate the probability of a value being a ciphertext's most likely plaintext
(m.l.p.), and then approximate the sum of m.l.p. probabilities over the ciphertext space to get the
average attacking success probability.
The ideal object has been used as the security goal in the security definitions in many
existing literatures. Intuitively, such approach tries to make the cipher behavior as randomly as
possible in order to achieve the highest security. For deterministic encryption, the security
definition in [7, 49] requires it to be indistinguishable from the “ideal”objectthat is a function
drawn at random from all possible permutations. For order-preserving encryption, the security
definition in [12] requires it to be indistinguishable from the “ideal” object that is a function
drawn at random from all possible order-preserving functions. For prefix-preserving encryption,
the security definition in [12] requires it to be indistinguishable from the “ideal”objectthatisa
function drawn at random from all possible prefix-preserving functions. However, it has not been
carefully examined whether the ideal object as defined in [12] has the highest possible security
for OPE schemes. It is meaningless to construct a real scheme indistinguishable from the “ideal”
object which is not secure. It can be shown that the ideal deterministic encryption object achieves
the highest security notion IND-DCPA (Indistinguishability against Distinct Chosen-Plaintext
Attacks) [7, 49]. Consequently, it is valid to use ideal object in the security definition of
64
deterministic encryption. For OPE, the authors in [12] attempt to prove that the ideal OPE object
achieves the security notion IND-OCPA (Indistinguishability against Ordered Chosen-Plaintext
Attacks). But they discover that given two randomly selected plaintexts, the distance of the
corresponding ciphertexts is small if the distance of the plaintexts is small; the distance of the
corresponding ciphertexts is large if the distance of the plaintexts is large. Based on this
property, the authors design the big jumpattacktoretrievethe“distance”information about the
plaintexts from the ciphertexts. Since it leaks more than order information, the ideal OPE object
cannot achieve the security notion IND-OCPA. Hence, there is no proof or any evidence to show
that the ideal OPE object achieves the highest security notion. In fact we prove that the ideal
OPE object is not necessarily the most secure OPE (Section 5.3).
In Section 5.4 we design two generalized OPE (GOPE) algorithms in polynomial-sized and
superpolynomial-sized domains to satisfy stronger notions of security than the ideal OPE object.
First, we consider the security notion IND-OCPA for OPE algorithms in polynomial-sized
domains. Note that the attacks designed so far (such as the big jump attack given in [12] to show
that ideal OPE object cannot achieve IND-OCPA in superpolynomial-sized domains) do not
eliminate the possibility for designing an OPE algorithm that is secure under IND-OCPA in
polynomial-sized domains. In fact, we extend the concept of encryption to design the GOPE
algorithms to achieve IND-OCPA. The difference between OPE and GOPE lies in the fact that
the ciphertexts of OPE are numbers while the ciphertexts of GOPE are allowed to be general
mathematical objects. Hence, the GOPE scheme requires a special comparison algorithm to
compare the ciphertexts.
65
We also study the security level OPE can achieve in superpolynomial-sized domains. We
weaken the security notion from IND-OCPA to IND-OLCPA. IND-OLCPA has one more
constraint to the adversary compared to IND-OCPA, that is, the range of plaintexts in the oracle
queries is bounded by a polynomial g1, i.e., the difference between the largest and the smallest
plaintexts in the oracle query is less than or equal to g1. We show that the lower bound on the
advantage of an adversary against any OPE algorithms under IND-OLCPA is 1
𝑔, where g is a
polynomial. Note that this lower bound is not achieved by the ideal OPE object. Accordingly, we
construct another GOPE algorithm to achieve this lower bound under IND-OLCPA.
is constructed based on two building blocks and . is adapted from such that the
ciphertext of a plaintext is secure under IND-OLCPA if can only support comparison between
two plaintexts whose difference is bounded by ( is designed to facilitate the comparison between
two plaintexts should preserve the order of the corresponding plaintexts should also guarantee
that the ciphertexts as follows: The ciphertexts of the first includes and , for any pair of
plaintexts, either or will fulfill the comparison task. Also, since the attacker can only query
plaintexts within the range are indistinguishable and the ciphertexts from have a small statistical
distance. Thus, achieves the lower bound on the advantage of an adversary. As discussed in
Chapter 1, existing OPE schemes have the single-key problems and cannot support multi-user
systems. To solve the single key problem, we develop protocols to support multi-user data-
centric systems where OPE schemes are used to protect the sensitive data that need to be
searched in encrypted form. We introduce a group of key agents into the system and invent the
protocoltoenable“distributedencryption”toassurethattheOPEencryptionkeyisnotknown
66
by any entity in the system. In Section 5.5, we briefly discuss our approach. Then in Section 5.6
we develop a digit based OPE (DOPE) protocol, where p key agents are deployed between the
DB and users. The master encryption key is shared to the p key agents such that each key agent
holds a different encryption key. For secret data x, it is mapped it into p “digits”, and each of the
p digits is encrypted by a separate key agent with a distinct key using an existing OPE scheme
(any OPE scheme can be used with our protocol). The ciphers of the digits are sent to DB and
integrated to the final ciphertext. Since the cipher of each digit is order-preserving, the integrated
ciphertext is order-preserving.
The basic DOPE protocol has some security problems. A key agent can see the plain digit,
which reveals part of the confidential data x. Additionally, if an adversary compromises the DB
and one key agent, then he can use the key to compromise the same digit of every data in the DB.
To cope with the attacks, we invent the oblivious encryption (OE, alternate to oblivious transfer)
technique in Section 5.7 toenablethekeyagentsto“obliviously” encryptthe“digits”withouta
high overhead. Moreover, we use a chain of key agents to encrypt each digit so that the key for
the digit cannot be compromised unless all the key agents in one chain are compromised. To
further prevent the adversary from using the location information (used in OE) and order
information (between the plaintexts and ciphertexts), we require each key agent in the chain
randomly permutes the vector it receives (vector permutation), and each key agent in the chain
will substitute half elements in the vector (data mutation) to randomize the orders of the elements
in the vector. We develop a complete solution, the OE-DOPE protocol, based on the basic-
DOPE, OE, and the key agent chain with vector permutation and data mutation approaches. The
67
performance study of OPE algorithms and the protocols for multi-user systems is conducted in
Section 5.8. Finally we summarize this chapter in Section 5.9.
5.1 Background
Let λ be the security parameter and ν be a negligible function. Let 𝑥$ 𝐴 denote that x is
uniformly randomly selected from set A, 𝑥$ 𝒳 denote that randomized algorithm 𝒳 returns
value x, and 𝒳𝒴 denote that algorithm 𝒳 is accessible to oracle 𝒴 . To facilitate thorough
analysis, we consider OPE schemes in two domains, the polynomial-sized domain and the
superpolynomial-sized domain. If m is a polynomial of λ, then the OPE scheme is in the
polynomial-sized domain, and if m is a superpolynomial of λ, then the OPE scheme is in the
superpolynomial-sized domain. Various security notions are defined for these two domains. We
first introduce the fundamental security notion IND-CPA (indistinguishability under chosen-
plaintext attack) and define it in Definition 5.1.1.
Definition 5.1.1 (IND-CPA): Let 𝒮 = (𝒦, , 𝒟) be a symmetric-key encryption scheme
and b{0,1}. Let k(ℒℛ(∙, ∙,b)) be a left-or-right encryption oracle such that for queries {(𝑥𝑢0,
𝑥𝑢1)}1≤u≤h, it returns
(𝑥𝑢𝑏 , k)
$ k(ℒℛ(𝑥𝑢
0, xu1, b))
for1≤u ≤h. Let 𝒜 be an adversary that can access k(ℒℛ(∙,∙,b)) and finally returns a bit b' as a
guess of b. Consider the following experiment.
Experiment 𝐄𝐗𝐏𝒮,𝒜IND −CPA −𝑏
k $ 𝒦; b'
$ 𝒜𝑘 (ℒℛ(∙,∙,𝑏)); Return b'
68
The encryption scheme 𝒮 is said to be secure under IND-CPA if for every probabilistic
polynomial time (PPT) adversary 𝒜, the advantage of 𝒜, defined by
𝐀𝐃𝐕𝒮,𝒜IND −CPA = Pr[𝐄𝐗𝐏𝒮,𝒜
IND −CPA −1 =1]−Pr[𝐄𝐗𝐏𝒮,𝒜IND −CPA −0 = 1]
is bounded by a negligible function of the security parameter.
OPE schemes are not secure under IND-CPA because the ciphertexts leaks the order
information of plaintexts. Consider the adversary queries (𝑥10, 𝑥1
1) and (𝑥20, 𝑥2
1), where 𝑥10 < 𝑥2
0
and 𝑥11 ≥𝑥2
1 . If b = 0, 𝑥10 and 𝑥2
0 will be encrypted, where 𝑥10 < 𝑥2
0; if b = 1, 𝑥11 and 𝑥2
1 will be
encrypted where 𝑥11 ≥𝑥2
1. Since OPE preserves order, the adversary can distinguish whether the
plaintexts are 𝑥10 and 𝑥2
0 or 𝑥11 and 𝑥2
1 by comparing the corresponding ciphertexts. Thus, the
advantage of such adversary is 1.
In [12], the security notion is weakened to IND-OCPA (indistinguishability under ordered
chosen-plaintext attack), where the adversary is forbidden to query plaintexts with different
orders.
Definition 5.1.2 (IND-OCPA [12]): IND-OCPA has the same definition as that of IND-
CPA except that the adversary is only allowed to query {(𝑥𝑢0, 𝑥𝑢
1) | 1 ≤u ≤h}, where the condition
𝑥𝑢0 < 𝑥𝑢
0 𝑥𝑢1 < 𝑥𝑣
1, 1 ≤ u, v ≤ h
is satisfied.
IND-OCPA is the highest security notion (with respect to indistinguishability and left-or-
right encryption oracle) for OPE algorithms. However, in [12], it has been shown that OPE
schemes are susceptible to the following the big jump attack under IND-OCPA.
69
Definition 5.1.3 (Big jump attack [12]): Consider the following PPT adversary 𝒜𝐵𝐽 with
three oracle queries in the experiment of security notion IND-OCPA.
Adversary 𝓐𝐵𝐽𝑘(ℒℛ(∙,∙,𝑏))
x $ {1, ..., m−1}
y1 𝑘(ℒℛ(1, 𝑥, 𝑏))
y2 𝑘(ℒℛ(𝑥, 𝑥 + 1, 𝑏))
y3 𝑘(ℒℛ(𝑥 + 1, 𝑚, 𝑏))
Return 1 if y3 − y2 > y2 − y1; else return 0.
In the big jump attack, the attacker chooses left plaintexts 1, x, and x+1, and the right
plaintexts x, x+1, and m, where x is randomly selected from {1, ..., m−1}. From the ciphertexts, if
y3 − y2 > y2 − y1, then the attacker can guess that the right plaintexts were encrypted; if y3 − y2 ≤ y2
− y1, then the attacker can guess that the left plaintexts were encrypted. Since the distance between
two ciphertexts can reflect, to some extent, the distance between the corresponding two plaintexts,
such guess will have a high probability of being correct. The lower bound on advantage of the
adversary has been derived in [12] and is cited in Lemma 5.1.1.
Lemma 5.1.1: 𝐀𝐃𝐕𝒮,𝒜𝐵𝐽
IND −OCPA ≥ 1 −2log 𝑛
𝑚−1
Remark 5.1.1: Note that for efficient OPE, both logm and logn should be bounded by a
polynomial of λ. Therefore 𝐀𝐃𝐕𝒮,𝒜𝐵𝐽
IND −OCPA ≥ 1 − 𝜈(𝜆) if m is a superpolynomial of λ, which
implies that it is impossible to construct an OPE that is secure under IND-OCPA if m is a
superpolynomial of λ. However, the lower bound on advantage of the adversary does not
70
eliminate the possibility for designing an OPE scheme that is secure under IND-OCPA if m is
bounded by a polynomial of λ.
Because of the big jump attack, the authors in [12] take an alternative approach: They
define the security notion POPF-CCA (pseudorandom order-preserving function under chosen-
ciphertext attack) based on the ideal OPE object defined as follows.
Definition 5.1.4 (Ideal OPE Object): We say that 𝒮* = (𝒦*
, *, 𝒟*
) is the ideal OPE
object if
- 𝒦* uniformly randomly selects 𝑓 ∈ 𝑂𝑃𝐸𝑚 ,𝑛 = 𝑔: 𝑚 𝑛 𝑥 < 𝑥′𝑔 𝑥 < 𝑔(𝑥′)};
- * encrypts x to f(x);
- 𝒟* decrypts y to f
−1(y).
For a “real” OPE scheme 𝒮 = (𝒦, , 𝒟) , it is secure under POPF-CCA if it is
computationally indistinguishable from the ideal OPE object 𝒮∗ = (𝒦∗, ∗, 𝒟∗). Formally, the
security notion POPF-CCA is defined as follows.
Definition 5.1.5 (POPF-CCA [12]): Let the advantage of the adversary in POPF-CCA be
𝐀𝐃𝐕𝒮,𝒜POPF −CCA = Pr[𝑘
$ 𝒦 : 𝒜 𝑘 ,∙ ,𝒟(𝑘 ,∙) = 1]−Pr[𝑘
$ 𝒦∗ : 𝒜∗(𝑘 ,∙),𝒟∗(𝑘 ,∙) = 1].
The encryption scheme 𝒮 is said to be secure under POPF-CCA if 𝐀𝐃𝐕𝒮,𝒜POPF −CCA is bounded by
a negligible function of the security parameter for every PPT adversary 𝒜.
Based on the security notion POPF-CCA, the authors in [12] construct a real OPE scheme
and prove that it is secure under POPF-CCA. In other words, in their approach the ideal OPE
object is used as the security goal and construct real OPE scheme to achieve that security goal.
71
However, the problem is: is the ideal OPE object always the most secure OPE. We construct a
counterexample to show the negative conclusion in the next section.
5.2 Security of OPE
In [12], the authors reduce the security of 𝒮 to security of the ideal object, they do not
analyze the security of the ideal OPE object. As an obvious counter example, the ideal object is
not secure when n = m. Indeed, there exists no secure OPE scheme when n = m because the
encryption algorithm is necessarily the identity function. In [12], the authors left open the
questions of how to measure the security of the ideal OPE object and how to choose n given m.
In this section, we analyze the security of the ideal OPE object. First, we need to establish
the attack model for the analysis. The security notions considered in [12], e.g. IND-CPA, IND-
DCPA, and IND-OCPA, are all related to chosen plaintext attacks. In the security notion of IND-
CPA, the adversary is allowed to make queries of the form 𝑥𝑖0, 𝑥𝑖
1 𝑖=1 . Afterwards the left-
right-encryption-oracle will return the ciphertexts {𝐸 𝑥𝑖𝑏 , 𝑘 }𝑖=1
to the adversary, where b is a
randomly selected bit. The security of the encryption scheme depends on how precisely the
adversary can predict b. The form of queries in the game of IND-CPA is specialized to facilitate
the definition of indistinguishability. IND-DCPA and IND-OCPA consider similar security
games except for the fact that they give additional constraints on the queries. Another effective
security game against the OPE scheme is to reverse the order of the chosen plaintext attack, i.e.,
the adversary is given the ciphertext, called the challenge, and subsequently chooses the
plaintexts. In this case, the adversary can reverse 𝑥, 𝑘 by the following binary-search chosen
72
plaintext attack. The adversary 𝒜 begins the attack by choosing the midpoint 𝑝 =𝑚+1
2, and asks
the encryption oracle to encrypt p. If 𝑥, 𝑘 = 𝑝, 𝑘 , then the adversary knows that 𝑥 = 𝑝.
If 𝑥, 𝑘 > 𝐸 𝑝, 𝑘 , then the adversary knows that 𝑥 > 𝑝 . 𝒜 can continue the attack by
choosing the plaintext 𝑝+𝑚
2. If 𝑥, 𝑘 < 𝐸 𝑝, 𝑘 , then 𝒜 knows that 𝑥 < 𝑝. Then 𝒜 continues
the attack by choosing the plaintext 1+𝑝
2. Thus, after at most log m chosen plaintext attacks, the
adversary can reverse 𝑥, 𝑘 . The security notions in these models are too strong and OPE
schemes cannot achieve the security level of such security games.
We develop a new attack model by considering a common scenario in third party hosting
with potential external attacks. Let O denote the owner of a database DB, where DB and its
corresponding querying logic are hosted on the Web by a third party Host. DB is encrypted using
an OPE scheme to protect its secrecy and O holds the encryption key. DB can be accessed by
various clients in CL and O may distribute the encryption key to legitimate clients in CL. The
goal is to protect DB from potential attacks. We assume that a public key infrastructure is in
place and the identities of individuals in O, CL, Host, and outsiders can be authenticated
correctly. Note that it is not possible to protect DB against any key holders in O and CL. At the
same time, it is not possible for an individual (attacker) without a key to arbitrarily choose a
plaintext and obtain the corresponding ciphertext. An attacker may happen to know some
plaintexts and be able to find out the corresponding ciphertexts. Thus, we do not consider chosen
plaintext attacks such as those in [12]. Instead, we consider the known plaintext attack model,
where the adversary is given a ciphertext 𝑥, 𝑘 (called the challenge) to compromise. The
attack model is formally given in Definition 5.2.1.
73
Definition 5.2.1 (Attack Model): The known plaintext attack model we consider involves
an adversary with h pairs of known plaintexts and ciphertexts. Let
KP = 𝑥𝑖 , 𝑥𝑖 , 𝑘 1 ≤ 𝑖 ≤ denote the set of h plaintext/ciphertext pairs. Then, the
adversary is given a ciphertext 𝑥, 𝑘 (called the challenge). The goal of the adversary is to
compromise x from the challenge 𝑥, 𝑘 based on KP.
Next, we need to determine how to generate the challenge 𝑥, 𝑘 in the attack model.
Since the encryption algorithm is deterministic, the adversary can always reverse the
ciphertexts 𝑥𝑖 , 𝑘 , where 1 ≤ 𝑖 ≤ , since 𝑥𝑖 is a known plaintext. Thus, we assume that x is
selected from 𝑚 ∗ instead of [𝑚] . Note that in the security definition of conventional
deterministic (probabilistic) encryption schemes, it is required that the adversary cannot retrieve
any bit of information of any selected 𝑥 ∈ 𝑚 ∗ (𝑥 ∈ 𝑚 ) from the corresponding ciphertext
against the known plaintext attack. That is to say the choice of x should not affect the security
result. However, the OPE scheme cannot reach such security level. Suppose that the adversary
knows the plaintexts/ciphertext pairs in the set KP = 𝑥, 𝑥, 𝑘 , 𝑥 + 2, 𝑥 + 2, 𝑘 , where
1 ≤ 𝑥 ≤ 𝑚 − 2. Since ciphertexty is encrypted from plaintext x+1 if and only if 𝑥, 𝑘 <
𝑦 < 𝐸 𝑥 + 2, 𝑘 , the adversary can reverse plaintext x+1 from (𝑥 + 1, 𝑘) based on KP.
Therefore, worst-case security is not suitable for quantifying the security of the OPE scheme.
Hence, we consider average-case security for the OPE scheme instead. We assign weights to the
elements in 𝑚 ∗ , and consider the expected security on 𝑚 ∗ . Factors such as data access
distribution and adversary's personal interest could affect weight assignments on 𝑚 ∗. However,
without prior information of the application environment, there is no way to tell which data is
74
more/less important. Thus, in this paper, we assume that the elements in 𝑚 ∗ are evenly
weighted, i.e., x is uniformly selected from 𝑚 ∗. The security analysis based on this assumption
can be the basis for further analysis considering non-evenly weighted 𝑚 ∗for the choice of the
challenge.
According to the attack model discussed above, the security of OPE schemes can be
measured by the one-wayness security defined in Definition 5.2.2.
Definition 5.2.2 (One-Wayness Security): We say that an encryption scheme 𝒮 = (𝒦, ,
𝒟) achieves the one-wayness security if
Pr[𝒜((x, k), KP) = x] = ν(λ),
for any PPT (probabilistic polynomial time) adversary 𝒜, where x is chosen uniformly randomly
from the plaintext domain, KP = {(xi, (xi, k)) | 1 i h} is the set of h (h is bounded by a
polynomial of λ) plaintext ciphertext pairs known by 𝒜 and x1,…,xh are also chosen uniformly
randomly from the plaintext domain, and ν denotes a negligible function.
Consider the adversary 𝒜 who randomly outputs an element in the plaintext domain. Then
the success probability for 𝒜 to reverse the ciphertext (x, k) is 1/m. Since λ will be set to be
logm [46], 𝒜 succeeds with negligible probability. However, it is not a complete security proof
because there may be other adversaries. Actually we can show that by choosing n ≥m3 > 1, the
ideal OPE object achieves one-wayness security, i.e. the probability for any adversary to fully
recover a plaintext is a negligible function of the security parameter λ = logm if the number h of
known plaintext/ciphertext pairs satisfies h = o(mε), 0 < ε < 1. The proof is relegated to the
75
appendices and we conclude it in Theorem 5.2.1. Therefore, the real OPE schemes
computationally indistinguishable from the ideal PPE object are also have one-wayness security.
Theorem 5.2.1: The ideal OPE object 𝒮* achieves one-wayness security.
5.3 The Limitation of the Ideal OPE Object
In this section we show that there exists situation such that the ideal OPE object is not the
most secure OPE. We consider a specific plaintext domain [m] and ciphertext range [n],
construct a real OPE scheme 𝒮 = (𝒦, , 𝒟) for [m] and [n] and prove that 𝒮 is secure under
IND-OCPA, and prove that the ideal OPE object 𝒮∗ for [m] and [n] is not secure under IND-
OCPA.
Plaintext domain and ciphertext range: In this section, let m = 2 and n = 2λ where λ is
the security parameter. Then the plaintext domain is [m] = {1, 2} and the ciphertext range is [n]
= {j | 1 ≤ j ≤ 2λ}.
The real OPE scheme: First we construct a real OPE scheme 𝒮 = (𝒦, , 𝒟) as follows.
- 𝒦: It uniformly randomly selects f {g: [m] [n] | g(2) = g(1) + 1};
- : For plaintext x, it returns f(x);
- 𝒟: For ciphertext y, it returns f−1
(y).
Unlike the ideal OPE object, in the real OPE scheme 𝒮 the encryption function is
uniformly randomly selected from a subset of order-preserving functions. The encryption
function has the property such that 1 is encrypted to a random element r in [1, n−1] while 2 is
encrypted to r+1. To show that the real OPE scheme 𝒮 is secure under IND-OCPA, we
76
compute the statistical distance between the probability distribution of ciphertexts for plaintext 1
and the probability distribution of ciphertexts for plaintext 2, and prove that it is negligibly small.
Based on this fact, we show that the success probability of every attack in IND-OCPA is also
negligibly small.
Lemma 5.3.1: Let Δ be the statistical distance between (1) and (2). Then Δ = 𝜈(𝜆).
Proof. According to the definition of , 𝑖 ∈ [𝑛] subjects to the probability distribution
such that
Pr[ 1 = 𝑗] = 1
𝑛−1for 1 ≤ 𝑗 < 𝑛
0 for 𝑗 = 𝑛 and
Pr[ 2 = 𝑗] = 0 for 𝑗 = 11
𝑛−1for 1 < 𝑗 ≤ 𝑛
Thus
Δ =1
2 Pr 1 = 𝑗 − Pr[ 2 = 𝑗] 𝑗 =
1
𝑛−1=
1
2𝜆−1= 𝜈 𝜆
Proposition 5.3.2: 𝒮 is secure under IND-OCPA. Specifically, 𝐀𝐃𝐕𝒮,𝒜IND −OCPA = 𝜈(𝜆) for
every PPT adversary 𝒜.
Proof. Note that the adversary has to query ordered plaintext pairs to ℒℛ in IND-OCPA
and here are the all possible queries of the adversary: {(1,1)}, {(2,2)}, {(1,1),(2,2)}, {(1,2)}, and
{(2,1)}. We analyze the security of 𝒮 according to these queries.
(1) The adversary queries {(1,1)} to ℒℛ. In this case, since the left plaintext equals to the
right plaintext, the returned ciphertexts cannot help the adversary to decide whether the left
plaintext or right plaintext is encrypted. Hence 𝐀𝐃𝐕𝒮,𝒜IND −OCPA = 0.
77
(2) The adversary queries {(2,2)} or {(1,1),(2,2)} to ℒℛ. The situation is similar to that in
(1) and hence 𝐀𝐃𝐕𝒮,𝒜IND −OCPA = 0.
(3) The adversary queries {(1,2)} to ℒℛ. According to Lemma 5.3.1,
𝐀𝐃𝐕𝒮,𝒜IND −OCPA = Pr[𝐄𝐗𝐏𝒮,𝒜
IND −OCPA −1 =1]−Pr[𝐄𝐗𝐏𝒮,𝒜IND −OCPA −0 = 1] = Δ = 𝜈(𝜆).
(4) The adversary queries {(2,1)} to ℒℛ. The situation is similar to that in (3) and hence
𝐀𝐃𝐕𝒮,𝒜IND −OCPA = 𝜈(𝜆).
According to (1) - (4), 𝐀𝐃𝐕𝒮,𝒜IND −OCPA = 𝜈(𝜆) for every PPT adversary 𝒜.
The ideal OPE object: According to Definition 5.1.4, the ideal OPE object 𝒮∗ =
(𝒦∗, ∗, 𝒟∗) is defined as follows.
- 𝒦∗: It uniformly randomly selects f {g: [m] [n] | g(1) < g(2)};
- ∗: For plaintext x, it returns f(x);
- 𝒟∗: For ciphertext y, it returns f−1
(y).
To show that the ideal OPE object 𝒮∗ is not secure under IND-OCPA, we compute the
statistical distance between the probability distribution of ciphertexts for plaintext 1 and the
probability distribution of ciphertexts for plaintext 2, and prove that it is significant (greater than
a positive constant). Based on this fact, we design an attack to distinguish left plaintext 1 and
right plaintext 2 according to the returned ciphertext y by comparing the conditional probabilities
Pr[y | 1] and Pr[y | 2]. It can be shown that the success probability of the attack is non-negligible
(greater than a positive constant).
Lemma 5.3.3: Let Δ∗ be the statistical distance between ∗(1) and ∗(2) . Then Δ∗ =
Ω(1).
78
Proof. Since |OPEm,n| = 𝑛𝑚
and |{f OPEm,n | f(i) = j}| = 𝑗 − 1𝑖 − 1
𝑛 − 𝑗𝑚 − 𝑖
, for i [m],
∗ 𝑖 ∈ [𝑛] subjects to the negative hypergeometric distribution
𝑗−1𝑖−1
𝑛−𝑗𝑚−𝑖
𝑛𝑚
, 1 ≤ j ≤ n.
Thus
Δ∗ =1
2
𝑗−1
0
𝑛−𝑗𝑚−1
𝑛𝑚
−
𝑗−1
1
𝑛−𝑗𝑚−2
𝑛𝑚
𝑗 =
1
2
𝑗−1
0
𝑛−𝑗1
𝑛2
− 𝑗−1
1
𝑛−𝑗0
𝑛2
𝑗
= 𝑛−2𝑗+1 𝑗
2 𝑛2
=𝑛
2(𝑛−1)≥
1
2= Ω(1)
For the ideal OPE object, if 1 is encrypted to j, then 2 must be encrypted to [j+1, n], and
hence there is more choices of the encryption of 2 if j is small; similarly if 2 is encrypted to j,
then 1 must be encrypted to [1, j−1], and hence there is more choices of the encryption of 1 if j is
large. Since the encryption function of the ideal OPE object is uniformly randomly selected from
all order-preserving functions, 1 is more likely to be encrypted to [1, (n+1)/2] and 2 is more
likely to be encrypted to [(n+1)/2, n]. Lemma 5.3.3 indicates that the difference of the
encryptions of 1 and 2 is significant. Such significant difference can be used to design the attack,
and based on the attack we prove that the ideal OPE object is not secure under IND-OCPA in
Proposition 5.3.4.
Proposition 5.3.4: For the ideal OPE object 𝒮∗ with the plaintext domain [m] and the
ciphertext range [n], there exists an adversary 𝒜 who can distinguish plaintexts 1 and 2 with one
79
oracle query under IND-OCPA such that 𝐀𝐃𝐕𝒮∗,𝒜IND −OCPA = Ω(1). In other words, the ideal OPE
object 𝒮∗ is not secure under IND-OCPA.
Proof. Since |OPEm,n| = 𝑛𝑚
and |{f OPEm,n | f(i) = j}| = 𝑗 − 1𝑖 − 1
𝑛 − 𝑗𝑚 − 𝑖
, for i [m],
∗ 𝑖 ∈ [𝑛] subjects to the negative hypergeometric distribution
𝑗−1𝑖−1
𝑛−𝑗𝑚−𝑖
𝑛𝑚
, 1 ≤ j ≤ n.
Note that
𝑗−1
0
𝑛−𝑗𝑚−1
𝑛𝑚
>
𝑗−1
1
𝑛−𝑗𝑚−2
𝑛𝑚
⟺
𝑛 − 𝑗𝑚 − 1
> (𝑗 − 1) 𝑛 − 𝑗𝑚 − 2
⟺ n – j – m + 2 > (j − 1) (m − 1)
𝑚=2 n – j > j – 1
⟺ j < (n+1)/2
Thus we construct the PPT adversary 𝒜 with one oracle query in the experiment of
security notion IND-OCPA as follows (note that y ≠ (n+1)/2 since n = 2λ).
Adversary 𝓐𝑘∗(ℒℛ(∙,∙,𝑏))
y 𝑘∗(ℒℛ(1,2, 𝑏))
Return 0 if y < (n+1)/2
Return 1 if y > (n+1)/2
Then
𝐀𝐃𝐕𝒮∗,𝒜IND −OCPA = Pr[𝐄𝐗𝐏𝒮∗,𝒜
IND −OCPA −1 = 1] − Pr[𝐄𝐗𝐏𝒮∗,𝒜IND −OCPA −0 = 1] = Δ∗ = Ω(1)
80
Remark 5.3.1: The proofs in Lemma 5.3.3 and Proposition 5.3.4 can be generalized to
show that the ideal OPE object is not secure under IND-OCPA for any plaintext domain [m] and
ciphertext range [n].
We conclude the results in this section in the following theorem.
Theorem 5.3.5: The ideal OPE object 𝒮∗ is not the most secure OPE for m = 2 and n =
2λ. Specifically, there exists a real OPE scheme 𝒮 secure under IND-OCPA while the ideal OPE
object 𝒮∗ is not secure under IND-OCPA.
5.4 Generalized OPE (GOPE)
5.4.1 Generalized OPE in the Polynomial-sized Domain
We define the concept of the generalized OPE (GOPE) scheme. Unlike OPE whose
ciphertext-space is [n], GOPE adopts general mathematical objects as ciphertexts. Hence a
special comparison algorithm is needed to compare the ciphertexts.
Definition 5.4.1 (GOPE scheme): A GOPE scheme 𝒮 = (𝒦, , 𝒟, 𝒞) is a symmetric-key
encryption scheme, where 𝒦: {0,1}∗ → {0,1}∗ is a key generation algorithm, : 𝑚 × {0,1}∗ →
𝑅 is an encryption algorithm, 𝒟: 𝑅 × {0,1}∗ → [𝑚] is a decryption algorithm, and 𝒞: 𝑅 × 𝑅 → {=
, >, <} is a comparison algorithm. 𝒮 satisfies that
Pr 𝒟 𝑥, 𝑘 , 𝑘 = 𝑥 > 1 − 𝜈(𝜆)
for any x [m] and key k, and
Pr 𝒞 𝑥, 𝑘 , 𝑥′, 𝑘 = 𝑤 > 1 − 𝜈(𝜆)
for any xwx’ and w {=, >, <}.
81
Next we construct the GOPE scheme 𝒮2 = (𝒦2, 2, 𝒟2, 𝒞2) with m being a polynomial of
λ,andprovethatitissecureunderIND-OCPA. In 𝒮2 the ciphertext y for plaintext x isa“set”.
An element in y is a share of the relation between x and x’, for all other plaintexts x’. When
comparing x and x’, the matching pair of shares from x and x’ can be retrieved to reconstruct the
relation (x < x’ or x > x’).Letthesymbol“<”encodedto1 Z3 and the symbol “>”encodedto2
Z3. 𝒮2 is constructed as follows.
- 𝒦2: Given the domain size m, it randomly picks a permutation of the set {(x, x’)|1≤x
< x’ ≤m}, and randomly generates rxx’ Z3 for1≤x < x’ ≤m. It returns {(, rxx’)|1≤x < x’ ≤
m};
- 2: For plaintext x, it returns the ciphertext y = {((x’, x), rx’x) | x’ < x}∪{((x, x’), 1 +
rxx’}) | x’ > x};
- 𝒟2: For ciphertext y, it retrieves (any) two elements (i, s) and (i’, s’) from the set y, and
returns plaintext x which appears in both −1
(i) and −1
(i’);
- 𝒞2: For ciphertexts y and y’, if y = y’, it returns =. Otherwise, it retrieves (i, s) from the set
y and (i, s’) from the set y’, if s – s’ = 1, it returns <; if s – s’ = 2, it returns >.
The efficiency, correctness, and security of 𝒮2 are presented in Lemma 5.4.1 and
Theorem 5.4.2.
Lemma 5.4.1: 𝒮2 is efficient and correct.
The efficiency of 𝒮2 and correctness of decryption algorithm can be easily verified. It
suffices to verify the correctness of comparison algorithm. For x = x’, since 2 𝑥, 𝑘 = 2(𝑥′, 𝑘),
it is correct for the comparison algorithm to return =. For x ≠x’, there exist unique i, s, s’ such
82
that (i, s) 2 𝑥, 𝑘 and (i, s’) 2 𝑥′, 𝑘 . If x < x’, 2 𝑥, 𝑘 = {⋯ , (𝜋 𝑥, 𝑥′ , 1 + 𝑟𝑥𝑥 ′ ), ⋯ }
and 2 𝑥′, 𝑘 = {⋯ , (𝜋 𝑥, 𝑥′ , 𝑟𝑥𝑥 ′ ), ⋯ }, thus (1 + rxx’) – rxx’ = 1, hence it is correct for the
comparison algorithm to return <; if x > x’, 2 𝑥, 𝑘 = {⋯ , (𝜋 𝑥′, 𝑥 , 𝑟𝑥 ′ 𝑥), ⋯ } and 2 𝑥′, 𝑘 =
{⋯ , (𝜋 𝑥′ , 𝑥 , 1 + 𝑟𝑥 ′ 𝑥), ⋯ }, thus rxx’ −(1+rxx’)=−1=2,henceitiscorrectforthecomparison
algorithm to return >.
Theorem 5.4.2: 𝒮2 is secure under IND-OCPA. Specifically, 𝐀𝐃𝐕𝒮2 ,𝒜IND −OCPA = 0.
Proof. Assume that the adversary queries {(𝑥𝑢0 , 𝑥𝑢
1 ) | 1 ≤ u ≤ h} under IND-OCPA.
According to the restriction under IND-OCPA, 𝑥𝑢0 = 𝑥𝑣
0 𝑥𝑢1 = 𝑥𝑣
1. Since it will not increase the
advantage by querying two identical plaintexts pairs, it suffices to consider 𝑥10 < 𝑥2
0 < ... < 𝑥0 and
𝑥11 < 𝑥2
1 < ... < 𝑥1 . Hence, the adversary views (2 𝑥1
0, 𝑘 , ⋯ , 2 𝑥0, 𝑘 for b = 0, and the
adversary views (2 𝑥11, 𝑘 , ⋯ , 2 𝑥
1 , 𝑘 for b = 1. It suffices to prove that the above two
probability distributions are identical because it implies that 𝐀𝐃𝐕𝒮2 ,𝒜IND −OCPA = 0.
We use mathematical induction on h to prove that the two probability distributions
(2 𝑥10, 𝑘 , ⋯ , 2 𝑥
0, 𝑘 ) and (2 𝑥11, 𝑘 , ⋯ , 2 𝑥
1 , 𝑘 ) are identical. For h = 1, it is necessary to
show that the probability distribution 2 𝑥10, 𝑘 equals to the probability distribution 2 𝑥1
1, 𝑘 .
Let Π = {(x, x’) |1≤x < x’ ≤m}. Let Ij,1≤ j ≤m−1,betheprobabilitydistributionsuchthat
Pr[I1 = i1, ..., Im−1 = im−1}] = 1
( Π −𝑗 )0≤𝑗≤𝑚 −2 for (i1, ..., im−1) Π𝑚−1 and ij ≠ij’ if j ≠j’. Let Sj,1≤
j ≤m−1,betheuniformdistributiononZ3. Then according to the construction of 2,
2 𝑥10, 𝑘 = 𝐼𝑗 , 𝑆𝑗 1 ≤ 𝑗 ≤ 𝑚 − 1 = 2 𝑥1
1, 𝑘
83
We assume that the two probability distributions are identical for h < h’. For h = h’, we
consider the following two conditional probability distributions
𝑋 = 2 𝑥 ′0 , 𝑘 |2 𝑥1
0, 𝑘 = 𝑦1, ⋯ , 2 𝑥 ′ −10 , 𝑘 = 𝑦 ′ −1
and
𝑌 = 2 𝑥 ′1 , 𝑘 |2 𝑥1
1, 𝑘 = 𝑦1, ⋯ , 2 𝑥 ′ −11 , 𝑘 = 𝑦 ′ −1
where yu = {(𝑖𝑢𝑗, 𝑠𝑢
𝑗) Π Z3 |1≤ j ≤m−1},1≤u ≤h’−1.y1, ..., yh’−1willaffect2 𝑥 ′
0 , 𝑘
(2 𝑥 ′1 , 𝑘 ). First, for 1 ≤ u ≤h’−1, there exists unique𝑖𝑢
𝑗 (for some j) appears in 2 𝑥 ′
0 , 𝑘
(2 𝑥 ′1 , 𝑘 ) according to the construction of 2. On the other hand, there exists unique 𝑖𝑢
𝑗′ (for
some j’) appears in yu’, 1 ≤ u’ ≠ u ≤ h’−1; hence those 𝑖𝑢𝑗′
will not appear in 2 𝑥 ′0 , 𝑘
(2 𝑥 ′1 , 𝑘 ). Thus, let
Π𝑢 = {𝑖𝑢𝑗 | 𝑖𝑢
𝑗 appears in yu but does not appear in yu’ forany1≤u’ ≠u ≤h’−1},
for 1 ≤ u ≤ h’−1. Then there exists 𝑖𝑢𝑗
∈ Π𝑢 such that ( 𝑖𝑢𝑗
, 𝑠𝑢𝑗− 1 ) appears in 2 𝑥 ′
0 , 𝑘
(2 𝑥 ′1 , 𝑘 ),1≤u ≤h’−1.Note that theelementsofasetdonothaveorders,without lossof
generality,for1≤u ≤h’−1,let(Iu, Su) be the probability distribution such that Pr[(Iu, Su) = (𝑖𝑢𝑗,
𝑠𝑢𝑗− 1)] =
1
Π𝑢 . The rest probability distributions are similar to those for the situation of h = 1.
Let
Π = {(x, x’) Π | (x, x’) does not appear in yu,1≤u ≤h’−1}
84
for h’ ≤u ≤m−1, let Iu be the probability distribution such that Pr[Ih’ = ih’, ..., Im−1 = im−1] =
1
( Π −𝑗 )0≤𝑗≤𝑚−′ −1
for (ih’, ..., im−1) Π𝑚− ′
and iu ≠ iu’ if u ≠u’. Let Su, h’ ≤u ≤m−1,be the
uniform distribution on Z3. Then
X = {(Iu, Su)|1≤u ≤m−1}=Y. (5.1)
Consequently,
Pr[2 𝑥10, 𝑘 = y1, ..., 2 𝑥 ′
0 , 𝑘 = yh’]
= Pr[X = yh’ | 2 𝑥10, 𝑘 = y1, ..., 2 𝑥 ′
0 , 𝑘 = yh’]∙Pr[X = yh’]
= Pr[Y = yh’ | 2 𝑥11, 𝑘 = y1, ..., 2 𝑥 ′
1 , 𝑘 = yh’]∙Pr[Y = yh’] (induction hypothesis and (5.1))
= Pr[2 𝑥11, 𝑘 = y1, ..., 2 𝑥 ′
1 , 𝑘 =yh’].
Hence it implies that the two probability distributions are identical for h = h’, which
completes induction.
Remark 5.4.1: In order to improve the efficiency of 𝒮2, and −1
can be substituted with
deterministic symmetric-key encryption and decryption algorithms, and rxx’ can be generated by
a pseudorandom number generator. It is obvious that the improved scheme remains secure under
IND-OCPA.
5.4.2 IND-OLCPA
Now we consider security notion for OPE schemes in superpolynomial-sized domains.
According to the big jump attack, if m is a superpolynomial of λ, then the adversary 𝒜𝐵𝐽 can
have 𝐀𝐃𝐕𝒮,𝒜𝐵𝐽
IND −OCPA ≥ 1 − 𝜈(𝜆) by using three oracle queries. From this we can conclude that
85
IND-OCPA is too strong a security notion for OPE schemes in the superpolynomial-sized
domain. Thus, we further weaken IND-OCPA and define the security notion IND-OLCPA
(indistinguishability under ordered and local chosen-plaintext attack), where the range of the
oracle queries is bounded by a polynomial of λ (to prevent the adversary from launching the big
jump attack). The definition of IND-OLCPA is given as follows.
Definition 5.4.2 (IND-OLCPA): The security notion IND-OLCPA has the same definition
as that of IND-CPA except that the adversary is restricted so that it can only query {(𝑥𝑢0, 𝑥𝑢
1) |1≤
u ≤h} where
𝑥𝑢0 < 𝑥𝑣
0 𝑥𝑢1 < 𝑥𝑣
1 (5.2)
for 1 ≤ u, v ≤ h, and there exists a polynomial g1 such that
|𝑥𝑢𝑖 − 𝑥𝑣
𝑗| ≤ g1(λ) (5.3)
for 1 ≤ u, v ≤ h and 0 ≤ i, j ≤ 1.
We design the following attack (and call it the small jump attack) against OPE schemes
under IND-OLCPA. Similar to the big jump attack, the small jump attack also decides whether
the ciphertexts are encrypted from the left or the right plaintexts based on the differences in
distances between the ciphertexts.
Definition 5.4.3 (Small jump attack): Consider the following PPT adversary 𝒜𝑆𝐽 with
three oracle queries in the experiment of security notion IND-OLCPA.
Adversary 𝓐𝑆𝐽𝑘(ℒℛ(∙,∙,𝑏))
x $ {1, ..., m−3}
y1 𝑘(ℒℛ(𝑥, 𝑥, 𝑏))
86
y2 𝑘(ℒℛ(𝑥 + 1, 𝑥 + 2, 𝑏))
y3 𝑘(ℒℛ(𝑥 + 3, 𝑥 + 3, 𝑏))
Return 1 if y3 − y2 < y2 − y1; else return 0.
In the small jump attack given above, the left plaintexts are x, x+1, and x+3, and the
corresponding right plaintexts are x, x+2, and x+3, where x is randomly selected from {1, ...,
m−3}. The following lemma shows that the small jump attack can distinguish these two cases
with non-negligible probability.
Lemma 5.4.3: There is no efficient OPE scheme that is secure under IND-OLCPA
(because of 𝒜𝑆𝐽 ) if m is a superpolynomial of λ. Specifically, there exists a polynomial g such
that 𝐀𝐃𝐕𝒮,𝒜𝑆𝐽
IND −OLCPA ≥1
𝑔(𝜆)∙𝑔1(𝜆).
Proof. Let di = 𝑖 + 1, 𝑘 − 𝑖, 𝑘 be the distance of the two ciphertexts, 1 ≤ i < m.
Suppose that the adversary selects x = i in the small jump attack. Then y3 − y2 = di+1 + di+2 and y2
− y1 = di if b = 0; y3 − y2 = di+2 and y2 − y1 = di + di+1 if b = 1. Therefore adversary 𝒜𝑆𝐽 returns
correct b if the following condition holds.
di + di+1 > di+2 and di < di+1 + di+2 (5.4)
Consequently, adversary 𝒜𝑆𝐽 may return incorrect b if either of the following two
conditions (called small jump and small reverse-jump) holds.
di + di+1 ≤ di+2 (5.5)
di ≥ di+1 + di+2 (5.6)
87
Note that condition (5.5) implies that the distance series increases faster than Fibonacci
numbers, and condition (5.6) implies that the reversed distance series increases faster than
Fibonacci numbers. Since the formula of Fibonacci Numbers is
1
5
1 + 5
2
𝑖
− 1 − 5
2
𝑖
and logdi must be bounded by a polynomial, it implies that condition (5.5) (resp. condition (5.6))
cannot consecutively happen superpolynomial times. Moreover, condition (5.6) cannot happen
consecutively after condition (5.5). Otherwise
di + di+1 ≤ di+2 and di+1 ≥ di+2 + di+3 di + di+1 + di+2 + di+3 ≤ di+2 + di+1 di + di+3 ≤ 0,
which causes contradiction.
Consider {(di, di+1, di+2) |1≤i ≤m−3}. Suppose that (di, di+1, di+2) satisfies condition (5.5)
or condition (5.6), and m−3−i is a superpolynomial. Since condition (5.5) (resp. condition (5.6))
cannot consecutively happen superpolynomial times and condition (5.6) cannot happen
consecutively after condition (5.5), there must exist polynomial gi such that
(𝑑𝑖+𝑔𝑖, 𝑑𝑖+1+𝑔𝑖
, 𝑑𝑖+2+𝑔𝑖) satisfies condition (5.4). Hence the points in the set
{i | (di, di+1, di+2) satisfy condition (5.4)}
partition [m] into polynomial-sized segments. Let g∙g1 be the maximum polynomial. Then there
are at least 𝑚−3
𝑔∙𝑔1 many i’s such that (di, di+1, di+2) satisfies condition (5.4). Since adversary 𝒜𝑆𝐽
returns correct b if it selects x = i and (di, di+1, di+2) satisfies condition (5.4),
𝐀𝐃𝐕𝒮,𝒜𝑆𝐽
IND −OLCPA ≥1
𝑚−3∙
𝑚−3
𝑔(𝜆)∙𝑔1(𝜆)=
1
𝑔(𝜆)∙𝑔1(𝜆)
88
Proposition 5.4.4: If the adversary repeats the small jump attack, then the lower bound on
the advantage of the adversary will become 1/g.
Proof. Since the range of plaintexts in the oracle queries is bounded by g1, the probability
for some i in the set {i | (di, di+1, di+2) satisfy condition (5.4)} in the proof of Lemma 5.4.3 will
fall into the range is at least 𝑔1
𝑚−3∙
𝑚−3
𝑔∙𝑔1 = 1/g. Therefore the lower bound on the advantage of the
adversary will increase to 1/g.
5.4.3 Ideal OPE and GOPE in the Superpolynomial-sized Domain
According to the same adversaries 𝒜 in Proposition 5.3.4, the ideal OPE object 𝒮∗ does
not achieve the lower bound on the advantage of the adversary 1/g (Proposition 5.4.4) under
IND-OLCPA in superpolynomial-sized domains. Next we design a GOPE scheme 𝒮3 =
(𝒦3, 3, 𝒟3, 𝒞3) in the superpolynomial-sized domain, and prove that it achieves that lower
bound. 𝒮3 is constructed based on two building blocks: 𝒮4 and 𝒮5. 𝒮4 is adapted from 𝒮1
and it is secure under IND-OLCPA; but it can only support “local” comparisons (i.e.
comparisons for ciphertexts whose plaintexts are closeby). The ciphertexts of 𝒮5 have proper
remote order to support “remote” comparisons (i.e. comparisons for ciphertexts whose plaintexts
are far apart).
First we design 𝒮4 = (𝒦4, 4, 𝒟4, 𝒞4). Let g2 denote a polynomial. In 𝒮4 the ciphertext
of x can be compared with g2(λ) − 1 (instead of m − 1) other ciphertexts whose plaintexts are
close to x.
89
- 𝒦4: Given the domain size m, it randomly picks a permutation of the set {(x, x’) | 1 ≤ x
< x’ ≤ m}, and randomly generates rxx’ Z3 for 1 ≤ x < x’ ≤ m. It returns {(, rxx’) | 1 ≤ x < x’ ≤
m};
- 4: For plaintext x, it returns the ciphertext
y = {((x’, x), rx’x) | x’ < x}∪{((x, x’), 1 + rxx’) | x < x’ ≤ g2(λ)} if x ≤ g2(λ)/2;
y = {((x’, x), rx’x) | x−g2(λ)/2<x’<x}∪{((x, x’), 1 + rxx’}) | x<x’≤x+g2(λ)/2} if g2(λ)/2 < x <
m − g2(λ)/2;
y = {((x’, x), rx’x) | m − g2(λ) ≤ x’ < x}∪{((x, x’), 1 + rxx’) | x < x’} if x ≥ m − g2(λ)/2;
- 𝒟4: For ciphertext y, it retrieves (any) two elements (i, s) and (i’, s’) from the set y, and
returns plaintext x which appears in both −1(i) and −1
(i’);
- 𝒞4: For ciphertexts y and y’, if y = y’, it returns =. Otherwise, it retrieves (i, s) from the set
y and (i, s’) from the set y’. If s – s’ = 1, it returns <. If s – s’ = 2, it returns >.
The correctness, security, and efficiency of 𝒮4 are presented in Lemmas 5.4.5 and 5.4.6.
Lemma 5.4.5: The decryption of 𝒮4 is correct. Also for plaintexts x1, x2 [m], the
comparison of 4(𝑥1, 𝑘) and 4(𝑥2, 𝑘) is correct if |x1 − x2| ≤ (g2(λ)− 1)/2.
Proof. The correctness of the decryption can be easily verified. Note that ciphertext of x1
(resp. x2) can compare with other g2(λ) − 1 ciphertexts whose plaintexts are close to x1 (resp. x2).
Hence 4(𝑥1, 𝑘) and 4(𝑥2, 𝑘) are comparable if |x1 − x2| ≤ (g2(λ) − 1)/2. The comparison is
correct referring to the proof of Lemma 5.4.1.
90
Lemma 5.4.6: Suppose that the range of oracle queries under IND-OLCPA is bounded by
polynomial g1. Then 𝒮4 is secure under IND-OLCPA if g2 ≥ 2g1 + 1. Furthermore, 𝒮4 can be
revised to achieve efficiency and remain secure under IND-OLCPA.
Proof. The security proof is analogous to that of Theorem 5.4.2. It is worthy to note that
the condition g2 ≥ 2g1 + 1 will be used in the inductive step to guarantee two conditional
probability distributions are identical. The detailed proof is presented as follows.
Assume that the adversary queries {(𝑥𝑢0, 𝑥𝑢
1) |1≤u ≤h} under IND-OLCPA. According to
the restriction condition (5.2) under IND-OLCPA, 𝑥𝑢0 = 𝑥𝑣
0 𝑥𝑢1 = 𝑥𝑣
1 . Since it will not
increase the advantage by querying two identical plaintexts pairs, it suffices to consider 𝑥10 < 𝑥2
0
< ... < 𝑥0 and 𝑥1
1 < 𝑥21 < ... < 𝑥
1. Hence, the adversary views (4 𝑥10, 𝑘 , ⋯ , 4 𝑥
0, 𝑘 ) for b = 0,
and the adversary views (4 𝑥11, 𝑘 , ⋯ , 4 𝑥
1 , 𝑘 ) otherwise. It suffices to prove that the above
two probability distributions are identical because it implies that 𝐀𝐃𝐕𝒮4 ,𝒜IND −OLCPA = 0.
We use mathematical induction on h to prove that the two probability distributions
(4 𝑥10, 𝑘 , ⋯ , 4 𝑥
0, 𝑘 ) and (4 𝑥11, 𝑘 , ⋯ , 4 𝑥
1 , 𝑘 ) are identical. For h = 1, it is necessary to
show that the probability distribution 4 𝑥10, 𝑘 equals to the probability distribution 4 𝑥1
1, 𝑘 .
Let Π = {(x, x’) | 1 ≤ x < x’ ≤ m}. Let Ij, 1 ≤ j ≤ g2(λ) − 1, be the probability distribution such that
Pr[I1 = i1, ..., 𝐼𝑔2 𝜆 −1 = 𝑖𝑔2 𝜆 −1] = 1
( Π −𝑗 )0≤𝑗≤𝑔2 𝜆 −2 for (i1, ..., 𝑖𝑔2 𝜆 −1) Πg2 λ −1 and ij ≠ ij’ if j
≠ j’. Let Sj, 1 ≤ j ≤ g2(λ) − 1, be the uniform distribution on Z3. Then according to the
construction of 4,
4 𝑥10, 𝑘 = {(Ij, Sj) | 1 ≤ j ≤ g2(λ) − 1} = 4 𝑥1
1, 𝑘 .
91
We assume that the two probability distributions are identical for h < h’. For h = h’, we
consider the following two conditional probability distributions
X = 4 𝑥 ′0 , 𝑘 | 4 𝑥1
0, 𝑘 = 𝑦1, ⋯ , 4 𝑥 ′ −10 , 𝑘 = 𝑦 ′ −1
and
Y = 4 𝑥 ′1 , 𝑘 | 4 𝑥1
1, 𝑘 = 𝑦1, ⋯ , 4 𝑥 ′ −11 , 𝑘 = 𝑦 ′ −1
where yu = {(𝑖𝑢𝑗, 𝑠𝑢
𝑗) Π Z3 | 1 ≤ j ≤ m−1}, 1 ≤ u ≤ h’−1. For oracle queries 𝑥𝑢
𝑖 and 𝑥𝑣𝑗, since |𝑥𝑢
𝑖
− 𝑥𝑣𝑗| ≤g1(λ) ≤ (g2(λ) − 1)/2, they are comparable according to Lemma 5.4.5. So y1, ..., 𝑦 ′ −1 will
affect 4 𝑥 ′0 , 𝑘 (4 𝑥 ′
1 , 𝑘 ). First, for 1 ≤ u ≤ h’ − 1, there exists unique 𝑖𝑢𝑗 (for some j) appears
in 4 𝑥 ′0 , 𝑘 (4 𝑥 ′
1 , 𝑘 ). On the other hand, there exists unique 𝑖𝑢𝑗 ′
(for some j’) appears in yu’,
1 ≤ u’ ≠ u ≤ h’ − 1; hence those 𝑖𝑢𝑗 ′
will not appear in 4 𝑥 ′0 , 𝑘 (4 𝑥 ′
1 , 𝑘 ). Thus, let
Π𝑢 = {𝑖𝑢𝑗 | 𝑖𝑢
𝑗 appears in yu but does not appear in yu’ for any 1 ≤ u’ ≠ u ≤ h’ − 1},
for 1 ≤ u ≤ h’ − 1. Then there exists 𝑖𝑢𝑗
∈ Π𝑢 such that (𝑖𝑢𝑗
, 𝑠𝑢𝑗− 1) appears in 4 𝑥 ′
0 , 𝑘
(4 𝑥 ′1 , 𝑘 ), 1 ≤ u ≤ h’ − 1. Note that the elements of a set do not have orders, without loss of
generality, for 1 ≤ u ≤ h’ − 1, let (Iu, Su) be the probability distribution such that Pr[(Iu, Su) = (𝑖𝑢𝑗,
𝑠𝑢𝑗− 1)] =
1
Π𝑢 . The rest probability distributions are similar to those for the situation of h = 1.
Let
Π = {(x, x’) Π | (x, x’) does not appear in yu, 1 ≤ u ≤ h’ − 1}.
92
For h’ ≤ u ≤ g2(λ) − 1, let Iu be the probability distribution such that Pr[Ih’ = ih’, ..., 𝐼𝑔2 𝜆 −1
= 𝑖𝑔2 𝜆 −1] = = 1
( Π −𝑗 )0≤𝑗≤𝑔2 𝜆 −′ −1
for (ih’, ..., 𝑖𝑔2 𝜆 −1) Π𝑔2 𝜆 − ′
and iu ≠ iu’ if u ≠ u’. Let Su,
h’ ≤ u ≤ g2(λ) − 1, be the uniform distribution on Z3. Then
X = {(Iu, Su) | 1 ≤ u ≤ g2(λ) − 1} = Y. (5.7)
Consequently,
Pr[4 𝑥10, 𝑘 = y1, ..., 4 𝑥 ′
0 , 𝑘 = yh’]
= Pr[X = yh’ | 4 𝑥10, 𝑘 = y1, ..., 4 𝑥 ′
0 , 𝑘 = yh’] ∙ Pr[X = yh’]
= Pr[Y = yh’ | 4 𝑥11, 𝑘 = y1, ..., 4 𝑥 ′
1 , 𝑘 = yh’] ∙ Pr[Y = yh’] (induction hypothesis and (5.7))
= Pr[4 𝑥11, 𝑘 = y1, ..., 4 𝑥 ′
1 , 𝑘 = yh’].
Hence it implies that the two probability distributions are identical for h = h’, which
completes induction.
To achieve efficiency of 𝒮4, and −1 can be substituted with deterministic symmetric-
key encryption and decryption algorithms, and rxx’ can be generated by a pseudorandom number
generator. It is obvious that the revision is efficient and remains secure under IND-OLCPA.
Note that the original 𝒮4 is given because it is easier to understand its GOPE
construction. It is revised to achieve better efficiency. For convenience, from this point onwards,
𝒮4 refers to the revised version. Next we design 𝒮5 = (𝒦5, 5, 𝒞5) . Since 𝒮4 supports
decryption and “local” comparisons, 𝒮5 does not need a decryption algorithm but should
support “remote” comparisons. In order to assure security, the ciphertexts should have small
statistical distances if the corresponding plaintexts are close to each other. To achieve this, for 1
93
≤ i ≤ l, 5(𝑖, 𝑘) are randomly selected from [n’], where n’ and l are positive integers. Then the
subsequent ciphertexts are gradually increased. The construction of 𝒮5 is shown as follows.
- 𝒦5: It randomly selects ri [n’] for 0 ≤ i ≤ l − 1 and returns (r0, ..., rl−1);
- 5: For plaintext x [m], we compute a and b, a ≥ 0 and 0 ≤ b < l, such that x − 1 = a ∙ l
+ b. 5 returns ciphertext y = rb + a;
- 𝒞5: For ciphertexts y and y’, if y > y’, it returns >; if y < y’, it returns <.
The correctness of 𝒮5 is presented in Lemma 5.4.7.
Lemma 5.4.7: For plaintexts x1, x2 [m], the comparison of 5(𝑥1, 𝑘) and 5(𝑥2, 𝑘) is
correct if |x1 − x2| ≥ n’∙l + l.
Proof. Without loss of generality, we assume that x1 < x2. Then x2 − x1 ≥ n’∙l + l. Let xi − 1
= ai ∙ l + bi satisfying ai ≥ 0 and 0 ≤ bi < l, 1 ≤ i ≤ 2. Then
n’ ∙l + l ≤ x2 − x1 = (a2 − a1) ∙ l + (b2 − b1) < (a2 − a1) ∙ l + l a2 − a1 > n’.
Hence,
5(𝑥1, 𝑘) = 𝑟𝑏1 + a1 < 𝑟𝑏1
+ (a2 – n’) = (𝑟𝑏1 – n’) + a2 < 𝑟𝑏2
+ a2 = 5(𝑥2, 𝑘),
which implies that the comparison is correct.
If the queries by the adversary against 𝒮5 are in the interval [c∙l + 1, (c+1)∙l], for some c ≥
0, then the adversary cannot distinguish the corresponding ciphertexts because they are
independent identical random variables generated by 5. If the queries involve plaintexts in two
consecutive intervals [c∙l + 1, (c+1)∙l] and [(c+1)∙l + 1, (c+2)∙l], then the advantage of the
adversary is not 0, but it can be controlled by l and n’. The security of 𝒮5 is given in the
following Lemma.
94
Lemma 5.4.8: Suppose that the range of oracle queries under IND-OLCPA is bounded by
polynomial g1. For polynomial g ≥ 1, 𝐀𝐃𝐕𝒮5 ,𝒜IND −OLCPA ≤
1
𝑔(𝜆) if l > g1(λ) and n’ ≥ g(λ) ∙ g1(λ).
Proof. Assume that the adversary queries {(𝑥𝑢0 , 𝑥𝑢
1 ) | 1 ≤ u ≤ h} under IND-OLCPA.
According to the restriction condition (5.2) under IND-OLCPA, 𝑥𝑢0 = 𝑥𝑣
0 𝑥𝑢1 = 𝑥𝑣
1. Since it
will not increase the advantage by querying two identical plaintexts pairs, it suffices to consider
𝑥10 < 𝑥2
0 < ... < 𝑥0 and 𝑥1
1 < 𝑥21 < ... < 𝑥
1. Let 𝑥𝑢𝑖 − 1 = 𝑎𝑢
𝑖 ∙ 𝑙 + 𝑏𝑢𝑖 satisfying 𝑎𝑢
𝑖 ≥ 0 and 0 ≤ 𝑏𝑢𝑖 <
l, then 5 𝑥𝑢𝑖 , 𝑘 = 𝑟𝑏𝑢
𝑖 + 𝑎𝑢𝑖 , 1 ≤ u ≤ h and 0 ≤ i ≤ 1. Hence, the adversary views (𝑟𝑏1
0 + 𝑎10, ...,
𝑟𝑏0 + 𝑎
0) for b = 0, and the adversary views (𝑟𝑏11 + 𝑎1
1, ..., 𝑟𝑏1 + 𝑎
1 ) otherwise. Let Δ be the
statistical distance between ( 𝑟𝑏10 + 𝑎1
0 , ..., 𝑟𝑏0 + 𝑎
0 ) and ( 𝑟𝑏11 + 𝑎1
1 , ..., 𝑟𝑏1 + 𝑎
1 ). Since
𝐀𝐃𝐕𝒮5 ,𝒜IND −OLCPA ≤ Δ, it suffices to prove that Δ ≤
1
g(λ).
We study the properties of those probability distributions. Since
|(𝑎𝑢𝑖 − 𝑎𝑣
𝑗) ∙ l + (𝑏𝑢
𝑖 − 𝑏𝑣𝑗)| = |𝑥𝑢
𝑖 − 𝑥𝑣𝑗| ≤ g1(λ) < l,
it implies that |𝑎𝑢𝑖 − 𝑎𝑣
𝑗| ≤ 1 and 𝑏𝑢
𝑖 = 𝑏𝑣𝑗 𝑎𝑢
𝑖 = 𝑎𝑣𝑗, 1 ≤ u, v ≤ h and 0 ≤ i, j ≤ 1. Furthermore
𝑏𝑢𝑖 = 𝑏𝑣
𝑗 𝑎𝑢
𝑖 = 𝑎𝑣𝑗 and 𝑥𝑢
0 ≠ 𝑥𝑣0 if u ≠ v implies that 𝑏𝑢
0 ≠ 𝑏𝑣0 if u ≠ v. Therefore 𝑟𝑏1
0 + 𝑎10, ...,
𝑟𝑏0 + 𝑎
0 are independent uniform distributions on [n’] + 𝑎10, ..., [n’] + 𝑎
0 . Similarly, 𝑟𝑏11 + 𝑎1
1,
..., 𝑟𝑏1 + 𝑎
1 are independent uniform distributions on [n’] + 𝑎11, ..., [n’] + 𝑎
1 . Hence Δ equals to
the statistical distance between independent uniform distributions X1, ..., Xh on [n’] + 𝑎10, ..., [n’]
+ 𝑎0 and independent uniform distributions Y1, ..., Yh on [n’] + 𝑎1
1, ..., [n’] + 𝑎1 , i.e.
95
Δ =1
2 Pr 𝑋1, ⋯ , 𝑋 = 𝑤1, ⋯ , 𝑤
𝑤𝑢 ∈ 𝑛 ′ +𝑎𝑢0 ∪ 𝑛 ′ +𝑎𝑢
1 ,1≤𝑢≤
− Pr 𝑌1, ⋯ , 𝑌 = 𝑤1, ⋯ , 𝑤
Since |𝑎𝑢0 − 𝑎𝑢
1 | ≤ 1 for 1 ≤ u ≤ h, Δ ≤ ∙𝑛 ′ −1
+∙𝑛 ′ −1
2𝑛 ′ =
𝑛 ′ ≤ 𝑔1(𝜆)
𝑛 ′ ≤1
𝑔(𝜆).
𝒮3 = (𝒦3, 3, 𝒟3, 𝒞3) is constructed by combining 𝒮4 and 𝒮5. In order to achieve full
comparison capability, g2, l, and n’ are chosen to satisfy the condition (g2 – 1)/2 ≥ n’∙l + l
(Lemmas 5.4.5 and 5.4.7). In order to achieve security, g2, l, and n’ are chosen to satisfy the
conditions g2 ≥ 2g1 + 1, l > g1, and n’ ≥ g ∙g1 (Lemmas 5.4.6 and 5.4.8). We can solve these
inequalities, and get l > g1, n’ ≥ g ∙ g1, and g2 ≥ max{2(n’∙l + l) + 1, 2g1 + 1} = 2(n’∙l + l) + 1.
Specifically, we can set l = g1 + 1, n’ = g ∙ g1, and g2 = 2(n’∙l + l) + 1. 𝒮3 encrypts plaintext x
into (4(𝑥, 𝑘), 5(𝑥, 𝑘)). Since g and g1 are polynomials, 𝒮3 is an efficient encryption scheme.
Given two ciphertexts (4(𝑥1, 𝑘) , 5(𝑥1, 𝑘) ) and (4(𝑥2, 𝑘) , 5(𝑥2, 𝑘) ), 𝒮3 first compares
4(𝑥1, 𝑘) and 4(𝑥2, 𝑘) by using 𝒞4 ; if it fails, 𝒮3 then compares 5(𝑥1, 𝑘) and 5(𝑥2, 𝑘) by
using 𝒞5. Also, 4(𝑥, 𝑘) can be decrypted by 𝒟4. We summarize these results in the following
theorem.
Theorem 5.4.9: Suppose that the range of oracle queries under IND-OLCPA is bounded
by polynomial g1. For any polynomial g ≥ 1, there exists an efficient GOPE scheme 𝒮3 such
that 𝐀𝐃𝐕𝒮3 ,𝒜IND −OLCPA ≤
1
𝑔(𝜆).
Proof. The proof is based on Lemmas 5.4.5 5.4.6 5.4.7 and 5.4.8.
96
5.5 Overview of OPE to Multi-user Systems
5.5.1 Settings
As presented in Chapter 3, we consider the system which consists of a single server DB
hosting a database and a set of users U = {Uj | j 1} accessing the data stored on DB. A set of
key agents in KA manage the key and mediate the communication between the users and the DB.
For convenience, we assume that there are only numerical data in DB. Data of other types
can be represented by numerical data easily. For each critical data item x, the DB maintains two
ciphertexts COPE(x) and CCE(x). COPE(x) is encrypted using a specialized OPE scheme with a
master key k. Note that the existing OPE scheme cannot be used directly to support multi-user
systems and we develop a general approach to adapt any existing OPE scheme into a
corresponding digit based OPE (DOPE) scheme. The cipher COPE(x) of a data item x is
encrypted using DOPE.
CCE(x) is encrypted using a classical encryption scheme (e.g. AES). The purpose of storing
CCE(x) is to support efficient transmission of responses. For each data item x, a different data key
dkx is used to generate CCE(x). A user with access privilege to data item x will be granted key dkx.
In real implementation, the data items with the same access privileges can be grouped together
into an access domain and only one key is needed for each access domain.
5.5.2 Adjusted Definition of OPE
Note that in base-b numbersystem,thedigitsareintheset{0,…,b−1}.Tofacilitatethe
construction of DOPE, we adjust the plaintext domain and ciphertext range. Let the plaintext
97
domain be {0,1}λ ={0,…,2
λ−1} (instead of [m] = {i |1≤i ≤m}) and the ciphertext range be
{0,1}μ ={0,…,2
μ−1} (instead of [n] = {j |1≤j ≤m}). The definition of OPE scheme is revised
accordingly as follows.
Definition 5.5.1 (Revised OPE Scheme): Let 𝒮λ,μ = (𝒦λ,μ
, λ,μ, 𝒟λ,μ
) be an encryption
scheme, where 𝒦 λ,μ: {0,1}*{0,1}* is the key generation algorithm, 𝒦 λ,μ
:
{0,1}λ{0,1}*{0,1}
μ is the encryption algorithm, and 𝒟 λ,μ
: {0,1}μ {0,1}*{0,1}
λ is the
decryption algorithm. We say that 𝒮λ,μ is an OPE scheme if λ,μ
satisfiesthe“order-preserving
property”:
x1 < x2 𝜆 ,𝜇 (𝑥1, 𝑘) < 𝜆 ,𝜇 (𝑥2, 𝑘)
for any x1,x2 {0,1}λ and key k.
Generally, the value of μ could impact the security of λ,μ. But μ must be bounded by a
polynomial of λ to keep the efficiency of 𝒮λ,μ.
5.5.3 Problem Specification, Adversary Model, and Security Requirement
We construct a new OPE approach for multi-user systems. The approach includes a new
OPE construction that is tightly coupled with a request communication protocol Q and a
response communication protocol P.
In the request protocol Q, a user Ui issues a request (query) q to the DB, where q may
contain some secret data that needs to be transmitted with q to the DB. For simplicity, assume
that there is only one secret data item in q and let x denote that data item. Protocol Q should
transfer q to DB while ensuring the correct and secure computation of COPE(x) and CCE(x) in the
98
request transmission process. (Note that Ui can encrypt x using dkx and obtain CCE(x). But since
Ui does not have the OPE master key k, it is not possible for Ui to compute COPE(x). Thus, a set
of key agents (KA) are introduced to perform the OPE encryption.)
In the response protocol P, the DB sends back the response r to the user. The response r
may include a set of encrypted data objects {CCE(y1), CCE(y2),…, CCE(yt)} and/or {COPE(y1),
COPE(y2), …, COPE(yt)} (the protocol decides whether to send CCE(yi) or COPE(yi) or both).
Protocol P should ensure the secure delivery of r to Ui and that Ui can decrypt the information in
r to obtain the query results y1, y2,…,yt.
Protocols Q and P have certain security requirements. The system entities, users, DB, and
key agents, may collude to acquire additional information. We unify the possible collusions and
construct a passive adversary 𝒜 who tries to gain extra information by compromising some
entities in the system. We assume that the key agents and DB are better protected than the users.
Therefore we assume that the adversary cannot compromise all key agents simultaneously. Thus,
we assume the adversary structure
AS = {𝑈𝒜 ∪ 𝐾𝐴𝒜, 𝑈𝒜 ∪ 𝐾𝐴𝒜 ∪ {𝐷𝐵} | 𝑈𝒜 ⊂ 𝑈, 𝐾𝐴𝒜 ⊂ 𝐾𝐴},
where 𝑈𝒜 is the set of compromised users and 𝐾𝐴𝒜 is the set of compromised key agents (note
that 𝑈𝒜 and 𝐾𝐴𝒜 could be empty). The system should ensure the security requirement
Pr[𝒜(View) = x] = ν(λ) is satisfied, where ν denotes a negligible function and View is the
instance event randomly selected from the event space of what the adversary 𝒜 can observe in
the system by compromising entities in AS. Let U(x) denote the set of users who can access the
critical data x. Assume that none of the users in U(x) are compromised by 𝒜 . The security
99
requirement can be interpreted as: for critical data x, if 𝒜 does not compromise the users in U(x),
then the probability for 𝒜 to retrieve x based on the information gathered from the compromised
entities is negligible.
5.5.4 Our Approach
We design a simple and effective response protocol P to deliver the responses very
efficiently. We simply include CCE(y1), CCE(y2),… CCE(yt) in r. The user should have access
rights to y1, y2,…,yt and, hence, should have the encryption keys 𝑑𝑘𝑦1 ,𝑑𝑘𝑦2 , … , 𝑑𝑘𝑦𝑡 to decrypt
the data items in r. Consider the security of the system against adversary 𝒜 (assume that 𝒜 has
not compromised the users in 𝑈𝑦1,…,𝑈𝑦𝑡
, where 𝑈𝑦𝑗 is the set of users who can access yj). Since
the protocol only transfers CCE(y1), CCE(y2), … CCE(yt), 𝒜 cannot get the encryption keys
𝑑𝑘𝑦1 ,𝑑𝑘𝑦2 , … , 𝑑𝑘𝑦𝑡 and cannot compromises y1, y2, …, yt. Note that the design of P is fully
discussed here and will not be discussed further.
The request communication protocol Q cannot avoid the OPE encryption and are more
complex. We design two protocols for Q: basic-DOPE and OE-DOPE, where OE-DOPE offers a
better security protection to the secret data x in request q during the communication process.
Basic-DOPE and OE-DOPE protocols are discussed in Sections 5.6 and 5.7, respectively. Both
basic-DOPE and OE-DOPE protocols can be used with any existing OPE scheme.
5.6 The Basic DOPE Protocol for Multi-user Systems
In the basic-DOPE protocol, we use p key agents, KA0,…,KAp−1. to encrypt confidential
data x into COPE(x). The critical data x is divided into p “digits”.Thei-th“digit”issenttoKAi to
100
be encrypted by the underlying OPE using a key ki,0≤u < p. The encrypted digits are sent to
DB and integrated into the ciphertext COPE(x).
The basic-DOPE and OE-DOPE protocols are coupled with the encryption algorithm
DOPE. In Subsection 5.6.1, we present the construction of the DOPE encryption algorithm. Then
we prove the correctness and analyze the security of the DOPE encryption scheme in
Subsections 5.6.2 and 5.6.3, respectively. The basic-DOPE protocol is introduced in
Subsection 5.6.4.
5.6.1 Construction of the DOPE Encryption Scheme
Figure 5.1. DOPE scheme 𝒮p
λ,μ(𝒦p
λ,μ, p
λ,μ, 𝓓p
λ,μ).
We construct the DOPE encryption scheme 𝒮pλ,μ
= (𝒦pλ,μ
, pλ,μ
, 𝒟pλ,μ
) based on OPE
scheme 𝒮 λ’,μ’, where λ = t∙λ’ and μ = t∙μ’, as follows. The key generation algorithm 𝒦 p
λ,μ
invokes 𝒦λ’,μ’ to generate the OPE key k including p subkeys kj, 0 j < p. The process of the
encryption algorithm pλ,μ
include: (1) representing the plaintext as p “digits”inbase2λ/p
number
system, (2) encrypting the p “digits”byλ’,μ’, and (3) integrate the encrypted p “digits”backtoa
𝓚pλ,μ
: Invoke 𝓚λ,μ to generate the set of keys k = {k0,…,kp1}.
𝓔pλ,μ
: Input: plaintext x, x {0,1}λ.
Output: ciphertext COPE(x), COPE(x) {0,1}μ.
Let λ’ = λ/p and μ’ = μ/p.
Express x in base 2λ’
number system, i.e., x = 0j<p xj · (2λ’
)j, 0 xj < 2
λ’.
COPE(x) = 0j<p λ’,μ’
(xj, kj) · (2λ’
)j.
𝓓pλ,μ
: Input: ciphertext COPE(x), COPE(x) {0,1}μ.
Output: plaintext x, x {0,1}λ.
Let λ’ = λ/t and μ’ = μ/t.
Express COPE(x) in base 2μ’
number system, i.e., COPE(x) = 0j<p yj · (2μ’
)j.
Compute xj = 𝓓λ’,μ’(yj, kj), 0 j < p and x = 0j<p xj · (2
λ’)j.
101
single value ciphertext in base 2μ’
number system. Accordingly the decryption algorithm 𝒟pλ,μ
uses the inverse process of pλ,μ
to decrypt the ciphertext. We describe the processes of 𝒮pλ,μ
in
Figure 5.1.
5.6.2 Construction of the DOPE Encryption Scheme
We analyze the correctness of 𝒮pλ,μ
in Proposition 5.6.1.
Proposition 5.6.1: 𝒮 pλ,μ
is correct, i.e. 𝓓pλ,μ
( pλ,μ
(x, k), k) = x and pλ,μ
is an OPE
algorithm.
Proof. First we prove that 𝓓pλ,μ(p
λ,μ(x, k), k) = x. Let x = 0j<p xj · (2λ’)j and pλ,μ(x, k) =
0j<p λ’,μ’(xj, kj) · (2
μ’)j. Then according the process shown in Figure 5.1,
𝓓pλ,μ(p
λ,μ(x, k), k) = 0j<p 𝓓λ’,μ’(yj,kj) · (2
λ’)j
= 0j<p 𝓓λ’,μ’(λ’,μ’(xj, kj), kj) · (2
λ’)j = 0j<p xj · (2λ’)j = x.
Now we prove that pλ,μ is an OPE algorithm. For x1 = 0j<p x1,j · (2
λ’)j and x2 = 0j<p x2,j ·
(2λ’)j, we consider three situations.
(i) x1 < x2. Then j0 s.t. x1,j = x2,j for j > j0 and x1,j < x2,j for j = j0. Therefore
λ’,μ’(x1,j, kj) = λ’,μ’(x2,j, kj) for j > j0 and λ’,μ’(x1,j, kj) < λ’,μ’(x2,j, kj) for j = j0.
Hence pλ,μ(x1, k) = 0j<p
λ’,μ’(x1,j, kj) · (2μ’)j < 0j<p
λ’,μ’(x2,j, kj) · (2μ’)j = p
λ,μ(x2, k).
(ii) x1 = x2. It can be proved that pλ,μ(x1, k) = p
λ,μ(x2, k) analogously to (i).
(iii) x1 > x2. It can be proved that pλ,μ(x1, k) > p
λ,μ(x2, k) analogously to (i).
According to (i) (ii) (iii), pλ,μ is an OPE algorithm.
102
5.6.3 Security of 𝓢𝓔pλ,μ
According to the construction in Figure 5.1, pλ,μ
(x, k) = 0j<p λ’,μ’
(xj, kj) · (2λ’
)j. The
security of pλ,μ
(x, k) can be reduced to the security of λ’,μ’(xj, kj) where λ’ = λ/p, μ’ = μ/p, and 0
j < p. According to [77], there exists OPE scheme 𝒮λ’,μ’ to achieve the one-wayness security
where (1) μ’ ≥3λ’ and (2) h (the number of plaintext ciphertext pairs known by the adversary)
are bounded by a polynomial of λ’. Hence the values of μ and p are critical to the security of
𝒮pλ,μ
. We set μ ≥3λ to satisfy (1), and set p = O(λc) to satisfy (2), where 0 < c < 1 is a constant.
The one-wayness security of 𝒮pλ,μ
is proved Theorem 5.6.2.
Theorem 5.6.2: Assume that there is an OPE scheme 𝒮λ’,μ’ = (𝒦λ’,μ’
, λ’,μ’, 𝓓λ’,μ’
) achieves
one-wayness security for μ’ ≥3λ’. Consider the DOPE scheme 𝒮pλ,μ
constructed based on 𝒮λ’,μ’
in Figure 5.1. Then 𝒮pλ,μ
also achieves the one-wayness security for μ ≥3λ and p = O(λc), 0 < c
< 1, even if the adversary knows a proper subset of keys in k. Specifically,
Pr[𝒜(pλ,μ
(x, k), KP, k’) = x] = ν(λ),
for μ ≥3λ, where KP = {(xi’, pλ,μ
(xi’, k)) | 1 i h}, and k’ ⊂k = {k0,…,kp1}.
Proof. We reduce what the adversary view in the plaintext domain {0,1}λ and ciphertext
range {0,1}μ to {0,1}λ’ and {0,1}μ’, where λ’ = λ/p and μ’ = μ/p. Suppose that
x = 0j<p xj · (2λ’)j.
Then pλ,μ(x, k) = 0j<p
λ’,μ’(xj, kj) · (2μ’)j. It implies that the adversary knows
λ’,μ’(xj, kj)
for 0 j < p. Suppose that
xi’ = 0j<p xi,j’ · (2λ’)j
103
for1≤i ≤h. Then pλ,μ(xi’, k) = 0j<p
λ’,μ’(xi,j’, kj) · (2μ’)j. It implies that the adversary knows
KPj = {(xi,j’, λ’,μ’(xi,j’, kj))|1≤i ≤h}
for0≤j < p. Since kj are independently generated for 0 j < p and Pr[𝒜(λ’,μ’(xj, kj), KPj, k’) = xj]
= 1 for kj k’,
Pr[𝒜(λ’,μ’(x, k), KP, k’) = xi] = 0 j < p Pr[𝒜(λ’,μ’(xj, kj), KPj, k’) = xj]
= Pr[𝒜(𝜆 ′ ,𝜇 ′
(𝑥𝑗 , 𝑘𝑗 ), KP𝑗 ) = 𝑥𝑗 ]𝑘𝑗∉𝐾′ .
Since μ ≥3λ, μ’ = μ/p ≥3(λ/p) = 3λ’. Since h is bounded by a polynomial of λ and p = O(λc),
0 < c < 1, h is also bounded by a polynomial of λ’ = λ/p. Therefore
Pr[𝒜(λ’,μ’(xj, kj), KPj) = xi] = ν(λ’)
for kj k’. Since p = O(λc) for some constant 0 < c < 1, a negligible function of λ’ = λ/p is also a
negligible function of λ. Hence Pr[𝒜(pλ,μ
(x, k), KP, k’) = x] = ν(λ).
5.6.4 The Basic DOPE Communication Protocol
Let KAi,0≤i ≤p−1 be the p key agents. Without loss of generality, we assume that p =
O(1), i.e. there are a constant number of key agents. Let 𝒮λ’,μ’ = (𝒦λ’,μ’
, λ’,μ’, 𝒟λ’,μ’
) be the
underlying OPE scheme, where μ ≥ 3λ, λ’ = λ/p and μ’ = μ/p. We assume that at the system
initialization time, some trusted party uses 𝒦λ,μ to generate k = (k0,…,kp1), and distribute ki to
KAi,0≤i ≤p−1.Thebasic-DOPE protocol realizes DOPE encryption scheme through KAi,0≤i
≤p−1anditspseudocodeispresentedinFigure5.2. Figure 5.3 shows the structure and message
flow of the basic-DOPE protocol.
104
The efficiency, correctness (i.e., DOPE encryption result is the same as the ciphertext
COPE(x)), and security proof of the basic-DOPE protocol are given in Theorem 5.6.3.
Figure 5.2. The pseudo code for the basic-DOPE protocol.
Figure 5.3. The structure and message flow of the basic-DOPE protocol.
Theorem 5.6.3: The basic-DOPE protocol is efficient and correct, and achieves the one-
wayness security against the adversary structure AS = { 𝑈𝒜 ∪ 𝐾𝐴𝒜 , 𝑈𝒜 ∪ 𝐾𝐴𝒜 ∪ {DB} |
𝑈𝒜 ⊂U, 𝐾𝐴𝒜 ⊂KA}.
Proof. Since λ’ = λ/p and μ’ = μ/p, the OPE algorithm λ’μ’ is efficient. Also, the processes
to express x in base 2λ’ and combines the encryptions to COPE(x) are efficient. Therefore, the basic-
DOPE protocol is efficient. The basic-DOPE protocol is correct because the DB receives COPE(x)
= 0j<p yj · (2μ’)j
= 0j<p λ’,μ’(xj, kj) · (2
μ’)j = p
λ,μ(x, k).
For security, we assume that the adversary 𝒜 compromises DB and 𝑈𝒜 ⊂U, 𝐾𝐴𝒜 ⊂KA.
Then the adversary knows some plaintext ciphertext pairs in the set KP, where the plaintexts are
User
Ui
KAj (λ’μ’
,kj) DB xj yj
…
…
KA0 (λ’μ’
,k0)
KAp-1 (λ’μ’
,kp−1)
yp−1
y1 x1
xp−1
Let plaintext x {0,1}λ, λ’ = λ/p, μ’ = μ/p, and p = O(1).
(1) The user Ui express x in base 2λ’
, i.e., x = 0j<p xj · (2λ’
)j, 0 xj < 2
λ’.
Ui sends xj to KAj,0≤j ≤p−1.
(2) For 0≤j ≤p−1, KAj computes yj = λ’,μ’(xj, kj) and sends yj to DB.
(3) DB combines COPE(x) = 0j<p yj · (2μ’
)j.
105
from the users in 𝑈𝒜 and the ciphertexts are from DB. Also, the adversary 𝒜 can retrieve the
keys in k’, where k’ = {kj | KAj 𝐾𝐴𝒜}. Now consider a user Ui 𝑈𝒜 sends pλ,μ
(x, k) to DB.
Then it is equivalent for the adversary to compromise pλ,μ
(x, k) given KP and k’. Since μ ≥3λ
and p = O(1), according to Theorem 5.6.2, Pr[𝒜(pλ,μ
(x, k), KP, k’) = x] = ν(λ). Hence, the basic-
DOPE protocol achieves the one-wayness security against the adversary structure AS.
5.7 The OE-DOPE Protocol for Multi-user Systems
5.7.1 Security Issue in the Basic DOPE Protocol
Consider the following two attacks against the basic-DOPE protocol.
(1) The adversary 𝒜 compromises the key agent KAu. Then 𝒜 canviewthe“digit”xu of the
plaintext x in the process of the basic DOPE protocol.
(2) The adversary 𝒜 compromises DB and the key agent KAu forsome0≤u < p. Then, for
any ciphertext COPE(x) = 0u<p λ’,μ’(xu, ku) · (2
μ’)u stored on DB, 𝒜 can use the key ku
retrieved from KAu to compute xu = 𝒟λ’,μ(λ’,μ’(xu, ku), ku).
In both situations, the adversary 𝒜 retrieves the partial information xu of x, which implies
that λ’ = λ/p bits of the critical data item are leaked. However, since we assume that 𝒜 cannot
compromise all key agents, 𝒜 cannot retrieve the whole plaintext x. That is why the basic-DOPE
protocol can achieve one-wayness security. But revealing λ/p bits of some critical data items may
be unacceptable in many applications. Hence, it is desirable to enhance the security of the
request communication protocol.
106
5.7.2 Oblivious Encryption
The attack in (2) is relatively easy to prevent. We substitute KAu by a chain of key agents
KAu,0,…,KAu,q−1. The key ku is also split into ku,0,…,ku,q−1 and distributed to KAu,0,…,KAu,q−1,
for all u. Critical data xu is encrypted through the chain KAu,0,…,KAu,q−1 by the OPE 𝜆0 ,𝜇0 ,…,
𝜆𝑞−1 ,𝜇𝑞−1 . The resulting ciphertexts, after encrypted by the chain of KAs, is order preserving
because the composition of OPEs is still an OPE. Now the adversary cannot retrieve xj from
COPE(x) unless it retrieves q keys ku,0,…,ku,q−1 by compromising the key agents KAu,0,…,KAu,q−1.
In principle, the attack in (1) can be prevented by secure computation, where the user has
the input xu and the key agent has the input ku. The user and the key agent can securely compute
the function λ’,μ’, λ’ = λ/p and μ’ = μ/p, by any two party computation protocol. However, existing
two party computation protocols have high overhead. Therefore we develop the technique of OE
(oblivious encryption) to enable the key agent to encrypt xu without knowing the actual value of
xu (i.e., the probability for the key agent to know xu is negligible). In OE, xu is further expressed
in the base 2λ’’ number system, where λ’’= λ’/t and t = λ’c, 0 < c < 1. Let xu,0,…,xu,t−1 be the t
“micro-digits”ofxu.Then, in the“micro-digit”domain{0,1}λ’’ of xu,v, the user sends a vector,
including xu,v and λ” – 1 random plaintexts to the key agent KAu,0≤v < t. KAu encrypts all of the
elements in the vector (KAu does not know which one is xu,v) and sends the encrypted vector to
the DB. At the same time, the user sends the location information lu,v of xu,v in the λ’’ random
plaintexts to the DB so that the DB can identify the encrypted xu,v and integrates them into
COPE(x). By further dividing the digits into t micro-digits, the probability for KAu to successfully
guess xu drops to 1/(λ”)t, which is a negligible function of λ.
107
5.7.3 Vector Permutation and Data Mutation
The protocol above has a new security issue. If both KAu,0 (the first key agent in the chain)
and the DB are compromised, then the location information (sent from the user to DB) can be
used to identify xu,v in the λ’’ random plaintexts (sent from the user to KAu,0). Consequently, xu =
0v<t xu,v · (2λ’’)v can be derived. To cope with this attack, the key agent KAu,j permutes the vector
(original or encrypted) by a permutation πu,v,j (randomly generated by the user) before sending
them to the next key agent KAu,j+1. Thus, l’u,v = πu,v,q−1 ○…○πu,v,0(lu,v) (instead of lu,v) will be the
location information the user sends to the DB. However, using permutations alone cannot
guarantee the security because the encryption 𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯○ 𝜆0 ,𝜇0 preserves the order. Thus 𝒜
can still correctly link the λ’’ plaintexts (retrieved from KAu,0) to the λ’’ ciphertexts (retrieved
from DB) according to their orders and, hence restore the above attack. To prevent the adversary
from using orders to establish the links, each key agent KAu,j, 1 ≤ j < q, will substitute half
elements in the vector (the set of the locations will be provided by the user) with new random
values to change the order of the ciphertext of xu,v in the vector. Consequently, the adversary
cannot use the location information and order information to indentify xu,v in the λ’’ random
plaintexts (sent from the user to KAu,0).
We now construct the OE-DOPE protocol. Let KA = {KAu,j |0≤u < p,0≤j < q}, which is
logically a KA grid of dimension p*q. We assume that there are a fixed number of key agents
and, hence, p, q = O(1). Let {0,1}λ be the plaintext domain, λ’ = λ/p, and λ” = λ’/t where t = (λ’)c
for some constant 0 < c <1.For0≤j < q, let 𝒮𝜆𝑗 ,𝜇 𝑗 = 𝒦𝜆𝑗 ,𝜇 𝑗 , 𝜆𝑗 ,𝜇 𝑗 , 𝒟𝜆𝑗 ,𝜇 𝑗 be the OPE schemes
satisfying λ0 = λ”, μj = λj+1 for0≤j < q−1, and μj = 3λj for0≤j < q. Therefore, 𝒮λ’’,μ’’ = (𝒦λ’’,μ’’,
108
λ’’,μ’’, 𝒟λ’’,μ’’) is also an OPE scheme, where 𝜆′′ ,𝜇′′ = 𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯ ○ 𝜆0 ,𝜇0 . Since μj = 3λj for0≤
j < q, we have μ’’ = 3q ∙ λ’’ > 3λ’’. Also since q is a constant, we have μ’’ = O(λ”) and, hence,
𝒮λ’’,μ’’ is an efficient OPE scheme. We assume that at the system initialization time, some trusted
party has used 𝒦𝜆𝑗 ,𝜇 𝑗 to generate the keys kj = (k0,j,…,kp1,j), and distributed ku,j to KAu,j,0≤u < p
and0≤j < q. The pseudo code of OE-DOPE protocol is shown in Figure 5.4. The structure and
message flow of the OE-DOPE protocol is illustrated in Figure 5.5.
Figure 5.4. The OE-DOPE protocol.
Let plaintext x {0,1}λ, λ’ = λ/p, and λ’’ = λ’/t where t = (λ’)
c for some constant 0 < c < 1.
The user Ui:
(1) Expresses x in base 2λ’
, i.e., x = 0u<p xu · (2λ’
)u, 0 xu < 2
λ’, further expresses xu
in base 2λ’’
, i.e., xu = 0v<t xu,v · (2λ’’
)v, 0 xu,v < 2
λ’’.
(2) For each xu,v, randomly selects λ’’ 1 distinct elements in {0,1}λ’’
. Let the elements
together with xu,v be 𝑤𝑢 ,𝑣,0 <…<𝑤𝑢 ,𝑣,𝑙𝑢 ,𝑣= xu,v <…<𝑤𝑢 ,𝑣,𝜆′′−1.
(3) Randomly generates the πu,v,j:{0,…,λ’’−1} {0,…,λ’’−1},0≤u < p, 0≤v < t,
and0≤j < q. (4) Randomly selects the set of locations Lu,v,j satisfying πu,v,j−1 ○…○πu,v,0(lu,v) Lu,v,j
{0,…,λ’’−1}, |Lu,v,j| = λ’’/2, 0≤u < p,0≤v < t,and1≤j < q.
(5) Sends Wu,v = (𝑤𝑢 ,𝑣,0, ..., 𝑤𝑢 ,𝑣,𝜆′′ −1) to KAu,0, , 0≤u < p and 0≤v < t
sends πu,v,0 to KAu,0 and (πu,v,j, Lu,j) to KAu,j, 0≤u < p,0≤v < t,and1≤j < q
sends l’u,v = πu,v,q−1○…○πu,v,0(lu,v) to DB,0≤u < p and0≤v < t.
The key agent KAu,v:
(1) Let W(0)
u,v = Wu,v.For0≤j < q−1, KAu,j encrypts every elements in W(j)
u,v by 𝜆𝑗 ,𝜇 𝑗
using the key ku,j except those at the locations in Lu,v,j, uses random values as the
encryption results for the elements at the locations in Lu,v,j, permutes the order of
the encryptions by πu,v,j, and sends the result W(j+1)
u,v to KAu,j+1.
(2) KAu,q−1 encrypts every elements in W(q-1)
u,v , permutes the order of the encryptions
by πu,v,q−1, and sends the result W(q)
u,v to DB.
The DB:
(1) Selects the l’u,v-th element in W(q)
u,v to compute COPE(x) = 0u<p (0v<t W(q)
u,v[l’u,v]
∙ 2μ’’)∙2
μ’.
109
As shown in Figure 5.4, the user Ui expresses the plaintext x in base 2λ’ number system and
furtherexpressesthe“digit”xu in base 2λ’’ numbersystem.Forthe“micro-digit”xu,v, the vector
Wu,v = (𝑤𝑢 ,𝑣,0, ..., 𝑤𝑢 ,𝑣,𝜆′′ −1) is created such that xu,v is in Wu,v at a random position lu,v. The user
also randomly generates the permutations πu,v,j and the set of location Lu,v,j, |Lu,v,j| = λ’’/2. (πu,v,j,
Lu,v,j) is sent to KAu,0≤u < p,0≤v < t,and0≤j < q, and l’u,v = πu,v,q−1 ○…○πu,v,0(lu,v) is sent to
DB,0≤u < p and0≤v < t. Then Wu,v is sent to KAu,0 so that the elements in Wu,v are encrypted,
subsituted, and permutated through KAu,0 to KAu,q−1. Finally, the DB identifies the encryptions of
xu,v according to l’u,v, and integrates them to get COPE(x). According to process, COPE(x) has the
following encryption structure: COPE(x) is the ciphertext encrypted by the DOPE scheme 𝒮pλ,μ.
𝒮pλ,μ is based on the underlying DOPE scheme 𝒮t
λ’,μ’. And 𝒮tλ’,μ’ is based on the underlying
OPE scheme 𝒮λ’’,μ’’, where 𝜆′′ ,𝜇′′ = 𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯ ○ 𝜆0 ,𝜇0 .
Figure 5.5. Message Flow of the OE-DOPE protocol.
We now prove the efficiency, correctness (i.e., the ciphertext COPE(x) preserves the order of
the plaintexts), and the security of the OE-DOPE protocol in the following theorem.
Ci KAu,0
(𝜆0 ,𝜇0 ,ku,0) DB
KA0,0
(𝜆0 ,𝜇0 ,k0,0)
KAp-1,0
(𝜆0 ,𝜇0 ,kp−1,0)
KAu,q−1
(𝜆𝑞−1 ,𝜇𝑞−1 ,ku,q−1)
KA0,q−1
(𝜆𝑞−1 ,𝜇𝑞−1 ,k0,q−1)
KAp-1,q−1
(𝜆𝑞−1 ,𝜇𝑞−1 ,kp-1,q−1)
W(1)
0,v
… W(0)
u,v
… W
(1)p−1,v
W(0)
0,v
W(1)
u,v
W(0)
p−1,v
W(q)
0,v
W(q)
p−1,v
W(q)
u,v
l’u,v
(πu,v,q−1, Lu,v,q−1) πu,v,0
…
110
Theorem 5.7.1: The OE-DOPE protocol is efficient and correct. Furthermore, consider the
adversary structure AS. Suppose that a user Ui 𝑈𝒜 sends x to DB. If the adversary 𝒜 does not
compromise all the key agents KAu,0,…,KAu,q−1, simultaneously, then the probability for 𝒜 to
retrievethe“digit”xu of x is negligible, where x = 0u<p xu · (2λ’)u.
Proof. The efficiency can be proved by a routine check. In the protocol, the user need to
create Wu,v including λ’’ elements in {0,1}λ’’,0≤u < p and0≤v < t. Since p is a constnat, λ’’ = λ’/t
= λ/(p∙t) and t = λ’c, 0 < c < 1, it is efficient for the user to create Wu,v. Also, it is efficient for the
user to create πu,v,j and Lu,v,j,0≤u < p,0≤v < t,and0≤j < q. Then the KAu,j need to encrypt the
elements in W(j)u,v by using 𝜆𝑗 ,𝜇 𝑗 . Note that λ0 = λ’’ = λ/(p∙t), μj = λj+1 for0≤j < q−1,andμj = 3λj
for 0 ≤ j < q. Since p and q are constants, λj = μj = O(λ). Therefore the encryption 𝜆𝑗 ,𝜇 𝑗 is
efficient. It is also efficient to perform the permutation πu,v,j and Lu,v,j on the encryptions. Finally,
it is efficient for the DB to identify the location of the encryptions of xu,v and integrate them to
get COPE(x).
According to the encryption structure, COPE(x) is the ciphertext encrypted by the basic
DOPE scheme 𝒮pλ,μ. 𝒮p
λ,μ is based on the underlying basic DOPE scheme 𝒮tλ’,μ’. And 𝒮t
λ’,μ’ is
based on the underlying OPE scheme 𝒮 λ’’,μ’’, where 𝜆′′ ,𝜇′′ = 𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯ ○ 𝜆0 ,𝜇0 . Hence
COPE(x) preserves order and the OE-DOPE protocol is correct.
For security, first note that if 𝜆𝑗 ,𝜇 𝑗 is a ROPF (random order-preservingfunction)for0≤j
< q, then the composition 𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯ ○ 𝜆0 ,𝜇0 is also a ROPF. Therefore the basic DOPE
scheme 𝒮tλ’,μ’
based on the underlying OPE scheme 𝒮λ’’,μ’’ where 𝜆′′ ,𝜇′′ = 𝜆𝑞−1 ,𝜇𝑞−1 ○ ⋯○
𝜆0 ,𝜇0 has one-wayness security according to Theorem 5.6.1. Hence, the basic DOPE scheme
111
𝒮pλ,μ
based on the underlying basic DOPE scheme 𝒮tλ’,μ’
also achieves one-wayness security
according to Theorem 5.6.1. Now suppose that the adversary 𝒜 compromises DB and retrieves
COPE(x). Then 𝒜 can derive λ’’,μ’’
(xu,v) = 𝜆𝑞−1 ,𝜇𝑞−1 (⋯𝜆0 ,𝜇0 (𝑥𝑢 ,𝑣 , 𝑘𝑢 ,0) ⋯ , 𝑘𝑢 ,𝑞−1) from
COPE(x), 0≤u < p and0≤v < t. In order to retrieve xu, 𝒜 needs to retireve all xu,v for0≤v < t.
Since 𝜆𝑗 ,𝜇 𝑗 has one-wayness security and 𝒜 does not compromises all KAu,j for 0 ≤ j < q, it
implies that the probability for 𝒜 to derive xu is negligible. Additionally, since the adversary 𝒜
does not compromise the key agents KAu,0,…,KAu,q−1 simultaneously, 𝒜 cannot retrieve all
πu,v,j for 0≤j < q, or all ku,j for 0≤j < q. Furthermore, the order information in W(q)
u,v cannot be
used to link them to the plaintexts in Wu,v. Thus, 𝒜 cannot identify xu,v in the vector Wu,v even if
it compromises KAu,0 and DB. But if the adversary compromises the key agent KAu,j,1≤j < q, it
can narrow down xu,v from λ’’ elements to λ’’/2 element based on Lu,v,j. Hence the probability for
the adversary to retrieve xu,v in Wu,v is at most 2q−1
/λ’’ (note that the adversary can compromise at
most q−1keyagentsinachain).Consequently,theprobabilityfortheadversarytoretrievexu =
0v<t xu,v · (2λ’’
)v is 2
q−1/λ’’
t. Since p = q = O(1), t = λ’
c = (λ/p)
c for some constant 0 < c < 1. It
implies that 2q−1
/λ’’t is a negligible function of λ. Hence, the probability for 𝒜 to retrieve xu of x
is negligible.
Note that if the adversary 𝒜 compromises less than q key agents, then it implies that 𝒜
does not compromise the key agents KAu,0, …, KAu,q−1 simultaneously for any 0 ≤ u < p.
According to Theorem 5.7.1, the probability for 𝒜 to retrieveany“digit”xu of x is negligible.
We summarize the conclusion in the following corollary.
112
Corollary 5.7.2: Consider the adversary structure AS. Suppose that a user Ui 𝑈𝒜 sends x
to DB. If the adversary 𝒜 compromises less that q key agents, then the probability for 𝒜 to
retrieveevery“digit”ofx is negligible.
5.8 Performance Study of OPE and Protocols for Multi-user Systems
We study the performance of the protocols basic-DOPE and OE-DOPE using different
underlying OPE schemes. To the best of our knowledge, five OPE schemes have been proposed
in the literatures [1, 6, 12, 39, 53]. None of them, except for the OPE algorithm proposed in [12],
have cryptographic security proofs. The OPE algorithm proposed in [6] can only be used in a
static system where no new data can be inserted to the database. The algorithm given in [39] is
not a full solution because it cannot compare all the plaintexts. The OPE algorithm developed
in [1] needs to process the whole database to model the data distribution. Thus, we only consider
the OPE algorithms proposed in [12] and [53] in our experimental study.
The performance of the OPE schemes has never been analyzed in the literature. Thus, we
first study the performance of the Hyper and Poly OPE schemes. We randomly generate a
polynomial with degree 10 for the Poly scheme. The domain of the plaintext is {0,1}λ and we
choose λ = {8, 16, 32, 64, 96, 128, 256, 512, 1024} and c = 0.5. The ciphertext range is {0,1}μ
and we consider μ = 3λ for Hyper OPE scheme and μ = 10λ for Poly OPE scheme. The
experiments are run on a 2.50GHz Intel Core 2 Duo Processor. Table 5.1 shows the execution
time in milliseconds for Hyper and Poly to encrypt a single critical data item of λ bits.
As can be seen, Hyper OPE scheme is far more expensive than Poly OPE scheme. In
Hyper OPE scheme, the process for realizing the hypergeometric random variable is very time
113
consuming. In Poly OPE scheme, it evaluates the randomly selected polynomial with the
plaintext as input, which is much less time consuming than Hyer OPE. But Hyper OPE scheme
can be proved to achieve one-wayness security, while there is no security proof for Poly OPE
scheme.
Table 5.1. Performance of Hyper and Poly OPE schemes.
λ Hyper OPE Poly OPE
8 20.37022 0.0003
16 4965.81013 0.0007
24 520073.44982 0.0008
32 0.0010
64 0.0027
128 0.0077
256 0.0261
512 0.0929
1024 0.3710
Now we compare the performance of the basic-DOPE and the OE-DOPE protocols
integrated with Hyper and Poly OPE schemes. In the two request communication protocols, the
request is sent from the user to the KA and then to the DB. To factor in the communication
latencies between the system entities, we allocate the user, the key agents and the DB to different
PlanetLab [57] computers and measure the communication latencies between them. The user is in
Dallas and the DB is in Los Angeles. The basic-DOPE protocol requires p key agents and we
choose p = 4 (make λ divisible by p). The four key agents are allocated to Phoenix (Arizona), Salt
Lake City (Utah), Carson City (Nevada), and Eugene (Oregon). The OE-DOPE protocol requires
p*q key agents. We use the same p value and consider q = 2. The 4 additional key agents (out of
8) are allocated in the same city as the other key agents in the same chain. The request message
114
without the critical data is of size 170 bytes (based on the average of the sizes of some common
queries). The critical data size is λ bits.
For comparison purpose, we also consider the “No Encryption” (NE) request
communication protocol, the Hyper request communication protocol, and the Poly request
communication protocol. In NE, the user directly sends the query (with the critical data in
plaintext) to DB. In Hyper/Poly, the user knows the master OPE key and encrypts the
confidential data using the Hyper/Poly OPE scheme with the master key and send the ciphertext
directly to DB. Table 5.2 shows the performance comparisons (in milliseconds) of the NE,
Hyper, basic-DOPE and OE-DOPE protocols using the Hyper OPE scheme. Table 5.3 shows the
performance comparisons (in milliseconds) of the NE, Poly, and the basic-DOPE and OE-DOPE
protocols using the Poly OPE scheme.
Table 5.2. Comparisons of the basic-DOPE protocol and OE-DOPE protocol with Hyper OPE
scheme.
λ NE Hyper basic-DOPE
+ Hyper
OE-DOPE +
Hyper
8 85.87 106.24 506.06 7718.90
16 85.92 5051.73 1525.28 317766.88
32 86.03 20537.11 9.19E+07
64 86.23 4965977.56
96 86.44 5.2E+08
As shown in Tables 5.2 and 5.3, the OE-DOPE protocol is more expensive than the basic-
DOPE protocol. This is because: (1) There are extra random data to be transmitted and encrypted
in the OE-DOPE protocol to facilitate oblivious encryption. (2) After encryption by one key
agent, the ciphertext grows. The longer the chain, the larger the size of the ciphertext becomes
(For example, Poly OPE scheme takes 0.0028 milliseconds to encrypt 64 bits plaintext. But after
115
encryption, the size of the plaintext becomes 640 bits. It then takes 0.16 milliseconds to encrypt
the cipher of 640 bits and will generate a new cipher of 6400 bits.) But the OE-DOPE protocol
achieves a higher security level than the basic-DOPE protocol. Compare with the baseline NE
protocol, the basic-DOPE protocol using Hyper OPE scheme is over 200 times slower for λ = 32,
and the basic-DOPE protocol using Poly OPE scheme is at most 2 times slower for any λ,8≤ λ ≤
1024. The Poly protocol has a similar performance to that of the NE protocol since the
encryptiontimeofPolyOPEisverysmallfor8≤ λ ≤1024.
Table 5.3. Comparisons of the basic-DOPE protocol and OE-DOPE protocol with Poly OPE
scheme.
λ NE Poly basic-DOPE +
Poly
OE-DOPE +
Poly
8 85.87 85.87 166.62 194.35
16 85.92 85.92 167.03 200.88
32 86.03 86.03 167.83 214.18
64 86.23 86.23 169.32 239.34
96 86.44 86.44 170.73 262.81
128 86.64 86.64 172.05 285.09
256 87.43 87.45 176.79 366.91
512 88.92 89.01 184.65 513.02
1024 91.62 91.99 197.23 786.72
Note that in OE-DOPE, the key agents are logically deployed in p rows and q columns. We
study the influence of q on the performance of the OE-DOPE protocol, We set p = 4, and vary q
from 2 to 4. The p∙q key agents are physically allocated to Phoenix (Arizona), Salt Lake City
(Utah), Carson City (Nevada), and Eugene (Oregon). Key agents in the same row are allocated in
the same city. The user is still in Dallas and the DB is still in Los Angeles. The request message
without the critical data is of size 170 bytes and the critical data is of size λ bits (we vary λ in the
experiments). The performance results (in milliseconds) of the OE-DOPE protocol using Hyper
116
OPE scheme (denoted by OHyper) and Poly OPE scheme (denoted by OPoly) are given in Table
5.4.
Table 5.4. Performances of the OE-DOPE protocol using Hyper OPE scheme/Poly OPE scheme
for different q.
λ OHyper
q=2
OHyper
q=3
OPoly
q=2
OPoly
q=3
OPoly
q=4
8 7718.9 48776.23 194.35 238.30 402.90
16 317766.88 200.88 266.33 516.80
32 9.19E+07 214.18 316.26 720.34
64 239.34 404.95 1183.25
96 262.81 488.20 1960.84
128 285.09 580.90 3091.75
256 366.91 1053.45 11920.64
512 513.02 1849.54 57943.90
1024 786.72 5411.71 313460.75
As can be seen, the execution time for the OE-DOPE protocol with the Hyper OPE or Poly
OPE scheme increases with increasing q. The impact of q is more significant for larger λ. With q
= 3 and for a 32-bit critical data, the OE-DOPE protocol takes 0.7 seconds, which is an
acceptable performance.
5.9 Summary
In this chapter, we first studied the security of the OPE schemes under the known plaintext
attack model, where the adversary knows a set of h random plaintext ciphertext pairs, and then is
given a ciphertext (called challenge) to compromise. An encryption scheme is said to achieve the
one-wayness security if the probability for any PPT adversary to fully recover the challenge is
negligible. We show that for the ideal OPE object achieves the one-wayness security, i.e.
although the adversary may retrieve some information about the challenge, the probability for the
117
adversary to fully recover the challenge is a negligible function of λ = logm if the number h of
known plaintexts/ciphertext pairs satisfies h = o(mε), 0 < ε < 1, and n ≥m
3. In the security proof
(in the appendices), we analyze the expected number of bits zh of the plaintext remaining secret
from the adversary against known plaintext attacks. zh can be formulated by the average min-
entropy [22, 25]. First, we derived an upper bound on zh for any OPE scheme against a known
plaintext attack. Then, we derive a lower bound on zh for the ideal OPE object. These two
inequalities bound the security that the ideal OPE object can achieve, and indicates the one-
wayness security of the ideal OPE object.
Then, we develop two novel protocols to extend existing OPE schemes for multi-user data-
centric systems. Users can encrypt their secret data using our OPE protocols without knowing
the OPE encryption key. Also, we develop a simple and effective response protocol to allow
efficient delivery of secret data in the response to the user. Our protocols are general and can be
used with any OPE scheme. We have proved their correctness and security. We have also studied
their performance and the results show that the protocols have a fairly reasonable overhead when
the underlying OPE scheme is relatively efficient.
118
CHAPTER 6
PREFIX-PRESERVING ENCRYPTION
Prefix-preserving encryption (PPE) scheme is a deterministic symmetric-key encryption scheme.
The ciphertexts of a PPE preserve the prefix of the plaintexts, i.e., the longest common prefix of
any two ciphertexts is of the same length as the longest common prefix of the corresponding
plaintexts. Such prefix preserving property enables PPE to support prefix based computations,
such as computation on anonymized IP addresses [78], prefix-matching search [4], and range
search [48].
The security of PPE is weakened since some prefix information of plaintexts is leaked
from ciphertexts. But existing works do not offer sufficient security analysis of the PPE schemes:
either they prove the security against the author-defined attacks, or they illustrate the security
based on experiments. Morever, the security proofs in [4, 78] are incomplete because they prove
that the real PPE schemes are computationally indistinguishable from the ideal PPE object (a
special PPE) whose security is unknown. If the security of the ideal object is unacceptable, then
the proof of indistinguishability between the real scheme and the ideal object is not very
indicative in security assurance.
In this chapter we first develop a novel mechanism to analyze the security of PPE. We
follow the same approach as that in [12] to seek a necessarily and sufficiently weakened security
notion to qualify the security of the ideal PPE object defined in Section 6.1. First, we prove that
no PPE scheme is secure under IND-CPA by designing a DLLCP attack (let the adversary query
119
two plaintext pairs with different lengths of longest common prefix strings) in Section 6.2. Then
we weaken the security notion from IND-CPA to IND-PCPA (indistinguishability under the
prefixed chosen-plaintext attack) and prove that (1) such weakened security notion is necessary
(otherwise the DLLCP attack will be successful), and (2) the ideal PPE object is secure under
IND-PCPA in Section 6.2. From (1) and (2), we conclude that the security notion IND-PCPA
exactly qualifies the security of the idea PPE object.
In Section 6.3, we develop a novel distributed PPE algorithm based on the PPE algorithm
constructed in [78], and extend PPE to multi-user systems based on the distributed PPE
algorithm by using multiple key agents. Consider a PPE system consisting of a server DB
hosting data encrypted by PPE using a master encryption key k. Assume that a user sends a query
to DB which contains a confidential data x. In our PPE protocol, k is secret shared and
distributed to the group of key agents. The user secret shares its confidential data x and passes
the shares to the key agents. The key agents then “distributedly” encrypt the data shares into
cipher shares, which in turn, are assembled into the ciphertext by the DB. We formally prove the
security of our protocol by defining an ideal model for PPE protocols and showing that our PPE
protocol is computationally indistinguishable from the ideal model. We also conduct experiments
to study the performance of the protocol, showing that it has a reasonable overhead.
Experimental studies and performance results for our multi-user PPE protocol is presented in
Section 6.4. Finally, we summarize this chapter in Section 6.5.
120
6.1 Ideal PPE Object
The ideal PPE object is a special PPE such that the encryption function is uniformly
randomly selected from all the prefix-preserving functions defined as follows.
Definition 6.1.1 (Ideal PPE Object): We say that 𝒮* = (𝒦*
, *, 𝒟*
) is the ideal PPE object
if
- 𝒦 * uniformly randomly selects f 𝐹
{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 ≜ {g: {0,1}
l{0,1}
l | |LCP(x1,x2)| =
|LCP(g(x1),g(x2))|, x1,x2{0,1}l};
- * encrypts x to f(x);
- 𝒟* decrypts y to f
−1(y).
In Lemma 6.1.1, we (1) prove that the prefix-preserving functions are invertible and,
hence, the ciphertexts of the ideal PPE can be decrypted, and (2) compute the cardinality of
𝐹{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 which will be used to prove the equivalence of the prefix-preserving function and the
tree-based function in Proposition 8.2.1.
Lemma 6.1.1: f is a bijection for any f 𝐹{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 and |𝐹
{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 | = 22𝑙−1.
Proof. For f 𝐹{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 , since the domain and range of f have the same (finite)
cardinality, it suffices to prove that f is injective. Assume that f(x1) = f(x2). Then |LCP(x1,x2)| =
|LCP(f(x1),f(x2))| = l. Hence x1 = x2.
Let N(l) denote the number of prefix-preserving functions with domain and range {0,1}l.
For l = 1 there are two prefix-preserving functions, which are f(0) = 0 and f(1) = 1; f(0) = 1 and
f(1) = 0. Thus, N(1) = 2.
121
Let f1 and f2 denote any two prefix-preserving functions with domain and range {0,1}l−1
.
Then it can be used to construct the prefix-preserving function f and g with domain and range
{0,1}l. For x{0,1}
l, let x = x1…xl where xi{0,1},1≤i ≤l. We define
𝑓 𝑥1 … 𝑥𝑙 ≜ 0𝑓1 𝑥2 … 𝑥𝑙 if 𝑥1 = 0
1𝑓2 𝑥2 … 𝑥𝑙 if 𝑥1 = 1 and 𝑔 𝑥1 … 𝑥𝑙 ≜
1𝑓1 𝑥2 … 𝑥𝑙 if 𝑥1 = 0
0𝑓2 𝑥2 … 𝑥𝑙 if 𝑥1 = 1
It can be verified that f and g are different prefix-preserving functions and any prefix-preserving
functions with domain and range {0,1}l−1
must agree with the form of f or g. Hence, N(l) =
2N(l−1)2. We can derive N(l) = 22𝑙−1 by solving the close form of N(l) from the established
equations N(l) = 2N(l-1)2 and N(1) = 2.
Now we present the formal definition of the ideal PPE object in Definition 6.1.2.
Definition 6.1.2 (Ideal PPE Object): We say that 𝒮* = (𝒦*
, *, 𝒟*
) is the ideal PPE object
if
- 𝒦* uniformly randomly selects f 𝐹
{0,1}𝑙 ,{0,1}𝑙𝑃𝑃𝐸 ;
- * encrypts x to f(x);
- 𝒟* decrypts y to f
−1(y).
Remark 6.1.1: The ideal PPE object is computationally infeasible since it involves
choosing f uniformly randomly from the set 𝐹{0,1}𝜆 ,{0,1}𝜆𝑃𝑃𝐸 , which is on the order of 22𝜆−1. In [4],
the authors construct a real PPE scheme 𝒮 = (𝒦, , 𝒟), and prove that the real PPE scheme is
computationally indistinguishable from the ideal PPE object.
122
6.2 Security of PPE
Existing cryptographic security proofs for PPE schemes only reduce the security of real
PPE schemes to the security of the ideal PPE object by showing that they are computationally
indistinguishable. However it is not a complete security proof since the security of the ideal PPE
object is unknown and there has been no security analysis in the literature to show its security
level. In this section, we complete the existing security proof by developing a security notion
IND-PCPA and showing that it exactly qualifies the ideal PPE object.
IND-CPA is a well established security notion in cryptography. However, PPE schemes
require the ciphertexts to preserve the prefix of the plaintexts and cannot be qualified by IND-
CPA. Consider the following DLLCP (differentiated length of longest common prefix) attack
against the PPE scheme 𝒮 = (𝒦, , 𝒟) with respect to IND-CPA in Figure 6.1.
Figure 6.1. The DLLCP attack.
In the DLLCP attack (shown in Figure 6.1), the adversary 𝒜 queries (𝑥10, 𝑥1
1) and (𝑥20, 𝑥2
1),
where LCP(𝑥10, 𝑥2
0) LCP(𝑥11, 𝑥2
1). If b = 0, 𝑥10 and 𝑥2
0 will be encrypted; if b = 1, 𝑥11 and 𝑥2
1 will
be encrypted. Since PPE preserves prefix, the adversary can distinguish whether the plaintexts
are 𝑥10 and 𝑥2
0 or 𝑥11 and 𝑥2
1 by comparing LCP(y1,y2) with LCP(𝑥10, 𝑥2
0) and LCP(𝑥11, 𝑥2
1), where
y1 and y2 are the returned ciphertexts of the encryption oracle. If LCP(y1, y2) = LCP(𝑥10, 𝑥2
0), then
(1) In the experiment 𝐄𝐗𝐏𝒮,𝒜IND −CPA −𝑏 , 𝒜 chooses the set of plaintext pairs
{(𝑥10, 𝑥1
1), (𝑥20, 𝑥2
1) | LCP(𝑥10, 𝑥2
0) LCP(𝑥11, 𝑥2
1)}
and sends it to ℒℛ;
(2) ℒℛ computes the set of ciphertexts {(𝑥𝑖𝑏 , 𝑘)}1≤i≤2, and sends it back to 𝒜;
(3) Finally 𝒜 outputs 𝑏′ = 0 if 𝐿𝐶𝑃 𝑥1
0 , 𝑥20 = 𝐿𝐶𝑃( 𝑥1
𝑏 , 𝑘 , 𝑥2𝑏 , 𝑘 )
1 otherwise .
123
the plaintexts are 𝑥10 and 𝑥2
0 and, hence, b = 0. If LCP(y1, y2) = LCP(𝑥11, 𝑥2
1), then the plaintexts
are 𝑥11 and 𝑥2
1 and, hence b = 1. Thus, the advantage of the adversary 𝒜 is 1. We summarize the
conclusion in the following lemma.
Lemma 6.2.1: PPE is not secure under IND-CPA.
In [12], it has been shown that an OPE scheme does not satisfy IND-CPA and a weakened
security notion IND-OCPA has been defined to qualify the security of OPE schemes (though no
OPE schemes satisfy IND-OCPA either and no security notion has been found to properly
qualify OPE yet). Inspired by this approach, we define a weaken security notion IND-PCPA to
qualify the security of PPE schemes. According to the DLLCP attack, the adversary should only
be allowed to query the plaintext pairs in the set
PPPh ≜ { (𝑥𝑖0, 𝑥𝑖
1) {0,1}l {0,1}
l,1≤i ≤h | |LCP(𝑥𝑢
0, 𝑥𝑣0)| = |LCP(𝑥𝑢
1 , 𝑥𝑣1)|,1≤u, v ≤h }
Accordingly, we define the security notion IND-PCPA (indistinguishability under prefixed
chosen-plaintext attack) in Definition 6.2.1.
Definition 6.2.1 (IND-PCPA): IND-PCPA has the same definition as that of IND-CPA
except that the adversary is only allowed to query the prefixed plaintext pairs in the set PPPh.
It is obvious that IND-PCPA is the necessarily weakened security notion (with respect to
indistinguishability and left-or-right encryption oracle) for PPE. We show that it is also the
sufficiently weakened security notion for PPE by proving in Theorem 6.2.2 that the ideal PPE
object is secure under IND-PCPA, where the proof of the theorem is relegated to the appendices.
124
Therefore, the real PPE schemes computationally indistinguishable from the ideal PPE object are
also secure under IND-PCPA.
Theorem 6.2.2: The ideal PPE object 𝒮* is secure under IND-PCPA.
6.3 PPE for Multi-user Systems
In this section we develop a security-enhanced protocol to support PPE in multi-user
systems. The multi-user system we consider consists of a single server hosting a database and a
set of users. Let DB denote the server and 𝑈 = {𝑈𝑗 ≥ 1} denote the set of users. The system
operations consist of a request protocol Q, in which a user 𝑈𝑖 issues a request (query) 𝑞 to 𝐷𝐵,
where 𝑞 may contain some secret data x that needs to be transmitted with 𝑞 to DB, and a
response protocol 𝑃, in which the DB sends back the response r to the user, where r may include
a returned data object y in encrypted form. Note that a request or a response may include
multiple data objects, but the processing will be the same. For simplicity, we assume that there is
only one secret data item in 𝑞 or r.
The PPE protocol should guarantee functionality requirements including: (1) When q
reaches DB, x should have been encrypted by the PPE using the key k; (2) When r is returned to
the user, the user should be able to obtain the plaintext y of r. The protocol should also satisfy
some security requirements, such as no entity in the system should have the knowledge of the
key k and x should be protected against all system entities but the owner. To avoid informal
security descriptions and facilitate formal security proofs, we define the security requirements
via an ideal model, in which (1) encryption and decryption are performed by a trusted party TP
who holds the key k (key agents replaced by TP); (2) the communication channels between TP
125
and users/DB are secure. The ideal model implies the highest security level that the real PPE
protocol (including q and P) can achieve and we will prove that therealprotocolis“equivalent”
to the ideal model.
The system entities, users, DB, and key agents, may collude to acquire additional
information. We unify the possible collusions and construct a passive adversary 𝒜 who tries to
gain extra information by compromising some entities in the system. We assume that the key
agents and DB are better protected than the users, and the adversary can compromise less than t
key agents (𝑡 ≤𝑚
2+ 1) simultaneously. Thus, the adversary structure is defined as
𝑍 = 𝑈𝒜 ∪ 𝐾𝐴𝒜 , 𝑈𝒜 ∪ 𝐾𝐴𝒜 ∪ 𝐷𝐵 𝑈𝒜 ∪ 𝑈, |𝐾𝐴𝑈𝒜| < 𝑡 ≤
𝑚
2 + 1},
where 𝑈𝒜 is the set of compromised users and 𝐾𝐴𝒜 is the set of compromised key agents (note
that 𝑈𝒜and 𝐾𝐴𝒜 could be empty).
In Subsection 6.3.1, we introduce the general system design. Then, we discuss the response
and request protocols in the following two subsections. The proof that shows our PPE protocol
achieves the functionality requirements is given in Subsection 6.3.2. In Subsection 6.3.3 we
formally define the security requirement and prove that our PPE protocol achieves the
requirement.
6.3.1 General System Design
For convenience, we assume that there are only numerical data in DB. Data of other types
can be represented by numerical data easily. For each critical data item x, the DB maintains the
ciphertexts 𝐶𝑃𝑃𝐸 (𝑥), 𝐶𝐶𝐸(𝑥) . 𝐶𝑃𝑃𝐸 𝑥 is encrypted using a PPE scheme with a master key k.
126
𝐶𝐶𝐸(𝑥) is encrypted using a classical encryption scheme (e.g. AES). The purpose of storing
𝐶𝐶𝐸(𝑥) is to support efficient transmission of responses. For each data item x, a different data key
𝑑𝑘𝑥 is used to generate 𝐶𝐶𝐸(𝑥). A user with access privilege to data item x will be granted key
𝑑𝑘𝑥 . In real implementation, the data items with the same access privileges can be grouped
together into an access domain and only one key is needed for each access domain. For example,
if data items x and y can be accessed by exactly the same set of users, then x and y can be in the
same access domain, i.e., we can have 𝑑𝑘𝑥 = 𝑑𝑘𝑦 .
Request protocol Q should transfer q to DB while ensuring the correct and secure
encryption of 𝐶𝑃𝑃𝐸 𝑥 and 𝐶𝐶𝐸(𝑥) in the request transmission process. (Note that 𝑈𝑖 can encrypt
x using 𝑑𝑘𝑥 and obtain 𝐶𝐶𝐸(𝑥). But since 𝑈𝑖 does not have the PPE master key k, it is not
possible for 𝑈𝑖 to compute 𝐶𝑃𝑃𝐸 𝑥 . Thus, a set of key agents (𝐾𝐴) are introduced to perform
PPE encryption in the request protocol. Let 𝐾𝐴 = {𝐾𝐴𝑗 |1 ≤ 𝑗 ≤ 𝑚} denote the set of key
agents. The key k is shared among the key agents such that no single entity in the system knows
the master encryption key k. The user shares x and sends the shares to the key agents in KA. The
key agents distributedly encrypt the shares of x with the shares of k and send the encrypted
shares to 𝐷𝐵. 𝐷𝐵 reconstructs the shares and get the ciphertexts 𝐶𝑃𝑃𝐸 𝑥 . The ``distributed"
encryption process is similar to the decryption process in the threshold public-key crypto system
[20, 21, 29, 30, 56, 63]. For the response protocol P, we use a simple but innovative design to
achieve efficiency without going through the key agents.
127
6.3.2 Response Protocol
In the response protocol P, it simply include 𝐶𝐶𝐸 𝑦1 , 𝐶𝐶𝐸 𝑦2 , ⋯ , 𝐶𝐶𝐸 𝑦𝑡 in r. The user
should have access rights to 𝑦1, 𝑦2, ⋯ , 𝑦𝑡 and, hence, should have the encryption keys to decrypt
the data items in r. Consider the security of the system against adversary 𝒜 (assume that 𝒜 has
not compromised the users in 𝑈𝑦1, ⋯ , 𝑈𝑦𝑡
, where 𝑈𝑦𝑗 is the set of users who can access 𝑦𝑗 . Since
the protocol only transfers 𝐶𝐶𝐸 𝑦1 , 𝐶𝐶𝐸 𝑦2 , ⋯ , 𝐶𝐶𝐸 𝑦𝑡 , 𝒜 cannot get the encryption keys and
cannot compromises 𝑦1, 𝑦2, ⋯ , 𝑦𝑡 . Note that the design of P is fully discussed here and will not
be discussed further.
6.3.3 Request Protocol
We design the request protocol Q which consists of the distributed PPE protocol 𝑃𝑑 and
the reduction algorithm RA. In 𝑃𝑑, the key agents “distributedly” evaluated PPE 𝑑 and the DB
assembles the result shares into the intermediate ciphertext z. In RA, z is reduced (in size) to the
single-bit ciphertext y based on a mapping function f. In the following subsections, we introduce
the primitives used in the protocol and discuss the details of 𝑃𝑑 and RA.
Primitives
Here we introduce the primitives used for constructing 𝑃𝑑, including the secret sharing
algorithm Π and reconstructing algorithm Re over Zp where p is a prime number, another secret
sharing algorithm Π′ and reconstructing algorithm Re’ over a multiplicative group G satisfying
that the decisional Diffie-Hellman (DDH) problem is hard over G, the hash function H mapping
strings to Zp, and the hash function H’ mapping strings to G.
128
Let G be a cyclic group where |𝐺| = 𝑝 and p is a prime number, and 𝑔 ∈ 𝐺 be a generator.
Without loss of generality, let G be a multiplicative group with the identity 1. We assume that
the decisional Diffie-Hellman (DDH) problem is hard over G, i.e. 𝑔, 𝑔𝑢 , 𝑔𝑣 , 𝑔𝑢𝑣 and
𝑔, 𝑔𝑢 , 𝑔𝑣 , 𝑔𝑤 are computationally indistinguishable for randomly selected 𝑢, 𝑣, 𝑤 from Zp. Let
𝐻 ∶ {𝑁𝑢𝑙𝑙} ∪ {0,1}∗ → 𝑍𝑝 be a cryptographic hash function, where Null denotes the empty
string. We assume that H is a random oracle, and then 𝑅 𝑥, 𝑘 ≜ 𝑔𝐻 𝑥 𝑘 is a pseudorandom
function according to [52]. Let 𝐻′ : {0,1}∗ → 𝐺 be a cryptographic hash function. Since a
cryptographic hash function should be collision-free, we assume that 𝐻′ 0 ≠ 𝐻′ 1 .
Let Π and Re be the sharing and reconstructing algorithms of the (𝑡, 𝑚) threshold secret
sharing scheme over Zp [63]. For secret 𝑥 ∈ 𝑍𝑝 ,
Π 𝑥 = 𝑠1, ⋯ , 𝑠𝑚 ∈ 𝑍𝑝𝑚 ,
where 𝑠𝑖 = 𝑓 𝑖 and f is a randomly selected polynomial over 𝑍𝑝 with degree 𝑡 − 1 satisfying
𝑓(0) = 𝑥. The reconstructing algorithm
𝑅𝑒 𝑠𝑖1, ⋯ , 𝑠𝑖𝑛 = 𝑥
for any n (𝑛 ≥ 𝑡 ) shares 𝑠𝑖1, ⋯ , 𝑠𝑖𝑛 ∈ {𝑠1, ⋯ , 𝑠𝑚 } by using the Lagrange's interpolation
formula. The (𝑡, 𝑚) threshold secret sharing scheme can be extended to the group G [21] [52].
Let Π′ and Re’ be the sharing and reconstructing algorithms of the (𝑡, 𝑚) threshold secret sharing
algorithm over G. For any secret 𝑥′ ∈ 𝐺
Π′ 𝑥′ = 𝑠1′ , ⋯ , 𝑠𝑚
′ ∈ 𝐺𝑚 ,
where 𝑠𝑖′ = 𝑥′ ∙ 𝑔𝑓 ′ (𝑖) and 𝑓′ is a randomly selected polynomial over 𝑍𝑝 with degree 𝑡 − 1
satisfying 𝑓′(0) = 0. The reconstructing algorithm
129
𝑅𝑒′ 𝑠𝑖1
′ , ⋯ , 𝑠𝑖𝑛′ = 𝑥′
for any n (𝑛 ≥ 𝑡) shares 𝑠𝑖1
′ , ⋯ , 𝑠𝑖𝑛′ ∈ {𝑠1
′ , ⋯ , 𝑠𝑚′ } by using the Lagrange's interpolation formula
to the exponents.
Protocol 𝑷𝓔𝒅
Suppose that the master encryption key 𝑘 ∈ 𝑍𝑝 is shared by π, i.e., Π 𝑘 = 𝑘1, ⋯ , 𝑘𝑚 ,
and the share 𝑘𝑖 is distributed to 𝐾𝐴𝑖 for 1 ≤ 𝑖 ≤ 𝑚 . We describe the protocol 𝑃𝑑 to
“distributedly" evaluate the PPE algorithm d in Figure 6.2. It encrypts the plaintext 𝑥 =
𝑥1 ⋯𝑥𝑙 to the intermediate ciphertext 𝑧 = 𝑧1 ⋯ 𝑧𝑙 .
Figure 6.2. The Protocol 𝑃𝑑
.
As shown in Figure 6.2, for plaintext 𝑥 = 𝑥1 ⋯ 𝑥𝑙 , the user shares 𝐻′ 𝑥𝑖 and
𝐻 𝑥1, ⋯ , 𝑥𝑖−1 to the key agents, the key agents distributedly encrypt it, and DB assembles the
Goal: distributedly encrypt x = x1 …xl to z = z1 …zl
for i = 1 to l do
the user shares H'(xi) and H(x1,…,xi-1)byΠ’ andΠ,respectively;
letΠ’ (H'(xi)) = (h'i1,…,h'im)andΠ(H(x1,…,xi−1)) = (hi1,…,him);
for j = 1 to m do
the user sends (h'ij, hij) to KAj;
end for
end for
for i = 1 to l do
for j = 1 to m do
KAj computes h''ij = h'ij ∙𝑔𝑖𝑗 ∙𝑘𝑗 and sends it to the DB;
end for
end for
for i = 1 to l do
the DB selects n (n ≥2t−1) shares ′′𝑖𝑗1,…,′′𝑖𝑗𝑛
and computes zi =
Re’(′′𝑖𝑗1,…,′′𝑖𝑗𝑛
);
end for
the DB retrieves the intermediate ciphertext z = z1 …zl;
130
encrypted shares into an intermediate ciphertext𝑧 = 𝑧1 ⋯ 𝑧𝑙 . We prove the correctness of this
protocol in Lemma 6.3.1.
Lemma 6.3.1: The DB retrieves the ciphertext encrypted by d in the end of the distributed
protocol 𝑃𝑑.
Proof. According to 𝑃𝑑, 𝐻′ 𝑥𝑖 and 𝐻 𝑥1, ⋯ , 𝑥𝑖−1 are shared by the user. Let 𝑖𝑗
′ =
𝐻′ 𝑥𝑖 ⋅ 𝑔𝑓𝑖′ 𝑗 be the shares of 𝐻′ 𝑥𝑖 , where 𝑓𝑖
′ is a randomly selected polynomial over ℤ𝑝
with degree 𝑡 − 1 satisfying 𝑓𝑖′(0) = 0 , 1 ≤ 𝑗 ≤ 𝑚 . Let 𝑖𝑗 = 𝑓𝑖 𝑗 be the shares of
𝐻 𝑥1, ⋯ , 𝑥𝑖−1 , where 𝑓𝑖 is a randomly selected polynomial over ℤ𝑝 with degree 𝑡 − 1 satisfying
𝑓𝑖 0 = 𝐻 𝑥1, ⋯ , 𝑥𝑖−1 , 1 ≤ 𝑗 ≤ 𝑚. The key agent 𝐾𝐴𝑗 will compute 𝑖𝑗′′ = 𝑖𝑗
′ ⋅ 𝑔𝑖𝑗 ⋅ 𝑘𝑗 =
𝐻′ 𝑥𝑖 ⋅ 𝑔𝑓𝑖′ 𝑗 ⋅ 𝑔𝑓𝑖 𝑗 ⋅ 𝑘𝑗 = 𝑔(𝑙𝑜𝑔𝑔𝐻 ′ 𝑥𝑖 )+𝑓𝑖
′ 𝑗 +𝑓𝑖 𝑗 ⋅ 𝑘𝑗 , 1 ≤ 𝑗 ≤ 𝑚 . Notice that log𝑔 𝐻′(𝑥𝑖) +
𝑓𝑖′ (𝑗) is the share of log𝑔 𝐻′ 𝑥𝑖 using a polynomial over 𝑍𝑝 with degree 𝑡 − 1, 1 ≤ 𝑗 ≤ 𝑚.
And 𝑓𝑖 𝑗 ⋅ 𝑘𝑗 is the share of 𝐻 𝑥1, … , 𝑥𝑖−1 ⋅ 𝑘 using a polynomial over 𝑍𝑝 with degree
2𝑡 − 2,1 ≤ 𝑗 ≤ 𝑚. Therefore (log𝑔 𝐻′ 𝑥𝑖 ) + 𝑓𝑖′ 𝑗 + 𝑓𝑖 𝑗 ⋅ 𝑘𝑗 is the share of log𝑔 𝐻′(𝑥𝑖) +
𝐻 𝑥1, … , 𝑥𝑖−1 ⋅ 𝑘 using a polynomial over 𝑍𝑝 with degree 2𝑡 − 2, 1 ≤ 𝑗 ≤ 𝑚. Hence the DB
reconstructs
𝑧𝑖 = 𝐻′ 𝑥𝑖 ⋅ 𝑅 𝑥1, … , 𝑥𝑖−1, 𝑘 ,
where 𝑅 𝑥1, ⋯ , 𝑥𝑖−1 , 𝑘 = 𝑔𝐻 𝑥1 ,⋯ ,𝑥𝑖−1 ∙𝑘 by using n (𝑛 ≥ 2𝑡 − 1) shares based on the
Lagrange's interpolation to the exponents, 1 ≤ 𝑖 ≤ 𝑙.
We show that d preserves prefix and it is secure under IND-PCPA in Lemma 6.3.2.
131
Lemma 6.3.2: The encryption d preserves prefix (i.e., two plaintexts share i common
prefix if and only if the ciphertexts share i common pre-blocks). Furthermore, it is secure under
IND-PCPA.
Proof. Since H’ and R are deterministic, if two plaintexts x and x’ share i common prefix,
then the ciphertext of x and the ciphertext of x’ share i common pre-blocks. Furthermore, since
H’(0) ≠ H’(1), the (i+1)-th block of the two ciphertexts are distinct. Therefore d preserves
prefix. It implies from the proofs in [48, 78] that the PPE algorithm is computationally
indistinguishable from the ideal PPE object. Thus it is secure under IND-PCPA according to
Theorem 8.2.4. Comparing the formula zi of d with the formula yi of , the only difference is
that zi is a pseudorandom block and yi is a pseudorandom bit. Hence d is also secure under IND-
PCPA.
Reduction Algorithm
Since 𝑑 = 𝑧 preserves prefix, the ciphertext 𝑧 = 𝑧1 ⋯ 𝑧𝑙 can support prefix search
already. But d increases the size of the ciphertext since 𝑧𝑖 is a block instead of a bit. This can
impact the search performance significantly. We develop a reduction algorithm RA to reduce the
intermediate ciphertext𝑧 = 𝑧1 ⋯ 𝑧𝑙 to the final single-bit ciphertext 𝑦 = 𝑦1 ⋯ 𝑦𝑙 , 1 ≤ 𝑖 ≤ 𝑙.
We use a mapping function f to record the mapping between 𝑧𝑖 and 𝑦𝑖 . For a node v, let l(v)
denote the left child node, r(v) denote the right child node, vl(v)denotes the edge connecting v
and l(v), and vr(v) denotes the edge connecting v and r(v). RA is designed in Figure 6.3.
132
In Figure 6.3, if f has recorded that 𝑧𝑖 has already been mapped to a bit, then the mapping
should be retained. Otherwise, 𝑧𝑖 is mapped to a chosen bit and f records this new mapping.
Lemma 6.3.3 proves the security and prefix preserving properties of the algorithm.
Figure 6.3. The Reduction Algorithm RA.
Lemma 6.3.3: RA is efficient. Furthermore, the encryption algorithm RA ∘ d preserves
prefix and is secure under IND-PCPA.
Goal: reduce the intermediate ciphertext z = z1 …zl to the final ciphertext y = y1 …yl;
Initialization: the mapping function f = null;
v = root;
While v leaf node do
if f(vl(v)) = f(vr(v)) = null then
b $ {0,1}
if b = 0 then f(vl(v)) = zi; yi = 0; v = l(v);
else f(vr(v)) = zi; yi = 1; v = r(v);
end if
end if
if f(vl(v)) null & f(vr(v)) = null then
if zi = f(vl(v)) then yi = 0; v = l(v);
else f(vr(v)) = zi; yi = 1; v = r(v);
end if
end if
if f(vr(v)) null & f(vl(v)) = null then
if zi = f(vr(v)) then yi = 1; v = r(v);
else f(vl(v)) = zi; yi = 0; v = l(v);
end if
end if
if f(vl(v)) null & f(vr(v)) null then
if zi = f(vl(v)) then yi = 0; v = l(v);
else yi = 1; v = r(v);
end if
end if
end while
return y = y1 …yl;
133
Proof. In RA, the mapping function f only records the mappings appeared on DB.
Moreover, when mapping z = z1 … zl to y = y1 … yl, RA takes l steps. Hence RA is efficient. For
two intermediate ciphertexts z = z1 … zl and z’ = z’1 … z’l, if they share i common pre-blocks,
then it can be verified that RA traverses the same i nodes for z and z’; if zi+1 ≠ z’i+1, then it can be
verified that RA traverses different (i+1)-th nodes for z and z’. Hence the first i blocks of z and z’
will map to the same i bits but the (i+1)-th blocks of z and z’ will map to different bits. Therefore
RA ∘ d preserves prefix. The secure proof is analogous to that in Lemma 6.3.2.
Functionalities and Security Requirements Proofs for the Protocols
In this section, we prove that our PPE protocol satisfies the functionality and security
requirements. In Theorem 6.3.4 we prove that the request protocol Q and response protocol P
satisfy the functionality requirements (1) and (2), respectively.
Theorem 6.3.4: The request protocol Q realizes the functionality requirement (1) and the
response protocol P realizes the functionality requirement (2).
Proof. According to Lemmas 6.3.1, 6.3.2, and 6.3.3, the DB receives the ciphertext of x
encrypted by the PPE RA ∘ 𝑑 in Q. Therefore the request protocol Q realizes the functionality
requirement (1). In P the returned data object y will be encrypted. The recipient user will have
the encryption key and, hence, can decrypt the ciphertext and obtain y. Hence the response
protocol P realizes the functionality requirement (2).
We adopt the security definition for multi-party computation [18, 19] to define the security
requirement for our system, which is based on real model and ideal model defined in Definition
6.3.1.
134
Definition 6.3.1 (Real Model and Ideal Model): The real model is exactly the request
protocol Q and response protocol P. In the ideal model, there are users, the DB, and a trusted
(incorruptible) party TP who holds the key. There are secure communication channels between
the TP and users/DB. In the ideal model the TP receives/sends the message from/to users/DB,
and does all the encryptions/decryptions needed in the protocols Q and P.
Now we define the security requirement in Definition 6.3.2. Essentially, it requires that the
real model is “equivalent” to the ideal model.
Definition 6.3.2 (Security requirement): Let 𝑉𝐼𝐸𝑊𝑅(𝑍) be the instance event randomly
selected from the event space of what the adversary 𝒜 can observe in the real model by
compromising entities in the set 𝑍 ∈ 𝒵. Let 𝑉𝐼𝐸𝑊𝐼 𝑍 be the instance event randomly selected
from the event space of what the adversary 𝒜 can observe in the ideal model by compromising
the entities in the set 𝑍 − 𝐾𝐴, the real model is secure if the adversary cannot retrieve more
information from the real model than the ideal model, or equivalently, if there exists a PPT
simulator 𝒮 such that 𝑉𝐼𝐸𝑊𝑅(𝑍) is computationally indistinguishable from 𝒮 𝑉𝐼𝐸𝑊𝐼 𝑍 , i.e.
the advantage of 𝒜, defined by
𝑨𝒅𝒗𝒜 ≜ Pr[𝒜(𝑉𝐼𝐸𝑊𝑅(𝑍)) = 1] − Pr[𝒜(𝒮(𝑉𝐼𝐸𝑊𝐼(𝑍))) = 1],
is bounded by a negligible function of the security parameter for any 𝑍 ∈ 𝒵.
We prove that our system achieves the security requirement in Theorem 6.3.5.
Theorem 6.3.5: Our system achieves the security requirement in Definition 6.3.2.
Proof. First we consider the security of Q. In both the real model and the ideal model, the
adversary 𝒜 can compromise some users and view the same thing. In the real model 𝒜 can
135
compromise less than t key agents; while in the ideal model 𝒜 cannot compromise the trusted
party TP. Since the user shares 𝐻′ 𝑥𝑖 and 𝑅 𝑥1, ⋯ , 𝑥𝑖−1, 𝑘 to the key agents by using (𝑡, 𝑚)
secret sharing scheme and 𝒜 compromises less than t shares, the view of 𝒜 is random numbers
and, hence, can be simulated by 𝒮. In the real model 𝒜 can compromise the DB and view the
intermediate ciphertext 𝑧 = 𝑧1 ⋯ 𝑧𝑙 , the final ciphertext 𝑦 = 𝑦1 ⋯ 𝑦𝑙 , and the mapping
function f; while in the ideal model 𝒜 can only view the final ciphertexty. Since the difference
between the intermediate ciphertextz and final ciphertexty is that 𝑧𝑖 is a random block and 𝑦𝑖 is a
random bit. Therefore z can be simulated by 𝒮 based on y. The mapping function f can be
simulated accordingly based on z and y. Hence, 𝑉𝐼𝐸𝑊𝑅 can be simulated by 𝒮 based on 𝑉𝐼𝐸𝑊𝐼.
Then we consider the security of P. In P only the users who can access rights to the data
will have the key. Thus, the adversary 𝒜 cannot get the encryption keys unless 𝒜 compromises
the corresponding users. Therefore P achieves the security requirement because the adversary
cannot achieve more information in P than in the ideal model.
6.4 Performance Study
We conduct experiments to study the performance of the request protocol Q. Specifically,
we study the performance of 𝑑 since it is the dominant factor. First, we consider the secret
sharing factor in 𝑑 . In 𝑑 , two secret sharing schemes have been used, one is and Re over Zp,
and the other is ’ and Re’ over G. Various groups can be used for G and here we use the
Schnorr group, i.e., G is a multiplicative subgroup of Zq*, where |G| = p and p is a 256-bit prime
136
number. We implemented the algorithms and ran them 104 trials on a PC with 2.50GHz Intel
Core 2 Duo Processor. The average execution times are shown in Figure 6.4.
As shown in Figure 6.4, ’ and Re’ has a higher computation cost than and Re because
the computation of ’ and Re’ needs extra group operations. Since the Lagrange's interpolation
is linear, the reconstruction algorithm has a lower computation cost than the sharing algorithm
which requires polynomial evaluation. Both the sharing time and the reconstruction time increase
when the threshold t increases (which is obvious from the sharing and reconstruction approach).
Figure 6.4. Computation Cost of Secret Sharing over Zp and G (Share Number m = 6).
To factor in the communication latencies between the system entities, we allocate the user,
the key agents and the DB to different PlanetLab computers and measure the communication
latencies between them. The user is in Dallas and the DB is in Los Angeles. Six key agents (i.e.,
m = 6) are allocated to Phoenix (Arizona), Salt Lake City (Utah), Carson City (Nevada), Eugene
(Oregon), Albuquerque (New Mexico), and Denver (Colorado). Both hash functions H and H’
are SHA-2. We assume that the request message without the critical data is of size 170 bytes
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
2 3 4 5 6
Tim
e (
mill
ise
con
ds)
t
Π
Re
0
0.5
1
1.5
2
2.5
2 3 4 5 6
Tim
e (
mill
ise
con
ds)
t
Π'
Re'
137
(based on the average size of some common queries). The critical data size is l bits and we set l =
{8,16,32,64,128,256,512,1024}. Since the threshold t should satisfy the condition t < 𝑚
2 + 1 [18,
19], we set t = 2,3,4. For comparison purpose, we also consider a “No Encryption” protocol and
a “PPE” protocol. In “No Encryption”, the user directly sends the query (with the critical data) to
the DB without any encryption. In “PPE”, we assume that the user has the encryption key and
encrypts the critical data by the PPE constructed in [79], and then sends the query (with the
encrypted critical data) to the DB. The experimental results are given in Figure 6.5 and
summarized in Table 6.1.
Figure 6.5. Encryption Cost Comparisons for Different Protocols.
As shown in Figure 6.5 and Table 6.1, the encryption cost of “No Encryption” < the
encryption cost of “PPE” < the encryption cost of 𝑑 for t = 2 < the encryption cost of 𝑑 for t =
3 < the encryption cost of 𝑑 for t = 4. The encryption costs of all protocols increase when l
increases because the length of the critical data increases. “No Encryption” requires
approximately 90 millisecond, and its encryption time increases slowly when the length of the
0
100
200
300
400
500
600
700
8 16 32 64 128 256 512 1024
Tim
e (
mill
ise
con
ds)
l
No Encryption
PPE
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
8 16 32 64 128 256 512 1024
Tim
e (
mill
ise
con
ds)
l
E_d (t=2)
E_d (t=3)
E_d (t=4)
138
critical data increases because it does not incur encryption overhead but only incurs
communication overhead. “PPE” requires 92 milliseconds to 609 milliseconds when l increases
from 8 to 1024. Relatively, 𝑑 incurs a much higher encryption cost than pure “PPE”, from 3
folds when data size is 8 bits to 6 folds when data size becomes 1024. This is because it also
incurs the sharing and reconstructing cost as well as a higher communication cost due to the use
of intermediate key agents. However, a multi-user PPE protocol is essential and the cost is
bearable. When t increases, the computation cost of secret sharing increases and, hence, the
encryption cost of 𝑑 increases, but the increase is relatively slow. Thus, using additional key
agents to enhance security can be a feasible method.
Table 6.1. Encryption Cost (in milliseconds) Comparisons for Different Protocols.
l “No Encryption” “PPE” 𝓔𝒅 (t=2) 𝓔𝒅 (t=3) 𝓔𝒅 (t=4)
8 88.11 92.14 305.01 306.27 309.71
16 88.16 96.22 373.55 376.05 382.94
32 88.27 104.38 483.21 488.21 502.00
64 88.47 120.70 661.87 671.88 699.46
128 88.88 153.33 960.05 980.07 1035.24
256 89.67 218.56 1471.30 1511.34 1621.67
512 91.16 348.95 2371.91 2451.98 2672.65
1024 93.86 609.45 3999.07 4159.23 4600.57
6.5 Summary
In this chapter, we developed the first complete security proof for PPE by qualifying the
security of the ideal PPE object. We created a new security notion, IND-PCPA, and proved that
the ideal PPE object is secure under IND-PCPA and can at the best reach IND-PCPA security.
We also built a protocol and extended an existing PPE scheme to support multi-user systems, in
which users do not know the master encryption key (in fact, no single entity in the system knows
139
the master encryption key) for encrypting and decrypting the data to be sent to and received from
the server, respectively. We solve the challenge and designed a distributed PPE encryption
scheme for the multi-user protocol. The correctness and security of the multi-user protocol have
been proved rigorously. The performance of the protocol is studied experimentally to illustrate
its feasibility.
140
CHAPTER 7
SUMMARY AND FUTURE RESEARCH
With the increasing importance of data intensive applications, the design of secure storage
systems becomes a very critical issue. In many situations, the storage servers are required to
process queries issued by the users. In this case, the data should be encrypted by various special
encryption schemes to support secure computations on the encrypted data. In this PhD research,
we study secure computation techniques including HE, OPE, and PPE schemes. We investigate
the existing works on HE, OPE, and PPE schemes, and overcome some of the limitations in
existing works.
We have constructed a novel non-circuit based HE algorithm. Our scheme is fully
homomorphic and we have proved that its security is equivalent to the well known large integer
factorization problem (which is also the security basis for RSA) under a bounded chosen
plaintext attacks. Our scheme yields a very practical time complexity for encryption, decryption,
and computation. Compare to Gentry’s algorithm, which is fully homomorphic and semantic
secure, our scheme is 107 times faster in addition and 6*10
5 times faster in multiplication. We
have also extended our HE algorithm to handle multiple users with different access rights,
allowing individual users to encrypt the secret data when sending them with the requests and
decrypt the ciphertexts received in the responses using individualized keys. The request and
response protocols are based on similarity transformation and can achieve the same security as
141
our homomorphic encryption scheme. Experiments show that our protocols have a satisfactory
performance.
We also have significant contributions in encryption schemes that facilitate search on
ciphertexts, namely, the OPE schemes. We use information theory to analyze the security of the
ideal OPE object defined in [12]. Specifically, we derive the expected number of bits (zh) of the
plaintext that can remain secret under h known plaintext attacks. The result shows that although
the adversary may retrieve some information about the plaintext, the probability for the
adversary to fully recover the plaintext is a negligible function of the security parameter. We also
show that the ideal OPE object, as defined in [12], may not be the most secure OPE. Then we
present two generalized OPE (GOPE) algorithms that satisfy stronger notions of security than the
ideal OPE object. To allow multiple users to use the master key for encrypting the data sent to
the database, we have developed two digit based request-response protocols DOPE and OE-
DOPE. Both DOPE and OE-DOPE can be used with any existing OPE scheme to support multi-
user systems. The performance study results show that the protocols have a fairly reasonable
overhead. When the underlying OPE scheme is relatively efficient, the protocols can yield good
performance.
PPE schemes have been developed to facilitate prefix based computations on encrypted
data, However, the attempt to qualify the security of the ideal PPE object has not been successful
in the literature. We have developed the first complete security proof to qualify the security of
the ideal PPE object. Also, we have proven that the ideal PPE object is the most secure PPE
scheme. In contrast to our proof showing that the ideal OPE object may not be the most secure
142
OPE, this is a positive confirmation of the security limit that any PPE scheme can achieve. We
have also converted an existing PPE scheme to a distributed algorithm, and constructed a
protocol based on the distributed PPE scheme to support multi-user systems. The correctness and
security of the multi-user PPE protocol have been proved rigorously. The performance of the
protocol is studied experimentally to illustrate its feasibility.
We plan to continue the research on the construction and, in particular, the application of
HE, OPE, and PPE schemes. First, we are interested in improving the security and performance
of these schemes. For example, the only OPE scheme that has a security proof is very inefficient,
taking over 10 seconds to encrypt/decrypt an 18-bit data object. Though it is possible to improve
its performance by partitioning the data and encrypt individual pieces distributedly, the security
of such approach needs to be carefully analyzed. Moreover, the problem of what is and how to
construct the most secure OPE is still open and we plan to investigate it further.
Second, we plan to apply HE schemes to practical applications. The HE scheme we have
invented has a substantial performance advantage over existing constructions. We plan to apply
it to several important application domains such as distributed key management in large scale
systems and privacy preserving data mining, in which the security requirements match with the
security level our HE scheme can assure. The key management tasks in very large scale system
may be highly demanding. For example, in future SCADA (supervisory control and data
acquisition) systems, there may be trillions of devices attached to the network and a centralized
key manager will not be feasible. Thus, it is necessary to have many distributed key managers.
This implies that these key managers may no longer be highly trusted, i.e., it is likely that one of
143
them may be malicious or may have security loopholes and can be compromised more easily.
We plan to investigate the techniques such that the key generation and refreshing of keying
materials can be done securely using HE algorithms.
In privacy preserving data mining, HE and OPE schemes can be used to protect the data
and support arithmetic and comparison operations. However, the single key problem in existing
HE and OPE schemes prevents them from conducting data mining on federated data centers,
where there are multiple data owners who would like to preserve the secrecy of their data while
allowing some data analysis agents to extract statistical information from the collective data sets.
Our multi-user HE and DOPE protocols can provide potential solutions. Specifically a key agent
(not the data owner or the data analysis agent) can acquire the encrypted data (encrypted using
different keys) from multiple sources and transform them to ciphers with encryption key KNULL
(no single entity in the system would have the key for decrypting ciphers encrypted using
KNULL) following our multi-user HE and/or OPE request protocols. The transformed ciphers can
be passed to the data analysis agent to perform data mining. The analysis results can be sent back
to the key agent to be further transformed into ciphers encrypted with KDAA following our multi-
user HE and/or OPE response protocols. The final ciphers can be decrypted by the data analysis
agent with its individualized key KDAA.
Despite a feasible protocol, it can still be challenging to use HE and OPE schemes to
achieve privacy preserving data mining. In decision tree learning, it is necessary to allocate the
attributes to the nodes of a binary tree based on the information gain which can be evaluated by
the entropy. In order to compute the entropy of an attribute, the values of the attribute and the
144
split point need to be compared, which requires OPE scheme to protect the security. Arithmetic
operations are only needed for computing the numbers of values that are greater than and smaller
than the split point, which can be achieved without HE scheme. Hence, the privacy preserving
decision tree mining can be achieved with an OPE scheme. Existing privacy preserving decision
tree mining protocols [2] [24] mainly use perturbation techniques which are susceptible to the
data recovery attacks [43] [45]. We plan to compare the security and performance of the privacy
preserving decision tree mining protocol based on multi-user DOPE with that based on
perturbation techniques.
Some clustering techniques, such as k-means, require recursively computing the distance
between the data objects (n dimensional tuples) and k centroids, clustering the data objects to the
nearest centroids, and updating the centroids. The data objects can be protected by an HE scheme
and the distances can be computed directly on the ciphertexts. But the problem is how to find the
minimum distance since they are encrypted by HE algorithm. The existing privacy preserving k-
means clustering protocols [23] [58] [70] eitheruseYao’smethodorXORhomomorphicsecret
sharing to evaluate the comparison circuit to find the minimum distance. But these methods incur
high computation and communication overheads. Although the distance includes the relation
information between the data objects and the centroids, it cannot be linked to every single value
in the data objects. Based on this consideration, the distances can be decrypted and compared
accordingly. In order to control information leak, we plan to combine the perturbation technique
to protect the distance information while supporting the correct comparison on the perturbed
distances. We also plan to investigate similar methods for other data mining algorithms.
APPENDIX
A.1 Security Proof for OPE Schemes
We study the security of OPE schemes by deriving the upper and lower bounds on zh. In
Subsection A.1.1, we derive an upper bound on zh for any OPE scheme SE = (K, E ,D)
based on a specific known plaintext attack. We then consider the lower bound on zh based
on the ideal OPE object SE∗(K∗, E∗,D∗). However, since the known plaintext/ciphertext
set KP = {(xi, E∗(xi, k)) | 1 ≤ i ≤ h} is not determined, it is difficult to derive the lower
bound on zh = H(Xm∗ |Ym∗,n, KP ) directly. Instead, we take the following approach to get
the lower bound on zh. First, we consider the case h = 0 and derive the lower bound on z0.
Let Eh denote the event that the adversary reverses the ciphertext based on KP . Then there
is a 1-to-1correspondence between Pr(Eh) and zh, i.e., Pr(Eh) = 2−zh . Therefore, the upper
bound on z0 can be transformed to the lower bound on Pr(E0). Also, note that KP cuts the
domain into h + 1 segments and the range into h + 1 segments so that the OPE algorithm
encrypts the plaintexts from each sub-domain to the corresponding sub-range. Hence, we
apply the lower bound on Pr(E0) to each sub-domain and sub-range pair in order to get
the lower bound on Pr(Eh) (Subsection A.1.2). Finally, we get the upper bound on zh by
reversing the one-to-one correspondence between Pr(Eh) and zh.
A.1.1 Upper Bound on zh for Any OPE Scheme
In the following lemma, we give an upper bound on zh for any OPE scheme. In this lemma
and for the remainder of this paper, the base of the logarithm operator log is 2 and the base
of the natural logarithm operator ln is e.
Lemma A.1.1: For any OPE scheme SE = (K, E ,D), zh ≤ log m−hh+1
.
145
146
Proof. Suppose that the adversary knows the plaintext/ciphertext pairs (xi = i(m+1)h+1
,
E(xi, k)), 1 ≤ i ≤ h. Assume that x is uniformly randomly selected from m∗ = [m]−{xi|1 ≤
i ≤ h} and the ciphertext E(x, k) is given to the adversary. Note that there exists i′ such
that
E(
(i′ − 1)(m+ 1)
h+ 1, k
)< E(x, k) < E
(i′(m+ 1)
h+ 1, k
).
Let E denote the event (i′−1)(m+1)h+1
< x < i′(m+1)h+1
. Since the encryption algorithm E preserves
the order of plaintexts, the adversary can conclude that E has the min-entropy
H∞(E) = − log1
i′(m+1)h+1
− (i′−1)(m+1)h+1
− 1= log(
i′(m+ 1)
h+ 1− (i′ − 1)(m+ 1)
h+ 1− 1) = log
m− hh+ 1
.
Hence zh ≤ log m−hh+1
. �
A.1.2 Lower Bound on zh for the Ideal OPE Object
We take the following steps to derive the lower bound on zh for the ideal OPE object.
First we analyze the special case where the adversary has no knowledge of any plain-
text/ciphertext pairs, i.e., h = 0. To do so, we first derive a formula for z0, namely z0 =
− log n−1∑
j∈[n] maxi∈[m](j−1i−1)(
n−jm−i)
(n−1m−1)
. We then prove that there exists a constant 0 < c < 1
such that z0 ≥ c logm for all n > m2 > 1. Thus, in the case h = 0, the probability for the
adversary to recover x is at most 2−z0 = 2−c logm = m−c. Then we consider the case where
the adversary has knowledge of h plaintext/ciphertext pairs. We derive an upper bound on
Pr(Eh), i.e., the expected probability for the adversary to fully recover a plaintext given its ci-
phertext and h known plaintext/ciphertext pairs. Note that the known plaintext/ciphertext
pairs will split the domain and range into intervals. We first prove a lemma giving an upper
bound on the expected number of “short” intervals, and use the previous result to bound
the probability on the remaining “long” intervals. Finally, we use these results to derive a
lower bound on zh. Finally we numerically compute the values of c′ = z0logm
.
147
The Case h = 0
We begin by proving a lower bound on z0 and the corresponding upper bound on Pr(Eh).
To do so, we first derive the formula of z0 in the following Lemma A.1.2.
Lemma A.1.2: For ideal OPE object, z0 = − log n−1∑
j∈[n] maxi∈[m](j−1i−1)(
n−jm−i)
(n−1m−1)
.
Proof. Let SIFm,n(i, j) = {f ∈ SIFm,n | f(i) = j}. Then
|SIFm,n(i, j)| =(j − 1
i− 1
)(n− jm− i
).
Let SIFm,n(j) = {f ∈ SIFm,n | ∃i ∈ [m] s.t. f(i) = j}. Then
|SIFm,n(j)| =∑i∈[m]
|SIFm,n(i, j)| =∑i∈[m]
(j − 1
i− 1
)(n− jm− i
)=
(n− 1
m− 1
).
Let Fi,j be a uniform random variable on SIFi,j. Then
Pr(Ym,n = j) =∑
f∈SIFm,n
Pr(Ym,n = j|Fm,n = f) Pr(Fm,n = f)
=
∑f∈SIFm,n Pr(Ym,n = j|Fm,n = f)(
nm
) =
∑f∈SIFm,n(j) Pr(Ym,n = j|Fm,n = f)(
nm
)=
∑f∈SIFm,n(j)
1m(
nm
) =|SIFm,n(j)|m(nm
) =
(n−1m−1
)m(nm
) =1
n.
Note that Pr(Ym,n = j|Fm,n = f) = m−1 since for f ∈ SIFm,n(j), there exists xf,j s.t.
f(xf,j) = j. Since Pr(Xm = xf,j) = m−1 and Ym,n = Fm,n(Xm), Pr(Ym,n = j|Fm,n = f) =
m−1.
Note that
Pr(Xm = i|Ym,n = j) =|SIFm,n(i, j)||SIFm,n(j)|
=
(j−1i−1
)(n−jm−i
)(n−1m−1
) .
148
Hence,
H∞(Xm|Ym,n) = − log∑j∈[n]
Pr(Ym,n = j)2−H∞(Xm|Ym,n=j)
= − log∑j∈[n]
Pr(Ym,n = j) maxi∈[m]
Pr(Xm = i|Ym,n = j)
= − log n−1∑j∈[n]
maxi∈[m]
Pr(Xm = i|Ym,n = j)
= − log n−1∑j∈[n]
maxi∈[m]
(j−1i−1
)(n−jm−i
)(n−1m−1
) �
Now we derive the the lower bound on z0 by proving that for ideal OPE object, there
exists a constant 0 < c < 1 such that z0 ≥ c logm for n > m2 > 1. We first prove five
technical lemmas. Note that there is a max function over i ∈ [m] in the formula of z0. In
Lemma A.1.3 we show that the maximum value can be achieved only if i ∈ [ mjn+1
, mjn+1
+ 1].
The next two lemmas, Lemma A.1.4 and Lemma A.1.5, are preparations for Lemma A.1.6.
Lemma A.1.4 gives an estimate on 1+x, which follows from Taylor expansion. Then Lemma
A.1.5 gives an estimate on(xy
), which is proved by applying Lemma A.1.4 twice. Finally, in
Lemma A.1.6, we apply the conclusions of Lemma A.1.4 and Lemma A.1.5 to prove a bound
on each term of the hypergeometric distribution, which is then simplified. Summation will
prove z0 ≥ c logm with the given conditions n > m2 and m > mc. We will use the conclusion
of Lemma A.1.7 to show that we can in fact choose mc = 1
Lemma A.1.3: Given j,m, n,(j−1i−1)(
n−jm−i)
(n−1m−1)
achieves the maximum value only if i ∈ [ mjn+1
, mjn+1
+
1].
Proof. We assume that(j−1i−1)(
n−jm−i)
(n−1m−1)
achieves the maximum value at i. Then(j−1i−2
)(n−j
m−i+1
)(n−1m−1
) ≤(j−1i−1
)(n−jm−i
)(n−1m−1
) and
(j−1i−1
)(n−jm−i
)(n−1m−1
) ≥(j−1i
)(n−j
m−i−1
)(n−1m−1
) .
Therefore,(j−1i−2
)(n−j
m−i+1
)(n−1m−1
) ≤(j−1i−1
)(n−jm−i
)(n−1m−1
) ⇒ 1
(j − i+ 1)(m− i+ 1)≤ 1
(i− 1)(n− j −m+ i)
⇒ i ≤ mj
n+ 1+ 1
149
and (j−1i−1
)(n−jm−i
)(n−1m−1
) ≥(j−1i
)(n−j
m−i−1
)(n−1m−1
) ⇒ 1
(j − i)(m− i)≤ 1
i(n− j −m+ i+ 1)
⇒ i ≥ mj
n+ 1.
Hence i ∈ [ mjn+1
, mjn+1
+ 1]. �
Lemma A.1.4: For every ε > 0, there exist xε > 0 and∣∣α + e
2
∣∣ < ε such that for all
|x−1| ≥ xε,
1 + x = (e+ αx)x.
Proof. This is a consequence of Taylor expansion. We have(1 +
1
x
)x= ex ln(1+
1x) = ex(
1x− 1
2x2+o(x−2))
= e1−12x
+o(x−1) = e · e−12x
+o( 1x)
= e− e
2x+ o(x−1) = e+
1
x
(−e
2+ o(1)
). �
Note that in Lemma A.1.4, α is negative for small ε. Also, α is not a constant, but
depends on x.
Lemma A.1.5: For every ε > 0, there exist 0 < cε < 1 and yε > 0 such that for x ≥ y2
and y ≥ yε,
cε
(exy
)y√
2πy≤(x
y
)≤eε(exy
)y√
2πy.
Proof. According to Stirling’s formula,
x! =√
2πx(xe
)xeλx
where
1
12x+ 1< λx <
1
12x.
150
Hence (x
y
)=
x!
y!(x− y)!
=
√2πx
(xe
)xeλx
√2πy(y
e)yeλy
√2π(x− y)
(x−ye
)x−yeλx−y
=
√xx−y
(xy
)y (xx−y
)x−y√
2πyeλx−λy−λx−y
=
(xy
)y (xx−y
)x−y+ 12
√2πy
eλx−λy−λx−y .
We now apply Lemma A.1.4 twice. For every ε > 0 there exist xε > 0 and∣∣α + e
2
∣∣ < ε
such that for∣∣∣x−yy ∣∣∣ ≥ xε and
∣∣∣ e(x−y)αy
∣∣∣ ≥ xε,(x
x− y
)x−y+ 12
=
(1 +
y
x− y
)x−y+ 12
=
(e+
αy
x− y
) yx−y (x−y+
12)
=
(e+
αy
x− y
)y(1+ 12(x−y))
where α depends on yx−y . We further have(
e+αy
x− y
)y(1+ 12(x−y))
= ey(1+1
2(x−y))(
1 +αy
e(x− y)
)y(1+ 12(x−y))
= ey(1+1
2(x−y))(e+ α
αy
e(x− y)
) αye(x−y)y(1+
12(x−y))
= ey(1+1
2(x−y))(e+
α2y
e(x− y)
)αy2(1+ 12(x−y))
e(x−y)
.
where the second α depends on αye(x−y) . For simplicity we denote the multiplication of the
two α’s as α2. What matters here is that the two α’s are bounded. For ε > 0, there exists
x′ε > 0 such that for xy≥ x′ε,
ey < ey(1+1
2(x−y)) = ey+y
2(x−y) ≤ ey+ε
151
and
e ≤ e+α2y
e(x− y)= e+
α2 yx
e(1− y
x
) ≤ e+ ε
and for x− y ≥ x′ε,xy≥ x′ε, and x ≥ y2,(− e
2− ε)
(1 + ε)
e(1− ε)≤α y
2
x
(1 + 1
2(x−y)
)e(1− y
x
) ≤ 0.
Therefore, for |x−yy| ≥ xε, | e(x−y)αy
| ≥ xε, x− y ≥ x′ε,xy≥ x′ε, and x ≥ y2.
ey(e+ ε)(− e2−ε)(1+ε)
e(1−ε) ≤(
x
x− y
)x−y+ 12
≤ ey+ε
Also, there exists x′′ε > 0 such that, for x ≥ x′′ε , y ≥ x′′ε , and x− y ≥ x′′ε ,
1− ε ≤ eλx−λy−λx−y < 1.
For xε, x′ε, and x′′ε , there exists yε > 0 such that for x ≥ y2 and y ≥ yε, all of the previous
constraints hold, i.e., |x−yy| ≥ xε, | e(x−y)αy
| ≥ xε, x−y ≥ x′ε,xy≥ x′ε, x ≥ x′′ε , y ≥ x′′ε , x−y ≥ x′′ε .
Hence for every ε > 0, there exists 0 < cε = (1 − ε)(e + ε)(− e2−ε)(1+ε)
e(1−ε) < 1 and yε > 0, such
that, for x ≥ y2 and y ≥ yε,
cε
(exy
)y√
2πy≤(x
y
)≤eε(exy
)y√
2πy. �
Lemma A.1.6: Let 12< σ < 1, j ∈ [ n
mσ, n− n
mσ], and i ∈ [ mj
n+1, mjn+1
+ 1]. Then for every
ε > 0 there exist cε,σ,1, cε,σ,2, and mε,σ such that
cε,σ,1m− 1
2 ≤(j−1i−1
)(n−jm−i
)(n−1m−1
) ≤ cε,σ,2m− 1−σ
2 ,
for n > m2 and m ≥ mε,σ.
Proof. By Lemma A.1.5, we have the following bounds:
(1) for every ε > 0, there exists 0 < cε < 1 and yε > 0 such that for j − 1 ≥ (i− 1)2 and
i− 1 ≥ yε,
cε
(e(j−1)i−1
)i−1√
2π(i− 1)≤(j − 1
i− 1
)≤eε(e(j−1)i−1
)i−1√
2π(i− 1),
152
(2) for every ε > 0, there exists 0 < cε < 1 and yε > 0 such that for n− j ≥ (m− i)2 and
m− i ≥ yε,
cε
(e(n−j)m−i
)m−i√
2π(m− i)≤(n− jm− i
)≤eε(e(n−j)m−i
)m−i√
2π(m− i),
(3) For every ε > 0, there exists 0 < cε < 1 and yε > 0 such that for n − 1 ≥ (m − 1)2
and m− 1 ≥ yε,
cε
(e(n−1)m−1
)m−1√
2π(m− 1)≤(n− 1
m− 1
)≤eε(e(n−1)m−1
)m−1√
2π(m− 1).
For (1), we need to derive the condition such that j − 1 ≥ (i − 1)2 and i − 1 ≥ yε hold.
Since mjn+1≤ i ≤ mj
n+1+ 1, it implies that (i − 1)2 ≤ ( mj
n+1)2. Hence it suffices to derive the
condition such that ( mjn+1
)2 ≤ j − 1 holds. Note that
(mj
n+ 1)2 ≤ j − 1⇔ (n+ 1)2(j − 1)− (mj)2 ≥ 0,
and the dominant term in (n + 1)2(j − 1) − (mj)2 is n2j − (mj)2. Since nmσ≤ j ≤ n − n
mσ
and n > m2, we have
n2j − (mj)2 > (n−m2)j2 ≥ j2 ≥ (n
mσ)2 > (
n
m)2 > m2.
It implies that j − 1 ≥ (i− 1)2 holds for sufficiently large m. Furthermore, since
i ≥ mj
n+ 1≥ mn
(n+ 1)mσ≥ m1−σ
2,
it implies that i converges to infinity when m converges to infinity. Therefore there exists
mσ,ε,1 > 0 such that for n > m2 and m ≥ mσ,ε,1,
j − 1 ≥ (i− 1)2 and i− 1 ≥ yε.
For (2), we need to derive the condition such that n− j ≥ (m− i)2 and m− i ≥ yε hold.
Since mjn+1≤ i ≤ mj
n+1+ 1, it implies that (m − i)2 ≤ (m − mj
n+1)2. Hence it suffices to derive
the condition such that (m− mjn+1
)2 ≤ n− j holds. We have
(m− mj
n+ 1)2 ≤ n− j ⇔ (n− j)(n+ 1)2 − (m(n+ 1)−mj)2 ≥ 0.
153
Since nmσ≤ j ≤ n− n
mσ, the dominant term in (n−j)(n+1)2−(m(n+1)−mj)2 is n3−(mn)2.
Since n > m2,
n3 − (mn)2 > (n−m2)n2 > n2 > m4.
It implies that n− j ≥ (m− i)2 holds for sufficiently large m. Furthermore, since
m− i ≥ m− mj
n+ 1− 1 ≥ m−
m(n− nmσ
)
n− 1 = m1−σ − 1,
it implies that m − i converges to infinity when m converges to infinity. Therefore there
exists mσ,ε,2 > 0 such that for n > m2 and m ≥ mσ,ε,2,
n− j ≥ (m− i)2 and m− i ≥ yε.
For (3), we have
n− 1 > m2 − 1 > m2 − 2m+ 1 = (m− 1)2.
Therefore there exists mε,3 > 0 such that for n > m2 and m ≥ mε,3,
n− 1 ≥ (m− 1)2 and m− 1 ≥ yε.
Therefore there exists mσ,ε,4 > 0 such that the estimation of Lemma A.1.5 can be applied
to(j−1i−1
),(n−jm−i
), and
(n−1m−1
)for j ∈ [ n
mσ, n− n
mσ], i ∈ [ mj
n+1, mjn+1
+ 1], n > m2 and m ≥ mσ,ε,4:
c2εeε
( e(j−1)i−1 )
i−1
√2π(i−1)
( e(n−j)m−i )m−i
√2π(m−i)
( e(n−1)m−1 )
m−1
√2π(m−1)
≤(j−1i−1
)(n−jm−i
)(n−1m−1
) ≤ e2ε
cε
( e(j−1)i−1 )
i−1
√2π(i−1)
( e(n−j)m−i )m−i
√2π(m−i)
( e(n−1)m−1 )
m−1
√2π(m−1)
.
Let
T =
( e(j−1)i−1 )
i−1
√2π(i−1)
( e(n−j)m−i )m−i
√2π(m−i)
( e(n−1)m−1 )
m−1
√2π(m−1)
=1√2π·
√m− 1
(i− 1)(m− i)
(j − 1
n− 1
m− 1
i− 1
)i−1(n− jn− 1
m− 1
m− i
)m−i.
Then
c2εeε· T ≤
(j−1i−1
)(n−jm−i
)(n−1m−1
) ≤ e2ε
cε· T.
154
In order to estimate T , we further let T1 =√
m−1(i−1)(m−i) , T2 = T21T22, where T21 =
(j−1n−1
m−1i−1
)i−1and T22 =
(n−jn−1
m−1m−i
)m−i. Thus it needs to estimate T1 and T2.
We first consider T1. Sincem nmσ
n+1≤ mj
n+1≤ i ≤ mj
n+1+ 1 ≤ m(n− n
mσ)
n+1+ 1, we have m1−σ
2≤
i ≤ m−m1−σ + 1. Note that (i− 1)(m− i) = −(i− m+12
)2 + (m+12
)2 −m. Consequently
m2−σ
4≤ min{(m
1−σ
2− 1)(m− m1−σ
2), (m−m1−σ)(m1−σ − 1)}
≤ (i− 1)(m− i) ≤ (m+ 1
2)2 −m =
(m− 1)2
4
for sufficiently large m. Therefore there exists mσ,ε,5 > 0 such that
2√m≤ 2√
m− 1≤√m− 1(m−1)2
4
≤ T1 =
√m− 1
(i− 1)(m− i)≤√m− 1m2−σ
4
≤ 2
m1−σ2
for m ≥ mσ,ε,5.
Now consider T2 = T21T22. Denote j = n2
+ u where −n2
+ nmσ≤ u ≤ n
2− n
mσ, then
−12+ 1mσ≤ u
n≤ 1
2− 1mσ
. Denote i = mjn
+v =(12
+ un
)m+v. Since mj
n+1≤ i = mj
n+v ≤ mj
n+1+1,
we have − 1m≤ − m
n+1≤ − mj
n+1≤ v ≤ − mj
n+1+ 1 ≤ 1. Then
T21 =
(j − 1
n− 1
m− 1
i− 1
)i−1=
(n2
+ u− 1
n− 1
m− 1(12
+ un
)m+ v − 1
)( 12+un)m+v−1
=
((1 +
1− 112+un
n− 1
)(1−
v−112+un
+ 1
m+ v−112+un
))( 12+un)m+v−1
=
(1 +
1− 112+un
n− 1
)( 12+un)m+v−1
·
(1−
v−112+un
+ 1
m+ v−112+un
)( 12+un)m+v−1
.
There exists mσ,ε,6 > 0 such that for n > m2 and m ≥ mσ,ε,6,
1− ε <
(1 +
1− 112+un
n− 1
)( 12+un)m+v−1
< 1 + ε.
155
For sufficiently large m,
∣∣∣∣−m+ v−112+u
nv−112+u
n+1
∣∣∣∣ ≥ xε. Then, by Lemma A.1.4,
(1−
v−112+un
+ 1
m+ v−112+un
)( 12+un)m+v−1
=
e− α(v−112+un
+ 1)
m+ v−112+un
−v−112+u
n
+1
m+ v−112+u
n
[( 12+un)m+v−1]
=
e− α(v−112+un
+ 1)
m+ v−112+un
−v+12−un
.
Since − 1m≤ v ≤ 1 and −1
2+ 1
mσ≤ u
n≤ 1
2− 1
mσ, we have −3
2≤ −3
2+ 1
mσ≤ −v + 1
2− u
n≤
12
+ 1m− 1
mσ≤ 1
2. Hence there exists mσ,ε,7 > 0 such that for n > m2 and m ≥ mσ,ε,7,
e−32 − ε ≤ e−v+
12−un − ε ≤
e− α(v−112+un
+ 1)
m+ v−112+un
−v+12−un
≤ e−v+12−un + ε ≤ e
12 + ε.
Hence for n > m2 and m ≥ max{mσ,ε,6,mσ,ε,7},
(1− ε)(e−32 − ε) ≤ T21 ≤ (1 + ε)(e
12 + ε).
Similarly,
T22 =
(n− jn− 1
m− 1
m− i
)m−i=
(n2− u
n− 1
m− 1(12− u
n
)m− v
)( 12−un)m−v
=
((1 +
1
n− 1
)(1 +
v12−un
− 1
m− v12−un
))( 12−un)m−v
=
(1 +
1
n− 1
)( 12−un)m−v
(1 +
v12−un
− 1
m− v12−un
)( 12−un)m−v
.
Then there exists mσ,ε,8 > 0 such that for n > m2 and m ≥ mσ,ε,8,
1− ε <(
1 +1
n− 1
)( 12−un)m−v
< 1 + ε.
156
For sufficiently large m,
∣∣∣∣m− v12−
un
v12−
un−1
∣∣∣∣ ≥ xε. Then, by Lemma A.1.4,
(1 +
v12−un
− 1
m− v12−un
)( 12−un)m−v
=
e+α(
v12−un
− 1)
m− v12−un
v
12−
un
−1
m− v12−
un
[( 12−un)m−v]
=
e+α(
v12−un
− 1)
m− v12−un
v− 12+un
.
Since − 1m≤ v ≤ 1 and −1
2+ 1
mσ≤ u
n≤ 1
2− 1
mσ, we have −1 ≤ −1− 1
m+ 1
mσ≤ v − 1
2+ u
n≤
1− 1mσ≤ 1. Hence there exists mσ,ε,9 > 0 such that for n > m2 and m ≥ mσ,ε,9,
e−1 − ε ≤ ev−12+un − ε ≤
e+α(
v12−un
− 1)
m− v12−un
v− 12+un
≤ ev−12+un + ε ≤ e+ ε.
Hence for n > m2 and m ≥ max{mσ,ε,8,mσ,ε,9},
(1− ε)(e−1 − ε) ≤ T22 ≤ (1 + ε)(e+ ε).
Consequently
(1− ε)2(e−32 − ε)(e−1 − ε) ≤ T2 = T21T22 ≤ (1 + ε)2(e
12 + ε)(e+ ε).
Hence for every ε > 0 and 12< σ < 1, there exists mσ,ε,10 > 0 such that
2(1− ε)2(e− 32 − ε)(e−1 − ε)√2πm
≤(j−1i−1
)(n−jm−i
)(n−1m−1
) ≤ 2(1 + ε)2(e12 + ε)(e+ ε)√
2πm1−σ,
for j ∈ [ nmσ, n− n
mσ], i ∈ [ mj
n+1, mjn+1
+ 1], n > m2, and m ≥ mσ,ε,10. Let
cε,σ,1 =2(1− ε)2(e− 3
2 − ε)(e−1 − ε)√2π
and cε,σ,2 =2(1 + ε)2(e
12 + ε)(e+ ε)√2π
.
Then for 12< σ < 1, j ∈ [ n
mσ, n− n
mσ], i ∈ [ mj
n+1, mjn+1
+ 1], n > m2, and m ≥ mσ,ε,10,
cε,σ,1m− 1
2 ≤(j−1i−1
)(n−jm−i
)(n−1m−1
) ≤ cε,σ,2m− 1−σ
2 . �
157
Lemma A.1.7: For any mc > 0, there exists nc > 0 and 0 < cmc,nc < 1 such that
z0 ≥ cmc,nc logm
for m ≤ mc and n ≥ nc.
Proof. Note that(j−1i−1
)(n−jm−i
)(n−1m−1
) =
(j−1)...(j−i+1)(i−1)!
(n−j)...(n−j−m+i+1)(m−i)!
(n−1)...(n−m+1)(m−1)!
=(m− 1)!
(i− 1)!(m− i)!( jn− 1
n)...( j
n− i−1
n)n−jn...(n−j
n− m−i−1
n)
(1− 1n)...(1− m−1
n)
.
Since 1 ≤ i ≤ m ≤ mc, for ε = (12)2mc−1 there exists nc > 0 such that
(m− 1)!
(i− 1)!(m− i)!(j
n)i−1(
n− jn
)m−i − ε ≤(j−1i−1
)(n−jm−i
)(n−1m−1
) ≤ (m− 1)!
(i− 1)!(m− i)!(j
n)i−1(
n− jn
)m−i + ε
for m ≤ mc and n ≥ nc. Therefore
z0 = − log n−1∑j∈[n]
maxi∈[m]
(j−1i−1
)(n−jm−i
)(n−1m−1
)= − log n−1
∑1≤j<n
4
maxi∈[m]
(j−1i−1
)(n−jm−i
)(n−1m−1
) +∑
n4≤j< 3n
4
maxi∈[m]
(j−1i−1
)(n−jm−i
)(n−1m−1
) +∑
3n4≤j≤n
maxi∈[m]
(j−1i−1
)(n−jm−i
)(n−1m−1
)
≥ − log n−1
n2
+∑
n4≤j< 3n
4
maxi∈[m]
(j−1i−1
)(n−jm−i
)(n−1m−1
)
≥ − log n−1
n2
+∑
n4≤j< 3n
4
maxi∈[m]
((m− 1)!
(i− 1)!(m− i)!(j
n)i−1(
n− jn
)m−i + ε
)= − log n−1
n2
(1 + ε) +∑
n4≤j< 3n
4
maxi∈[m]
(m− 1)!
(i− 1)!(m− i)!(j
n)i−1(
n− jn
)m−i
.
According to Lemma A.1.3, i ∈ [ mjn+1
, mjn+1
+ 1]. For n4≤ j < 3n
4, m
8≤ i < 3m
4+ 1. Hence
maxi∈[m]
(m− 1)!
(i− 1)!(m− i)!(j
n)i−1(
n− jn
)m−i ≤ (j
n+n− jn
)m−1 − (j
n)m−1(
n− jn
)m−m
≤ 1− (1
4)m−1 ≤ 1− (
1
4)mc−1.
158
Consequently, we have
z0 ≥ − log n−1
n2
(1 + ε) +∑
n4≤j< 3n
4
maxi∈[m]
(m− 1)!
(i− 1)!(m− i)!(j
n)i−1(
n− jn
)m−i
≥ − log n−1
(n
2(1 + ε) +
n
2
(1− (
1
4)mc−1
))≥ − log
(1− (
1
2)2mc−1 +
ε
2
)≥ − log
(1− (
1
2)2mc
).
Let cmc,nc =− log(1−( 12 )2mc)
logmc> 0, then z0 ≥ − log
(1− (1
2)2mc
)≥ cmc,nc logmc ≥ cmc,nc logm.
�
We now prove the lower bound on z0 in the following Theorem A.1.8.
Theorem A.1.8: For ideal OPE object, there exists a constant 0 < c < 1 such that for
n > m2 > 1,
z0 ≥ c logm.
Proof. Note that z0 is the average min-entropy of the hypergeometric distribution. Thus
z0 < logm, so trivially we have c < 1. It remains to prove that we can in fact choose c > 0.
We first prove the bound for n > m2 and m > mc, for some mc > 0. Then we prove the
bound for m ≤ mc and n ≥ nc, for some nc > 0, which will be used to prove that we can
choose mc = 1.
159
Based on Lemmas A.1.2, A.1.3, and A.1.6, for 12< σ < 1, n > m2, and m ≥ mε,σ, we
have
z0 = − log n−1∑j∈[n]
maxi∈[m]
(j−1i−1
)(n−jm−i
)(n−1m−1
)≥ − log n−1
2n
mσ+
∑j∈( n
mσ,n− n
mσ]
maxi∈[m]
(j−1i−1
)(n−jm−i
)(n−1m−1
)
≥ − log n−1(
2n
mσ+ (n− 2n
mσ)cε,σ,2m
− 1−σ2
)= − log
(2
mσ+ (1− 2
mσ)cε,σ,2m
− 1−σ2
)=
1− σ2
logm− log
(2m
1−3σ2 + (1− 2
mσ)cε,σ,2
).
Note that cε,σ,2 = 2(1+ε)2(e12+ε)(e+ε)√2π
in the proof of Lemma A.1.6. It implies that
limm→∞
log
(2m
1−3σ2 + (1− 2
mσ)cε,σ,2
)= log cε,σ,2.
Therefore for cσ = 1−σ4
, there exists mc > mε,σ such that
1− σ4
logm− log
(2m
1−3σ2 + (1− 2
mσ)cε,σ,2
)> 0
for m > mc. Hence
z0 ≥1− σ
2logm− log
(2m
1−3σ2 + (1− 2
mσ)cε,σ,2
)=
1− σ4
logm+
(1− σ
4logm− log
(2m
1−3σ2 + (1− 2
mσ)cε,σ,2
))≥ cσ logm
for n > m2 and m > mc.
Lemma A.1.7 allows us to conclude the proof of Theorem A.1.8. Note that the set
{(m,n) | 1 < m ≤ mc,m2 < n < nc} is finite and that z0
logm> 0 for (m,n) in that set. Since
we have already obtained two nonzero lower bounds on z0logm
, the first for the case n > m2
and m > mc and the second for the case n ≥ nc and 1 < m ≤ mc, we can choose c to be the
minimum of
160
1. the bound in the first case,
2. the bound in the second case,
3. the set of values z0logm
for (m,n) in the finite set of remaining values.
In other words, we can choose mc = 1, completing the proof of Theorem A.1.8. �
According to information theory, we have the following Corollary A.1.9 based on Theorem
A.1.8. It gives an upper bound on the probability for the adversary to reverse x from the
ciphertext E∗(x, f).
Corollary A.1.9: Let f be chosen uniformly randomly from SIFm,n and let x be chosen
uniformly randomly from [m]. Let E0 denote the event that the adversary obtains x from
the ciphertext E∗(x, f). Then for n > m2 > 1,
Pr (E0) ≤ 2−c logm = m−c.
Proof. According to Theorem A.1.8, there exists 0 < c < 1 such that for n > m2 > 1,
z0 ≥ c logm. It follows from information theory that the probability for the adversary to
recover x from the ciphertext E∗(x, f) is 2−c logm = m−c. �
The General Case
We now consider the case of h known plaintext attacks with the set of plaintext/ciphertext
pairs KP = {(xi, yi)}hi=1, where yi = E∗(xi, f), 1 ≤ i ≤ h. In this case, the plaintexts
xi will cut the domain into h + 1 segments [1, x1), (x1, x2), ..., (xh−1, xh), (xh,m], and the
ciphertexts yi will similarly cut the range into h + 1 segments [1, y1), (y1, y2), ..., (yh−1, yh),
(yh, n]. Since the encryption algorithm E∗ is order-preserving, it encrypts the plaintexts from
the sub-domains [xi + 1, xi+1 − 1] to the sub-ranges [yi + 1, yi+1 − 1], where 0 ≤ i ≤ h and
x0 = y0 = 0, xh+1 = m + 1, yh+1 = n + 1. We will proceed by applying Corollary A.1.9 to
each pair of [xi + 1, xi+1 − 1] and [yi + 1, yi+1 − 1], 0 ≤ i ≤ h.
In order to do so, we first give the following lemma. It analyzes the relationship of the
distance between a pair of plaintexts x and x′ with the distance between the corresponding
161
pair of ciphertexts E∗(x, f) and E∗(x′, f), in particular that for n ≥ m3, E∗(x′, f)−E∗(x, f)−1
is greater than (x′ − x− 1)2 with a dominant probability.
Lemma A.1.10: Suppose that n ≥ m3. Let x ∈ [m] and 1 ≤ δ ≤ m − x − 1. Let
y = E∗(x, f) and choose δ′ to satisfy y+δ′+1 = E∗(x+δ+1, f), where f is chosen uniformly
randomly from SIFm,n. Then
Pr(δ′ ≤ δ2) ≤ 1
m2.
Proof. Note that there are(y − 1
x− 1
)(δ′
δ
)(n− y − δ′ − 1
m− x− δ − 1
)functions in SIFm,n that maps x to y and x+ δ + 1 to y + δ′ + 1. Therefore
Pr(δ′ ≤ δ2) =δ2∑δ′=δ
n−m−δ′+x+δ∑y=x
(y−1x−1
)(δ′
δ
)(n−y−δ′−1m−x−δ−1
)(nm
)≤ n
δ2∑δ′=δ
(δ′
δ
)(n−δ′−2m−δ−2
)(nm
)For δ ≤ δ′ ≤ δ2, (
δ′
δ
)≤(δ2
δ
)≤ δ2
δ
δ2 − 1
δ − 1
δ2 − 2
δ − 2δ2(δ−3)
≤ δ(δ + 1)(δ + 3)m2(δ−3).
Also,
n
(n− δ′ − 2
m− δ − 2
)(n
m
)−1= n
(n− δ′ − 2) · · · (n−m− δ′ + δ + 1)
(m− δ − 2)!
m!
n · · · (n−m+ 1)
≤ nm(m− 1) · · · (m− δ − 1)
n(n− 1) · · · (n− δ − 1)· (n− δ′ − 2) · · · (n−m− δ′ + δ + 1)
(n− δ − 2) · · · (n−m+ 1)
≤ m
(1
m2· · · 1
m2· 1 · · · 1
)= m−2δ−1.
162
Then we have
Pr(δ′ ≤ δ2) ≤δ2∑δ′=δ
δ(δ + 1)(δ + 3)m−7
≤ δ(δ + 1)((δ + 3)(δ2 − δ)
)m−7
<δ(δ + 1)(δ + 1)3
m7
<m5
m7=
1
m2. �
Now we prove the generalization of Corollary A.1.9 to the case of arbitrary h.
Proposition A.1.11: Let f be chosen uniformly randomly from SIFm,n, where n ≥ m3.
Assume that the adversary knows h plaintexts/ciphertexts pairs E∗(xi, f), 1 ≤ i ≤ h. Let x
be chosen uniformly randomly from [m]∗ = [m] − {xi}hi=1 and let Eh denote the event that
the adversary obtains x from the ciphertext E∗(x, f) based on KP . Then
Pr(Eh) ≤(h+ 1
m− h
)c+
1
m2.
Proof. Without loss of generality, assume that x0 = 0 < x1 < · · · < xh < xh+1 = m+ 1.
Let Dj = [xj−1 + 1, xj − 1], 1 ≤ j ≤ h+ 1. Then⋃1≤j≤h+1
Dj = [m]∗ = [m]− {xi}hi=1.
Let δj = |Dj| = xj − xj−1 − 1 and δ′j = E∗(xj, f) − E∗(xj−1, f) − 1, for 1 ≤ j ≤ h + 1. By
Corollary A.1.9 and Lemma A.1.10, we know that
Pr(Eh|x ∈ Dj) = Pr(E|x ∈ Dj, δ′j > δ2j ) Pr(δ′j > δ2j ) + Pr(E|x ∈ Dj, δ
′j ≤ δ2j ) Pr(δ′j ≤ δ2j )
≤ Pr(E|x ∈ Dj, δ′j > δ2j ) + Pr(δ′j ≤ δ2j )
≤ δ−cj +m−2.
Since∑
1≤j≤h+1 δj = m− h, we have∑1≤j≤h+1
δ1−cj
m− h≤
∑1≤j≤h+1
((m− h)/(h+ 1))1−c
m− h(1)
=
(h+ 1
m− h
)c.
163
Thus, for n ≥ m3, we have
Pr(Eh) =∑
1≤j≤h+1
Pr(E|x ∈ Dj) Pr(x ∈ Dj) ≤∑
1≤j≤h+1
(δ−cj +m−2)δj
m− h
=∑
1≤j≤h+1
δ1−cj
m− h+
∑1≤j≤h+1
δjm2(m− h)
≤(h+ 1
m− h
)c+
1
m2. �
Remark A.1.1: Note that if h = o(mε), 0 < ε < 1, the probability Pr(Eh) =(h+1m−h
)c+
1m2 ≤
(mε+1m−mε
)c+ 1
m2 ≈ 1m(1−ε)c + 1
m2 is a negligible function of the secure parameter logm.
Hence, it implies that in the case n = m3, the probability for the adversary to fully recover
the plaintext is a negligible function of the secure parameter logm if the number of known
plaintext/ciphertext pairs satisfies h = o(mε), where 0 < ε < 1.
Remark A.1.2: Note that the inequality (1) in the proof of Proposition A.1.11 becomes
an equality if and only if δj = m−hh+1
, 1 ≤ j ≤ h + 1. This implies that when the known
plaintexts are evenly distributed in the domain, the attack is most effective. Consider the
following two types of known plaintext attacks. In the first case, the adversary knows the
plaintext/ciphertext pair (1, E∗(1, f)), so by Corollary A.1.9, we have Pr(E1) ≤ 1(m−1)c .
In the second case, the adversary knows the plaintext/ciphertext pair (m2, E∗(m
2, f)), so by
Proposition A.1.11, we have Pr(E1) ≤(
2m−1
)c+ 1
m2 . Since 1(m−1)c ≤
(2
m−1
)c+ 1
m2 , this implies
that the second attack is more effective.
Remark A.1.3: Note that the bound given in Proposition A.1.11 for h = 0 is asymp-
totically identical to that given in Corollary A.1.9. The bound given in Proposition A.1.11
equals 1 when h reaches m2
. Actually, if the adversary knows the plaintext/ciphertext pairs
(xi, f(xi)) where xi = 2i− 1, 1 ≤ i ≤ m/2, then the adversary can reverse any newly given
ciphertext.
Now we show the corresponding lower bound on zh for ideal OPE object in the following
theorem.
164
Theorem A.1.12: For ideal OPE object, there exists a constant 0 < c < 1 such that
for n ≥ m3 > 1,
zh ≥ c logm− hh+ 1
− 1
(ln 2)m2−c .
Proof. Since it has been proved in Proposition A.1.11 that Pr(Eh) ≤(h+1m−h
)c+ 1
m2 , we
have
zh = log1
Pr(Eh)
≥ log1(
h+1m−h
)c+ 1
m2
= c logm− hh+ 1
− log(1 +
(m−hh+1
)cm2
)
≥ c logm− hh+ 1
− log(1 +1
m2−c )
≥ c logm− hh+ 1
− 1
(ln 2)m2−c . (2)
The correctness of inequality (2) is based on the fact that xln 2≥ log(1 + x) for x > 0 since
ddx
( xln 2− log(1 + x)) = 1
ln 2− 1
(ln 2)(1+x)= x
(ln 2)(1+x)> 0 for x > 0. �
Numerically compute the value of c
Here we include a graph showing numerically computed values of c′ = z0logm
as a function of
m. We include the cases n = m2 and n = m3. These estimates translate into estimates for
zh, the number of bits of information that are guaranteed to remain secret from the adver-
sary in the case of an attack with h known plaintext/ciphertext pairs. The corresponding
probability for the adversary to recover the plaintext from the ciphertext without any known
plaintext/ciphertext pairs is(h+1m−h
)c′+ 1
m2 .
As can be seen from Figure A.1, for 20 ≤ m ≤ 500, the value of c′ is well over 0.4,
indicating that more than 40% of the bits of a plaintext are protected from the adversary,
rendering it unlikely for the adversary to recover the complete plaintext despite the order
preserving nature of the encryption scheme. A more precise analysis of the values of c′ for
165
Figure A.1. Numerically Computed c′ = z0/logm Against m.
large m would greatly enhance our understanding of the security of the algorithm for typical
values of m, such as m = 21024. We have proved in Theorem A.1.8 that c′ → c as m → ∞
for both n = m2 and n = m3. We conjecture that c ≈ 0.5 in both cases.
A.2 Security Proof for PPE Schemes
Existing cryptographic security proofs for PPE schemes only reduce the security of real PPE
schemes to the security of the ideal PPE object by showing that they are computationally
indistinguishable. However it is not a complete security proof since the security of the ideal
PPE object is unknown and there has been no security analysis in the literature to show its
166
security level. In this section, we complete the existing security proof by proving that the
ideal PPE object is secure under IND-PCPA.
To prove that the ideal PPE object is secure under IND-PCPA, we need to show the
number of the prefix-preserving functions mapping x0i to E∗(xbi , k) equals to that of the
prefix-preserving functions mapping x1i to E∗(xbi , k), where (x0i , x1i ) are the plaintext pairs
the adversary queries, 1 ≤ i ≤ h. In other words, there is no bias for the adversary’s guess.
However, the proof is not straightforward because it needs to use the prefix-preserving prop-
erty to count the number of prefix-preserving functions mapping x0i (resp. x1i ) to E∗(xbi , k),
where x0i , x1i , and E∗(xbi , k) are indeterminates, 1 ≤ i ≤ h.
To overcome the difficulties, we represent the prefix-preserving function by the tree-based
function. The tree-based function consists of a plaintext tree and a ciphertext tree. The
plaintext tree is a complete binary tree. Each edge connecting a parent node to its left child
node is labeled by 0, and each edge connecting a parent node to its right child node is labeled
by 1. Each leaf node in the plaintext tree is labeled by the binary string composed of the
labels of the edges from the root to itself (the label represents the plaintext string). The
ciphertext tree is the same as the plaintext tree except for its labels. Each edge connecting
a parent node to its left child node could be labeled by 0 or 1. If it is labeled by 0 (resp.
1), then the corresponding edge connecting the parent node to its right child node must
be labeled by 1 (resp. 0). A tree-based function maps the i-th leaf node in the plaintext
tree to the i-th leaf node in the cipertext tree. It implies that the labels of each path in the
ciphertext tree (from the root node to the leaf node) represent the ciphertext of the plaintext
represented by the corresponding path in the plaintext tree. In other words, the label of the
i-th leaf node in the cipertext tree represents the cipertext of the label of the i-th leaf node
in the plaintext tree.
Once the prefix-preserving function is represented by the tree-based function, it suffices
to show that the number of the tree-based functions mapping x0i to E∗(xbi , k) equals to that
of the tree-based functions mapping x1i to E∗(xbi , k), 1 ≤ i ≤ h. An important observation is
that given h plaintext ciphertext pairs, some labels of the edges in the ciphertext tree will be
167
determined while others will not. Also, the number of the undetermined labels of the edges
in the ciphertext tree decides the number of the tree-based functions. Therefore the security
proof can be reduced to show that the number of the undetermined labels of the edges in the
ciphertext tree given (x0i , E∗(xbi , k)) equals to that of the undetermined labels of the edges in
the ciphertext tree given (x1i , E∗(xbi , k)), 1 ≤ i ≤ h. We use mathematical induction on h to
prove the equality of these two numbers.
A.2.1 Tree-Based Function Definition
Before defining the tree-based function, we first define some preliminary concepts. Then,
the tree-based function is formally defined in Definition A.2.3.
Definition A.2.1: Let T = (V T , ET ) be a tree where V T denotes the set of nodes and
ET denotes the set of edges. The nodes in V T can be partitioned into the set of internal
nodes V TI and the set of leave nodes V T
L , where V T = V TI
⋃V TL and V T
I
⋂V TL = ∅.
Let vL,Ti denote the i-th leaf node in T where the leave nodes are indexed from the left
most leaf node (the first) to the right most leaf node (the |V TL |-th), 1 ≤ i ≤ |V T
L |. For v ∈ V TL
with depth n, let P (v) denote the path from the root to v and P (v)[1] · · ·P (v)[n+ 1] denote
the nodes on the path, where P (v)[1] is the root, P (v)[n+ 1] = v, and P (v)[2], · · · , P (v)[n]
are internal nodes connecting root and v. Let PI(v) = {P (v)[i] | 1 ≤ i ≤ n} denote the
set of internal nodes on the path P (v), PL(v) = {v} denote the set of leaf node in the path
P (v), and PE(v) = {P (v)[i]P (v)[i + 1] | 1 ≤ i ≤ n} denote the set of edges on the path
P (v). �
In the tree-based function, the domain of the plaintexts and the range of the ciphertexts
are two labeled trees. The labeling rules (defined in Definition A.2.2) can guarantee the
prefix-preserving property of the tree-based function. In Definition A.2.2, we first define the
internal nodes labeled (INL) tree where the internal nodes are labeled with 0 or 1, and define
the nodes and edges labeled (NEL) tree where the labels are extended from the internal nodes
to the edges and leave nodes.
168
Definition A.2.2 (INL and NEL trees): Internal nodes labeled (INL) tree is defined to
be a pair (T,L), where T = (V T , ET ) is a tree and
L : V TI → {0, 1}
is a label function over internal nodes, which is called INL function. Given an INL tree
(T,L), it uniquely defines the nodes and edges labeled (NEL) tree (T,L∗), where the NEL
function
L∗ : V T⋃
ET → {0, 1}∗
is defined by the following rules.
(1) For v ∈ V TI , L∗(v) , L(v).
(2) Let e ∈ ET where e = v1ev2e and v1e, v2e denote the two endpoints of e. Without loss of
generality, assume that v1e is the parent node and v2e is the child node. Then L∗(e) , L(v1e)
if v2e is the left child node of v1e; L∗(e) , 1⊕ L(v1e) if v2e is the right child node of v1e.
(3) For v ∈ V TL with depth n, L∗(v) is a string of n bits. Let PE(v) = {P (v)[i]P (v)[i +
1] | 1 ≤ i ≤ n} denote the set of edges on the path P (v), where P (v)[1] is the root,
P (v)[n+ 1] = v, and P (v)[2], · · · , P (v)[n] are internal nodes connecting root and v. Then
L∗(v) , L∗(P (v)[1]P (v)[2]) · · · L∗(P (v)[n]P (v)[n+ 1]). �
Now we are ready to define the tree-based function, which is given in Definition A.2.3.
The tree-based function is defined with respect to two NEL trees, which are called the
plaintext tree and the ciphertext tree, respectively. It maps the label of the i-th leaf node
in the plaintext tree to the i-th leaf node in the ciphertext tree.
Definition A.2.3 (tree-based function): The tree-based function is defined with respect
to two NEL trees: the plaintext tree PTl = (TPTl ,L∗PTl) and the ciphertext tree CTl =
(TCTl ,L∗CTl), where TPTl and TCTl are two complete binary trees with heights l. In the
plaintext tree PTl, the INL function LPTl(v) ≡ 0 for any internal node v ∈ VTPTlI . But
in the ciphertext tree CTl, the INL function LCTl(v) could be 0 or 1 for any internal node
169
v ∈ V TCTlI . The INL functions LPTl and LCTl uniquely define the NEL functions L∗PTl and
L∗CTl following the rules defined in Definition A.2.2.
Given PTl = (TPTl ,L∗PTl) and CTl = (TCTl ,L∗CTl), we define the corresponding tree-based
function
fPTl,CTl : {0, 1}l → {0, 1}l
fPTl,CTl(L∗PTl(vL,TPTli )) , L∗CTl(v
L,TCTli ),
where vL,TPTli and v
L,TCTli denote the i-th leave nodes in the plaintext tree and ciphertext
tree, respectively, 1 ≤ i ≤ 2l. Let TBFl denote the set of all tree-based functions, i.e.,
TBFl = {fPTl,CTl | LCTl : VTCTlI → {0, 1}}. �
Remark A.2.1: In the definition of the tree-based function fPTl,CTl , the plaintext tree
PTl is fixed since LPTl is fixed; but the ciphertext tree is not fixed. Since LCTl uniquely
defines L∗CTl , it also determines the ciphertext tree CTl. Therefore, the INL function of
ciphertext tree LCTl uniquely determines the tree-based function fPTl,CTl .
We show the equivalence of the tree-based function and the prefix-preserving function in
Proposition A.2.1.
Proposition A.2.1: TBFl = F PPE{0,1}l,{0,1}l .
Proof. First we show that TBFl ⊆ F PPE{0,1}l,{0,1}l . For any x1, x2 ∈ {0, 1}l, there exist
vL,TPTj1, vL,TPTj2
∈ V TPTL such that L∗PT (vL,TPTj1
) = x1 and L∗PT (vL,TPTj2) = x2. According to the
definition of L∗PT , the paths P (vL,TPTj1) and P (vL,TPTj2
) share |LCP (x1, x2)| many common
edges. Therefore, on the ciphertext tree, the paths P (vL,TCTj1) and P (vL,TCTj2
) also share
|LCP (x1, x2)| many common edges. For any tree-based function fPT,CT ∈ TBFl, we have
fPT,CT (x1) = fPT,CT (L∗PT (vL,TPTj1)) , L∗CT (vL,TCTj1
) and
fPT,CT (x2) = fPT,CT (L∗PT (vL,TPTj2)) , L∗CT (vL,TCTj2
).
Hence, |LCP (fPT,CT (x1), fPT,CT (x2))| = |LCP (x1, x2)| according to the definition of L∗CT .
It implies that fPT,CT ∈ F PPE{0,1}l,{0,1}l . So TBFl ⊆ F PPE
{0,1}l,{0,1}l . Second, according to the
170
definition of TBFl and the cardinality of F PPE{0,1}l,{0,1}l computed in Lemma 6.1.1,
|TBFl| = |{LCT : V TCTI → {0, 1}}| = 2|V
TCTI | = 22l−1 = |F PPE
{0,1}l,{0,1}l |.
Since TBFl ⊆ F PPE{0,1}l,{0,1}l and |TBFl| = |F PPE
{0,1}l,{0,1}l |, consequently TBFl = F PPE{0,1}l,{0,1}l . �
Remark A.2.2: According to the proof of Proposition A.2.1, the prefix-preserving
property of the tree-based function can be geometrically interpreted as follows. For any
x ∈ {0, 1}l, it corresponds to the j-th leaf node vL,TPTlj in the plaintext tree V
TPTlL . Actually
j = B(x) + 1 where B(x) denotes the binary number of x. For x1, x2 ∈ {0, 1}l, the paths
P (vL,TPTlj1
) and P (vL,TPTlj2
) on the plaintext tree share |LCP (x1, x2)| many common edges.
The tree-based function has the prefix-preserving property such that the paths P (vL,TCTlj1
)
and P (vL,TCTlj2
) on the ciphertext tree also share |LCP (x1, x2)| many common edges.
Based on Proposition A.2.1, we give an alternative definition for the ideal PPE object in
Definition A.2.4.
Definition A.2.4 (alternative definition of ideal PPE object): It has the same definition
as that of the original ideal PPE object except that K∗ uniformly randomly selects f from
TBFl instead of F PPE{0,1}l,{0,1}l . �
A.2.2 Security Proof
Now we prove that the ideal PPE object is secure under IND-PCPA. It also implies that
the real PPE schemes, which are computationally indistinguishable to the ideal PPE object,
achieve the highest security notion for PPE. Essentially, we need to show that in the security
notion IND-PCPA, the number of the tree-based functions mapping x0i to E∗(xbi , k) equals
to that of the tree-based functions mapping x1i to E∗(xbi , k), where (x0i , x1i ) are the queried
plaintexts pairs, 1 ≤ i ≤ h. Since LCTl uniquely determines the tree-based function, in
order to count those numbers, we need to consider the effect towards LCTl (partial mapping
will be determined) when given the plaintext ciphertext pairs (x0i , E∗(xbi , k))/(x1i , E∗(xbi , k)),
1 ≤ i ≤ h.
171
Lemma A.2.2: Given h plaintext ciphertext pairs (xi, yi) of fPTl,CTl , then the labels
of the internal nodes on h paths P (vji) are determined, where vji is decided by xi and the
labels are decided by yi, 1 ≤ i ≤ h.
Proof. Consider plaintext ciphertext pair (xi, yi) ∈ {0, 1}l×{0, 1}l such that fPTl,CTl(xi) =
yi. We assume that xi = L∗PTl(vL,TPTlji
) where vL,TPTlji
denotes the ji-th leaf node in the plain-
text tree, and yi = yi1 · · · yil, yiu ∈ {0, 1}, 1 ≤ u ≤ l. According to the definition of tree-based
function,
yi = fPTl,CTl(xi) = fPTl,CTl(L∗PTl(vL,TPTlji
)) = L∗CTl(vL,TCTlji
).
Therefore, L∗CTl(P (vL,TCTlji
)[u]P (vL,TCTlji
)[u+1]) = yiu for 1 ≤ u ≤ l according to the definition
of L∗CTl . It implies that the labels of the edges on the path P (vL,TCTlji
) are determined by yi.
Since the labels of the internal nodes on the path and the labels of the edges on the same path
can be mutually decided, the labels of the internal nodes on the path P (vL,TCTlji
) are decided
by yi. Hence, given the plaintext ciphertext pairs (xi, yi) where xi = L∗PTl(vL,TPTlji
), for the
INL function LCTl , the labels of the internal nodes on the path P (vL,TCTlji
) are determined,
1 ≤ i ≤ h. �
Consider the adversary counting the number of tree-based functions mapping x0i /x1i to
E∗(xbi , k), 1 ≤ i ≤ h. Since the tree-based function is uniquely determined by the INL
function LCTl (Remark A.2.1), it is equivalent to count the number of INL functions. The
important observation is: according to Lemma A.2.2, the labels of the internal nodes on
the corresponding h paths are determined. Therefore it suffices to count the rest undeter-
mined labels since they decides the number of INL functions. Following such idea, we use
mathematical induction on h to prove that the two numbers are identical in Lemma A.2.3.
Lemma A.2.3: The number of the tree-based functions mapping x0i to E∗(xbi , k)) equals
to that of the tree-based functions mapping x1i to E∗(xbi , k)), 1 ≤ i ≤ h.
Proof. Let x0i = L∗PTl(vL,TPTlj0i
) where vL,TPTlj0i
denotes the j0i -th leaf node in the plaintext
tree, and x1i = L∗PTl(vL,TPTlj1i
) where vL,TPTlj1i
denotes the j1i -th leaf node in the plaintext tree,
1 ≤ i ≤ h. For tree-based functions mapping x0i to E∗(xbi , k), the labels of the internal nodes
172
on the path P (vL,TCTlj0i
) in the ciphertext tree are determined; for tree-based functions mapping
x1i to E∗(xbi , k), the labels of the internal nodes on the path P (vL,TCTlj1i
) in the ciphertext tree
are determined, 1 ≤ i ≤ h (Lemma A.2.2). Hence it suffices to prove that the determined
labels of the internal nodes in that two ciphertext trees are assigned consistent values and
the number of the undetermined labels of the internal nodes in that two ciphertext trees are
identical, i.e.
| ∪1≤i≤h PI(vL,TCTlj0i
)| = | ∪1≤i≤h PI(vL,TCTlj1i
)| (6.1)
where PI(v) (defined in Definition A.2.1) denotes the set of internal nodes on the path P (v).
We use mathematical induction on h to prove it. For h = 1, it is obvious that the labels in
PI(vL,TCTlj01
)/PI(vL,TCTlj11
) are assigned consistent values with respect to E∗(xb1, k) according to
the proof of Lemma A.2.2. Also |PI(vL,TCTlj01
)| = l−1 = |PI(vL,TCTlj11
)|. So we assume that the
induction assumption holds for h < h′ and consider the situation for h = h′. According to the
inductional assumption, the labels in ∪1≤i≤h′−1PI(vL,TCTlj0i
)/∪1≤i≤h′−1PI(vL,TCTlj1i
) are assigned
consistent values. Also, |∪1≤i≤h′−1PI(vL,TCTlj0i
)| = |∪1≤i≤h′−1PI(vL,TCTlj1i
)| and |PI(vL,TCTlj0h′
)| =
|PI(vL,TCTlj1h′
)|. Since (x0i , x1i ) ∈ PPPh′ , LCP (x0h′ , x
0i ) = LCP (x1h′ , x
1i ) for 1 ≤ i ≤ h′ − 1
according to the definition of PPPh′ . Note that x0i = L∗PTl(vL,TPTlj0i
) and x1i = L∗PTl(vL,TPTlj1i
) for
1 ≤ i ≤ h′, we have |PI(vL,TPTlj0h′
)∩PI(vL,TPTlj0i
)| = |PI(vL,TPTlj1h′
)∩PI(vL,TPTlj1i
)| for 1 ≤ i ≤ h′−1
according to the definition of NEL function L∗PTl . Therefore
|PI(vL,TCTlj0h′
) ∩ PI(vL,TCTlj0i
)| = |PI(vL,TCTlj1h′
) ∩ PI(vL,TCTlj1i
)| (6.2)
for 1 ≤ i ≤ h′ − 1 according to the conclusions in Remark A.2.2. Without loss of generality,
we assume b = 0. So the labels in PI(vL,TCTlj0h′
)/PI(vL,TCTlj0i
) are assigned consistent values for
1 ≤ i ≤ h′ − 1, i.e., the labels in PI(vL,TCTlj0h′
) ∩ PI(vL,TCTlj0i
) are assigned the same values no
matter with respect to E∗(xbh′ , k) or E∗(xbi , k) for 1 ≤ i ≤ h′ − 1. Consequently, the labels in
PI(vL,TCTlj1h′
) ∩ PI(vL,TCTlj1i
) are assigned the same values no matter with respect to E∗(xbh′ , k)
or E∗(xbi , k) for 1 ≤ i ≤ h′ − 1 based on the proof in Lemma A.2.2 and (6.2), which implies
that the labels in PI(vL,TCTlj1h′
)/PI(vL,TCTlj1i
) are assigned consistent values for 1 ≤ i ≤ h′ − 1.
Therefore, the labels in ∪1≤i≤h′PI(vL,TCTlj0i
)/∪1≤i≤h′PI(vL,TCTlj1i
) are assigned consistent values.
173
Also, let 1 ≤ i0 ≤ h′ − 1 such that
|PI(vL,TCTlj0h′
) ∩ PI(vL,TCTlj0i0
)| = max1≤i≤h′−1
{|PI(vL,TCTlj0h′
) ∩ PI(vL,TCTlj0i0
)|}
and
|PI(vL,TCTlj1h′
) ∩ PI(vL,TCTlj1i0
)| = max1≤i≤h′−1
{|PI(vL,TCTlj1h′
) ∩ PI(vL,TCTlj1i0
)|}.
Then
| ∪1≤i≤h′ PI(vL,TCTlj0i
)| = |(∪1≤i≤h′−1PI(vL,TCTlj0i
)) ∪ PI(vL,TCTlj0h′
)|
= | ∪1≤i≤h′−1 PI(vL,TCTlj0i
)|+ |PI(vL,TCTlj0h′
)| − |PI(vL,TCTlj0h′
) ∩ PI(vL,TCTlj0i0
)|
= | ∪1≤i≤h′−1 PI(vL,TCTlj1i
)|+ |PI(vL,TCTlj1h′
)| − |PI(vL,TCTlj1h′
) ∩ PI(vL,TCTlj1i0
)|
= |(∪1≤i≤h′−1PI(vL,TCTlj1i
)) ∪ PI(vL,TCTlj1h′
)| = | ∪1≤i≤h′ PI(vL,TCTlj1i
)|.
It completes the induction. �
In Theorem A.2.4, we prove the security of the ideal PPE object.
Theorem A.2.4: The ideal PPE object SE∗ is secure under IND-PCPA.
Proof. According to Proposition A.2.1 and Lemma A.2.3, the number of the prefix-
preserving functions mapping x0i to E∗(xbi , k)) equals to that of the prefix-preserving functions
mapping x1i to E∗(xbi , k)), 1 ≤ i ≤ h. Therefore Pr(ExpIND-PCPA-bSE∗,A = 1) = 1
2for b = 0, 1.
Hence,
AdvIND-PCPASE∗,A = Pr(ExpIND-PCPA-1
SE∗,A = 1)− Pr(ExpIND-PCPA-0SE∗,A = 1) = 0,
which implies that the ideal PPE object SE∗ is secure under IND-PCPA. �
174
REFERENCES
[1] M. Abd-El-Malek,W.V.Courtright,C.Cranor,etal.,“Ursaminor:versatilecluster-based
storage,” in USENIX Conference on File and Storage Technology, pp. 13-16, 2005.
[2] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” in ACM SIGMOD
International Conference on Management of Data, pp. 439-450, 2000.
[3] R.Agrawal, J.Kiernan,R.Stikant, andY.Xu, “Order-preserving encryption for numeric
data,”in SIGMOD’04, pp. 563-574, 2004.
[4] G.Amanatidis,A.BoldyrevaandA.O’Neill,“Provably-Secure Schemes for Basic Query
Support in Outsourced Databases,” in Working Conference on Data and Applications
Security 2007 Proceedings, Lecture Notes in Computer Science, Vol. 4602, pp. 14-30,
2007.
[5] F. Armknecht, D. Augot, L. Perret, A. Sadeghi, “OnConstructingHomomorphicEncryption
Schemes from Coding Theory,” in IMA Int. Conf, pp. 23-40, 2011.
[6] G.Bebek,“Anti-tamper database research: Inference control techniques,” Technical Report
EECS 433 Final Report, Case Western Reserve University, 2002.
[7] M.Bellare, T.Kohno, andC.Namprempre, “Authenticated encryption in SSH: provably
fixing the SSH binary packet protocol,” in Proceedings of the 9th ACM conference on
Computer and Communications Security (CCS-02), pp. 1-11, 2002.
[8] M. Bellare, A. Boldyreva, and A. O'Neill, “Deterministic and efficiently searchable
encryption,”inCRYPTO'07, pp. 535-552, 2007.
[9] M. Bellare, M. Fischlin, A. O'Neill, and T. Ristenpart, “Deterministic encryption:
Definitional equivalences and constructions without random oracles,” inCRYPTO'08, pp.
360-378, 2008.
[10] E. Bertino, “Database security - concepts, approaches, and challenges,” in Dependable and
Secure Computing, IEEE Transactions on, Vol.2, pp. 2-19, 2005.
[11] A.Boldyreva,S.Fehr,andA.O'Neill,“Onnotionsofsecurityfordeterministic encryption,
andefficientconstructionswithoutrandomoracles,”inCRYPTO'08, pp. 335-359, 2008.
[12] A.Boldyreva,N.Chenette,Y.Lee,A.O'Neill,“Order-Preserving Symmetric Encryption,”
in Advances in Cryptology - Eurocrypt'09, 2009.
[13] A.Boldyreva,N.Chenette,A.O'Neill,“Order-Preserving Encryption Revisited: Improved
Security Analysis and Alternative Solutions,” in Advances in Cryptology - Crypt'11, 2011.
175
[14] D.Boneh,E.Goh,KNissim, “Evaluating2-DNF Formulas on Ciphertexts,” in TCC, pp.
325-341, 2005.
[15] D.Boneh andB.Waters, “Conjunctive, subset, and range queries on encrypted data,” in
TCC, pp. 535-554, 2007.
[16] Z. Brakerski, C. Gentry, V. Vaikuntanathan, “Fully Homomorphic Encryption without
Bootstrapping,” in Electronic Colloquium on Computational Complexity (ECCC) 18: 111,
2011.
[17] S.Bulygin,T.Rai,“CounteringChosen-Ciphertext Attacks against Noncommutative Polly
Cracker Cryptosystems,” Special Semester on Gröbner Bases, Linz, Austria, 2006.
[18] R. Cramer, I. Damgård, U. Maurer, “General secure multi-party computation from any
linear secret-sharing scheme,” in Advances in Cryptology - EUROCRYPT 2000, Lecture
Notes in Computer Science, Springer-Verlag, Vol. 1807, pp. 316-334, 2000.
[19] R. Cramer, I. Damgard, J. Nielsen, “Multiparty Computation, an Introduction,” 2009,
available from http://cs.au.dk/~jbn/smc.pdf.
[20] Y.Desmedt, “Society andgrouporiented cryptography: annewconcept,” inAdvances in
Cryptography - CRYPTO '87, Springer-Verlag LNCS 293, pp. 120-127, 1987.
[21] Y.DesmedtandY.Frankel, “ThresholdCrypto-Systems,” inAdvances in Cryptography -
CRYPTO '89, Springer-Verlag LNCS 435, pp. 307-315, 1989.
[22] Y. Dodis, L. Reyzin, A. Smith, “Fuzzy extractors: How to generate strong keys frombiometricsandothernoisydata,” inSIAM Journal on Computing, Vol. 38, No. 1, pp. 97-
139, 2008.
[23] M.C. Doganay, T.B. Pedersen, Y. Saygin, Erkay Savas, and Albert Levi, “Distributed
Privacy Preserving k-Means Clustering with Additive Secret Sharing,” in International
Workshop on Privacy and Anonymity in the Information Society (PAIS), 2008.
[24] J. Dowd , S. Xu , W. Zhang, “Privacy-preserving decision tree mining based on random
substitutions,” in International Conference on Emerging Trends in Information and
Communication Security, 2005.
[25] S. Dziembowski, K. Pietrzak, “Leakage-Resilient Cryptography,” inFOCS '08, pp. 293-
302, 2008.
[26] R.Endsuleit,W.Geiselmann,R.Steinwandt,“AttackingaPolynomial-Based Cryptosystem:
Polly Cracker,” in International Journal of Information Security, Vol. 1, No. 3, pp. 143-148,
2002.
[27] M.FellowsandN.Koblitz,“CombinatorialCryptosystemsGalore!,” in Finite fields: theory,
applications, and algorithms, Vol. 168, pp. 51-61, 1994.
[28] W.Geiselmann,R.Steinwandt,“CryptanalysisofPolly Cracker,” in IEEE Transactions on
Information Theory 48(11), pp. 2990-2991, 2002.
176
[29] P.Gemmell,“Anintroductiontothresholdcryptography,”inCryptobytes, pp. 7-12, 1997.
[30] R.Gennaro,S.Jarecki,H.KrawczykandT.Rabin,“Robustandefficientsharing of RSA
functions,” in Advances in Cryptology - CRYPTO '96, Springer-Verlag LNCS 1109, pp.
157-172, 1996.
[31] C.Gentry,“Fullyhomomorphicencryptionusingideallattices,” in 41st ACM Symposium on
Theory of Computing (STOC), 2009.
[32] C. Gentry,“Aworking implementation of fully homomorphic encryption,” available from
http://eurocrypt2010rump.cr.yp.to/9854ad3cab48983f7c2c5a2258e27717.pdf.
[33] GMU MP library, available from http://gmplib.org/.
[34] O. Goldreich, S. Micali, and A. Wigderson, “How to play ANY mental game,” in
Proceedings of the nineteenth annual ACM conference on Theory of computing, pp. 218-
229. ACM Press, 1987.
[35] O. Goldreich, “Foundations of Cryptography: Volume 1, Basic Tools,” Cambridge
University Press, ISBN-10: 0521035368, 2007.
[36] O. Goldreich, “Foundations of Cryptography: Volume 2, Basic Applications,” Cambridge
University Press, ISBN-10: 052111991X, 2009.
[37] S. Goldwasser, S. Micali, “Probabilistic Encryption,” in Special issue of Journal of
Computer and Systems Sciences, Vol. 28, No. 2, pp. 270-299, 1984.
[38] S.C. Gultekin Ozsoyoglu, D. Singer, “Anti-tamper databases: Querying encrypted
databases,”inConference on Database and Applications Security, 2003.
[39] H.Hacig¨um¨us,B.R.Iyer,C.Li,andS.Mehrotra,“ExecutingSQLoverencrypteddatain
the database-service-provider model,” in Proceedings of the ACM SIGMOD Conf. on
Management of Data, Madison,Wisconsin, 2002.
[40] H. Hacig um us, B.R. Iyer, andS.Mehrotra,“Efficient Execution of Aggregation Queries
over Encrypted Relational Databases,” in Database Systems for Advanced Applications,
Vol. 2973, pp. 633-650, 2004.
[41] M. Halloush and M. Sharif, “Global heuristic search on encrypted data (GHSED),” in
International Journal of Computer Science Issues (IJCSI), Vol. 1, pp. 13-17, 2009.
[42] D.Hofheinz,R.Steinwandt, “ADifferentialAttack onPollyCracker,” in Proceedings of
IEEE International Symposium on Information Theory, pp. 211, 2002.
[43] Z. Huang, W. Du, and B. Chen, “Deriving private informaiton from randomized data,” in
ACM SIGMOD International Conference on Management of Data, pp. 37-47, 2005.
[44] M. Kane and D. Kawamoto, “Oracle buys PeopleSoft for $10 billion,” available from
http://news.cnet.com/Oracle-buys-PeopleSoft-for-10-billion/2100-1001\_3-5488298.html.
177
[45] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, “On the privacy preserving properties
of random data perturbation techniques,” in IEEE International Conference on Data Mining,
2003.
[46] J.Katz,Y. Lindell, “Introduction toModernCryptography: Principles and Protocols,” in
Chapman & Hall/CRC, 2007.
[47] F. Levy-dit-Vehel, L. Perret, “APollyCrackerSystemBasedonSatisfiability,” in Progress
in Computer Science and Applied Logic, pp. 177-192, 2004.
[48] J. Li, E.R.Omiecinski, “Efficiency and security trade-off in supporting range queries on
encrypteddatabases,”inData and Applications Security, pp. 69-83, 2005.
[49] M. Luby and C. Racko, “How to construct pseudo-random permutations from pseudo-
random functions,” in SIAM Journal of Computing, Vol. 17, No. 2, pp. 373-386, 1988.
[50] L. Ly, “Polly Two − A New Algebraic Polynomial-Based Public-Key Scheme,” in
Applicable Algebra in Engineering, Communication and Computing, Vol. 17, pp. 267-283,
2006.
[51] M.M. Mano, “Digital Design,” Prentice Hall; 3 edition, August 1, 2001.
[52] M.Naor,B.Pinkas,O.Reingold, “DistributedPseudo-RandomFunctions andKDCs,” in
Advances in Cryptology EUROCRYPT'99, pp. 327-346, 1999.
[53] G. Ozsoyoglu, D. Singer, S.S. Chung, “Anti-tamper databases: Querying encrypted
databases,” in Proceedings of the 17th Annual IFIP WG 11.3 Working Conference on
Database and Applications Security, Estes Park, Colorado, 2003.
[54] K. Pagiamtzis andA. Sheikholeslami, “Content-addressable memory (CAM) circuits and
architectures: A tutorial and survey,” in IEEE Journal of Solid-State Circuits, Vol. 41, No.
3, pp. 712–727, 2006.
[55] P. Paillier“Public-Key Cryptosystems Based on Composite Degree Residuosity Classes,” in
EUROCRYPT’99, pp. 223-238, 1999.
[56] T. Pederson, “A threshold crypto-system without a trusted dealer,” in Advances in
Cryptology - EUROCRYPT '91, Springer-Verlag LNCS 547, pp. 522-526, 1991.
[57] PlanetLab, available from http://www.planet-lab.org.
[58] P.K. Prasad, C.P. Rangan, “Privacy Preserving BIRCH Algorithm for Clustering overArbitrarilyPartitionedDatabases,”ADMA, pp. 146-157, 2007.
[59] T.Rai, “InfiniteGröbner bases and Noncommutative Polly Cracker Cryptosystems,”PhD
Thesis, Virginia Polytechnique Institute and State Univ, 2004.
[60] R.L. Rivest, A. Shamir, L. Adleman, “A Method for Obtaining Digital Signatures and
Public-Key Cryptosystems,” in Communications of the ACM 21 (2), pp. 120–126, 1978.
178
[61] R.L. Rivest, L. Adleman and M.L. Dertouzos, “On data banks and privacy
homomorphisms,” in Foundations of Secure Computation, eds. R. A. Demillo et al.,
Academic Press, pp. 167-179, 1978.
[62] T.Sander,A.Young,M.Yung,“Non-Interactive CryptoComputing For NC1,” in FOCS'99,
pp. 554-567, 1999.
[63] A.Shamir,“Howtoshareasecret,” in Communications of the ACM, Vol. 22, Issue 1, pp.
612-613, 1979.
[64] E. Shi, J. Bethencourt, T-H.H.Chan,D. Song, andA. Perrig, “Multi-dimensional range
queryoverencrypteddata,”inSymposium on Security and Privacy, pp. 350-364, 2007.
[65] N.P.SmartandF.Vercauteren,“Fullyhomomorphicencryptionwithrelativelysmallkeyand ciphertext sizes,” in PKC’10, pp. 420-443, 2010.
[66] D.X.Song,D.Wagner,A.Perrig,“Practicaltechniquesforsearchesonencrypteddata,”in
IEEE Symposium on Security and Privacy, pp. 44-55, 2000.
[67] D.StehleandR.Steinfeld,“FasterFullyHomomorphicEncryption,” in Cryptology ePrint
Archive: Report 2010/299, 2010.
[68] R.Steinwandt,“ACiphertext-Only Attack on Polly Two,”preprint,2006.
[69] B.M. Thuraisingham, “Multilevel Secure Database Management System,” in Encyclopedia
of Database Systems, pp. 1789-1792, 2009.
[70] J. Vaidya, C. Clifton, “Privacy-preserving k-means clustering over vertically partitioned
data,” inProceedings of the ninth ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 206- 215, 2003.
[71] M.vanDijk,C.Gentry,S.Halevi,andV.Vaikuntanathan,“Fullyhomomorphicencryption
over the integers,” in EUROCRYPT'10 Proceedings of the 29th Annual international
conference on Theory and Applications of Cryptographic Techniques, pp. 24-43, 2010.
[72] L. Xiao, I. Yen, “A Note for the Ideal Order-Preserving Encryption Object and Generalized
Order-PreservingEncryption,”http://eprint.iacr.org/2012/350.pdf.
[73] L. Xiao, I. Yen, “Security Analysis and Enhancement for Prefix-Preserving Encryption
Schemes,”submittedtoAsiacrypto’12, http://eprint.iacr.org/2012/191.pdf.
[74] L. Xiao, I. Yen, D.T. Huynh, “Extending Order Preserving Encryption for Multi-User
Systems,”submittedtoInfocom’13, http://eprint.iacr.org/2012/192.pdf.
[75] L.Xiao, I.Yen,D.Lin,“SecurityAnalysis for anOrderPreservingEncryptionScheme,”Tech Report UTDCS-06-10, 2010, available from http://utdallas.edu/~xll052000/OPEproof-
TR1.pdf, revised version: http://utdallas.edu/~xll052000/OPEproof-TR2.pdf.
[76] L.Xiao,O. Bastani, I. Yen, “AnEfficientHomomorphic Encryption Protocol forMulti-
UserSystems,”submittedtoICDE’13, http://eprint.iacr.org/2012/193.pdf.
179
[77] L.Xiao,O.Bastani,I.Yen,“SecurityAnalysisforOrderPreservingEncryptionSchemes,” in CISS, 2012.
[78] J.Xu,J.Fan,M.H.Ammar,andS.B.Moon,“Prefix-preserving IP address anonymization:
Measurement-based security evaluation and a new cryptography-based scheme,” in IEEE
International Conference on Network Protocols, pp. 280-289, 2002.
[79] L.Xu,“Hydra:aplatformforsurvivableandsecuredatastoragesystems,” in Proceedings
of the 2005 ACM workshop on Storage security and survivability, pp. 108-144, 2005.
[80] A.C. Yao, “Protocols for Secure Computations (Extended Abstract),” FOCS, pp. 160-164,
1982.
180
VITA