Upload
rosaline-dennis
View
246
Download
0
Embed Size (px)
Citation preview
Authenticated Online Data Integration Services 1
Authenticated Online Data Integration Services
Qian Chen, Haibo Hu, Jianliang XuHong Kong Baptist University
Authenticated Online Data Integration Services 2
Data Integration Services
Query
Integration Server (IS)
ResultData Sources
Client
• Combining data from multiple sources• Providing users with a unified query interface
Authenticated Online Data Integration Services 3
Example: Metasearch Engines
Client
Airlines
IntegrationServer (IS)
CX105 $617 HK->MEL
CX135 $617 HK->MEL
CX105 $617 HK->MEL
CX135 $617 HK->MEL
QF30 $594 HK->MEL
QF98 $698 HK->MEL
MH73 $691 HK->MEL
MH79 $699 HK->MEL
QF30Price: $594
QF30 $594 HK->MEL
QF98 $698 HK->MEL
MH73 $691 HK->MEL
MH79 $699 HK->MEL
Incorrect results• Hacking attack• Incomplete search• Program bug • In favor of sponsor
HK -> MEL
CX105Price: $617
Authenticated Online Data Integration Services 4
Meta-analysis◦ Life science research (e.g., virus spread and
disease control) requires collection of disparate datasets
◦ Example: DataNet◦ The server may be compromised by cyber attack
Collaborative data fusion◦ Online collaborative data platforms◦ Examples: Wikipedia, Wikisensing, Wikidata◦ May alter some critical results due to political or
financial reasons
More Examples
Authenticated Query ProcessingEnable clients to verify the correctness of query
results
Authenticated Online Data Integration Services 5
0 1 2 3
Preliminaries: Merkle Hash Tree
Service Provider
Data Owner
Client
{1, 3, 4, 5}
𝑄=[1 ,1]R
𝑠𝑖𝑔(𝑁𝑟𝑜𝑜𝑡 )}
DatasetMHT,
𝑁1 :h (𝑑1)𝑁 2:h (𝑑2)𝑁 3: h(𝑑3)𝑁 4 :h(𝑑4)
𝑁12 :h(𝑁1∨𝑁 2) 𝑁 34 : h(𝑁 3∨𝑁 4)
𝑁𝑟𝑜𝑜𝑡 :h(𝑁12∨𝑁 34)Merkle Hash Tree (MHT)
Sign 𝑠𝑖𝑔(𝑁𝑟𝑜𝑜𝑡)
Dataset
Q=[1,1]
• Soundness: 1 [1,1]; root• Completeness: 0,[1,1]
?
Verify
0 1 2 3
Authenticated Online Data Integration Services 6
Problem definition & challenges Design intuition
◦ Homomorphic secret sharing seal () Query authentication schemes
◦ Prefix tree◦ Indexes: -G-tree & -R-tree
Experiments Conclusion
Roadmap
Authenticated Online Data Integration Services 7
Integrated dataset ◦ is from source
Range Query
Objectives:◦ Enable authenticated query processing
Soundness Completeness Freshness
◦ Minimize the verification overhead◦ Support efficient data/seal update
Problem Definition
Authenticated Online Data Integration Services 8
Data collected from multiple sources independently◦ How to prove completeness◦ Minimize cost of authenticated query processing
Publicly verifiable ◦ Client and data sources don’t share secret keys
A naïve solution◦ Server sends the whole dataset & signatures to the client
Do local search◦ Drawback: the overhead is linear to the dataset size
Challenges
Authenticated Online Data Integration Services 9
Verify far-away non-result values without using the whole dataset◦ proves ◦ proves
Issue: have sigs; don’t
Need special signature
Design Intuition
Prefix Tree based on to
Authenticated Online Data Integration Services 10
Secret sharing scheme (completeness)◦ A secret among users◦ Each takes a piece of secret share ◦ Final secret
Each data source binds secret share with value◦ binds with
Total secret is recovered from all values◦ has secret share
Design Intuition
Authenticated Online Data Integration Services 11
Main Contributions
Homomorphic Secret
Sharing Seal ()
Authenticated Prefix Tree
-G-Tree &-R-Tree
Extended Queries & Optimizations
Authenticated Online Data Integration Services 12
Content:
Seal design◦ Seal is “additively” homomorphic◦ Seal can be folded by the integration server
from
Homomorphic Secret Sharing Seal
Authenticated Online Data Integration Services 13
Homomorphism◦ Completeness
Embedded a secret sharing scheme◦ Seal folding
Generate seals for internal nodes◦ Update efficient
Cancel out the old seals RSA based signature
◦ Publicly verifiable
Seal Designs
Authenticated Online Data Integration Services 14
Authenticated Prefix Tree
{1, 3, 4, 5}
0 1
𝑠𝑒𝑎𝑙(𝑑1)𝑠𝑒𝑎𝑙(𝑑2) 𝑠𝑒𝑎𝑙(𝑑3)𝑠𝑒𝑎𝑙(𝑑4)
𝑆12=𝑠𝑒𝑎𝑙1⊗𝑠𝑒𝑎𝑙2 𝑆34=𝑠𝑒𝑎𝑙3⊗𝑠𝑒𝑎𝑙4
𝑆𝑟𝑜𝑜𝑡=𝑆12⊗𝑆34
Data
Q=[1,1]
00 01 10 11
IntegrationServer
Data Sources
Client
𝑄=[1 ,1]
• Soundness: [1,1]; • Completeness: ,1[1,1]; secret
Verify
Dataset
& Seals
RVO {,}
Authenticated Online Data Integration Services 15
-G-Tree
Efficient to update, but may deteriorate under skew distribution
Authenticated Online Data Integration Services 16
-R-Tree
Tightly clustered, but an update may cause cascading tree reconstructions
Authenticated Online Data Integration Services 17
Proven security with thorough theoretical analysis
Cost models for the query authentication schemes
Optimizations◦ -G-Tree
More efficient update Lazy-update
◦ -R-Tree Reduce reconstruction Loose bounds for data value
Extension to advanced queries: kNN, skyline
More Analysis and Optimizations
Authenticated Online Data Integration Services 18
Experiment Settings Dataset:
◦ Gowalla dataset in Stanford Large Network Dataset Collection 6,442,890 user check-ins 1,280,969 unique locations with a non-spatial score
◦ Weather dataset from NWS Cooperative Observer Program 10,000 volunteers report daily weather observation
Server: Dual 4-core Intel Xeon X5570 2.93GHz CPU and 32GB RAM, running GNU/Linux, OpenJDK 1.6
Client: Core 2 Quad processor and 4GB RAM, WinXP RSA (2048 bits), AES(256 bits) h(): SHA-1 (160 bits)
Authenticated Online Data Integration Services 19
Server Construction Cost
The construction cost is linear to the dataset size
G-tree: -G-tree R-tree: -R-tree
Authenticated Online Data Integration Services 20
Basic Query Auth Performance
Both index trees outperform the naïve solution.R-tree is better than G-tree since it is more compact.
G-tree: -G-tree R-tree: -R-tree
Authenticated Online Data Integration Services 21
Index Update Performance
G-tree updates more efficiently than R-tree.Both optimizations improve performance by 20-
30%.
G-tree: -G-tree R-tree: -R-tree Lazy-G-tree: lazy-updateG-tree Loose-R-tree: loosely-bounded R-tree
Authenticated Online Data Integration Services 22
Designed a novel signature homomorphic secret sharing seal ()
Proposed two indexes for authenticating multi-dimensional queries over integrated data
Extended to advanced queries, proposed update schemes, and provided formal cost models
Experimentally validated the performance with proven security
Summary of Contributions
Authenticated Online Data Integration Services 24
Varying Query Ranges
Cost is linear to the query range (i.e., the result size )
G-tree: -G-tree R-tree: -R-tree
Authenticated Online Data Integration Services 25
Varying Data Dimensionality
G-tree deteriorates faster since R-tree is more compact
G-tree: -G-tree R-tree: -R-tree