Upload
pushpalanka-jayawardhana
View
88
Download
0
Tags:
Embed Size (px)
Citation preview
AN EFFICIENT APPROXIMATE PROTOCOL
FOR
PRIVACY PRESERVING ASSOCIATION RULE MINING
MURAT KANTARCIOGLU, ROBERT NIX AND JAIDEEP VAIDYA (2009)
- BY
PUSHPALANKA JAYAWARDHANA
158217G
CONTENTS
• Introduction
• Two Types of Techniques
• Problem
• Proposing Solution
• Related Work
• Bloom Filters
• Redefine the Problem
• Approximate Threshold Dot Product Algorithm
• Security
• Computational and Communicational Cost
• Experimental Results
• Accuracy
• Efficiency
• Conclusion
INTRODUCTION
• Association Rule Mining
• An important data mining model studied extensively by the database and data mining community
• A method for discovering interesting relations between variables in large databases
• "Beer and diaper" story
• Buying Diapers==> Buying Beer
• Privacy Preserving in Association Rule Mining
• More parties are interested in learning the global association rules
• None is willing to reveal the data at individual sites
TWO TYPES OF TECHNIQUES
• Perturbation Based methods
• Locally perturb data before delivering to the data miner
• Special techniques are used to reconstruct the original distribution
• Mining algorithm needs to be modified to consider that data is perturbed
• Have security concerns
• Secure Multiparty Computation Techniques
• Each party builds decision tree
• Only the final decision tree is shared, not any other data
• Use cryptographic techniques for security
• Computationally intensive
PROPOSING SOLUTION
An approximate protocol for computing the dot product of two
vectors owned by two different parties
RELATED WORK
• A similar approximation protocol is already proposed with sampling techniques
• A solution is present with bloom filters
• Rule mining is done centrally
• Goethals' s encryption mechanism
• simple and secured
• Calculate exact dot product
• Run time O(n)
BLOOM FILTERS
• A probabilistic data structure• Used to test on membership of
a set• False positives are possible• No false negatives• Can be used to approximate
the intersection size between two sets
REDEFINE PROBLEM
• Compute the scalar product
• Checks if the scalar product of two distributed vectors is greater than some threshold
X1 . X2 = |S1 ∩ S2| ≥ t
APPROXIMATE THRESHOLD DOT PRODUCT ALGORITHM
• Each party creates own bloom filter, using common parameters.
• size of the bloom filter - m
• hash functions - h1, h2, ..................., hk
• Participate in the secure dot product algorithm using private bloom filters and get the random shares of the dot product result
• each party participates in secure multiplication protocol using private dot product results to get the random share of the multiplication result
• Finally, each party participate in a secure comparison protocol to approximate the final result.
SECURITY
• Preserved under following assumptions
• Parties are semi-honest
• Dot product, multiplication and comparison protocols are secure
COMPUTATIONAL AND COMMUNICATIONAL COST
• O(nk) for hashing for bloom filters, rest is O(1)
• Hashing cost is negligible compared to public key operations
• m<<n --> faster
• Flexible to use if a better secure dot product computing protocol if found in the future
• communication cost propotional to m --> low cost
EXPERIMENTAL RESULTS
• Consider effect of,
• vector length (l),
• vector density (d)
• the actual intersection of the two vectors (i)
• the bloom filter parameters
• m (length of filter)
• k(number of hash functions)
on the performance of the algorithm.
ACCURACY -3
Even for a large vector, same accuracy can be achieved with sub-linear increase in filter length
ACCURACY -4
• At 0 density no error
• Drastically increase error at high densities
• Good for sparse vectors
CONCLUSIONS
• Propose an efficient and secure protocol to approximately compute scalar product in a privacy preserving manner.
• Efficiency is gained by allowing an approximation than an exact answer
• Extending to work with more than 2 parties is a future work