15
Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Embed Size (px)

Citation preview

Page 1: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Tools for Privacy Preserving Distributed Data Mining

By Michael Holmes

Page 2: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Why Private Data Mining

❖ The CDC may want to use data mining techniques to identify trends in disease outbreaks.

❖ Insurance companies have useful data but can’t disclose it because of privacy concerns.

❖ Is there a way to obtain this data without revealing the identity of the patients?

Page 3: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Private Data Mining Techniques

❖ Secure Sum

❖ Secure Set Union

❖ Secure Size of Set Intersection

❖ Scalar Product

Page 4: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Private Data Mining Toolkit

❖ Association Rules in horizontally partitioned data

❖ Association Rules in vertically partitioned data

❖ EM Clustering

Page 5: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Secure Sum

❖ Securely compute the sum from individual databases.

❖ Have a site randomly generate a number R

❖ Add this number to every value and send it to site 2.

❖ Site 2 can then add each of it’s values to that values sent from site 1 and return a single number back to Site 1.

❖ Site 1 can then remove the random number N times and find the correct sum.

Page 6: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Secure Sum

Page 7: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Secure Set Union

Page 8: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Secure Size of Set Intersection

❖ Only possible with Commutative Encryption.

❖ very party encrypts their data and then sends it to another party.

❖ The next party also encrypts the encrypted data.

❖ After all parties have encrypted all the data from every other party only that has been duplicated by the encryption is shared.

❖ Count the duplicates and you know the size of the intersection.

Page 9: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Scalar Product

❖ Want to compute the sum of x1 * y1 between two databases

❖ Use linear combinations of random numbers to disguise elements and then computationally remove these once you get the result.

Page 10: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Association Rules in Horizontally Partitioned Data

❖ Candidate Set Generation

❖ Local Pruning

❖ Itemset Exchange (Secure Union Step here)

❖ Support Count Exchange

Page 11: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Association Rules in Vertically Partitioned Data

❖ Uses scalar product to determine if the count of an item set is greater than a threshold

❖ If the count is above the threshold you’ve determined that the database is worth querying

❖ Can also user Secure Size Set Intersection to see how much is in common.

❖ Useful when using algorithm such as apriori algorithm

Page 12: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

EM Clustering

❖ Uses secure sum to get a global number associated with all sites involved.

❖ Once global sum is computed, it can be used in the Expectation-maximization method to generate staistical models.

Page 13: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

EM Clustering

❖ Uses secure sum to get a global number associated with all sites involved.

❖ Once global sum is computed, it can be used in the Expectation-maximization method to generate staistical models.

Page 14: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Things to Note

❖ These algorithms are not fully private, some information is learned in the process.

❖ For example in the set intersection, sites can potentially learn the sizes of each database.

❖ Make sure to pick the appropriate algorithms for what you need to accomplish

❖ Watch out for intermediate information being leaked!

Page 15: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Thank you