6
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa Department of Computer Science 1

PRIVACY AND security Issues IN Data Mining

Embed Size (px)

DESCRIPTION

Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti. P.h.D . Candidate: Anna Monreale. PRIVACY AND security Issues IN Data Mining. University of Pisa Department of Computer Science. Privacy-Preserving Data Mining. New privacy-preserving data mining techniques: - PowerPoint PPT Presentation

Citation preview

Page 1: PRIVACY AND security Issues  IN Data Mining

1

PRIVACY AND SECURITY ISSUES

IN DATA MINING

P.h.D. Candidate:Anna Monreale

SupervisorsProf. Dino PedreschiDott.ssa Fosca Giannotti

University of Pisa Department of Computer Science

Page 2: PRIVACY AND security Issues  IN Data Mining

2

Privacy-Preserving Data Mining

New privacy-preserving data mining techniques: For individual privacy: Personal data are private For corporate privacy: Knowledge extracted is private

Goal: to develop algorithms for modifying the original data, so that private data are protected private knowledge remain private even after the mining

tasks Analysis results are still useful

Natural trade-off between privacy quantification and data utility

Page 3: PRIVACY AND security Issues  IN Data Mining

3

Secure Outsourcing of Data Mining

all encrypted transactions in D* and items contained in it are secure given any mining query the server can compute the encrypted result encrypted mining and analysis results are secure the owner can decrypt the results and so, reconstruct the exact result the space and time incurred by the owner in the process has to be

minimum

The server has access to data of the owner

Data owner has the property of Data Knowledge extracted

from data

Page 4: PRIVACY AND security Issues  IN Data Mining

4

A Solution for Pattern Mining: K-anonymity

Attack Model: the attacker knows the set of plain items and their true supports in D exactly and has access to the encrypted database D∗

Item-based attack: guessing the plain item corresponding to the cipher item e with probability prob(e)

Itemset-based attack: guessing the plain itemset corresponding to the cipher itemset E with probability prob(E)

+

Encryption: Replacing each plain item in

D by a 1-1 substitution cipher Adding fake transactions K-Anonymity: for each item

e there are at least others k-1 cipher items Decryption: A Synopsis allows computing the actual support of every pattern

Page 5: PRIVACY AND security Issues  IN Data Mining

5

Privacy-Preserving DT Framework

GOAL: publishing and sharing various forms of data without disclosing sensitive personal information while preserving mining results Sequence data Query-Log data ….…

Problem: Anonymizing sequence data while preserving sequential pattern mining results

Attack Model: Sequence Linking Attack The attacker knows part of a sequence and want to guess the

whole correct sequence Idea: Combining k-anonymity and sequence hiding

methods and reformulating the problem as that of hiding k-infrequent sequences

Page 6: PRIVACY AND security Issues  IN Data Mining

Running example: k = 2

Dataset DB CA B C DA B C DB C EB C D

Dataset D’B CA B C DA B C DB CA B C D

Root

B:3

C:3

E:1

A:2

B:2

C:2

D : 2D:1

Prefix Tree Construction

Tree Pruning

Tree Reconstruction Generation of D’

LCS:1. B C 2. B C D

Root

B:3

C:3

E:1

A:2

B:2

C:2

D : 2D:1

LcutB C E : 1B C D : 1

Root

B:1

C:1

A:2

B:2

C:2

D : 2Root

B:2

C:2

A:3

B:3

C:3

D : 3

Root

B:2

C:2

A:2

B:2

C:2

D:2

6