32
Motivation k -Anonymous Patterns Condensed Representation Summary k -Anonymous Patterns M. Atzori 1,2 F. Bonchi 2 F. Giannotti 2 D. Pedreschi 1 1 Department of Computer Science University of Pisa 2 Information Science and Technology Institute CNR, Pisa Speaker: Maurizio Atzori 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 M. Atzori, F. Bonchi, F. Giannotti,D. Pedreschi k -Anonymous Patterns

-Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

k -Anonymous Patterns

M. Atzori1,2 F. Bonchi2 F. Giannotti2 D. Pedreschi1

1Department of Computer ScienceUniversity of Pisa

2Information Science and Technology InstituteCNR, Pisa

Speaker: Maurizio Atzori

9th European Conference on Principles and Practice ofKnowledge Discovery in Databases (PKDD), 2005

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 2: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Outline

1 MotivationData Mining and Privacy of IndividualsAn Example of the Problem Addressed

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 3: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Outline

1 MotivationData Mining and Privacy of IndividualsAn Example of the Problem Addressed

2 k-Anonymous PatternsDefinitions and PropertiesFirst Results on Inference Channels

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 4: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Outline

1 MotivationData Mining and Privacy of IndividualsAn Example of the Problem Addressed

2 k-Anonymous PatternsDefinitions and PropertiesFirst Results on Inference Channels

3 Condensed RepresentationDefinitions and PropertiesBenefits of the Condensed Representation

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 5: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

A Taxonomy of Privacy Preserving Data Mining

1 Intesional Knowledge HidingBertino’s approach, based on DB Sanitization

2 Extensional Knowledge HidingAgrawal’s approach, based on DB RandomizationSweeney’s approach, based on DB Anonymization

3 Distributed Extensional Knowledge HidingClifton’s approach based on Secure Multiparty Computation

4 Secure Intesional Knowledge SharingClifton’s Public/Private/Unknown Attribute FrameworkZaïane’s Association Rule Sanitization (but also IKH)k -Anonymous Patterns – Our approach

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 6: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

The Purpose

We want to publish datamining results (like SecureIntesional Knowledge Sharing)

We DON’T want to release information related to fewpeople, that can help to trace single individuals

We don’t want to specify any other information

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 7: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

A Motivating Example in the Medical Domain

Example

Suppose Dr. Gregory House conduces both usual hospitalactivities and research

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 8: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

A Motivating Example in the Medical Domain

Example

Suppose Dr. Gregory House conduces both usual hospitalactivities and research

He has a big database with all sensitive information abouthis patients

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 9: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

A Motivating Example in the Medical Domain

Example

Suppose Dr. Gregory House conduces both usual hospitalactivities and research

He has a big database with all sensitive information abouthis patients

Playing with Data Mining, he discovered interesting trendsabout patologies in his patient data

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 10: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

A Motivating Example in the Medical Domain

Example

Suppose Dr. Gregory House conduces both usual hospitalactivities and research

He has a big database with all sensitive information abouthis patients

Playing with Data Mining, he discovered interesting trendsabout patologies in his patient data

Question

Can Dr. House publish his discoveries to third persons withoutoffending the privacy of his patients?

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 11: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

A Motivating Example in the Medical Domain

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 12: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

A Motivating Example in the Medical Domain

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 13: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

An Association Rule Can Be Used to Break Anonymity

Example

a1 ∧ a2 ∧ a3 ⇒ a4 [sup = 80, conf = 98.7%]

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 14: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

An Association Rule Can Be Used to Break Anonymity

Example

a1 ∧ a2 ∧ a3 ⇒ a4 [sup = 80, conf = 98.7%]

sup({a1, a2, a3}) =sup({a1, a2, a3, a4})

conf≈

800.987

= 81.05

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 15: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

An Association Rule Can Be Used to Break Anonymity

Example

a1 ∧ a2 ∧ a3 ⇒ a4 [sup = 80, conf = 98.7%]

sup({a1, a2, a3}) =sup({a1, a2, a3, a4})

conf≈

800.987

= 81.05

In other words, we know that there is just one individual forwhich the pattern a1 ∧ a2 ∧ a3 ∧ ¬a4 holds.

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 16: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

Now we know that...

Fact

Even if we mine with a high support value, we can inferpatterns holding in the original database which are notintentionally released

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 17: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

Now we know that...

Fact

Even if we mine with a high support value, we can inferpatterns holding in the original database which are notintentionally released

They can regards very few individuals

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 18: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Data Mining and Privacy of IndividualsAn Example of the Problem Addressed

Now we know that...

Fact

Even if we mine with a high support value, we can inferpatterns holding in the original database which are notintentionally released

They can regards very few individuals

The support value of such patterns can be inferred withoutaccessing the database

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 19: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesFirst Results on Inference Channels

What do you mean with “k -Anonymous Pattern”?

Definition (Anonymous Pattern)

Given a database D and an anonymity threshold k , a pattern pis said to be k-anonymous if sup

D(p) ≥ k or sup

D(p) = 0.

Definition (Inference Channel)

An Inference Channel is any set of itemsets from which it ispossible to infer that a pattern p is not k-anonymous.

We are interested in inference channels that are made offrequent itemsets.

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 20: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesFirst Results on Inference Channels

Example

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 21: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesFirst Results on Inference Channels

Example

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 22: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesFirst Results on Inference Channels

Reducing the Number of Patterns to Check

Theorem

∀p ∈ Pat(I) : 0 < supD

(p) < k . ∃ I ⊆ J ∈ 2I : CJI .

Translation: we can prune the search space by looking forInference Channels regarding only conjunctive patterns.

This property makes possible to havea (Naïve) Inference Channel Detector Algorithm

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 23: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesBenefits of the Condensed Representation

What do you mean with “Condensed Representation”?

Definition (Partial Order on Inference Channels)

CJI � CL

H when I ⊆ H and (J \ I) ⊆ (L \ H)

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 24: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesBenefits of the Condensed Representation

What do you mean with “Condensed Representation”?

Definition (Partial Order on Inference Channels)

CJI � CL

H when I ⊆ H and (J \ I) ⊆ (L \ H)

Example

Caca � Cabcd

ab

Intuitively, a ∧ ¬c is less specific than a ∧ b ∧ ¬c ∧ ¬d , since thetransactions s.t. a ∧ ¬c are a superset of the transactions s.t.a ∧ b ∧ ¬c ∧ ¬d

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 25: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesBenefits of the Condensed Representation

What do you mean with “Condensed Representation”?

Definition (Partial Order on Inference Channels)

CJI � CL

H when I ⊆ H and (J \ I) ⊆ (L \ H)

Example

Caca � Cabcd

ab

Intuitively, a ∧ ¬c is less specific than a ∧ b ∧ ¬c ∧ ¬d , since thetransactions s.t. a ∧ ¬c are a superset of the transactions s.t.a ∧ b ∧ ¬c ∧ ¬d

Definition (Maximal Inference Channel)

CJI is maximal w.r.t. D and σ, if ∀CL

H � CJI then sup(CL

H) = f LH = 0

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 26: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesBenefits of the Condensed Representation

Theoretical Results on Condensed Representation

Theorem (Form of Maximal Inference Channel)

An Inference Channel CJI is maximal iff

1 I is closed and2 J is maximal

Theorem (Lossless Representation)

Every Inference Channel can be computed from the set ofMaximal Inference Channels

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 27: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesBenefits of the Condensed Representation

Condensed Representation of Inference Channels

Smaller search space1 Memory saving (the number of non-maximal channels can

be huge)2 Faster running times3 Less distortion if we try to sanitize the set of frequent

itemset (but this point is not discussed in the paper)

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 28: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesBenefits of the Condensed Representation

Condensed Representation of Inference Channels

Number of Inference Channels

0 1000 2000 3000

Num

ber

of M

axim

al In

fere

nce

Cha

nnel

s

0

200

400

600MUSHROOM, k = 10ACCIDENTS, k = 10MUSHROOM, k = 50ACCIDENTS, k = 50

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 29: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesBenefits of the Condensed Representation

Improvements Over the Time Performance

Support (%)

20 25 30 35 40 45 50

Tim

e (m

s)

0

10000

20000

30000

40000

50000

60000

Naive

Optimized

MUSHROOM, k = 10

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 30: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Definitions and PropertiesBenefits of the Condensed Representation

Number of Inference Channels

Support (%)

20 30 40 50 60 70 80

Num

ber

of M

axim

al In

fere

nce

Cha

nnel

s

0

200

400

600

800

k = 10k = 25k = 50k = 100

MUSHROOM

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 31: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Motivationk -Anonymous Patterns

Condensed RepresentationSummary

Summary

We defined k-anonymous patterns and provide a generalcharacterization of inference channels holding amongpatterns that may threat anonymity of source data

We developed an effective and efficient algorithm to detectsuch potential threats, which yields a methodology tocheck whether the mining results may be disclosed withoutany risk of violating anonymity

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns

Page 32: -Anonymous Patternsgroups.di.unipi.it/~atzori/slides/pkdd05-slides.pdf · 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2005 ... Zaïane’s

Appendix For Further Reading

For Further Reading

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi.Blocking Anonymity Threats Raised by Frequent ItemsetMining.Fifth IEEE International Conference on Data Mining(ICDM), Houston, Texas, USA, 2005.

L. Sweeney.k-Anonymity: A Model for Protecting Privacy.International Journal on Uncertainty Fuzziness andKnowledge-based Systems, 10(5), 2002.

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi k -Anonymous Patterns