37
Seminar web data extraction 1 / 26 Seminar web data extraction: Mining uncertain data Sebastiaan van Schaik [email protected] 20 January 2011 Sebastiaan van Schaik

Mining Uncertain Data (Sebastiaan van Schaaik)

  • Upload
    timfu

  • View
    117

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction 1 / 26

Seminar web data extraction: Mining uncertain data

Sebastiaan van [email protected]

20 January 2011

Sebastiaan van Schaik

Page 2: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Frequent patterns & association rules 2 / 26

Introduction

Focus of this presentation: mining of frequent patterns andassociation rules from (uncertain) data.

Example applications:

discover regularities in customer transactions;

analysing log files: determine how visitors use a website;

Based on:

Mining Uncertain Data with Probabilistic Guarantees[9] (KDD 2010);

Frequent Pattern Mining with Uncertain Data[1] (KDD 2009);

A Tree-Based Approach for Frequent Pattern Mining from UncertainData[6] (PAKDD 2008).

Sebastiaan van Schaik

Page 3: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Frequent patterns & association rules 3 / 26

Introduction & running example

Frequent pattern (itemset): items that occurs sufficiently often.Example: {fever, headache}

Association rule: a set of items values implying another set of items.Example: {fever, headache} ⇒ {nausea}

Patient Diagnosist1 Cheng {severe cold}t2 Andrey {yellow fever, haemochromatosis}t3 Omer {schistosomiasis, syringomyelia}t4 Tim {Wilson’s disease}t5 Dan {Hughes-Stovin syndrome}t6 Bas {Henoch-Schnlein purpura}

Running example: patient diagnosis database

Yellow fever?

Sebastiaan van Schaik

Page 4: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Frequent patterns & association rules 4 / 26

Measuring ‘interestingness’: support & confidence

Support of an itemset X :sup(X ): number of entries (rows, transactions) that contain X

Confidence of an association rule X ⇒ Y :

conf(X ⇒ Y ) =sup(X ∪ Y )

sup(X )

Sebastiaan van Schaik

Page 5: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Frequent patterns & association rules 5 / 26

Finding association rules: Apriori (1)

Agrawal et al. introduced Apriori in 1994[2] to mine association rules:1 Find all frequent itemsets X in database D (Xi is frequent iff

sup(Xi ) > minsup):1 Candidate generation: generate all possible itemsets of length k

(starting k = 1) based on frequent itemsets of length k − 1;2 Test candidates, discard infrequent itemsets;3 Repeat with k = k + 1.

Important observation: all subsets X ′ of a frequent itemset X arefrequent (Apriori property). Used to purge before step (2).

Example: if X ′ = {fever} is not frequent in database D, thenX = {fever, headache} can not be frequent.

Sebastiaan van Schaik

Page 6: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Frequent patterns & association rules 5 / 26

Finding association rules: Apriori (1)

Agrawal et al. introduced Apriori in 1994[2] to mine association rules:1 Find all frequent itemsets X in database D (Xi is frequent iff

sup(Xi ) > minsup):1 Candidate generation: generate all possible itemsets of length k

(starting k = 1) based on frequent itemsets of length k − 1;2 Test candidates, discard infrequent itemsets;3 Repeat with k = k + 1.

Important observation: all subsets X ′ of a frequent itemset X arefrequent (Apriori property). Used to purge before step (2).

Example: if X ′ = {fever} is not frequent in database D, thenX = {fever, headache} can not be frequent.

Sebastiaan van Schaik

Page 7: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Frequent patterns & association rules 6 / 26

Finding association rules: Apriori (2)

Apriori continued:2 Extract association rules from frequent itemsets X . For each

Xi ∈ X :1 Generate all non-empty subsets S of Xi . For each S :2 Test confidence of rule S ⇒ (Xi − S)

Example: itemset X = {fever, headache, nausea} is frequent, test:

{fever, headache} ⇒ {nausea}{fever, nausea} ⇒ {headache}{nausea, headache} ⇒ {fever}{fever} ⇒ {headache, nausea}(. . . )

Sebastiaan van Schaik

Page 8: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Frequent patterns & association rules 6 / 26

Finding association rules: Apriori (2)

Apriori continued:2 Extract association rules from frequent itemsets X . For each

Xi ∈ X :1 Generate all non-empty subsets S of Xi . For each S :2 Test confidence of rule S ⇒ (Xi − S)

Example: itemset X = {fever, headache, nausea} is frequent, test:

{fever, headache} ⇒ {nausea}{fever, nausea} ⇒ {headache}{nausea, headache} ⇒ {fever}{fever} ⇒ {headache, nausea}(. . . )

Sebastiaan van Schaik

Page 9: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Introduction to uncertain data 7 / 26

Introduction to uncertain data

Data might be uncertain, for example:

Location detection using multiple RFID sensors (triangulation);

Sensor readings (temperature, humidity) are noisy;

Face recognition;

Patient diagnosis.

Challenge: how do we model uncertainty andtake it into account when mining frequentitemsets and association rules?

Sebastiaan van Schaik

Page 10: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Introduction to uncertain data 8 / 26

Existential probabilities

Existential probability: a probability is associated with each item in atuple, expressing the odds that the item belongs to that tuple.

Important assumption: tuple and item independence!

Patient Diagnosis (including existential probabilities)t1 Cheng { 0.9 : a 0.72 : d 0.718 : e 0.8 : f }t2 Andrey { 0.9 : a 0.81 : c 0.718 : d 0.72 : e }t3 Omer { 0.875 : b 0.857 : c }t4 Tim { 0.9 : a 0.72 : d 0.718 : e }t5 Dan { 0.875 : b 0.857 : c 0.05 : d }t6 Bas { 0.875 : b 0.1 : f }

Simplified probabilistic diagnosis database (adapted from [6])

Sebastiaan van Schaik

Page 11: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Introduction to uncertain data 8 / 26

Existential probabilities

Existential probability: a probability is associated with each item in atuple, expressing the odds that the item belongs to that tuple.

Important assumption: tuple and item independence!

Patient Diagnosis (including existential probabilities)t1 Cheng { 0.9 : a 0.72 : d 0.718 : e 0.8 : f }t2 Andrey { 0.9 : a 0.81 : c 0.718 : d 0.72 : e }t3 Omer { 0.875 : b 0.857 : c }t4 Tim { 0.9 : a 0.72 : d 0.718 : e }t5 Dan { 0.875 : b 0.857 : c 0.05 : d }t6 Bas { 0.875 : b 0.1 : f }

Simplified probabilistic diagnosis database (adapted from [6])

Sebastiaan van Schaik

Page 12: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Introduction to uncertain data 9 / 26

Possible worlds

D = {t1, t2, . . . , tn} (n transactions)tj =

{(p(j ,1), i1), . . . , (p(j ,m), im)

}(m items in each transaction)

D can be expanded to possible worlds: W = {W1, . . . ,W2nm}.

Patient Diagnosis (including prob.)t1 Cheng { 0.9 : a 0.72 : d 0.718 : e 0.8 : f }t2 Andrey { 0.9 : a 0.81 : c 0.718 : d 0.72 : e }t3 Omer { 0.875 : b 0.857 : c }t4 Tim { 0.9 : a 0.72 : d 0.718 : e }t5 Dan { 0.875 : b 0.857 : c 0.05 : d }t6 Bas { 0.875 : b 0.1 : f }

Pr[Wx ] = (1− p(1,a)) · p(1,d) · (1− p(1,e)) · p(1,f ) · p(2,a) · . . . · p(6,f )

= 0.1 · 0.72 · 0.29 · 0.2 · 0.9 · . . . · 0.1≈ 0.00000021 (one of the 218 possible worlds)

Sebastiaan van Schaik

Page 13: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Introduction to uncertain data 9 / 26

Possible worlds

D = {t1, t2, . . . , tn} (n transactions)tj =

{(p(j ,1), i1), . . . , (p(j ,m), im)

}(m items in each transaction)

D can be expanded to possible worlds: W = {W1, . . . ,W2nm}.

Patient Diagnosis (including prob.)t1 Cheng { 0.9 : a 0.72 : d 0.718 : e 0.8 : f }t2 Andrey { 0.9 : a 0.81 : c 0.718 : d 0.72 : e }t3 Omer { 0.875 : b 0.857 : c }t4 Tim { 0.9 : a 0.72 : d 0.718 : e }t5 Dan { 0.875 : b 0.857 : c 0.05 : d }t6 Bas { 0.875 : b 0.1 : f }

Pr[Wx ] = (1− p(1,a)) · p(1,d) · (1− p(1,e)) · p(1,f ) · p(2,a) · . . . · p(6,f )

= 0.1 · 0.72 · 0.29 · 0.2 · 0.9 · . . . · 0.1≈ 0.00000021 (one of the 218 possible worlds)

Sebastiaan van Schaik

Page 14: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > Introduction 10 / 26

Introduction

Approaches to mining frequent itemsets from uncertain data:

U-Apriori[4] and p-Apriori[9]

UF-growth[6]

UFP-tree[1]

. . .

Further focus:

UF-growth: mining without candidate generation;

p-Apriori: pruning using Chernoff bounds

Sebastiaan van Schaik

Page 15: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > Introduction 11 / 26

Expected support

Support of an itemset X turns into a random variable:

E [sup(X )] =∑

Wi∈W

[Pr[Wi ] · supWi

(X )]

Enumerating all possible worlds is infeasible, however (because ofindependency assumptions):

E [sup(X )] =∑tj∈D

[ ∏x∈X

Pr[x , tj ]

]

(see [7, 6])

Sebastiaan van Schaik

Page 16: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > Introduction 11 / 26

Expected support

Support of an itemset X turns into a random variable:

E [sup(X )] =∑

Wi∈W

[Pr[Wi ] · supWi

(X )]

Enumerating all possible worlds is infeasible, however (because ofindependency assumptions):

E [sup(X )] =∑tj∈D

[ ∏x∈X

Pr[x , tj ]

]

(see [7, 6])

Sebastiaan van Schaik

Page 17: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > Introduction 12 / 26

Expected support (2)

Patient Diagnosis (including prob.)t1 Cheng { 0.9 : a 0.72 : d 0.718 : e 0.8 : f }t2 Andrey { 0.9 : a 0.81 : c 0.718 : d 0.72 : e }t3 Omer { 0.875 : b 0.857 : c }t4 Tim { 0.9 : a 0.72 : d 0.718 : e }t5 Dan { 0.875 : b 0.857 : c 0.05 : d }t6 Bas { 0.875 : b 0.1 : f }

Expected support of itemset X = {a, d} in patient diagnosis database:

supWx(X ) = 2

E[sup(X )] =∑

Wi∈W

[Pr[Wi ] · supWi

(X )]

=∑tj∈D

[ ∏x∈X

Pr[x , tj ]

]= 0.9 · 0.72 + 0.9 · 0.71 + 0 · 0 + 0.9 · 0.72 + 0 · 0.05 + 0 · 0= 1.935

Sebastiaan van Schaik

Page 18: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > Introduction 12 / 26

Expected support (2)

Patient Diagnosis (including prob.)t1 Cheng { 0.9 : a 0.72 : d 0.718 : e 0.8 : f }t2 Andrey { 0.9 : a 0.81 : c 0.718 : d 0.72 : e }t3 Omer { 0.875 : b 0.857 : c }t4 Tim { 0.9 : a 0.72 : d 0.718 : e }t5 Dan { 0.875 : b 0.857 : c 0.05 : d }t6 Bas { 0.875 : b 0.1 : f }

Expected support of itemset X = {a, d} in patient diagnosis database:

supWx(X ) = 2

E[sup(X )] =∑

Wi∈W

[Pr[Wi ] · supWi

(X )]

=∑tj∈D

[ ∏x∈X

Pr[x , tj ]

]= 0.9 · 0.72 + 0.9 · 0.71 + 0 · 0 + 0.9 · 0.72 + 0 · 0.05 + 0 · 0= 1.935

Sebastiaan van Schaik

Page 19: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > Introduction 13 / 26

Frequent itemsets in probabilistic databases

An itemset X is frequent iff:

UF-growth: E[sup(X )] > minsup (also used in [4, 1] and many others)

p-Apriori: Pr[sup(X ) > minsup] ≥ minprob

Sebastiaan van Schaik

Page 20: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > UF-growth 14 / 26

Introduction to UF-growth

Apriori versus UF-growth:

Apriori-like algorithms generate and test candidate itemsets;

UF-growth[6] (based on FP-growth[5]) grows a tree based on aprobabilistic database.

Outline of procedure (example follows):

1 First scan: determine expected support of all items;2 Second scan: create branch for each transaction (merging

identical nodes when possible). Each node contains:

An item;Its probability;Its occurrence count.Example: (a, 0.9, 2)

An itemset X is frequent iff: E[sup(X )] > minsup

Sebastiaan van Schaik

Page 21: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > UF-growth 14 / 26

Introduction to UF-growth

Apriori versus UF-growth:

Apriori-like algorithms generate and test candidate itemsets;

UF-growth[6] (based on FP-growth[5]) grows a tree based on aprobabilistic database.

Outline of procedure (example follows):

1 First scan: determine expected support of all items;2 Second scan: create branch for each transaction (merging

identical nodes when possible). Each node contains:

An item;Its probability;Its occurrence count.Example: (a, 0.9, 2)

An itemset X is frequent iff: E[sup(X )] > minsup

Sebastiaan van Schaik

Page 22: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > UF-growth 14 / 26

Introduction to UF-growth

Apriori versus UF-growth:

Apriori-like algorithms generate and test candidate itemsets;

UF-growth[6] (based on FP-growth[5]) grows a tree based on aprobabilistic database.

Outline of procedure (example follows):

1 First scan: determine expected support of all items;2 Second scan: create branch for each transaction (merging

identical nodes when possible). Each node contains:

An item;Its probability;Its occurrence count.Example: (a, 0.9, 2)

An itemset X is frequent iff: E[sup(X )] > minsup

Sebastiaan van Schaik

Page 23: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > UF-growth 15 / 26

UF-tree example (1)Patient Diagnosis (including prob.)

t1 Cheng { 0.9 : a 0.72 : d 0.718 : e 0.8 : f }t2 Andrey { 0.9 : a 0.81 : c 0.718 : d 0.72 : e }t3 Omer { 0.875 : b 0.857 : c }t4 Tim { 0.9 : a 0.72 : d 0.718 : e }t5 Dan { 0.875 : b 0.857 : c 0.05 : d }t6 Bas { 0.875 : b 0.1 : f }

1) determine exp. support

E [sup({a})] = 2.7

E [sup({b})] = 2.625

E [sup({c})] = 2.524

E [sup({d})] = 2.20875

E [sup({e})] = 2.1575

E [sup({f })] = 0.9

2) build tree

(from [6])

Sebastiaan van Schaik

Page 24: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > UF-growth 16 / 26

UF-tree example (2)

Extract frequent patterns from FP-tree:

E [sup({a, e})] = 1 · 0.72 · 0.9 + 2 · 0.71875 · 0.9 = 1.94175

E [sup({c , e})] = 1 · 0.72 · 0.81 = 0.5832

E [sup({d , e})] = 1 · 0.72 · 0.71875 + 2 · 0.71875 · 0.72 = 1.5525

E [sup({a, d , e})] = 1 · 0.9 · 0.71875 · 0.72 + 2 · 0.9 · 0.72 · 0.71875 = 1.39725

Sebastiaan van Schaik

Page 25: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > UF-growth 17 / 26

UF-growth continued

Mining larger itemsets can be done more efficiently using treeprojections.

Remarks:

Nodes can only be merged when items have identical probabilities(otherwise, all occurence counts equal 1);

Suggested solution in [6]: rounding of probabilities;

Other solution (from [1]): store a carefully constructed summaryof probabilities in each node. Might yield overestimation ofexpected support.

Sebastiaan van Schaik

Page 26: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > p-Apriori 18 / 26

Introduction to p-Apriori

Apriori has been extended to support uncertainty;

New pruning techniques[9, 4, 3] improve efficiency;

Note: the apriori (“downwards closure”) property still holds in theprobabilistic case[1];

Goal: prune candidates, saving as much time as possible.

In p-Apriori, an itemset X is frequent iff:

Pr[sup(X ) > minsup] ≥ minprob

Sebastiaan van Schaik

Page 27: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > p-Apriori 18 / 26

Introduction to p-Apriori

Apriori has been extended to support uncertainty;

New pruning techniques[9, 4, 3] improve efficiency;

Note: the apriori (“downwards closure”) property still holds in theprobabilistic case[1];

Goal: prune candidates, saving as much time as possible.

In p-Apriori, an itemset X is frequent iff:

Pr[sup(X ) > minsup] ≥ minprob

Sebastiaan van Schaik

Page 28: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > p-Apriori 19 / 26

p-Apriori: advanced frequent itemset mining

Sun et al. [9] use a simplified approach to modelling uncertainty: eachtuple ti is associated with an existential probability pi .

In p-Apriori: itemset X is frequent if and only if:

Pr[sup(X ) > minsup] ≥ minprob

Let cnt(X ) denote the number of tuples containing X , then:

cnt(X ) < minsup⇒ X can not be frequent

Chernoff bounds1 provide a strict bound on the tail distributions ofsums of independent random variables.

1Interesting course: Probability & Computing by James WorrellSebastiaan van Schaik

Page 29: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > p-Apriori 19 / 26

p-Apriori: advanced frequent itemset mining

Sun et al. [9] use a simplified approach to modelling uncertainty: eachtuple ti is associated with an existential probability pi .

In p-Apriori: itemset X is frequent if and only if:

Pr[sup(X ) > minsup] ≥ minprob

Let cnt(X ) denote the number of tuples containing X , then:

cnt(X ) < minsup⇒ X can not be frequent

Chernoff bounds1 provide a strict bound on the tail distributions ofsums of independent random variables.

1Interesting course: Probability & Computing by James WorrellSebastiaan van Schaik

Page 30: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > p-Apriori 20 / 26

p-Apriori: pruning using Chernoff Bounds (1)

Each tuple ti is associated with an existential probability pi . Then:

Yi =

{1 with probability pi

0 with probability 1− pi

Y =∑

Yi = sup(X )

Furthermore:

µ = E[sup(X )]

δ =minsup− µ− 1

µ

Pr[sup(X ) ≥ minsup] = Pr [sup(X ) > minsup− 1]

= Pr [sup(X ) > (1 + δ)µ]

Sebastiaan van Schaik

Page 31: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > p-Apriori 20 / 26

p-Apriori: pruning using Chernoff Bounds (1)

Each tuple ti is associated with an existential probability pi . Then:

Yi =

{1 with probability pi

0 with probability 1− pi

Y =∑

Yi = sup(X )

Furthermore:

µ = E[sup(X )]

δ =minsup− µ− 1

µ

Pr[sup(X ) ≥ minsup] = Pr [sup(X ) > minsup− 1]

= Pr [sup(X ) > (1 + δ)µ]

Sebastiaan van Schaik

Page 32: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > p-Apriori 21 / 26

p-Apriori: pruning using Chernoff Bounds (2)

Using a Chernoff bound (see [8], theorem 4.3 and exercise 4.1):

Pr[sup(X ) ≥ minsup] <

{2−δµ if δ ≥ 2e − 1

e−δ2µ

4 otherwise

Therefore: an itemset X can not be frequent if:

for δ ≥ 2e − 1 : 2−δµ < minprob

for 0 < δ < 2e − 1 : e−δ2µ

4 < minprob

Example with minprob = 0.4, minsup = 9 and E [sup(X )] = 3:

e−δ2µ

4 = e−( 9−3−1

3 )2·3

4 = e−2512 ≈ 0.125 < minprob

Sebastiaan van Schaik

Page 33: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Mining uncertain data > p-Apriori 22 / 26

p-Apriori: finding frequent patterns (DP)

The p-Apriori algorithm for finding frequent patterns resembles apriori:

1 Generate set of candidate k-itemsets Ck based on frequentitemsets of length k − 1

2 For each itemset X ∈ Ck :1 Try pruning by using apriori property2 Compute cnt(X ), try pruning using Chernoff bound

3 For each itemset X ∈ Ck left: compute pmf in O(n2) time,compare against minprob

(association rules can be mined using the frequent patterns)

Sebastiaan van Schaik

Page 34: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > Summary & conclusion 23 / 26

Summary & conclusion

Data mining of uncertain data is a new, fast moving field;

Data uncertainty introduces a significant complexity layer;

Different algorithms use different definitions and models;

Algorithm performance greatly depends on data.

Sebastiaan van Schaik

Page 35: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > References 24 / 26

References I

C. C. Aggarwal, Y. Li, and Jing Wang.Frequent pattern mining with uncertain data.discovery and data mining, pages 29–37, 2009.

Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors.Fast algorithms for mining association rules, volume 1215 of Proc20th Int Conf Very Large Data Bases VLDB. Citeseer, 1994.

C. K. Chui and B. Kao.decremental approach for mining frequent itemsets from uncertaindata.Proceedings of the 12th Pacific-Asia conference on Advances inknowledge discovery and data mining, pages 64–75, 2008.

Sebastiaan van Schaik

Page 36: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > References 25 / 26

References II

C. K. Chui, Ben Kao, and Edward Hung.Mining frequent itemsets from uncertain data.Advances in Knowledge Discovery and Data Mining, pages 47–58,2007.

J. Han, J. Pei, Y. Yin, and R. Mao.Mining frequent patterns without candidate generation: Afrequent-pattern tree approach.Data mining and knowledge discovery, 8(1):53–87, 2004.

C. Leung, M. Mateo, and D. Brajczuk.tree-based approach for frequent pattern mining from uncertaindata.Advances in Knowledge Discovery and Data Mining, pages653–661, 2008.

Sebastiaan van Schaik

Page 37: Mining Uncertain Data (Sebastiaan van Schaaik)

Seminar web data extraction > References 26 / 26

References III

C. K. S. Leung, B. Hao, and F. Jiang.Constrained frequent itemset mining from uncertain data streams.pages 120–127, 2010.

R. Motwani and P. Raghavan.Randomized Algorithms.Cambridge University Press, 1995.

Liwen Sun, R. Cheng, and D. W. Cheung.Mining uncertain data with probabilistic guarantees.discovery and data mining, pages 273–282, 2010.Recommended by Dan Olteanu, read by Nov 12 4pm.

Sebastiaan van Schaik