An Efficient Algorithm for Incremental Mining of Association Rules

Preview:

DESCRIPTION

An Efficient Algorithm for Incremental Mining of Association Rules. Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA ’ 05 Speaker :董原賓 Advisor : 柯佳伶. Introduction. Previous incremental mining algorithms FUP (Fast Update Algorithm) - PowerPoint PPT Presentation

Citation preview

1

An Efficient Algorithm for Incremental Mining of Association Rules

Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee

RIDE-SDMA’05

Speaker :董原賓 Advisor :柯佳伶

2

Introduction Previous incremental mining algorithms

FUP (Fast Update Algorithm) FUP2 negative border※They all have to rescan the originally database

Problem Publication-like database

EX : Publication database, web log records, etc. The original database is normally much larger than the incremental database

Solution NFUP (New Fast Update Algorithm)

3

Definition

DB : original database db : the set of newly added transaction

s DB+ : DB + db n, Pn : db is divided into n partitions, db = P1UP2U,…,UPn-1UPn

dbm,n = PmUPm+1U,…,UPn-1UPn

4

Definition α set: frequent itemsets in DB+

β set: frequent in dbm,n , (m ≤ n), but infrequent in dbm-1,n

γ set: frequent in dbm,m, but infrequent in dbm+1,n

X.count : occurrence count

X.start : partition number when X becomes frequent

X.type : denotes one of the three types α,β, and γ

5

FUP (Fast Update Algorithm)

In case2, itemset is easily calculated In case3, FUP needs to rescan the orig

inal database

6

NFUP (New Fast Update Algo.) A backward method that only requires scan

ning incremental database

A frequent itemset in the incremental database is also important even if it is infrequent in the updated database

Partition the incremental database (db) by the time interval

7

NFUP The frequent set of itemsets of DB is k

nown in advance

NFUP scans each partition backward, the last partition is scanned first

In each partition, the process is performed like that of Apriori.

8

NFUP

9

Scan from Pn to P1 and find the α,β,γ itemsets in db

After P1 is scanned, the occurrence count is accumulated with itemsets of DB

10

The latest partition is scanned first, initialize variables and accumulate the occurrence

Still frequent in Pm then

accumulate count

Still frequent in dbm,n then accumulate count

Only frequent in dbm+1,n then Remove from α set and addInto β set

Not belong to any set and frequent in Pm then check if Pm is the latest partitionYes α set No γ set

11

Example

Scan p2 : 1-itemset

α set startcountβ set startcount γ set startcount

Min sup = 50%

{A: 2} {B: 2} {C: 3}{D: 1} {E: 1} {F: 2}

3 x 0.5 = 1.5

Check if itemset belongs to α setElse check itemset doesn’t belongs to any setCheck if itemset’s count >= 1.5Check if P2 is the latest partition yes α no γ

{A} 2 2

{B} 2 2

{C} 2 3

{F} 2 2

{AB} 2 2

{AC} 2 2

{BC} 2 2

{CF} 2 2

{ABC} 2 2

Run Apriori-gen scan P2 : 2-itemset {AB: 2} {AC: 2} {AF: 1} {BC: 2} {BF: 1} {CF: 2}

Check if itemset belongs to α set Else check itemset doesn’t belong to any set Check if itemset’s count >= 1.5 Check if P2 is the latest partition yes α no γ

{ABC: 2}Scan P2 : 3-itemset

12

Example

Scan p1 : 1-itemset

α set startcountβ set startcount γ set startcount

Min sup = 50%

{A: 1} {B: 3} {C: 2}{D: 1} {E: 3} {F: 0}

3 x 0.5 = 1.5

Check if itemset belongs to α set Check itemset doesn’t belongs to any setElse check if itemset’s count >= 1.5Check if P1 is the latest partition yes α no γ

{A} 2 2

{B} 2 2

{C} 2 3

{F} 2 2

{AB} 2 2

{AC} 2 2

{BC} 2 2

{CF} 2 2

{ABC} 2 2

Run Apriori-genscan P1 : 2-itemset {AB: 1} {AC: 0} {BC: 2}{BE: 3} {CE: 2}Check if itemset belon

gs to α set Check itemset doesn’t belong to any set Else check if itemset’s count >= 1.5 Check if P1 is the latest partition yes α no γ

Yesaccumulate countCount < s*|dbm,n| = 0.5x6 = 3 β set

Yesaccumulate countCount < s*|dbm,n| = 0.5x6 = 3 β set

3

5

51

1

1{F} 2 2 {E} 1 3

3

4

1

1

{AC} 2 2

{CF} 2 2

{BE} 1 3

{CE} 1 2

{ABC} 2 2

13

Example

α set startcount

{A} 1 3

{B} 1 5

{C} 1 5

{AB} 1 3

{BC} 1 4

γ set startcount

{E} 1 3

{BE} 1 3

{CE} 1 2

β set startcount

{F} 2 2

{AC} 2 2

{CF} 2 2

{ABC} 2 2

7

8

90

0

0

{AB} 1 3

{BC} 1 4

{ABC} 2 2

{AE} 0 3

14

Experiment

Intel Pentium IV 1.5GHz CPU, 640 MB main memory

Microsoft Windows 2000 Professional Synthetic datasets:

15

Experiment

16

Experiment

17

Experiment

Recommended