AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen...

AR mining

Implementation and comparison of three AR mining algorithms

Xuehai Wang, Xiaobo Chen, Shen chen

CSCI6405 class project

AR mining

Outline

• Motivation

• Dataset

• Apriori based hash tree algorithm

• FP-tree algorithm

• Conclusion

• Reference

AR mining

Motivation

• Make the time of generating rules as shot as possible!

• To understand the three algorithms– Apriori algorithm– Apriori with hash tree algorithm– FP-tree algorithm

• Learn how to improve an algorithm

AR mining

Dataset• IBM dataset generator

– Can set item number– Can set minimal support– Can set dataset size

1 2 5 8 9

2 3 4 6 7 12

Tid item

AR mining

Apriori principle

• Apriori principle– A candidate generation-and-test Approach [4]– Given a frequent itemset, its subset must be fre

quent– A set is infrequent, its super set will not be gene

rated and tested

• But there is still some places can be improved– Count the support– I/O scan times

AR mining

Apriori Hash Tree Alg

• Candidate K-itemset size is l• There is n transactions• Average transaction size is m• Calculate support count:

– Original Apriori Alg:

– With hash tree: O( n.log(l).(mk) )

)( mklnO

)log( mklnO

AR mining

• Candidate is stored in a hash tree structure

Tid Items

2 1 3 6

3 1 2 3

5 2 3 6

1-itemset candidate hash tree

1(1)2(1)1(2)

1(2) 3(1)2(1)

AR mining

2 1 3 6

3 1 2 3

5 2 3 6

2(4)5(1) 6(3)

1(3) 3(3)4(1)

1itemset , Min support = 2

AR mining

2 1 3 6

3 1 2 3

5 2 3 6

2 3(2)2 6(1)

1 3(2)1 2(2)

3 6(2)

1 6(1)

2 itemset, Min support = 2

3 itemset, Min support = 2

1 2 3(1)

AR mining

FP-tree

• Since the mining dataset is always very huge, it’s impossible to read all transactions into computer memory all in once.

• But I/O scan is very time consuming.

• FP-tree algorithm will try to suite all information from the dataset into computer memory, hence only need to scan I/O two times.

AR mining

FP-tree

• FP-tree algorithm and implementation– By Xiaobo Chen

AR mining

FP-tree (Frequent Pattern Tree)

• Mining frequent pattern without candidate generation

• Divide and conquer methodology: decompose mining tasks into smaller ones

AR mining

FP-tree (Merits of FP-tree algorithm)

• Make most use of common shared prefix

• Complete and compact

All information of a transaction is

stored in a path

The size is constrained by the data set consequently, the longest path corresponds to the longest

pattern

The compact ratio: over 100

AR mining

FP-tree (Construction of FP-tree)

• TID freq. Items bought

• 100 {f, c, a, m, p}

• 200 {f, c, a, b, m}

• 300 {f, b}

• 400 {c, p, b}

• 500 {f, c, a, m, p}

min_support = 3Item frequency f 4c 4a 3b 3m 3p 3

AR mining

FP-tree (construction (Cont’d))TID freq. Items bought100 {f, c, a, m, p}200 {f, c, a, b, m}300 {f, b}400 {c, p, b}500 {f, c, a, m, p}

AR mining

FP-tree construction (Cont’d)• TID freq. Items bought

• 100 {f, c, a, m, p}

• 200 {f, c, a, b, m}

• 300 {f, b}

• 400 {c, p, b}

• 500 {f, c, a, m, p}

min_support = 3Item frequency f 4c 4a 3b 3m 3p 3Header Table

Item frequency head f 4c 4a 3b 3m 3p 3

AR mining

FP-tree (Mining Frequent Patterns Using the FP-tree)

• General idea (divide-and-conquer)– Recursively grow frequent pattern path using the FP-

• Method – For each item, construct its conditional pattern-base,

and then its conditional FP-tree

– Repeat the process on each newly created conditional FP-tree

– Until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern)

AR mining

Conditional pattern base for p

fcam:2, cb:1

• Start with last item in order (i.e., p).

• Follow node pointers and traverse only the paths containing p.

• Accumulate all of transformed prefix paths of that item to form a conditional pattern base

Constructing a new FP-tree based on this pattern base leads to only one branch c:3Thus we derive only one frequent pattern cont. p. Pattern cp

AR mining

• Move to next least frequent item in order, i.e., m

• Follow node pointers and traverse only the paths containing m.

• Accumulate all of transformed prefix paths of that item to form a conditional pattern base

Conditional pattern base for m

fca:2, fcab:1

Constructing a new FP-tree based on this pattern base leads to path fca:3From this we derive frequent patterns fcam, fcm, cam, fm, cm, am

AR mining

FP-tree (Conditional Pattern-Bases for the example)

EmptyEmptyf

{(f:3)}|c{(f:3)}c

{(f:3, c:3)}|a{(fc:3)}a

Empty{(fca:1), (f:1), (c:1)}b

{(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m

{(c:3)}|p{(fcam:2), (cb:1)}p

Conditional FP-treeConditional pattern-baseItem

AR mining

FP-tree (Why is Frequent pattern Growth fast?)

• Performance studies show that

FP-growth is an order of magnitude faster than

Apriori, and is also faster than tree-projection

• Reasoning:

– No candidate generation, no candidate test

– Use compact data structure

– Eliminate repeated database scan

– Basic operation is counting and FP-tree building

AR mining

FP-tree: Expected result: FP-growth vs. Apriori: Scalability With the Support Threshold

0 0.5 1 1.5 2 2.5 3

Support threshold(%)

D1 FP-grow th runtime

D1 Apriori runtime

AR mining

Conclusion

• FP-tree is faster than other two algorithms.

• Apriori as well as hash tree algorithms are easier to implement.– We can easily combine them with other

methods or tools. (i.e. distributed parallel computing).

• The parameter of dataset is very important too.– Density, size, min support …

AR mining

References

• [1] Jiawei Han and Micheline Kamber: "Data Mining: Concepts and Techniques ", Morgan Kaufmann, 2001

• [2] Jiawei Han, Jian Pei, Yiwen Yin: Mining Frequent Patterns without Candidate Generation, ACM SIGMOD, 2000

• [3] N.Mamoulis, Advanced Database Technologies (Slides)

• [4] Jiawei Han and Micheline Kamber. Data Mining - Concepts and Techniques. MorganKaufmann Publishers, 2001.

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen...

Documents

dornsife.usc.edu · Marissa Chavez Nelly Chavez Harshvardhan Chawla Michael Cheah Angela Chen Bonnie Chen Catherine Chen Christine Chen Frank Chen Hanlong Chen Jane Chen Jason Chen

Virtual Workspaces in the Grid Kate Keahey keahey@mcs.anl.gov Argonne National Laboratory Ian Foster, Tim Freeman, Xuehai Zhang, Daniel Galron

Survive YuwenTu 、 Liling Chen 、 Yuhan Chen 、 Haoyu Fang

ORDER SELECTION IN FINITE MIXTURE MODELS 1 Jiahua Chen ...jhchen/paper/ChenKhalili06.pdf · McLachlan (1987), Dacunha-Castelle and Gassiat (1999), Chen and Chen (2001), Chen, Chen

Ningyang cricketNingyang cricket -----by Chen Xuekui -----by Chen Xuekui

Group 9 96306051 Annie Chen 96306075 Eric Chen 97306012 Christine Tsou 96308013 Charmian Chen

DIA C.S Chen Chen Final

Clinical Case #6 By Chen, chun-Yu (Kim) Chen, I -chun (Afra) Chen, I -chun (Afra)

Chen Taijiquan Old Frame _ Master Chen Explained

Chen puttel chen ase olle neiderlein lenspiegel linchen

Wonderland a novel abstraction-based out-of-core graph … · A NOVEL ABSTRACTION-BASED OUT-OF-CORE GRAPH PROCESSING SYSTEM Mingxing Zhang, YongweiWu, Youwei Zhuo, Xuehai Qian, Chengying

CHEN, Mei Hsiang (Chinese Traditional: (a.k.a. CHEN, Mei ... › HOMELESS › ssvf › docs › Updated_CFR_SSVF_1… · CHEN, Mei Hsiang (Chinese Traditional: (a.k.a. CHEN, Mei-Hsiang),

Chen Chen Bloque C

Grace Chen 10 minute presentation with notes · 2019. 8. 5. · Chen Tsan陳瓚(2nd child), Chen Shan 陳珊(oldest), Chen Yin 陳瑛(5th), great‐grandmother Chen, great‐grandfather

Master Chen Zhi Ming.doc - Chen Style Taijiquan

Owen1997 Inc Chen Chen

Chen Chen .Vias

Kayaking Albert Chen Period 4 Albert Chen Period 4

Yuting Chen Ana Gonzalez Ximei Chen Elena Orozco

CSCI6405 Fall 2003 Dta Mining and Data Warehousingxwang/courses/cs6405/Note2.3.pdfCSCI6405 Fall 2003 Dta Mining and Data Warehousing ... Overview on DM and DW 1. ... From data warehousing