39
Mining Frequent Closed Cubes in 3D Datasets Liping Ji Kian-Lee Tan Anthony K. H. Tung Computer Science Department National University of Singapore

Mining Frequent Closed Cubes in 3D Datasets

  • Upload
    lainey

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Mining Frequent Closed Cubes in 3D Datasets. Liping Ji Kian-Lee Tan Anthony K. H. Tung. Computer Science Department National University of Singapore. Motivation. Frequent Closed Pattern (FCP) Mining: great importance, wide application Previous works all limited to 2D FCP mining - PowerPoint PPT Presentation

Citation preview

Page 1: Mining Frequent Closed Cubes in 3D Datasets

Mining Frequent Closed Cubes in 3D Datasets

Liping Ji Kian-Lee Tan

Anthony K. H. Tung

Computer Science DepartmentNational University of Singapore

Page 2: Mining Frequent Closed Cubes in 3D Datasets

Motivation Motivation

Frequent Closed Pattern (FCP) Mining:Frequent Closed Pattern (FCP) Mining: great importance, wide applicationgreat importance, wide application Previous works all limited to 2D FCP miningPrevious works all limited to 2D FCP mining biological data: biological data: gene-timegene-time, , gene-samplegene-sample market basket data: market basket data: transanction-itemsettransanction-itemset Extend the 2D FCP mining to the 3D contextExtend the 2D FCP mining to the 3D context biological data: gene-sample-time

marketing data: region-time-items

Page 3: Mining Frequent Closed Cubes in 3D Datasets

tt11: a: a11 a a22 a a33 a a55

tt22: a: a11 a a22 a a33

tt33: a: a11 a a22 a a33 a a44

tt44: a: a33 a a55

TransactionsTransactions

ItemsetsItemsets

BackgroundBackground

Frequent Pattern (FP) and Frequent Closed Frequent Pattern (FP) and Frequent Closed Pattern (FCP)Pattern (FCP)

minimum support threshold: minsup=2

Page 4: Mining Frequent Closed Cubes in 3D Datasets

TransactionsTransactions

BackgroundBackground

Frequent Pattern (FP) and Frequent Closed Frequent Pattern (FP) and Frequent Closed Pattern (FCP)Pattern (FCP)

minimum support threshold: minsup=2

tt11: a: a11 a a22 a a33 a a55

tt22: a: a11 a a22 a a33

tt33: a: a11 a a22 a a33 a a44

tt44: a: a33 a a55

ItemsetsItemsets

Page 5: Mining Frequent Closed Cubes in 3D Datasets

TransactionsTransactions

BackgroundBackground

Frequent Pattern (FP) and Frequent Closed Frequent Pattern (FP) and Frequent Closed Pattern (FCP)Pattern (FCP)

minimum support threshold: minsup=2

tt11: a: a11 a a22 a a33 a a55

tt22: a: a11 a a22 a a33

tt33: a: a11 a a22 a a33 a a44

tt44: a: a33 a a55

ItemsetsItemsets

FCPFCP

FPFP

Page 6: Mining Frequent Closed Cubes in 3D Datasets

tt11: a: a11 a a22 a a33 a a55

tt22: a: a11 a a22 a a33

tt33: a: a11 a a22 a a33 a a44

tt44: a: a33 a a55

TT

II

Binary MappingBinary Mapping

T\IT\I aa11 aa22 aa33 aa44 aa55

tt11 11 11 11 00 11tt22 11 11 11 00 00tt33 11 11 11 11 00tt44 00 00 11 00 11

BackgroundBackground

Page 7: Mining Frequent Closed Cubes in 3D Datasets

tt11: a: a11 a a22 a a33 a a55

tt22: a: a11 a a22 a a33

tt33: a: a11 a a22 a a33 a a44

tt44: a: a33 a a55

TT

II

Binary MappingBinary Mapping

T\IT\I aa11 aa22 aa33 aa44 aa55

tt11 11 11 11 00 11tt22 11 11 11 00 00tt33 11 11 11 11 00tt44 00 00 11 00 11

BackgroundBackground

Page 8: Mining Frequent Closed Cubes in 3D Datasets

Frequent Closed CubeFrequent Closed Cube 3D Dataset3D Dataset

RowRow

ColumnColumn

HeightHeight

SliceSlice

Page 9: Mining Frequent Closed Cubes in 3D Datasets

Frequent Closed CubeFrequent Closed Cube Slices by Height DimensionSlices by Height Dimension

hh33hh22hh11

Page 10: Mining Frequent Closed Cubes in 3D Datasets

Frequent Closed CubeFrequent Closed Cube Closed Cube: MaximalClosed Cube: Maximal

hh33hh22hh11

Page 11: Mining Frequent Closed Cubes in 3D Datasets

Frequent Closed CubeFrequent Closed Cube Closed Cube: MaximalClosed Cube: Maximal

hh33hh22hh11

Page 12: Mining Frequent Closed Cubes in 3D Datasets

Definition: Frequent Closed Cube (FCC)Definition: Frequent Closed Cube (FCC) Maximal: cannot be extended in any Maximal: cannot be extended in any

dimensiondimension Frequent: satisfy Frequent: satisfy minH, minR, minCminH, minR, minC

threshodsthreshods

Frequent Closed CubeFrequent Closed Cube

Page 13: Mining Frequent Closed Cubes in 3D Datasets

Definition: Frequent Closed Cube (FCC)Definition: Frequent Closed Cube (FCC) Maximal: cannot be extended in any Maximal: cannot be extended in any

dimensiondimension Frequent: satisfy Frequent: satisfy minH, minR, minCminH, minR, minC

thresholdsthresholds

Frequent Closed CubeFrequent Closed Cube

Page 14: Mining Frequent Closed Cubes in 3D Datasets

RSM vs. CubeMinerRSM vs. CubeMiner

Representative Slice Mining (RSM)Representative Slice Mining (RSM) extend existing 2D FCP mining algorithms for extend existing 2D FCP mining algorithms for

FCC miningFCC mining CubeMinerCubeMiner operate on the 3D space directlyoperate on the 3D space directly

Page 15: Mining Frequent Closed Cubes in 3D Datasets

RSMRSM

Representative Slice (RS) GenerationRepresentative Slice (RS) Generation enumerate all possible combination of slicesenumerate all possible combination of slices 2D FCP Mining from each RS2D FCP Mining from each RS Post-pruning to Remove Unclosed CubesPost-pruning to Remove Unclosed Cubes If a 2D FCP is contained in other slices besides

its contributing slices, it is unclosed and hence removed; otherwise, it is retained.

Page 16: Mining Frequent Closed Cubes in 3D Datasets

Slices by Height DimensionSlices by Height Dimension

hh33hh22hh11

RSMRSM

Page 17: Mining Frequent Closed Cubes in 3D Datasets

RSMRSM

Page 18: Mining Frequent Closed Cubes in 3D Datasets

Slices by Height DimensionSlices by Height Dimension

hh33hh22hh11

RSMRSM

Page 19: Mining Frequent Closed Cubes in 3D Datasets

CubeMiner PrincipleCubeMiner Principle

α

β

γ

Page 20: Mining Frequent Closed Cubes in 3D Datasets

CubeMiner PrincipleCubeMiner Principle

γ

β

α

α

β

γ

Page 21: Mining Frequent Closed Cubes in 3D Datasets

CubeMiner: CuttersCubeMiner: Cutters

Slice hSlice h11 Cutters from hCutters from h11

Page 22: Mining Frequent Closed Cubes in 3D Datasets

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 23: Mining Frequent Closed Cubes in 3D Datasets

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4 Cutter Checking: Cutter Checking: A.A.

Cutter Checking:Cutter Checking: check if the Cutter is applicable (A.) check if the Cutter is applicable (A.) Subset of the node: A.Subset of the node: A. Otherwise: N.A.Otherwise: N.A.

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 24: Mining Frequent Closed Cubes in 3D Datasets

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 ))

Left Tree:Left Tree: remove Cutter’s left atom h remove Cutter’s left atom h1 1 from from parent node parent node

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 25: Mining Frequent Closed Cubes in 3D Datasets

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 ))

Middle Tree:Middle Tree: remove Cutter’s middle atom r remove Cutter’s middle atom r1 1 from from parent node parent node

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 26: Mining Frequent Closed Cubes in 3D Datasets

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))

Right Tree:Right Tree: remove Cutter’s right atom c remove Cutter’s right atom c4 4 from from parent node parent node

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 27: Mining Frequent Closed Cubes in 3D Datasets

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))

hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5

Next Cutter:Next Cutter: checking checking

N.A.N.A. A.A. A.A.

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 28: Mining Frequent Closed Cubes in 3D Datasets

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))

hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5

((hh22hh3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr33rr44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c3 3 ))

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 29: Mining Frequent Closed Cubes in 3D Datasets

Subset CubeSubset Cube

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))

hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5

((hh22hh3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr33rr44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c3 3 ))

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 30: Mining Frequent Closed Cubes in 3D Datasets

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))

hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5

((hh22hh3 3 ,, rr22~r~r44, c, c11~c~c55 )) ((hh11~h~h3 3 ,, rr33rr44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c33 ))

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 31: Mining Frequent Closed Cubes in 3D Datasets

Left Track CheckingLeft Track Checking

RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))

hh11,, rr11, c, c4 4

((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))

hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5

((hh22hh3 3 ,, rr22~r~r44, c, c11~c~c55 )) ((hh11~h~h3 3 ,, rr33rr44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c33 ))

Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree

Page 32: Mining Frequent Closed Cubes in 3D Datasets

Parallelism Parallelism RSMRSM

Task: mining of each Representative SliceTask: mining of each Representative Slice CubeMiner:CubeMiner:

Task: mining of each branchTask: mining of each branch Processor:Processor:

Initial: keep a copy of the whole datasetInitial: keep a copy of the whole dataset Independent and concurrent with few Independent and concurrent with few

communication costcommunication cost

Page 33: Mining Frequent Closed Cubes in 3D Datasets

Real yeast cell-cycle regulated genesReal yeast cell-cycle regulated genes Elutriation Experiments: 14*9*7161Elutriation Experiments: 14*9*7161 CDC15 Experiments: 19*9*7761CDC15 Experiments: 19*9*7761

Synthetic Data: IBM data generatorSynthetic Data: IBM data generator Synthetic 1: H*R*C=(8~20)*20*1000Synthetic 1: H*R*C=(8~20)*20*1000 Synthetic 2: H*R*C=100*100*10000Synthetic 2: H*R*C=100*100*10000

Mining FCC: ExperimentsMining FCC: Experiments

Page 34: Mining Frequent Closed Cubes in 3D Datasets

Experiments: Optimize CubeMinerExperiments: Optimize CubeMiner

Optimal: sort Optimal: sort slices by zero slices by zero decreasing decreasing order order

Prune off Prune off infrequent infrequent cubes early cubes early

Elutritration(14*9*7161)Elutritration(14*9*7161)

Page 35: Mining Frequent Closed Cubes in 3D Datasets

Experiments: Optimize RSMExperiments: Optimize RSM

Optimal: Optimal: enumerate slices enumerate slices by the smallest by the smallest dimension dimension

Slice enumeration Slice enumeration takes relatively long takes relatively long processing time processing time

Elutritration(14*9*7161)Elutritration(14*9*7161)

Page 36: Mining Frequent Closed Cubes in 3D Datasets

Experiments: RSM vs. CubeMinerExperiments: RSM vs. CubeMiner

With the increase With the increase of the smallest of the smallest dimension, CubeMiner dimension, CubeMiner outperforms RSMoutperforms RSM

Synthetic Data (vary size of height dimension)Synthetic Data (vary size of height dimension)

Page 37: Mining Frequent Closed Cubes in 3D Datasets

Experiments: ParallelismExperiments: Parallelism

CDC15 (Vary Number of Processors)CDC15 (Vary Number of Processors)

As the degree of As the degree of parallelism increases, parallelism increases, the response time the response time decreases.decreases.

Optimal number of processors

Page 38: Mining Frequent Closed Cubes in 3D Datasets

Notion of Frequent Closed CubeNotion of Frequent Closed Cube

RSM: RSM: efficient when one of the dimension is smallefficient when one of the dimension is small

CubeMiner: superior for large datasets

Parallel RSM and CubeMiner

Conclusion Conclusion

Page 39: Mining Frequent Closed Cubes in 3D Datasets

Thank You!Thank You!