48
Fast discovery of sequential patterns in large databases using effective time -indexing Information Sciences ( 2008 ) 4228 -4245 Ming-Yen Lin , Suh-Yin Lee and Sheng-Shun Wang National Chiao Tung University , Taiwan Advisor Prof. Huang, Jen-Peng Student TU,JING-GUO

Time Indexing

Embed Size (px)

DESCRIPTION

document

Citation preview

Page 1: Time Indexing

Fast discovery of sequential patterns in largedatabases using effective time -indexing

Information Sciences ( 2008 ) 4228 -4245Ming-Yen Lin , Suh-Yin Lee and Sheng-Shun Wang

National Chiao Tung University , Taiwan

Advisor : Prof. Huang, Jen-PengStudent: TU,JING-GUO

Page 2: Time Indexing

Outline Introduction Related work Definition An example Performance analysis and experimental

evaluation Conclusions

Page 3: Time Indexing

Introduction IntroductionThe time constraints between elements of a sequential pattern ar e not

specified so that some uninteresting patterns may appear.For example, without specifying the maximum time gap, one my fin d a

pattern < ( b, d, e ) ( a, f ) >, which means an item -set having a and fwill occur after the occurrence of an item -set having b, d, and e.

However, the pattern could be insignificant if the time interva l betweenthe two item-set is too long such as over months.

?time

Ink ,paperprinterpc

Page 4: Time Indexing

Introduction IntroductionThe time constraints between elements of a sequential pattern ar e not

specified so that some uninteresting patterns may appear.For example, without specifying the maximum time gap, one my fin d a

pattern < ( b, d, e ) ( a, f ) >, which means an item -set having a and fwill occur after the occurrence of an item -set having b, d, and e.

However, the pattern could be insignificant if the time interva l betweenthe two item-set is too long such as over months.

1 2 3 4 5

Ink ,paperprinterpc

… 100

Page 5: Time Indexing

Related work

Sequentail pattern mining GSP ( apriori ) DELISP

Page 6: Time Indexing

Definition

Definition .1 (frequent item)

An item x is called a frequent item in a sequence database DB if the supp ort of 1-sequence <(x)> is greater than or equal to minsup.

Definition .2 (type-1, type-2 , prefix , stem)

Type-2< (a , b) >

Type-1< (a) (b) >

Typeitemset

Page 7: Time Indexing

Definition

Definition .1 (frequent item)

An item x is called a frequent item in a sequence database DB if the supp ort of 1-sequence <(x)> is greater than or equal to minsup.

Definition .2 (type-1, type-2 , prefix , stem)

Type-2< (a , b) >

Type-1< (a) (b) >

Typeitemset

prefix stem

Page 8: Time Indexing

Definition

Definition .3 ( it , lst , let )

[ 1:1:1 , 21:21:21 ]< 1(a) 2(b) 9(d) 15(c) 21(a)>T2

[1:1:1]< 1(a) 2(b) 9(d) 15(c) >T1

TIdxitemsetTransaction

[ x : y : z ]

initial-timeLast end-time

Last start-time

Page 9: Time Indexing

Definition

Definition .3 ( it , lst , let )

[ 1:25:25 ]( a) (c )< 1(a) 2(b) 9(d) 25(c) 28(a)>

[ 1:2:2 ]( a) (b )< 1(a) 2(b) 9(d) 15(c) >

TIdxitemset

[ x : y : z ]

initial-timeLast end-time

Last start-time

Page 10: Time Indexing

Definition

swin = sliding time-window

mingap = minimum time gap

maxgap = maximum time gap

duration = constraint time window

Time-constraints

Page 11: Time Indexing

DefinitionLemma .1 ( type1 )

leti + mingap ≤ VTP ≤ lsti + maxgap

VTP = valid time periods

Page 12: Time Indexing

DefinitionLemma .1 ( type1 )

leti + mingap ≤ VTP ≤ lsti + maxgap

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset

[ 10:17:17 ]C2TIdxTransaction

1 2 10

c ,dba,c e

17 18 …. 24

a

….….

Ex: < (b) (e) >

Page 13: Time Indexing

DefinitionLemma .1 ( type1 )

leti + mingap ≤ VTP ≤ lsti + maxgap

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset

[ 10:17:17 ]C2TIdxTransaction

1 2 10

c ,dba,c e

17 18 …. 24

a

….….

Ex: < (b) (e) >

35

duration = 25

Page 14: Time Indexing

DefinitionLemma .1 ( type1 )

leti + mingap ≤ VTP ≤ lsti + maxgap

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset

[ 10:17:17 ]C2TIdxTransaction

1 2 10

c ,dba,c e

17 18 …. 24

a

….….

Ex: < (b) (e) >

35

maxgap = 15

32

Page 15: Time Indexing

DefinitionLemma .1 ( type1 )

leti + mingap ≤ VTP ≤ lsti + maxgap

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset

[ 10:17:17 ]C2TIdxTransaction

1 2 10

c ,dba,c e

17 20 ….24….….

Ex: < (b) (e) >

35

mingap = 3

32

Page 16: Time Indexing

DefinitionLemma .1 ( type1 )

leti + mingap ≤ VTP ≤ lsti + maxgap

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset

[ 10:17:17 ]C2TIdxTransaction

1 2 10

ba,c e

17 20 ….….….

Ex: < (b) (e) >

3532

VTP

Page 17: Time Indexing

DefinitionLemma .1 ( type1 )

leti + mingap ≤ VTP ≤ lsti + maxgap

1 2 10

ba,c e

17 20 ….….…. 3532

VTP

Page 18: Time Indexing

DefinitionLemma .2 ( type2 )

leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

Page 19: Time Indexing

DefinitionLemma .2 ( type2 )

leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset

[ 10:17:17 ]C2TIdxTransaction

1 2 10

c ,dba,c e

17 24….

Ex: < (b) (e) >

35

Page 20: Time Indexing

DefinitionLemma .2 ( type2 )

leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset

[ 10:17:17 ]C2TIdxTransaction

1 2 10

ba,c e

17….

Ex: < (b) (e) >

Page 21: Time Indexing

An example

< 5(a) 10(d) 21(c,d) 26(e) >< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >sequences

C4

C3

C2

C1

Tran, ID

1f3e

3b

1g

3d3c

3a

SupportItem

min_Sup=2

Page 22: Time Indexing

An example

< 5(a) 10(d) 21(c,d) 26(e) >< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >sequences

C4

C3

C2

C1

Tran, ID

[ 6:6:6 , 18:18:18 ][ 5:5:5 ]

[ 5:5:5 , 31:31:31 ]

min_Sup=2

<( a )> -TIdx

swin = 2

mingap = 3

maxgap = 15

duration = 25

Time-constraints

Page 23: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3

Page 24: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3

30

duration = 25

Page 25: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3

30

1. leti + mingap ≤ VTP ≤ lsti + maxgap

8 ≤ VTP ≤ 20

Page 26: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3

30

1. leti + mingap ≤ VTP ≤ lsti + maxgap

8 ≤ VTP ≤ 20

<( a )( b )> 1

Page 27: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3

2. leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

3 ≤ VTP ≤ 7

Page 28: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3

2. leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

3 ≤ VTP ≤ 7

<( a ,c )> 1

<( a ,f )> 1

Page 29: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3 56

duration = 25

Page 30: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3 56

1. leti + mingap ≤ VTP ≤ lsti + maxgap

33 ≤ VTP ≤ 46

Page 31: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3 56

1. leti + mingap ≤ VTP ≤ lsti + maxgap

33 ≤ VTP ≤ 46

<( a )( f )> 1

Page 32: Time Indexing

An example

[ 5:5:5 , 31:31:31 ]

TIdxC1

Tran, ID

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

a

item

5

aa ,fc b

18 31 45

f

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

3 56

<( a )( b )> 1

<( a )( f )> 1

<( a ,c )> 1

Page 33: Time Indexing

An example

[ 6:6:6 , 18:18:18 ]

TIdxC2

Tran, ID

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

sequences

a

item

10

aba ,c e

17 18 24

c ,d

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

6

1. leti + mingap ≤ VTP ≤ lsti + maxgap

9 ≤ VTP ≤ 21

Page 34: Time Indexing

An example

[ 6:6:6 , 18:18:18 ]

TIdxC2

Tran, ID

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

sequences

a

item

10

aba ,c e

17 18 24

c ,d

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

6

1. leti + mingap ≤ VTP ≤ lsti + maxgap

9 ≤ VTP ≤ 21

<( a )( b )> 1

<( a )( e )> 1

<( a )( a )> 1

Page 35: Time Indexing

An example

[ 6:6:6 , 18:18:18 ]

TIdxC2

Tran, ID

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

sequences

a

item

10

aba ,c e

17 18 24

c ,d

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

6

2. leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }

4 ≤ VTP ≤ 8

<( a ,c )> 1

Page 36: Time Indexing

An example

[ 6:6:6 , 18:18:18 ]

TIdxC2

Tran, ID

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

sequences

a

item

10

aba ,c e

17 18 24

c ,d

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

6

1. leti + mingap ≤ VTP ≤ lsti + maxgap

21 ≤ VTP ≤ 33

<( a )( c )> 1

<( a )( d )> 1

Page 37: Time Indexing

An example

[ 6:6:6 , 18:18:18 ]

TIdxC2

Tran, ID

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

sequences

a

item

10

aba ,c e

17 18 24

c ,d

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

6

<( a )( a )> 1<( a )( b )> 1<( a )( c )> 1<( a )( d )> 1<( a )( e )> 1<( a ,c )> 1

Page 38: Time Indexing

An example

[ 5:5:5 ]

TIdxC4

Tran, ID

< 5(a) 10(d) 21(c,d) 26(e) >

sequences

a

item

10

eda c ,d

21 26

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

5

1. leti + mingap ≤ VTP ≤ lsti + maxgap

8 ≤ VTP ≤ 20

<( a )( d )> 1

Page 39: Time Indexing

An example

< 5(a) 10(d) 21(c,d) 26(e) >< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >sequences

C4

C3

C2

C1

Tran, ID

[ 6:6:6 , 18:18:18 ][ 5:5:5 ]

[ 5:5:5 , 31:31:31 ]

min_Sup=2

<( a )> -TIdx

<( a )( a )> 1<( a )( b )> 2<( a )( c )> 1<( a )( d )> 2<( a )( e )> 1<( a ,c )> 2

Page 40: Time Indexing

An example

< 5(a) 10(d) 21(c,d) 26(e) >

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

C4

C3

C2

C1

Tran, ID

[ 6:6:6][ 3:3:5 ]

min_Sup=2

<( a ,c )> -TIdx

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

<( a ,c )( b)> 2

Page 41: Time Indexing

An example

< 5(a) 10(d) 21(c,d) 26(e) >

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

sequences

C4

C3

C2

C1

Tran, ID

[ 6:6:10][ 3:3:18 ]

min_Sup=2

<( a ,c )( b)> -TIdx

swin = 2mingap = 3

maxgap = 15duration = 25

Time-constraints

No morepatterns can be

formed

Page 42: Time Indexing

An example

< 5(a) 10(d) 21(c,d) 26(e) >< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >sequences

C4

C3

C2

C1

Tran, ID

(a )( b)

a

(a ,c)

(a )( d)

(a ,c)( b)

Frequent itemset

(c )( b)( a)

(c )( b)

(c )( e)

c

Frequent itemset

Min_Sup=2

(b )( e)

(b )( d)

b

(b )( a)

(b )( e)( d)

Frequent itemsetd

Frequent itemset

e

(e )( d)

Frequent itemset

Page 43: Time Indexing

Dealing with extra-large databases

Page 44: Time Indexing

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10

Average number of items per transaction = 2.5

Average size of potentially sequential patterns = 4

Average size of potentially frequent itemsets =1.25

Number of data sequences in database = 100k

Page 45: Time Indexing

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10

Average number of items per transaction = 2.5

Average size of potentially sequential patterns = 4

Average size of potentially frequent itemsets =1.25

Number of data sequences in database = 100k

Page 46: Time Indexing

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10

Average number of items per transaction = 2.5

Average size of potentially sequential patterns = 4

Average size of potentially frequent itemsets =1.25

Number of data sequences in database = 100k

Page 47: Time Indexing

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10

Average number of items per transaction = 2.5

Average size of potentially sequential patterns = 4

Average size of potentially frequent itemsets =1.25

Number of data sequences in database = 100k

Page 48: Time Indexing

Conclusions

This paper has presented METISP, a time -indexing algorithm formining sequential patterns with various time constraints , inclu dingminimum-, maximum-, and exact-gaps, sliding time-windows, anddurations. METISP effectively shrinks the search space of potent ialpatterns.