Upload
van-nguyen
View
245
Download
0
Embed Size (px)
DESCRIPTION
document
Citation preview
Fast discovery of sequential patterns in largedatabases using effective time -indexing
Information Sciences ( 2008 ) 4228 -4245Ming-Yen Lin , Suh-Yin Lee and Sheng-Shun Wang
National Chiao Tung University , Taiwan
Advisor : Prof. Huang, Jen-PengStudent: TU,JING-GUO
Outline Introduction Related work Definition An example Performance analysis and experimental
evaluation Conclusions
Introduction IntroductionThe time constraints between elements of a sequential pattern ar e not
specified so that some uninteresting patterns may appear.For example, without specifying the maximum time gap, one my fin d a
pattern < ( b, d, e ) ( a, f ) >, which means an item -set having a and fwill occur after the occurrence of an item -set having b, d, and e.
However, the pattern could be insignificant if the time interva l betweenthe two item-set is too long such as over months.
?time
Ink ,paperprinterpc
Introduction IntroductionThe time constraints between elements of a sequential pattern ar e not
specified so that some uninteresting patterns may appear.For example, without specifying the maximum time gap, one my fin d a
pattern < ( b, d, e ) ( a, f ) >, which means an item -set having a and fwill occur after the occurrence of an item -set having b, d, and e.
However, the pattern could be insignificant if the time interva l betweenthe two item-set is too long such as over months.
1 2 3 4 5
Ink ,paperprinterpc
… 100
Related work
Sequentail pattern mining GSP ( apriori ) DELISP
Definition
Definition .1 (frequent item)
An item x is called a frequent item in a sequence database DB if the supp ort of 1-sequence <(x)> is greater than or equal to minsup.
Definition .2 (type-1, type-2 , prefix , stem)
Type-2< (a , b) >
Type-1< (a) (b) >
Typeitemset
Definition
Definition .1 (frequent item)
An item x is called a frequent item in a sequence database DB if the supp ort of 1-sequence <(x)> is greater than or equal to minsup.
Definition .2 (type-1, type-2 , prefix , stem)
Type-2< (a , b) >
Type-1< (a) (b) >
Typeitemset
prefix stem
Definition
Definition .3 ( it , lst , let )
[ 1:1:1 , 21:21:21 ]< 1(a) 2(b) 9(d) 15(c) 21(a)>T2
[1:1:1]< 1(a) 2(b) 9(d) 15(c) >T1
TIdxitemsetTransaction
[ x : y : z ]
initial-timeLast end-time
Last start-time
Definition
Definition .3 ( it , lst , let )
[ 1:25:25 ]( a) (c )< 1(a) 2(b) 9(d) 25(c) 28(a)>
[ 1:2:2 ]( a) (b )< 1(a) 2(b) 9(d) 15(c) >
TIdxitemset
[ x : y : z ]
initial-timeLast end-time
Last start-time
Definition
swin = sliding time-window
mingap = minimum time gap
maxgap = maximum time gap
duration = constraint time window
Time-constraints
DefinitionLemma .1 ( type1 )
leti + mingap ≤ VTP ≤ lsti + maxgap
VTP = valid time periods
DefinitionLemma .1 ( type1 )
leti + mingap ≤ VTP ≤ lsti + maxgap
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset
[ 10:17:17 ]C2TIdxTransaction
1 2 10
c ,dba,c e
17 18 …. 24
a
….….
Ex: < (b) (e) >
DefinitionLemma .1 ( type1 )
leti + mingap ≤ VTP ≤ lsti + maxgap
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset
[ 10:17:17 ]C2TIdxTransaction
1 2 10
c ,dba,c e
17 18 …. 24
a
….….
Ex: < (b) (e) >
35
duration = 25
DefinitionLemma .1 ( type1 )
leti + mingap ≤ VTP ≤ lsti + maxgap
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset
[ 10:17:17 ]C2TIdxTransaction
1 2 10
c ,dba,c e
17 18 …. 24
a
….….
Ex: < (b) (e) >
35
maxgap = 15
32
DefinitionLemma .1 ( type1 )
leti + mingap ≤ VTP ≤ lsti + maxgap
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset
[ 10:17:17 ]C2TIdxTransaction
1 2 10
c ,dba,c e
17 20 ….24….….
Ex: < (b) (e) >
35
mingap = 3
32
DefinitionLemma .1 ( type1 )
leti + mingap ≤ VTP ≤ lsti + maxgap
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset
[ 10:17:17 ]C2TIdxTransaction
1 2 10
ba,c e
17 20 ….….….
Ex: < (b) (e) >
3532
VTP
DefinitionLemma .1 ( type1 )
leti + mingap ≤ VTP ≤ lsti + maxgap
1 2 10
ba,c e
17 20 ….….…. 3532
VTP
DefinitionLemma .2 ( type2 )
leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }
DefinitionLemma .2 ( type2 )
leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset
[ 10:17:17 ]C2TIdxTransaction
1 2 10
c ,dba,c e
17 24….
Ex: < (b) (e) >
35
DefinitionLemma .2 ( type2 )
leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >itemset
[ 10:17:17 ]C2TIdxTransaction
1 2 10
ba,c e
17….
Ex: < (b) (e) >
An example
< 5(a) 10(d) 21(c,d) 26(e) >< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >sequences
C4
C3
C2
C1
Tran, ID
1f3e
3b
1g
3d3c
3a
SupportItem
min_Sup=2
An example
< 5(a) 10(d) 21(c,d) 26(e) >< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >sequences
C4
C3
C2
C1
Tran, ID
[ 6:6:6 , 18:18:18 ][ 5:5:5 ]
[ 5:5:5 , 31:31:31 ]
min_Sup=2
<( a )> -TIdx
swin = 2
mingap = 3
maxgap = 15
duration = 25
Time-constraints
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3
30
duration = 25
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3
30
1. leti + mingap ≤ VTP ≤ lsti + maxgap
8 ≤ VTP ≤ 20
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3
30
1. leti + mingap ≤ VTP ≤ lsti + maxgap
8 ≤ VTP ≤ 20
<( a )( b )> 1
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3
2. leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }
3 ≤ VTP ≤ 7
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3
2. leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }
3 ≤ VTP ≤ 7
<( a ,c )> 1
<( a ,f )> 1
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3 56
duration = 25
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3 56
1. leti + mingap ≤ VTP ≤ lsti + maxgap
33 ≤ VTP ≤ 46
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3 56
1. leti + mingap ≤ VTP ≤ lsti + maxgap
33 ≤ VTP ≤ 46
<( a )( f )> 1
An example
[ 5:5:5 , 31:31:31 ]
TIdxC1
Tran, ID
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
a
item
5
aa ,fc b
18 31 45
f
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
3 56
<( a )( b )> 1
<( a )( f )> 1
<( a ,c )> 1
An example
[ 6:6:6 , 18:18:18 ]
TIdxC2
Tran, ID
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
sequences
a
item
10
aba ,c e
17 18 24
c ,d
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
6
1. leti + mingap ≤ VTP ≤ lsti + maxgap
9 ≤ VTP ≤ 21
An example
[ 6:6:6 , 18:18:18 ]
TIdxC2
Tran, ID
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
sequences
a
item
10
aba ,c e
17 18 24
c ,d
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
6
1. leti + mingap ≤ VTP ≤ lsti + maxgap
9 ≤ VTP ≤ 21
<( a )( b )> 1
<( a )( e )> 1
<( a )( a )> 1
An example
[ 6:6:6 , 18:18:18 ]
TIdxC2
Tran, ID
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
sequences
a
item
10
aba ,c e
17 18 24
c ,d
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
6
2. leti - swin ≤ VTP ≤ minimum of { lsti + swin , iti + duration }
4 ≤ VTP ≤ 8
<( a ,c )> 1
An example
[ 6:6:6 , 18:18:18 ]
TIdxC2
Tran, ID
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
sequences
a
item
10
aba ,c e
17 18 24
c ,d
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
6
1. leti + mingap ≤ VTP ≤ lsti + maxgap
21 ≤ VTP ≤ 33
<( a )( c )> 1
<( a )( d )> 1
An example
[ 6:6:6 , 18:18:18 ]
TIdxC2
Tran, ID
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
sequences
a
item
10
aba ,c e
17 18 24
c ,d
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
6
<( a )( a )> 1<( a )( b )> 1<( a )( c )> 1<( a )( d )> 1<( a )( e )> 1<( a ,c )> 1
An example
[ 5:5:5 ]
TIdxC4
Tran, ID
< 5(a) 10(d) 21(c,d) 26(e) >
sequences
a
item
10
eda c ,d
21 26
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
5
1. leti + mingap ≤ VTP ≤ lsti + maxgap
8 ≤ VTP ≤ 20
<( a )( d )> 1
An example
< 5(a) 10(d) 21(c,d) 26(e) >< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >sequences
C4
C3
C2
C1
Tran, ID
[ 6:6:6 , 18:18:18 ][ 5:5:5 ]
[ 5:5:5 , 31:31:31 ]
min_Sup=2
<( a )> -TIdx
<( a )( a )> 1<( a )( b )> 2<( a )( c )> 1<( a )( d )> 2<( a )( e )> 1<( a ,c )> 2
An example
< 5(a) 10(d) 21(c,d) 26(e) >
< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
C4
C3
C2
C1
Tran, ID
[ 6:6:6][ 3:3:5 ]
min_Sup=2
<( a ,c )> -TIdx
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
<( a ,c )( b)> 2
An example
< 5(a) 10(d) 21(c,d) 26(e) >
< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >
sequences
C4
C3
C2
C1
Tran, ID
[ 6:6:10][ 3:3:18 ]
min_Sup=2
<( a ,c )( b)> -TIdx
swin = 2mingap = 3
maxgap = 15duration = 25
Time-constraints
No morepatterns can be
formed
An example
< 5(a) 10(d) 21(c,d) 26(e) >< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >
< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >
< 3(c) 5(a,f) 18(b) 31(a) 45(f) >sequences
C4
C3
C2
C1
Tran, ID
(a )( b)
a
(a ,c)
(a )( d)
(a ,c)( b)
Frequent itemset
(c )( b)( a)
(c )( b)
(c )( e)
c
Frequent itemset
Min_Sup=2
(b )( e)
(b )( d)
b
(b )( a)
(b )( e)( d)
Frequent itemsetd
Frequent itemset
e
(e )( d)
Frequent itemset
Dealing with extra-large databases
Performance analysis and experimental evaluation
Average number of transaction per data -sequence = 10
Average number of items per transaction = 2.5
Average size of potentially sequential patterns = 4
Average size of potentially frequent itemsets =1.25
Number of data sequences in database = 100k
Performance analysis and experimental evaluation
Average number of transaction per data -sequence = 10
Average number of items per transaction = 2.5
Average size of potentially sequential patterns = 4
Average size of potentially frequent itemsets =1.25
Number of data sequences in database = 100k
Performance analysis and experimental evaluation
Average number of transaction per data -sequence = 10
Average number of items per transaction = 2.5
Average size of potentially sequential patterns = 4
Average size of potentially frequent itemsets =1.25
Number of data sequences in database = 100k
Performance analysis and experimental evaluation
Average number of transaction per data -sequence = 10
Average number of items per transaction = 2.5
Average size of potentially sequential patterns = 4
Average size of potentially frequent itemsets =1.25
Number of data sequences in database = 100k
Conclusions
This paper has presented METISP, a time -indexing algorithm formining sequential patterns with various time constraints , inclu dingminimum-, maximum-, and exact-gaps, sliding time-windows, anddurations. METISP effectively shrinks the search space of potent ialpatterns.