37
Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Embed Size (px)

Citation preview

Page 1: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Time Series Shapelets: A New Primitive for Data Mining

Lexiang Ye and Eamonn KeoghUniversity of California, Riverside

KDD 2009

Presented by: Zhenhui Li

Page 2: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Classification in Time Series

• Application: Finance, Medicine

• 1-Nearest Neighbor– Pros: accurate, robust, simple– Cons: time and space complexity (lazy learning); results are not

interpretable

0 200 400 600 800 1000 1200

Page 3: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Solution

• Shapelets– time series subsequence– representative of a class– discriminative from other classes

Page 4: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

MOTIVATING EXAMPLE

Page 5: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

false nettles

stinging nettles

false nettles

Shapelet

stinging nettlesfalse nettles stinging nettles

Leaf Decision Tree

Shapelet Dictionary

5.1

yes no

I

I

0 1

Page 6: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

BRUTE-FORCE ALGORITHM

Page 7: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

ca

Candidates Pool

Extract subsequences of all possible lengths

Page 8: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Testing the utility of a candidate shapelet

• Arrange the time series objects– based on the distance from candidate

• Find the optimal split point (maximal information gain)

• Pick the candidate achieving best utility as the shapelet

Split Point

0

candidate

Information gain

Page 9: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Problem

• Total number of candidate

• Each candidate: compute the distance between this candidate and each training sample

• Trace dataset– 200 instances, each of length 275– 7,480,200 shapelet candidates– approximately three days

MAXLEN

MINLENl DTi

i

lT )1(

Candidates Pool

Page 10: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Speedup

• Distance calculations from time series objects to shapelet candidates are the most expensive part

• Reduce the time in two ways– Distance Early Abandon

• reduce the distance computation time between two time series

– Admissible Entropy Pruning• reduce the number of distance calculatations

0

candidate

Page 11: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

DISTANCE EARLY ABANDON

Page 12: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

0 10 20 30 40 50 60 70 80 90 100

T

S

Page 13: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

0 10 20 30 40 50 60 70 80 90 100

best matching location Dist= 0.4Dist= 0.4S

T

Page 14: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

0 10 20 30 40 50 60 70 80 90 100

T

S

calculation abandoned at this point

Dist> 0.4Dist> 0.4

Page 15: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Distance Early Abandon

• We only need the minimum Dist

• Method– Keep the best-so-far distance– Abandon the calculation if the current distance is

larger than best so far.

Page 16: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

ADMISSIBLE ENTROPY PRUNING

Page 17: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Admissible Entropy Pruning

• We only need the best shapelet for each class• For a candidate shapelet

– We don’t need to calculate the distance for each training sample

– After calculating some training samples, the upper bound of information gain < best candidate shapelet

– Stop calculation– Try next candidate

Page 18: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

0

false nettlesstinging nettles

Page 19: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

0

0

I=0.42I=0.42

I= 0.29I= 0.29

Page 20: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

false nettles stinging nettles

Leaf Decision Tree

Shapelet Dictionary

5.1

yes no

I

I

0 1

false nettles

stinging nettles

false nettles

false nettles

Shapelet

stinging nettles

ClassificationClassification

Page 21: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

EXPERIMENTAL EVALUATION

Page 22: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Performance Comparison

Original Lightning DatasetLength 2000

Training 2000

Testing 18000

Page 23: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Projectile Points

Page 24: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

11.24

85.47

Shapelet Dictionary

(Clovis)

(Avonlea)

I

II

0 200 400

0

1.0

Arrowhead Decision Tree

I

21

II

0

Clovis Avonlea

Method Accuracy Time

Shapelet 0.80 0.33

Rotation Invariant Nearest Neighbor 0.68 1013

Page 25: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Wheat SpectrographySpectrography

0 200 400 600 800 1000 1200

0

0.5

1

one sample from each class

Wheat DatasetLength 1050

Training 49

Testing 276

Page 26: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

2 4 0 1 3 6 5

I

II

III IV

V

VI

100 200 3000

0.1

0.2

0.3

0.4

0.0

I

II

III

IV

V

VI

Shapelet Dictionary

Wheat Decision Tree

Method Accuracy Time

Shapelet 0.720 0.86

Nearest Neighbor 0.543 0.65

Page 27: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

the Gun/NoGun Problem

Method Accuracy Time

Shapelet 0.933 0.016

Rotation Invariant Nearest Neighbor 0.913 0.064

0 50 100

0

238.94

Shapelet Dictionary

Gun Decision Tree

(No Gun)

No Gun

Gun

I

I

1 0

Page 28: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Conclusions

• Interpretable results

• more accurate/robust

• significantly faster at classification

Page 29: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Discussions - Comparison

Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu, “Discriminative Frequent Pattern Analysis for Effective Classification” (ICDE'07)

Hong Cheng, Xifeng Yan, Jiawei Han, and Philip S. Yu, "Direct Discriminative Pattern Mining for Effective Classification", (ICDE'08)

Similarities:• motivation: Discriminative frequent pattern = Shapelet• technique: Use upper bound of information gain to speed upDifferences:• application: general feature selection v.s. time series (no explicit features)• split node: binary (contain/not contain a pattern) v.s. numeric value (smaller/larger than a value)

Page 30: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Discussions – other topics

• Similar ideas could be applied to other research topics– graph– image– spatio-temporal– social network– ….

Page 31: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Discussions – other topics

• Graph classification:

Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu, “Mining Significant GraphPatterns by Scalable Leap Search”, Proc. 2008 ACM SIGMOD Int. Conf. onManagement of Data (SIGMOD'08), Vancouver, BC, Canada, June 2008.

Page 32: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Discussions – other topics

• moving object classification

Discriminative sub-movement

Page 33: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Discussions – other topics

• Social network– classify normal/spamming users

Page 34: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Discussions – other topics

Page 35: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Discussions – other topics

• Social network– classify normal/spamming users– How to find discriminative features on social network?

• social network structure• user behaviour

Page 36: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Discussions – other topics

• For different applications, this idea could be adapted to improve the performance; but not easily adapted.

Page 37: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

Thank You

Question?