Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of...

Time Series Shapelets: A New Primitive for Data Mining

Lexiang Ye and Eamonn KeoghUniversity of California, Riverside

KDD 2009

Presented by: Zhenhui Li

Classification in Time Series

• Application: Finance, Medicine

• 1-Nearest Neighbor– Pros: accurate, robust, simple– Cons: time and space complexity (lazy learning); results are not

interpretable

0 200 400 600 800 1000 1200

Solution

• Shapelets– time series subsequence– representative of a class– discriminative from other classes

MOTIVATING EXAMPLE

false nettles

stinging nettles

false nettles

Shapelet

stinging nettlesfalse nettles stinging nettles

Leaf Decision Tree

Shapelet Dictionary

yes no

BRUTE-FORCE ALGORITHM

Candidates Pool

Extract subsequences of all possible lengths

Testing the utility of a candidate shapelet

• Arrange the time series objects– based on the distance from candidate

• Find the optimal split point (maximal information gain)

• Pick the candidate achieving best utility as the shapelet

Split Point

candidate

Information gain

Problem

• Total number of candidate

• Each candidate: compute the distance between this candidate and each training sample

• Trace dataset– 200 instances, each of length 275– 7,480,200 shapelet candidates– approximately three days

MAXLEN

MINLENl DTi

lT )1(

Candidates Pool

Speedup

• Distance calculations from time series objects to shapelet candidates are the most expensive part

• Reduce the time in two ways– Distance Early Abandon

• reduce the distance computation time between two time series

– Admissible Entropy Pruning• reduce the number of distance calculatations

candidate

DISTANCE EARLY ABANDON

0 10 20 30 40 50 60 70 80 90 100

best matching location Dist= 0.4Dist= 0.4S

0 10 20 30 40 50 60 70 80 90 100

calculation abandoned at this point

Dist> 0.4Dist> 0.4

Distance Early Abandon

• We only need the minimum Dist

• Method– Keep the best-so-far distance– Abandon the calculation if the current distance is

larger than best so far.

ADMISSIBLE ENTROPY PRUNING

Admissible Entropy Pruning

• We only need the best shapelet for each class• For a candidate shapelet

– We don’t need to calculate the distance for each training sample

– After calculating some training samples, the upper bound of information gain < best candidate shapelet

– Stop calculation– Try next candidate

false nettlesstinging nettles

I=0.42I=0.42

I= 0.29I= 0.29

false nettles stinging nettles

Leaf Decision Tree

Shapelet Dictionary

yes no

false nettles

stinging nettles

false nettles

Shapelet

stinging nettles

ClassificationClassification

EXPERIMENTAL EVALUATION

Performance Comparison

Original Lightning DatasetLength 2000

Training 2000

Testing 18000

Projectile Points

Shapelet Dictionary

(Clovis)

(Avonlea)

0 200 400

Arrowhead Decision Tree

Clovis Avonlea

Method Accuracy Time

Shapelet 0.80 0.33

Rotation Invariant Nearest Neighbor 0.68 1013

Wheat SpectrographySpectrography

0 200 400 600 800 1000 1200

one sample from each class

Wheat DatasetLength 1050

Training 49

Testing 276

2 4 0 1 3 6 5

III IV

100 200 3000

Shapelet Dictionary

Wheat Decision Tree

Shapelet 0.720 0.86

Nearest Neighbor 0.543 0.65

the Gun/NoGun Problem

Shapelet 0.933 0.016

Rotation Invariant Nearest Neighbor 0.913 0.064

0 50 100

238.94

Shapelet Dictionary

Gun Decision Tree

(No Gun)

No Gun

Conclusions

• Interpretable results

• more accurate/robust

• significantly faster at classification

Discussions - Comparison

Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu, “Discriminative Frequent Pattern Analysis for Effective Classification” (ICDE'07)

Hong Cheng, Xifeng Yan, Jiawei Han, and Philip S. Yu, "Direct Discriminative Pattern Mining for Effective Classification", (ICDE'08)

Similarities:• motivation: Discriminative frequent pattern = Shapelet• technique: Use upper bound of information gain to speed upDifferences:• application: general feature selection v.s. time series (no explicit features)• split node: binary (contain/not contain a pattern) v.s. numeric value (smaller/larger than a value)

Discussions – other topics

• Similar ideas could be applied to other research topics– graph– image– spatio-temporal– social network– ….

• Graph classification:

Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu, “Mining Significant GraphPatterns by Scalable Leap Search”, Proc. 2008 ACM SIGMOD Int. Conf. onManagement of Data (SIGMOD'08), Vancouver, BC, Canada, June 2008.

• moving object classification

Discriminative sub-movement

• Social network– classify normal/spamming users

• Social network– classify normal/spamming users– How to find discriminative features on social network?

• social network structure• user behaviour

• For different applications, this idea could be adapted to improve the performance; but not easily adapted.

Thank You

Question?

Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of...

Documents

Eamonn FingletonEamonn Fingleton

ClassificationContinued Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA 92521 eamonn@cs.ucr.edu

CS 170 INTRO TO ARTIFICIAL INTELLIGENCE CS 170 INTRO TO ARTIFICIAL INTELLIGENCE Dr Eamonn Keogh eamonn@cs.uci.edu eamonn/ Office:

Polar shapelets - CaltechAUTHORSauthors.library.caltech.edu/6358/1/MASmnras05.pdf · Polar shapelets Richard Massey1,2 and Alexandre Refregier3 1Institute of Astronomy, Madingley

Eamonn Slattery

Karen Devaney Saoirse Murray Olivia O’Hara Eamonn Sweeney

Effective Communication Eamonn M. M. Quigley, MD, FACG Houston Methodist Hospital Weill Cornell Medical College Eamonn M. M. Quigley, MD, FACG Houston

Time Series Shapelets: A New Primitive for Data Mininglexiangy/Shapelet/Shapelet.pdf · Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye ... requiring extensive parameter

Katrina Hilton, Noah Parker, Eamonn Powers, Jessica Wen

Accelerating the Dynamic Time Warping Distance Measure Using Logarithmic Arithmetic Joseph Tarango, Eamonn Keogh, Philip Brisk {jtarango,eamonn,philip}@cs.ucr.edu

Time Series Shapelets: A New Primitive for Data Miningalumni.cs.ucr.edu/~lexiangy/Shapelet/kdd2009shapelet.pdfTime Series Shapelets: A New Primitive for Data Mining ... especially

European Platform. Eamonn Power, TSSG

LIBERALISME KLASIK PERKENALAN SINGKAT EAMONN BUTLER

Text Similarity Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA 92521 eamonn@cs.ucr.edu

EAMONN BYRNE LANDSCAPE ARCHITECTUREd284f45nftegze.cloudfront.net/eblawebsite/EBLA_WEB... · 2014-01-24 · Eamonn Byrne Landscape Architecture (EBLA) is a consultancy providing services

Learning Shapelet Patterns from Network-based Time Series Dataychen/public/TII.pdf · 2019-03-10 · this way, the shapelets are detached from candidate segments and the learned shapelets

CS 260 Winter 2014 Eamonn Keogh’s Presentation of

Fingerprint-Matching Algorithm Using Polar Shapelets

Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA 92521 eamonn@cs.ucr.edu Who

Shapelets for gravitational lensing and galaxy morphology ... · system described inchapter 1as introduced byRefregier(2003) when we speak of »shapelets«. Exceptions, foremost the