28
Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them Albert Bifet 1 , Jesse Read 2 , Indr ˙ e ˇ Zliobait ˙ e 3 Bernhard Pfahringer 4 , Geoff Holmes 4 1 Yahoo! Research Barcelona 2 Universidad Carlos III, Madrid, Spain 3 Aalto University and Helsinki Institute for Information Technology (HIIT), Finland 4 University of Waikato, Hamilton, New Zealand ECML-PKDD 2013, 25 September 2013

Pitfalls in benchmarking data stream classification and how to avoid them

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Pitfalls in benchmarking data stream classification and how to avoid them

Pitfalls in Benchmarking Data StreamClassification and How to Avoid Them

Albert Bifet1, Jesse Read2, Indre Zliobaite3

Bernhard Pfahringer4, Geoff Holmes4

1Yahoo! Research Barcelona2Universidad Carlos III, Madrid, Spain

3Aalto University and Helsinki Institute for Information Technology (HIIT), Finland4University of Waikato, Hamilton, New Zealand

ECML-PKDD 2013, 25 September 2013

Page 2: Pitfalls in benchmarking data stream classification and how to avoid them

Data Streams

Data StreamsI Sequence is potentially infiniteI High amount of data: sublinear spaceI High speed of arrival: sublinear time per exampleI Once an element from a data stream has been processed

it is discarded or archived

Big Data & Real Time

Page 3: Pitfalls in benchmarking data stream classification and how to avoid them

1. Motivation

Page 4: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Dataset

I Popular benchmark for testing adaptive classifiersI Collected from the Australian New South Wales Electricity

Market.I Contains 45,312 instances which record electricity prices

at 30 minute intervals.I The class label identifies the change of the price (UP or

DOWN) related to a moving average of the last 24 hours.

Page 5: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Dataset, Accuracy

0 1 2 3 4

·104

0

20

40

60

80

100

Time, instances

Acc

urac

y,%

VFDT Majority ClassNaive Bayes

Page 6: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Dataset, Accuracy

0 1 2 3 4

·104

0

20

40

60

80

100

Time, instances

Acc

urac

y,%

Magic Classifier VFDTMajority Class Naive Bayes

Page 7: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Dataset, Kappa Statistic

0 1 2 3 4

·104

0

20

40

60

80

100

Time, instances

Kap

paS

tatis

tic,%

VFDT Naive Bayes

Page 8: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Dataset, Kappa Statistic

0 1 2 3 4

·104

0

20

40

60

80

100

Time, instances

Kap

paS

tatis

tic,%

Magic Classifier VFDTNaive Bayes

Page 9: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Dataset, Accuracy

Algorithm name Acc. (%) Algorithm name Acc. (%)DDM 89.6* Local detection 80.4Learn++.CDS 88.5 Perceptron 79.1KNN-SPRT 88.0 AUE2 77.3GRI 88.0 ADWIN 76.6FISH3 86.2 EAE 76.6EDDM-IB1 85.7 Prop. method 76.1Magic classifier 85.3 Cont. λ-perc. 74.1ASHT 84.8 CALDS 72.5bagADWIN 82.8 TA-SVM 68.9DWM-NB 80.8* tested on a subset

Page 10: Pitfalls in benchmarking data stream classification and how to avoid them

2. Problem

Page 11: Pitfalls in benchmarking data stream classification and how to avoid them

No-Change classifier: Weather classifier

Prediction for tomorrow: the same astoday

Page 12: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Dataset, Accuracy

0 1 2 3 4

·104

0

20

40

60

80

100

Time, instances

Acc

urac

y,%

No-Change VFDTMajority Class Naive Bayes

Page 13: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Dataset, Kappa Statistic

0 1 2 3 4

·104

0

20

40

60

80

100

Time, instances

Kap

paS

tatis

tic,%

No-Change VFDTNaive Bayes

Page 14: Pitfalls in benchmarking data stream classification and how to avoid them

Characteristics of the Electricity Dataset

0.5 1 1.5 2 2.5 3 3.5 4 4.5

·104

20

30

40

50

60

Time, instances

Cla

sspr

ior,

%

Page 15: Pitfalls in benchmarking data stream classification and how to avoid them

Characteristics of the Electricity Dataset

20 40 60 80 100 120 140 160 180 200

0

0.5

1

Lag, instances

Aut

ocor

rela

tion

Page 16: Pitfalls in benchmarking data stream classification and how to avoid them

3. Proposal

Page 17: Pitfalls in benchmarking data stream classification and how to avoid them

New Evaluation for Stream Classifiers

Kappa Statistic

I p0: classifier’s prequential accuracyI pc : probability that a chance classifier makes a correct

prediction.I κ statistic

κ =p0 − pc

1 − pc

I κ = 1 if the classifier is always correctI κ = 0 if the predictions coincide with the correct ones as

often as those of the chance classifier

Page 18: Pitfalls in benchmarking data stream classification and how to avoid them

New Evaluation for Stream Classifiers

Kappa Plus Statistic

I p0: classifier’s prequential accuracyI pe: no-change classifier’s prequential accuracy

I κ+ statisticκ+ =

p0 − pe

1 − pe

I κ+ = 1 if the classifier is always correctI κ+ = 0 if the predictions coincide with the correct ones as

often as those of the no-change classifier

Page 19: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Market Dataset Accuracy

0 1 2 3 4

·104

60

80

100

Time, instances

Acc

urac

y,%

No-Change HATLev. Bagging

Page 20: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Market Dataset κ

0 1 2 3 4

·104

0

20

40

60

80

100

Time, instances

Kap

paS

tatis

tic,%

No-Change HATLev. Bagging

Page 21: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Market Dataset κ+

0 1 2 3 4

·104

−300

−200

−100

0

100

Time, instances

Kap

paP

lus

Sta

tistic

,%

No-Change HATLev. Bagging

Page 22: Pitfalls in benchmarking data stream classification and how to avoid them

SWT: Temporally Augmented Classifier

SWT: meta strategy that builds meta instances by augmentingthe original input attributes with the values of recent classlabels from the past

Pr [class is c] ≡ h(x t , ct−`, . . . , ct−1)

for the t-th test instance, where ` is the size of the slidingwindow over the most recent true labels.

Page 23: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Market Dataset κ+

0 1 2 3 4

·104

−300

−200

−100

0

100

Time, instances

Kap

paP

lus

Sta

tistic

,%

No-Change SWT HATSWT Lev. Bagging

Page 24: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Market Dataset κ+

0 1 2 3 4

·104

−300

−200

−100

0

100

Time, instances

Kap

paP

lus

Sta

tistic

,%

No-Change HATLev. Bagging

Page 25: Pitfalls in benchmarking data stream classification and how to avoid them

Electricity Market Dataset κ+

0 1 2 3 4

·104

−300

−200

−100

0

100

Time, instances

Kap

paP

lus

Sta

tistic

,%

No-Change SWT HATSWT Lev. Bagging

Page 26: Pitfalls in benchmarking data stream classification and how to avoid them

Forest Cover Type Dataset

0 2 4

·105

60

80

100

Time, instances

Acc

urac

y,%

No-Change HATLev. Bagging

0 2 4

·105

0

20

40

60

80

100

Time, instances

Kap

paS

tatis

tic,%

No-Change HATLev. Bagging

0 2 4

·105

−300

−200

−100

0

100

Time, instances

Kap

paP

lus

Sta

tistic

,%

No-Change HATLev. Bagging

0 2 4

·105

0

20

40

60

80

100

Time, instances

Acc

urac

y,%

No-Change SWT HATSWT Lev. Bagging

0 2 4

·105

0

20

40

60

80

100

Time, instances

Kap

paS

tatis

tic,%

No-Change SWT HATSWT Lev. Bagging

0 2 4

·105

−300

−200

−100

0

100

Time, instances

Kap

paP

lus

Sta

tistic

,%No-Change SWT HATSWT Lev. Bagging

Page 27: Pitfalls in benchmarking data stream classification and how to avoid them

Conclusions

Temporal dependence in data stream mining

I new κ+ measureI a wrapper classifier SWT

Pitfalls in Benchmarking Data StreamClassification and How to Avoid Them

Page 28: Pitfalls in benchmarking data stream classification and how to avoid them

Thanks!

Pitfalls in Benchmarking Data StreamClassification and How to Avoid Them