Classification, Regression and Other Learning Methods CS240B Presentation Peter Huang June 4, 2014

Classification, Regression and Other Learning Methods

CS240B Presentation

Peter HuangJune 4, 2014

OutlineMotivation

Introduction to Data Streams and Concept Drift

Survey of Ensemble Methods: Bagging: KDD ’01: A Streaming Ensemble Algorithm

(SEA) for Large-Scale Classification Weighted Bagging: KDD ’03: Mining Concept-Drifting

Data Streams using Ensemble Classifiers Adaptive Boosting: KDD ’04: Fast and Light Boosting

for Adaptive Mining of Data Streams

Summary

Conclusion

MotivationSignificant amount of research recently has focused

on mining data streams

Real-world applications include: financial data analysis, credit card fraud, network monitoring, sensor networks, and many others

Algorithms for mining data streams have to overcome challenges not seen in traditional data mining, particularly performance and unending data sets

Traditional algorithms must be made non-blocking, fast and light, and must adapt to data stream issues

Data StreamsA Data Stream is a continuous stream of data

items, in the form of tuples or vectors, that arrive at a high rate, and are subject to unknown changes such as concept drift or shift

Algorithms that process data streams must be: Iterative – reading data sequentiallyEfficient – fast and light in computation/memorySingle-pass – account for surplus of dataAdaptive – account for concept driftAny-time – be able to provide best answer

continuously

Data Stream Classification Various type of methods are used to classify data streams

Single classifier Sliding window on recent data, fixed or variable Naive Bayes, C4.5, RIPPER Support vector, neural networks K-NN, linear regression

Decision Trees BOAT algorithm VFDT, Hoeffding tree CVFDT

Ensemble Methods Bagging Boosting Random Forest

Concept DriftConcept drift is an implicit property of data streams

Concept may change or drift over time due to sudden or gradual changes of external environment

Mining changes one of the core issues of data mining, useful in many real-world applications

Two types of concept change: gradual and shift

Methods to adapt to concept drift: Ensemble methods, majority or weight voting Exponential Forgetting, forgetting factor Replacement methods, create new classifier

Type of Concept DriftTwo types of concept change: gradual and shift

Shift: change in mean, class/distribution change

Gradual: change in mean and variance, trends

Ensemble ClassifiersEnsemble methods is one method of

classification that naturally handles concept drift

Combines the predictions of multiple base models, each learned using a base learner

Known that combining multiple models consistently outperforms individual models

Use either traditional averaging or weighted averaging to classify data stream items

Survey of Ensemble MethodsBagging: KDD ’01: A Streaming Ensemble

Algorithm (SEA) for Large-Scale Classification

Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers

Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams

KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale

Classification

Approach problem of large-scale or streaming classification by building committee or ensemble classifiers, each constructed on a subset of available data points

Basically introduces the concept of ensemble classification

Traditional scheme of averaging prediction used

Later improved in KDD ’03, KDD ’04, and more

Ensemble of ClassifiersFixed ensemble size, up to around 20-25

New classifier replaces least quality classifier in existing ensemble

Building blocks are decision trees constructed using C4.5

Operational parameter is whether to prune tree or not

In experiments, pruning decreased overall accuracy because of over-fitting

Adapts to concept drift by changing over time, follows Gaussian-like CDF gradual change

Streaming Ensemble Pseudocodewhile more data points are available

read d points, create training set D

build classifier Ci using D

evaluate Ci-1 on D

evaluate all classifiers in ensemble E on D

if E not fullinsert Ci-1 into E

else if Quality(Ci-1) > Quality(Ej) for some j

replace Ej with Ci-1

Quality is measured by ability to classify points in current test set

Replacement of Existing Classifiers

78 84 75 80 70 85

Existing Ensemble of Classifiers

Newly Trained Classifier

78 84 75 80 85 68

Average Ensemble Quality: 77.4 80.4

Next Trained ClassifierNew Ensemble of Classifiers

Experimental Results: Adult Data

Experimental Results: SEER Data

Experimental Results: Web Data

Experimental Results: Concept Drift

Survey of Ensemble Methods

Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification



KDD ’03: Mining Concept-Drifting Data Streams using Ensemble

ClassifiersGeneral framework for mining concept-drifting data

streams using ensemble of weighted classifiers

Basically improves the concept of ensemble classification by adding weighted averaging instead of traditional averaging

Weight is reversely proportional to classifiers expected error, or MSE, such that wi = MSRr – MSRi

Eliminates the effect of examples representing outdated concepts by assigning lower weight

Ensemble of ClassifiersFixed ensemble size, top K classifiers kept

New classifiers replaces less weighted classifiers in existing ensemble


Adapts to concept drift by removing and/or reducing weight of incorrect classifiers

Streaming Ensemble Pseudocodewhile more data points are available

read d points, create training set S

build classifier C’ from S

compute error rate of C’ via cross-validation on S

derive weight w’ for C’, w’ = MSEr – MSEi

for each classifier Ci in C:

apply Ci on S to derive MSEi

compute weight wi

C top K weight classifiers in C U {C’}

return C

Quality is measured by ability to classify points in current test set

Data Expiration Problem Identify in a timely manner those data in the training

set that are no longer consistent with the current concepts

Discards data after they become old, that is, after a fixed period of time T has passed since their arrival

If T is large, the training set is likely to contain outdated concepts, which reduces classification accuracy

If T is small, the training set may not have enough data, and as a result, the learned model will likely carry a large variance due to over-fitting.

Expiration Problem Illustrated


12 15 19 21 10 X

Existing Stream of Classifiers Train Example

13

Ensemble of Classifiers Used

Newer Classifiers on Right, Numbers Represents MSE

Error

New Classifier from Train

12 15 19 13 10

Experimental Results: Average Error

Experimental Results: Error Rates


Survey of Ensemble Methods

Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification



KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams

Novel Adaptive Boosting Ensemble method to solve continuous mining of data stream problem

Basically improves the concept of ensemble classification by boosting incorrectly classified samples

Weight of incorrect samples is wi = (1 – ej)/ej

Traditional scheme of averaging prediction used

Ensemble of ClassifiersFixed ensemble size, recent M classifiers kept

Boosting of incorrect sample weight provide a number of formal guarantees on performance


Adapts to concept drift by change detection, starting ensemble from scratch

Streaming Ensemble PseudocodeEb = {C1,…,Cm}, Bj = {(x1,y1),…,(xn,yn)}

while more data points are availableread n points, create training block Bj

compute ensemble prediction on each n point i

change detection: Eb {} if change detected

if Eb <> {}:

compute error rate of Eb on Bj

set new samples weight wi = (1 – ej)/ej

else: wi = 1

learn new classifier Cm+1 from Bj

update Eb Cm+1, remove C1 if m = M

Change DetectionTo detect change, check null hypothesis H0 and

alternative hypothesis H1

Two-stage method: first check significant test, second check hypothesis test


85 88 90 87 84 78

Existing Ensemble of Classifiers

New Classifier

86

Boosted Ensemble

Newer Classifiers on Right, Numbers Represents Accuracy

Boosted Classifier

88 90 87 84 86


Experimental Results: Comparison

Experimental Results: Time and Space

SummaryBagging: KDD ’01: A Streaming Ensemble

Algorithm (SEA) for Large-Scale Classification Introduced bagging ensemble for data stream

Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble ClassifiersAdds weighting to improve accuracy and handle

drift

Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data StreamsAdds boosting to further improve accuracy and

speed

Thank YouQuestions?

Sources Adams, Niall M., et al. "Efficient Streaming Classification

Methods." (2010).

Street, W. Nick, and Yong Seog Kim. "A streaming ensemble algorithm (SEA) for large-scale classification." Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001.

Wang, Haixun, et al. "Mining concept-drifting data streams using ensemble classifiers." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.

Chu, Fang, and Carlo Zaniolo. "Fast and light boosting for adaptive mining of data streams." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004. 282-292.

Documents

Classification, Regression and Other Learning Methods CS240B Presentation Peter Huang June 4, 2014