Upload
elisha-follett
View
218
Download
0
Embed Size (px)
Citation preview
Classification, Regression and Other Learning Methods
CS240B Presentation
Peter HuangJune 4, 2014
OutlineMotivation
Introduction to Data Streams and Concept Drift
Survey of Ensemble Methods: Bagging: KDD ’01: A Streaming Ensemble Algorithm
(SEA) for Large-Scale Classification Weighted Bagging: KDD ’03: Mining Concept-Drifting
Data Streams using Ensemble Classifiers Adaptive Boosting: KDD ’04: Fast and Light Boosting
for Adaptive Mining of Data Streams
Summary
Conclusion
MotivationSignificant amount of research recently has focused
on mining data streams
Real-world applications include: financial data analysis, credit card fraud, network monitoring, sensor networks, and many others
Algorithms for mining data streams have to overcome challenges not seen in traditional data mining, particularly performance and unending data sets
Traditional algorithms must be made non-blocking, fast and light, and must adapt to data stream issues
Data StreamsA Data Stream is a continuous stream of data
items, in the form of tuples or vectors, that arrive at a high rate, and are subject to unknown changes such as concept drift or shift
Algorithms that process data streams must be: Iterative – reading data sequentiallyEfficient – fast and light in computation/memorySingle-pass – account for surplus of dataAdaptive – account for concept driftAny-time – be able to provide best answer
continuously
Data Stream Classification Various type of methods are used to classify data streams
Single classifier Sliding window on recent data, fixed or variable Naive Bayes, C4.5, RIPPER Support vector, neural networks K-NN, linear regression
Decision Trees BOAT algorithm VFDT, Hoeffding tree CVFDT
Ensemble Methods Bagging Boosting Random Forest
Concept DriftConcept drift is an implicit property of data streams
Concept may change or drift over time due to sudden or gradual changes of external environment
Mining changes one of the core issues of data mining, useful in many real-world applications
Two types of concept change: gradual and shift
Methods to adapt to concept drift: Ensemble methods, majority or weight voting Exponential Forgetting, forgetting factor Replacement methods, create new classifier
Type of Concept DriftTwo types of concept change: gradual and shift
Shift: change in mean, class/distribution change
Gradual: change in mean and variance, trends
Ensemble ClassifiersEnsemble methods is one method of
classification that naturally handles concept drift
Combines the predictions of multiple base models, each learned using a base learner
Known that combining multiple models consistently outperforms individual models
Use either traditional averaging or weighted averaging to classify data stream items
Survey of Ensemble MethodsBagging: KDD ’01: A Streaming Ensemble
Algorithm (SEA) for Large-Scale Classification
Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers
Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams
KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale
Classification
Approach problem of large-scale or streaming classification by building committee or ensemble classifiers, each constructed on a subset of available data points
Basically introduces the concept of ensemble classification
Traditional scheme of averaging prediction used
Later improved in KDD ’03, KDD ’04, and more
Ensemble of ClassifiersFixed ensemble size, up to around 20-25
New classifier replaces least quality classifier in existing ensemble
Building blocks are decision trees constructed using C4.5
Operational parameter is whether to prune tree or not
In experiments, pruning decreased overall accuracy because of over-fitting
Adapts to concept drift by changing over time, follows Gaussian-like CDF gradual change
Streaming Ensemble Pseudocodewhile more data points are available
read d points, create training set D
build classifier Ci using D
evaluate Ci-1 on D
evaluate all classifiers in ensemble E on D
if E not fullinsert Ci-1 into E
else if Quality(Ci-1) > Quality(Ej) for some j
replace Ej with Ci-1
Quality is measured by ability to classify points in current test set
Replacement of Existing Classifiers
78 84 75 80 70 85
Existing Ensemble of Classifiers
Newly Trained Classifier
78 84 75 80 85 68
Average Ensemble Quality: 77.4 80.4
Next Trained ClassifierNew Ensemble of Classifiers
Experimental Results: Adult Data
Experimental Results: SEER Data
Experimental Results: Web Data
Experimental Results: Concept Drift
Survey of Ensemble Methods
Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification
Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers
Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams
KDD ’03: Mining Concept-Drifting Data Streams using Ensemble
ClassifiersGeneral framework for mining concept-drifting data
streams using ensemble of weighted classifiers
Basically improves the concept of ensemble classification by adding weighted averaging instead of traditional averaging
Weight is reversely proportional to classifiers expected error, or MSE, such that wi = MSRr – MSRi
Eliminates the effect of examples representing outdated concepts by assigning lower weight
Ensemble of ClassifiersFixed ensemble size, top K classifiers kept
New classifiers replaces less weighted classifiers in existing ensemble
Building blocks are decision trees constructed using C4.5
Adapts to concept drift by removing and/or reducing weight of incorrect classifiers
Streaming Ensemble Pseudocodewhile more data points are available
read d points, create training set S
build classifier C’ from S
compute error rate of C’ via cross-validation on S
derive weight w’ for C’, w’ = MSEr – MSEi
for each classifier Ci in C:
apply Ci on S to derive MSEi
compute weight wi
C top K weight classifiers in C U {C’}
return C
Quality is measured by ability to classify points in current test set
Data Expiration Problem Identify in a timely manner those data in the training
set that are no longer consistent with the current concepts
Discards data after they become old, that is, after a fixed period of time T has passed since their arrival
If T is large, the training set is likely to contain outdated concepts, which reduces classification accuracy
If T is small, the training set may not have enough data, and as a result, the learned model will likely carry a large variance due to over-fitting.
Expiration Problem Illustrated
Replacement of Existing Classifiers
12 15 19 21 10 X
Existing Stream of Classifiers Train Example
13
Ensemble of Classifiers Used
Newer Classifiers on Right, Numbers Represents MSE
Error
New Classifier from Train
12 15 19 13 10
Experimental Results: Average Error
Experimental Results: Error Rates
Experimental Results: Concept Drift
Survey of Ensemble Methods
Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification
Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers
Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams
KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams
Novel Adaptive Boosting Ensemble method to solve continuous mining of data stream problem
Basically improves the concept of ensemble classification by boosting incorrectly classified samples
Weight of incorrect samples is wi = (1 – ej)/ej
Traditional scheme of averaging prediction used
Ensemble of ClassifiersFixed ensemble size, recent M classifiers kept
Boosting of incorrect sample weight provide a number of formal guarantees on performance
Building blocks are decision trees constructed using C4.5
Adapts to concept drift by change detection, starting ensemble from scratch
Streaming Ensemble PseudocodeEb = {C1,…,Cm}, Bj = {(x1,y1),…,(xn,yn)}
while more data points are availableread n points, create training block Bj
compute ensemble prediction on each n point i
change detection: Eb {} if change detected
if Eb <> {}:
compute error rate of Eb on Bj
set new samples weight wi = (1 – ej)/ej
else: wi = 1
learn new classifier Cm+1 from Bj
update Eb Cm+1, remove C1 if m = M
Change DetectionTo detect change, check null hypothesis H0 and
alternative hypothesis H1
Two-stage method: first check significant test, second check hypothesis test
Replacement of Existing Classifiers
85 88 90 87 84 78
Existing Ensemble of Classifiers
New Classifier
86
Boosted Ensemble
Newer Classifiers on Right, Numbers Represents Accuracy
Boosted Classifier
88 90 87 84 86
Experimental Results: Concept Drift
Experimental Results: Comparison
Experimental Results: Time and Space
SummaryBagging: KDD ’01: A Streaming Ensemble
Algorithm (SEA) for Large-Scale Classification Introduced bagging ensemble for data stream
Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble ClassifiersAdds weighting to improve accuracy and handle
drift
Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data StreamsAdds boosting to further improve accuracy and
speed
Thank YouQuestions?
Sources Adams, Niall M., et al. "Efficient Streaming Classification
Methods." (2010).
Street, W. Nick, and Yong Seog Kim. "A streaming ensemble algorithm (SEA) for large-scale classification." Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001.
Wang, Haixun, et al. "Mining concept-drifting data streams using ensemble classifiers." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.
Chu, Fang, and Carlo Zaniolo. "Fast and light boosting for adaptive mining of data streams." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004. 282-292.