Upload
barnan-das
View
253
Download
2
Tags:
Embed Size (px)
DESCRIPTION
This paper was presented at the International Conference on Data Mining, 2013.
Citation preview
Barnan DasSchool of Electrical Engineering and Computer Science
Washington State University
wRACOG: A Gibbs Sampling-Based Oversampling TechniqueBarnan Das, Narayanan C. Krishnan, Diane J. Cook
2
Imbalanced Class Distribution
3
Automated Prompting for Older Adults
4
Automated Prompting for Older Adults
Class Distribution
5
149
3831
Total number of data points
3980
Solution?
6
Preprocessing
Sampling• Over-sampling the minority class• Under-sampling the majority class
Oversampling• Spatial location of samples in Euclidean space
Proposed Approach
7
Preprocessing technique to oversample minority class
Approximate discrete probability distribution using
Generate new minority class data points using
Chow-Liu’s algorithm Gibbs sampling
Approximating Discrete Probability Distribution
8
Minority Class
Mutual Information Between Attributes
I (xi,xj)i = 1,2,…(n-1)j = 2,3,…,ni < j
Maximum-weighted Dependence Tree
Chow-Liu Dependence Tree
Gibbs Sampling
9
For all attributes
Chow-Liu Dependence Tree
Gibbs Sampling
10
Minority Class Samples
Majority Class Samples
Markov Chains
(wrapper-based)RApidly COnverging Gibbs sampler: RACOG & wRACOG
11
Differ in sample selection from Markov chains RACOG:• Based on burn-in and lag• Stopping criteria: predefined number of iterations• Effectiveness of new samples is not judged
wRACOG:• Iterative training on dataset, addition of
misclassified data points• Stopping criteria: No further improvement of
performance measure (TP rate)
Experimental Setup
12
Datasets
• prompting• abalone• car• nursery• letter• connect-4
Classifiers
• C4.5 decision tree
• SVM• k-Nearest
Neighbor• Logistic
Regression
Other Methods
• SMOTE• SMOTEBoost• RUSBoost
Results (Sensitivity)
13
Results (G-mean)
14
Results (ROC)
15
New Samples Generated
16
Iterations of Gibbs Sampler
17
Conclusion
18
• Oversampling technique to address imbalanced classes
• Takes probability distribution of minority class into account
• Performs better than other sampling methods
19
Backup Slides
20