20
Barnan Das School of Electrical Engineering and Computer Science Washington State University wRACOG: A Gibbs Sampling-Based Oversampling Technique Barnan Das, Narayanan C. Krishnan, Diane J. Cook

wRACOG: A Gibbs Sampling-Based Oversampling Technique

Embed Size (px)

DESCRIPTION

This paper was presented at the International Conference on Data Mining, 2013.

Citation preview

Page 1: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Barnan DasSchool of Electrical Engineering and Computer Science

Washington State University

wRACOG: A Gibbs Sampling-Based Oversampling TechniqueBarnan Das, Narayanan C. Krishnan, Diane J. Cook

Page 2: wRACOG: A Gibbs Sampling-Based Oversampling Technique

2

Imbalanced Class Distribution

Page 3: wRACOG: A Gibbs Sampling-Based Oversampling Technique

3

Automated Prompting for Older Adults

Page 4: wRACOG: A Gibbs Sampling-Based Oversampling Technique

4

Automated Prompting for Older Adults

Page 5: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Class Distribution

5

149

3831

Total number of data points

3980

Page 6: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Solution?

6

Preprocessing

Sampling• Over-sampling the minority class• Under-sampling the majority class

Oversampling• Spatial location of samples in Euclidean space

Page 7: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Proposed Approach

7

Preprocessing technique to oversample minority class

Approximate discrete probability distribution using

Generate new minority class data points using

Chow-Liu’s algorithm Gibbs sampling

Page 8: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Approximating Discrete Probability Distribution

8

Minority Class

Mutual Information Between Attributes

I (xi,xj)i = 1,2,…(n-1)j = 2,3,…,ni < j

Maximum-weighted Dependence Tree

Chow-Liu Dependence Tree

Page 9: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Gibbs Sampling

9

For all attributes

Chow-Liu Dependence Tree

Page 10: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Gibbs Sampling

10

Minority Class Samples

Majority Class Samples

Markov Chains

Page 11: wRACOG: A Gibbs Sampling-Based Oversampling Technique

(wrapper-based)RApidly COnverging Gibbs sampler: RACOG & wRACOG

11

Differ in sample selection from Markov chains RACOG:• Based on burn-in and lag• Stopping criteria: predefined number of iterations• Effectiveness of new samples is not judged

wRACOG:• Iterative training on dataset, addition of

misclassified data points• Stopping criteria: No further improvement of

performance measure (TP rate)

Page 12: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Experimental Setup

12

Datasets

• prompting• abalone• car• nursery• letter• connect-4

Classifiers

• C4.5 decision tree

• SVM• k-Nearest

Neighbor• Logistic

Regression

Other Methods

• SMOTE• SMOTEBoost• RUSBoost

Page 13: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Results (Sensitivity)

13

Page 14: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Results (G-mean)

14

Page 15: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Results (ROC)

15

Page 16: wRACOG: A Gibbs Sampling-Based Oversampling Technique

New Samples Generated

16

Page 17: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Iterations of Gibbs Sampler

17

Page 18: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Conclusion

18

• Oversampling technique to address imbalanced classes

• Takes probability distribution of minority class into account

• Performs better than other sampling methods

Page 19: wRACOG: A Gibbs Sampling-Based Oversampling Technique

19

Page 20: wRACOG: A Gibbs Sampling-Based Oversampling Technique

Backup Slides

20