23
What’s in a Label? Business value of “soft” vs “hard” cluster ensembles solutions-2 Nicole Huyghe & Anita Prinzie

Sawtooth 2012 what's in a label

Embed Size (px)

DESCRIPTION

Sawtooth Conference 2012 Orlando - FlordidaWhat's in a label? The business value of hard versus soft clusteringby Nicole Huyghe and Anita Prinzie

Citation preview

Page 1: Sawtooth 2012   what's in a label

What’s in a Label? Business value of “soft” vs “hard” cluster

ensemblessolutions-2

Nicole Huyghe & Anita Prinzie

Page 2: Sawtooth 2012   what's in a label

Answers the who and the why

Page 3: Sawtooth 2012   what's in a label

Theme 1

Theme 2

...

Theme 3

Theme 9

Theme 10

Cluster Ensemble

Page 4: Sawtooth 2012   what's in a label

HARD OR SOFT CLUSTER ENSEMBLE

Page 5: Sawtooth 2012   what's in a label

Stability Integrity Accuracy Size

Page 6: Sawtooth 2012   what's in a label

Stability

Similarity Index (Lange et al, 2004) indicates the percentage of pairs of observations that belong to the same cluster in both clustering C and clustering C’.

Page 7: Sawtooth 2012   what's in a label

Cluster Integrity – Heterogeneity

Total separation of clusters: based on the distance between cluster centers

Page 8: Sawtooth 2012   what's in a label

Cluster Integrity - Homogeneity

Scatter (compactness): average ratio of the cluster variance to the variance of the dataset.

Page 9: Sawtooth 2012   what's in a label

Accuracy

Adjusted Rand Index (Hubert and Arabie, 1985): level of agreement between the predicted segment and the real segment correcting for the expected level of agreement.

1 2

38

7

9

4

5

6

1

2

38

7

9

4

56

Reality Prediction

Page 10: Sawtooth 2012   what's in a label

Size

Uniformity deviation: average deviation from each segment from uniform segment size (1/number of segments).

Page 11: Sawtooth 2012   what's in a label

Rheumatism

Osteoporosis

Software journey

Page 12: Sawtooth 2012   what's in a label

Stability Heterogeneity

Accuracy Homogeneity

H>S H>S

H>S H>SS>H

S>HS>H

Page 13: Sawtooth 2012   what's in a label

LC gives smaller segments

Soft CCEA

Soft LC

Hard LC

Hard CCEA

Rheumatism

OsteoporosisSoftware journey

Soft CCEA

Soft LC

Hard LC

Hard CCEA

Page 14: Sawtooth 2012   what's in a label

MIXED EVIDENCE

Page 15: Sawtooth 2012   what's in a label

Fixed Factors

x 10100 100 100 100

Page 16: Sawtooth 2012   what's in a label

High

confidence

Low

confidence

High

confidence

Low

confidence

Page 17: Sawtooth 2012   what's in a label

Sim. Index soft > hard

Sim. Index hard > soft

Stability: SOFT is better

Strong similarity

Weak similarity

High confidence

Low confidence

Page 18: Sawtooth 2012   what's in a label

Homogeneity: SOFT is better

Scatter hard > soft

Strong similarity

Weak similarity

High confidence

Low confidence

Page 19: Sawtooth 2012   what's in a label

Heterogeneity: Hard is better

Tot. Sep. soft > hard

Strong similarity

Weak similarity

High confidence

Low confidence

Page 20: Sawtooth 2012   what's in a label

Size: Hard is better

Strong similarity

Weak similarity

Uni. dev. soft > hard

High confidence

Low confidence

Page 21: Sawtooth 2012   what's in a label

HARD ENSEMBLES GIVE BETTER BUSINESS SEGMENTS

Page 22: Sawtooth 2012   what's in a label

risingquestionsdo we cause

Anita Prinzie, Nicole [email protected]

www.solutions2.be

Page 23: Sawtooth 2012   what's in a label

References

• Fred and Jain, Combining Multiple Clustering using Evidence Accumulation (2005), IEEE Transactions on Pattern analysis and Machine Intelligence, 27(6), 835-850.

• Lange, T., Roth., V., Braun L. And Buhmann J.M. (2004) , Stability-based validation of Clustering Solutions, Neural Computation, 16, 1299-1323.

• Haldiki, M.,Vazirgiannis M. and Batistakis, Y. (2000), Quality Scheme Assessment in the Clustering Process, Proc. Of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, 265-276.

• Hubert, L. And Arabie, P. (1985) Comparing partitions, Journal of Classification, 193-218.

• Nieweglowski, L., CLV package (2007), R software.• Martin, A., Quinn, K.M. And Park, J.H., Markov Chain Monte Carlo

Package (MCMCpack) (2003-2012), R software.