EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED...

EXPLORATORY TOOLS IN MODEL-BASEDCLUSTERING

L.A. Garcıa-Escudero, A. Gordaliza, C. Matran and A. Mayo-IscarDpto. de Estadıstica e I. O. Universidad de Valladolid

(SPAIN)

1.- Introduction

• Two key questions:

¦ How to adequately choose the number of groups in a clusteringproblem?

¦ How to measure the strength of data point cluster assignments?

• It is impossible to answer these two questions without:

¦ Stating clearly which is the probabilistic model assumed.¦ Putting constrains on the allowed clusters scatters.¦ Stating clearly what we understand by noise.

2.- Model Based Clustering:

• Many statistical practitioners view the Cluster Analysis as a collectionof mostly heuristic techniques for partitioning multivariate data.

• This view relies on the fact that most of the cluster techniques are notexplicitly based on a probabilistic model:

“...lead the naive investigator into believing that he or she didnot make any assumption at all, and that the results thereforeare ‘objective’...” (Flury 1997, page 123)

⇒ A properly stated underlying probabilistic model is convenient

• Two model-based clustering approaches:

¦ Mixture approach:

[ k∑

πjφ(xi; θj)]

(assign xi to cluster j whenever πjφ(x; θj) > πlφ(x; θl) for l 6= j).

¦ “Crisp” (0-1) approach:

i∈Rj

φ(xi; θj)

(Rj indexes of the xi’s assigned to cluster j).

¦ Mixture approach:

[ k∑

πjφ(xi; θj)]⇒ EM-algorithm

(assign xi to cluster j whenever πjφ(x; θj) > πlφ(x; θl) for l 6= j).

¦ “Crisp” (0-1) approach:

i∈Rj

φ(xi; θj) ⇒ CEM-algorithm

(Rj indexes of the xi’s assigned to cluster j).

• Noise in real problems ⇒ Robust Clustering

• Two robust clustering approaches providing “theoretical well-basedclustering criterion in presence of outliers” (Bock 2002):

¦ Mixture modeling: The noise is fitted through mixture components(Fraley and Raftery, Peel and McLachlan,...)

¦ Trimming approach: A fraction α of most outlying data is trimmed.(Gallegos and Ritter, Cuesta-Albertos et al., Garcıa-Escudero et al.,Neykov et al.,...).

• Noise in real problems ⇒ Robust Clustering

• Two robust clustering approaches providing “theoretical well-basedclustering criterion in presence of outliers” (Bock 2002):

¦ Mixture modeling: The noise is fitted through mixture components(Fraley and Raftery, Peel and McLachlan,...)

¦ Trimming approach: A fraction α of most outlying data is trimmed.(Gallegos and Ritter, Cuesta-Albertos et al., Garcıa-Escudero et al.,Neykov et al.,...).

⇒ We will focus on the trimming approach!!

3.- TCLUST methodology

• Spurious-Outlier Model (Gallegos 2001 and Gallegos and Ritter 2005):

[ k∏

i∈Rj

f(xi; µj, Σj)][ ∏

i/∈R

gψi(xi)

¦ f(xi; µ, Σ) is a p-variate normal p.d.f..

¦ R = ∪kj=1Rj contains [n(1− α)] regular data.

¦ gψiare some p.d.f.’s for the non-regular data.

• If no conditions are possed on Σi’s ⇒ Not a well-defined problem.

• Restrictions are needed:

¦ Same spherical covariance matrices (i.e., Σj = σ2 · I) ⇒ Trimmedk-means (Cuesta-Albertos et al. 1997).

¦ Same (not necessarily spherical) covariance matrices (Σj = Σ) ⇒Determinantal criteria (Gallegos and Ritter 2005).

¦ Different covariances but with equal scales (|Σ1| = ... = |Σg|) ⇒Heterogeneous robust clustering (Gallegos 2001, 2003)

• A different constrain:

Mn = maxj=1,...,k

maxl=1,...,p

λl(Σj) and mn = minj=1,...,k

minl=1,...,p

λl(Σj),

where λl(Σj) are the eigenvalues of the Σj.

• Fix a constant c such that

Mn/mn ≤ c (Eigenvalues-ratio restriction).

¦ c controls the strength of the restriction:· c = 1 ⇒ Trimmed k-means.· Large c ⇒ An almost unrestricted solution.

• It extends Hathaway’s restrictions σ2i ≤ c · σ2

j for 1 ≤ i, j ≤ k.

• Weights: We consider group weights πj ∈ [0, 1].

• Trimming + Eigenvalue restrictions + Weights ⇒ TCLUST

(Garcıa-Escudero et al. (2008) Annals of Statistics, 36, 1324-1345)

¦ Existence of both theoretical and sample solutions.¦ Consistency.¦ Feasible algorithm.

3. Guidance in choosing k

• Many procedures for choosing k in “crisp” clustering are based onmonitoring the size of the (log-)“likelihoods”:

k 7→k∑

i∈Rj

log φ(xi; θj) for k = 1, 2, ... .

• Examples:

¦ Σj = σ2I (k-means) ⇒ Friedman and Rubin 1967, Engelman andHartigan 1969, Calinski and Harabasz 1974,...

¦ Σj = Σ (determinant criterium) ⇒ Marriot 1971,...

• Trimmed versions can be also considered⇒ Garcıa-Escudero et al 2003.

• Drawback: The log-likelihoods strictly increase when increasing k:

0 5 10

• Log-likelihoods: -4765.8 (k = 1) < -3530.1 (k = 2) < -3197.7 (k = 3)

• Figure:

Number of groups

Log−

likeli

−500

−400

• Solutions:

¦ Searching for an “elbow”.¦ Nonlinear transformation by Sugar and James 2003.

• The TCLUST does not suffer from this problem:

0 5 10

10k = 1

0 5 10

• Log-likelihoods: -4765.8 (k = 1) < -4203.2 (k = 2) = -4203.2 (k = 3)

• Recall the presence of weights (which can be set to zero) ⇒ π3 = 0 ink = 3 solution!.

• The TCLUST does not suffer from this problem:

0 5 10

10k = 1

0 5 10

• Log-likelihoods: -4765.8 (k = 1) < -4203.2 (k = 2) = -4203.2 (k = 3)

• This fact was already noticed by Bryant (1991) when dealing with theso-called Penalized Classification Maximum Likelihood...

• Importance of the “trimming” and the scatter constrain:

−10 0 5 10 15 20

−10−5

(a) k=3, alpha=0 and large c

−10 0 5 10 15 20−10

(b) k=2, alpha=0.1 and moderate c

⇒ “◦” are trimmed points in the figure on the right.

4.- Classification Trimmed Likelihood Curves

• Based on the TCLUST methodology, we monitor:

(α, k) 7→ L(α, k) :=k∑

nj log πj +k∑

i∈Rj

log φ(xi; θj),

when k = 1, 2, ... and α ∈ (0, 1).

• Smallest k such that L(α, k) ' L(α, k + 1) (for almost every α) ⇒ k isa good choice for the number of groups.

• L(α, k) increase very fast till α ≤ α0 ⇒ α0 is a good choice for thetrimming level.

• They provide information about the group sizes.

Example 1: Mixture 0.9 ·N(( 0

0 ) , ( 4 11 1 )

)+ 0.1 ·N(

( 5−5 ) , ( 30 00 30 )

0.0 0.1 0.2 0.3 0.4 0.5

−500

−100

0.0 0.1 0.2 0.3 0.4 0.5

−500

−100

0.0 0.1 0.2 0.3 0.4 0.5

−500

−100

−5 0 5 10 15 20

Unrestric

k = 2 vs. 1

0 0.1 0.2 0.3 0.4 0.5

k = 3 vs. 2

0 0.1 0.2 0.3 0.4 0.5

← L(α, k)

← L(α, k − 1)

Example 2: 0.3 · N((−5

5 ) , ( 4 11 1 )

)+ 0.3 · N

(( 5−5 ) ,

(4 −1−1 1

) )+ 0.3 ·

5 ) , ( 4 00 4 )

)+ 0.1 ·N(

( 00 ) , ( 30 0

0 30 ))

0.0 0.2 0.4 0.6

−600

−200

0.0 0.2 0.4 0.6−6

−400

0.0 0.2 0.4 0.6

−600

−200

0.0 0.2 0.4 0.6

−600

−200

−15 −5 5 15

Unrestric

k = 2 vs. 1

0 0.2 0.5

k = 3 vs. 2

0 0.2 0.50

k = 4 vs. 3

0 0.2 0.5

Example 3: “The topography of multivariate normal mixtures” (Ray andLindsay 2005) ⇒ Mixture with 2 components.

0.0 0.2 0.4 0.6

−250

−500

0.0 0.2 0.4 0.6

−250

−500

0.0 0.2 0.4 0.6

−250

−500

−3 −2 −1 0 1 2

Unrestric

k = 2 vs. 1

0 0.1 0.3 0.5 0.7

0k = 3 vs. 2

0 0.1 0.3 0.5 0.7

Mixture with 2 components with 3 modes:

−2 −1 0 1 2 3

−2−1

−2 −1 0 1 2 3

−2−1

alpha = 0.5

5.- Strength of cluster assignments

• Confirmatory tool: Were satisfactory the choices made for k and α?• The strength of the cluster assignment of observation xi to group j:

Dj(xi, θ) = πjφ(xi, θj)

• If D(1)(x, θ) ≤ ... ≤ D(k)(x, θ), define some Bayes factors as:

BF(i) = log(D(k−1)(xi; θ)/D(k)(xi; θ)

¦ Small BF(i) ⇒ Clear cluster assignment for the observation xi.

• Bayes factors for trimmed points: Given the maximum possiblestrength:

di = maxj=1,...,k

{πjφ(xi, θj)} = D(k)(xi, θ), we have:

¦ d(1) ≤ ... ≤ d([nα]) ≤ ... ≤ d(n) ⇒ [nα] observations to be trimmed.

¦ Bayes factors for trimmed data ⇒ BF(i) = log(D(k)(xi, θ)/d([nα])

¦ Small BF(i) ⇒ More clearly observation i should be trimmed.

• Graphical display I: “Silhouette” plot

−10 −5 0 5 10 15 20

(a) k= 3 and alpha= 0.1

−150 −100 −50 0

Bayes factor

• Graphical display II: “Most Doubtful assignments”¦ Label observations i’s with BF(i) ≥ log(3/4):

−10 −5 0 5 10 15 20

BF > log(3/4)

[PCA, discriminant or Bhattacharyya coordinates (Hennig and Christlieb 2002) if p > 2...]

EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED...

Documents

ExplorATory 2010

Similarity Adaptation in an Exploratory Retrieval Scenario · 2010. 7. 12. · Exploratory retrieval tools support users in search scenarios where the retrieval goal cannot be stated

2017-18 Middle School Exploratory Offerings Exploratory ... › acps › students › Documents › MS-Ex… · 2017-18 Middle School Exploratory Offerings Exploratory Grade Burley

Enhancing Exploratory Search with Hedonic Browsing Using Social Tagging Tools

Javier Vega Gordaliza - basqueecodesignmeeting2020.eus · 11 BASQUE ECODESIGN MEETING 2020: Productos de consumo PRODUCTO –DISEÑO OPTIMIZADO - ejemplos OPTIMIZACION DE GEOMETRIA:

An Exploratory Study of Automated Tools to Support SCRUM

Tools for the exploratory analysis of two-dimensional spatial point patterns

EXPLORATORY MAJOR PREVIEWS ORIENTATION...EXPLORATORY STUDIES CAREER COUNSELOR LAUREN HERMANN Explore majors and career paths Find tools to help you decide Create a resume or get yours

Exploratory interactive tools for spatial data analysis interactive tools for spatial data analysis Exploratory interactive tools for spatial data analysis Thibault LAURENT, Anne RUIZ-GAZEN

Spatial Tools for Econometric and Exploratory Analysis

Chapter 3 Exploratory Tools for Relationshipswild/ChanceEnc/restricted/student/... · Chapter 3 Exploratory Tools for Relationships Exercises for Section 3.1.1 1. ... Pride and Prejudice

Development of exploratory research tools based on TANDEM ... · This new implementation was applied to develop a set of new tools: temporally variable multi-aspect speech morphing

Exploratory testing

Exploratory Data Analysis EDA - Amazon S3€¢ Visual Analytics Tools • synthesize information • derive insights • understandable assessments • communicate effectively

ANÁLISIS EXPLORATORIO DEL EWOM MEDIANTE HERRAMIENTAS DE DATAbibliotecadigital.econ.uba.ar/download/rimf/rimf_v6_n2_04.pdf · EWOM EXPLORATORY ANALYSIS THROUGH DATA MINING TOOLS Nicolás

Qualitative interview-based research: An exploratory study ...essay.utwente.nl/79454/1/Althoff_BA_BMS.pdf · project management tools, agile methods, agile scaling techniques and

Exploratory interactive tools for spatial data analysis

Social bookmarking and exploratory searchSocial bookmarking tools also support exploratory search activities. In ex-ploratory search, the problem definition is less well structured

Science Comics as tools for science education and communication : a brief, exploratory study

Towards exploratory video search using linked data · Multimed Tools Appl DOI 10.1007/s11042-011-0733-1 Towards exploratory video search using linked data Jörg Waitelonis·Harald