View
223
Download
2
Category
Tags:
Preview:
Citation preview
Reza Bosagh Zadeh (Carnegie Mellon)
Shai Ben-David (Waterloo)
UAI 09, Montréal, June 2009
A UNIQUENESS THEOREMFOR CLUSTERING
TALK OUTLINE
Questions being addressed
Introduce Axioms
Introduce Properties
Uniqueness Theorem for Single-Linkage
Taxonomy of Partitioning Functions
Bosagh Zadeh, Ben-David, UAI 2009
WHAT IS CLUSTERING?
Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.
40 45 50 55
74
76
78
80
82
84
Bosagh Zadeh, Ben-David, UAI 2009
SOME BASIC UNANSWERED QUESTIONS
Are there principles governing all clustering paradigms?
Which clustering paradigm should I use for a given task?
Bosagh Zadeh, Ben-David, UAI 2009
“Clustering” is an ill-defined problem
There are many different clustering tasks, leading to different clustering paradigms:
THERE ARE MANY CLUSTERING TASKS
Bosagh Zadeh, Ben-David, UAI 2009
“Clustering” is an ill-defined problem
There are many different clustering tasks, leading to different clustering paradigms:
THERE ARE MANY CLUSTERING TASKS
Bosagh Zadeh, Ben-David, UAI 2009
WE WOULD LIKE TO DISCUSS THE BROAD NOTION OF CLUSTERING
Independently of any particular algorithm, particular objective function, or particular generative data model
Bosagh Zadeh, Ben-David, UAI 2009
WHAT FOR?
Choosing a suitable algorithm for a given task.
Axioms: to capture intuition about clustering in general.
Expected to be satisfied by all clustering functions
Properties: to capture differences between different clustering paradigms
Bosagh Zadeh, Ben-David, UAI 2009
TIMELINE
Jardine, Sibson 1971 Considered only hierarchical functions
Kleinberg 2003 Presented an impossibility result
This paper Presents a uniqueness result for Single-Linkage
Bosagh Zadeh, Ben-David, UAI 2009
THE BASIC SETTING
For a finite domain set SS, a distance function is a symmetric mapping
d:SxSd:SxS → RR++ such that d(x,y)=0d(x,y)=0 iff x=yx=y.
A partitioning function takes a dissimilarity function on SS and returns a partition of SS.
We wish to define the axioms that distinguish clustering functions, from any other functions that output domain partitions.
Bosagh Zadeh, Ben-David, UAI 2009
KLEINBERG’S AXIOMS Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive
λλ.
RichnessThe range of F(d)F(d) over all dd is the set of all possible partitionings
ConsistencyIf d’d’ equals dd except for shrinking
distances within clusters of F(d)F(d) or stretching between-cluster distances,
then F(d) = F(d’).F(d) = F(d’).
Bosagh Zadeh, Ben-David, UAI 2009
KLEINBERG’S AXIOMS Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive
λλ.
RichnessThe range of F(d)F(d) over all dd is the set of all possible partitionings
ConsistencyIf d’d’ equals dd except for shrinking
distances within clusters of F(d)F(d) or stretching between-cluster distances,
then F(d) = F(d’).F(d) = F(d’).Inconsistent! No algorithm can satisfy all 3 of these.
Bosagh Zadeh, Ben-David, UAI 2009
CONSISTENT AXIOMS: Scale Invariance F(F(λλd, k)=F(d, k)d, k)=F(d, k) for all d d and all strictly positive
λλ.
k-RichnessThe range of F(d, k)F(d, k) over all dd is the set of all possible k-k-partitionings
ConsistencyIf d’d’ equals dd except for shrinking distances
within clusters of F(d, k)F(d, k) or stretching between-cluster distances,
then F(d, k)=F(d’, k).F(d, k)=F(d’, k).Consistent! (And satisfied by Single-Linkage, Min-Sum, …)
Fix kk
Bosagh Zadeh, Ben-David, UAI 2009
Definition. Call any partitioning function which satisfies
a Clustering Function
Scale Invariancek-RichnessConsistency
CLUSTERING FUNCTIONS
Bosagh Zadeh, Ben-David, UAI 2009
TWO CLUSTERING FUNCTIONS
Single-Linkage
1. Start with with all points in their own cluster
2. While there are more than k clusters Merge the two most
similar clusters
Similarity between two clusters is the similarity of the most similar two
points from differing clusters
Min-Sum k-Clustering
Find the k-partitioning Γ which minimizes
(Is NP-Hard to optimize)
Scale Invariancek-RichnessConsistency
Both Functions satisfy:
Hierarchical
Not Hierarchical
Proofs in paper.
Bosagh Zadeh, Ben-David, UAI 2009
CLUSTERING FUNCTIONS
Single-Linkage and Min-Sum are both Clustering functions.
How to distinguish between them in an Axiomatic framework? Use Properties
Not all properties are desired in every clustering situation: pick and choose properties for your task
Bosagh Zadeh, Ben-David, UAI 2009
PROPERTIES - ORDER-CONSISTENCY
Order-ConsistencyIf two datasets dd and d’d’ have the same
ordering of the distances, then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)
Bosagh Zadeh, Ben-David, UAI 2009
o In other words the clustering function only cares about whether a pair of points are closer/further than another pair of points.
o Satisfied by Single-Linkage, Max-Linkage, Average-Linkage…
o NOT satisfied by most objective functions (Min-Sum, k-means, …)
PATH-DISTANCE
In other words, we find the path from x to y, which has the smallest longest jump in it.
1 2
7
14
e.g.
Pd( , ) = 2
Since the path from above has a jump of distance 2
Undrawn edges are large
Bosagh Zadeh, Ben-David, UAI 2009
PATH-DISTANCE
Imagine each point is an island, and we would like to go from island a to island b. As if we’re trying to cross a river by jumping on rocks.
Being human, we are restricted in how far we can jump from island to island.
Path-Distance would have us find the path with the smallest longest jump, ensuring that we could complete all the jumps successfully.
Bosagh Zadeh, Ben-David, UAI 2009
PROPERTIES – PATH-DISTANCE COHERENCE
Path-Distance CoherenceIf two datasets dd and d’d’ have the same
induced path distance then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)
Bosagh Zadeh, Ben-David, UAI 2009
UNIQUENESS THEOREM
Theorem (This work) Single-Linkage is the only clustering function
satisfying Order-Consistency and Path Distance-Coherence
Bosagh Zadeh, Ben-David, UAI 2009
UNIQUENESS THEOREM
Theorem (This work) Single-Linkage is the only clustering function satisfying
Order-Consistency and Path-Distance-Coherence
Is Path-Distance-Coherence doing all the work? No. Consistency is necessary for uniqueness k-Richness is necessary
“X is Necessary”: All other axioms/properties satisfied, just X missing, still not enough to get uniqueness
Bosagh Zadeh, Ben-David, UAI 2009
PRACTICAL CONSIDERATIONS
Single-Linkage is not always the right function to use. Because Path-Distance-Coherence is not always
desirable.
It’s not always immediately obvious when we want a function to focus on the Path Distance
Introduce a different formulation involving Minimum Spanning Trees
Bosagh Zadeh, Ben-David, UAI 2009
2020
PROPERTIES - MST-COHERENCE
F
If
Then
MST-CoherenceIf two datasets dd and d’d’ have the same
Minimum Spanning Tree then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)
, 2
F, 220
20 , 2
Bosagh Zadeh, Ben-David, UAI 2009
A TAXONOMY OF CLUSTERING FUNCTIONS
Min-Sum satisfies neither MST-Coherence nor Order-Consistency
Future work: Characterize other clustering functions
Bosagh Zadeh, Ben-David, UAI 2009
THANKS FOR YOUR ATTENTION!
Bosagh Zadeh, Ben-David, UAI 2009
Recommended