Upload
betty-parks
View
261
Download
10
Embed Size (px)
Citation preview
An Impossibility Theorem for Clustering
By Jon Kleinberg
Definitions Clustering function: operates on a set S of
more than 2 points and the distances among them
where is a partition of S Distance function:
the distance is 0 only for d(i,i) Does not require the triangle inequality.
RSSd :
),( dSf
Many different clustering criteria
k-center k-median k-means Inter-Intra etc
k-Center
Minimize maximum distance
k-median
Minimize average distance
k-means: minimize distance squared
Inter-Intra
T(C)
D(C)
Maximize D(C) – T(C)
Motivation
Each criterion optimizes different features
Is there one clustering criterion with phenomenal cosmic powers?
Method
Give three intuitive axioms that any criterion should satisfy
Surprise: Not possible to satisfy all three
Reminiscent of Arrow’s Impossibility theorem: ranking is impossible
Axiom 1 – Scale-Invariance For any distance function d and any β >0 we have
that f(S,d)=f(S,βd)
Axiom 2 - Richness Range(f) is equal to all partitions of S
i.e. All possible clusterings can be generated given the right distances
Axiom 3 - Consistency Let d and d’ be two distance functions. If
f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=
d(i,j)
d(i,j)d’(i,j)
d’(i,j)
Definition
Anti-chain: A collection of partitions is an anti-chain if it does not contain two distinct partitions such that one is a refinement of the other
Anti-Chains can not satisfy Richness
Main Result For each , there is no clustering
function f that satisfies Scale-Invariance, Richness and Consistency
Implied by proof that if f satisfies Scale-Invariance and Consistency, then Range(f) is an anti-chain
2n
Reminder of Axioms Scale-Invariance: For any distance
function d and any β >0 we have that f(d)=f(β d)
Richness: Range(f) is equal to all partitions of S
Consistency: Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=
Single Linkage
Cluster by combining the closest points
0 1 4 9 10 12 15 19 20
Any two axioms For every pair of axioms, there is a
stopping condition for single linkage
Consistency + Richness: only link if distance is less than r
Consistency + SI: stop when you have k connected components
Richness + SI: if x is the diameter of the graph, only add edges with weight βx
Centroid-Based Clustering (k,g)-centroid clustering function: Choose
T, a set of k centroid points such that is minimized
If g is identity, we get k-median, etc.
Result: For every and every function g and n significantly larger than k the (k,g)-centroid clustering function does not satisfy consistency.
)),(( TidgSi
2k
Proof: A contradiction
r
r+δ
ε
X (size m)Y (size λm)
)()()),(( mgrmgTidg
A new distance function
r’r+δ
ε
Y (size λm)
)()'()),(( rmgrmgTidg
X0 (size m/2)
r’
r
r+δ
X1 (size m/2)
r’ < r
Wrapping Up If we pick λ, r, r’, ε and δ right then we can
have:
But then our new centers are in X0 and X1
But our new distance followed consistency, so it should give us X and Y.
This covers the case where k is 2.
)()'()()( rmgrmgmgrmg
Discussion: Relaxing Axioms Refinement-consistency: if d’ is an f(d)-
transformation of d, then f(d’) is a refinement of f(d) Near-Richness: all partitions except the trivial
one can be obtained
These together allow a function that satisfies these replacements.
What other relaxations could we have?
Discussion Does this mean there is a law of continuous
employment for clustering criterion creators?
Is the clustering function properly defined? Allow overlaps Allow outliers
Are these the right axioms? All partitions possible vs. power set
Axioms for graph clustering?
Questions?