Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Yinyang K-Means: A Drop-In Replacement of the Classic K-Means
with Consistent Speedup
Yufei Ding, Yue Zhao, Xipeng Shen
Madanlal Musuvathi, Todd Mytkowicz
North Carolina State University, USA Microsoft Research at Redmond, USA
K-Means
Popular Clustering Algorithm: K-Means
2
One of the most popular algorithms for clustering.
Decades of usage in various domains since proposed by Lloyd in 1957.
Classic Algorithm [by Lloyd]
3
Update centers w/ new centroids
Assign points to clusters based on d(p, c) ∀p,c
Convergence
Set initial centersGroup N points into K clusters
Each assignment step calculates N*K distances.cause slow clustering for large problems
4
Group N points into K clusters
Each assignment step calculates N*K distances.cause slow clustering for large problems
Are all distance calculations necessary?
If c is not the closest center to p, it’s safe to skip computing d(p, c).
Prior Work
5
• K-D Tree [Kanungo et al., 2002]
• Slowdowns for high dimensions
• Incremental Optimization
• [Elkan, 2003; Drake & Hamerly, 2012; Hamerly, 2010]
• Use triangle inequality to filter out the unnecessary
• Many times of memory space cost
• Slowdowns in some cases (medium dim, large K & N)
• Approximation [Wang et al., 2012]
• Different results
• Unable to inherit the level of trust
Lloyd’s K-‐Means remains the dominant choice in practice!
Goal of This Work
6
• Speed: Much faster in all settings (N, K, D)
• Trust: Same results as Lloyd’s K-Means produces
• Simplicity: Easy to implement and deploy
Develop a drop-‐in replacement of Lloyd’s K-‐Means.
Overview of Yinyang K-Means
7
• Carefully maintain and leverage lower bounds & upper bounds
• A 3-‐level filter
Minimize # of distance calculations:
YangYin
• Space-‐conscious elastic designA harmony w/ contrary forces
On average, 9.36X faster than classic K-Means. No slowdown regardless of N, K, dim.
Triangular Inequality
8
• Fundamental tool for getting bounds
cx
c’
d(x,c’) - d(c’,c) ≤ d(x,c) ≤ d(x,c’) + d(c’,c)
x: data point C’: center in iteration i C:center in iteration i+1
lower bound: lb(x, c) upper bound: ub(x, c)
Assignment Step
9
• Three levels of filters
• Overhead V.S. Benefits
Global filterGroup filter
Local filter
10
Global Filtering• One single check to decide whether a point is possible to
change its center assignment.
10
c1
c’1 c2
c’2
c’3
c3
c4
c’4
x
G
G’ = {C’1,C’2, …, C’k} - C’1
G = {C1,C2, …, Ck} - C1
Prior iter:
Question: Is c1 the closest center for x in this iteration?
This iter:
11
Global Filtering
11
c1
c’1 c2
c’2
c’3
c3
c4
c’4
x
G
G’ = {C’1,C’2, …, C’k} - C’1
G = {C1,C2, …, Ck} - C1
Prior iter:
This iter:
• Compute ub(x,c1) and lb(x, G) ub(x,c1) = ub(x, c’1) + 𝛥c1
lb(x, G) = lb(x,G’) - maxdelta
• 𝛥c1 = d(c,c’) maxdelta = Max (𝛥(c)) ∀c
Principle1: if ub(x,c1) ≤ lb(x, G);
then x does not change assignment.
• One single check to decide whether a point is possible to change its center assignment.
Orig time: O(n•k•d)
12
Global Filtering
12
Overhead:Space Cost: O(n) to store boundsTime Cost: O(k•d + n) to compute center shifts and bounds
ub(x,c1) = ub(x, c’1) + 𝛥c1 lb(x, G) ≤ lb(x,G’) -‐ maxdelta
Benefits: ~60% redundant distance computation can be removed• Limiting factor: maxdelta = Max (𝛥(c)) ∀c
Orig space: O(n•d)
13
Group Filtering
13
• Rules to compute ub(x,c1) and lb(x, G) ub(x,c1) = ub(x, c’1) + 𝛥c1
lb(x, Gi) ≤ lb(x,G’i) - maxdelta(Gi)
• 𝛥c1 = d(c1,c’1) maxdelta(Gi) = Max (𝛥(c)) ∀c ∊ Gi
• Group filtering:
Principle1I: if ub(x,c1) ≤ lb(x, Gi);
then no center in Gi can be closest to x.
Grouping centers into m
groups: {G1,G2, …, Gm}
c’3
c3G2
c1
c’1c5
c’5
c4
c’4
x
G1
c2 c2
G3
14
Group Filtering
14
• How to do the grouping?
Partial K-Means on initial centers (one time preprocessing)
Overhead: O(k•m•d•iter) (we set iter to 5)
c’3
c3G2
c1
c’1c5
c’5
c4
c’4
x
G1
c2 c2
G3
Choosing m—Elasticity: • m = k/10 if space allows • max value otherwise
Orig time: O(n•k•d)
15
Group Filtering
15
• Efficiency and overhead
Overhead:Space Cost: O(n•m) to maintain bounds across iterationTime Cost: O(k•d + n•m) to compute center shifts and bounds
ub(x,c1) = ub(x, c’1) + 𝛥(c1) lb(x, Gi) ≤ lb(x,Gi) -‐ maxdelta(Gi)
Efficiency: ~ 80% redundant distance computation can be removed
Orig space: O(n•d)
16
Local Filtering
16
c’3
c3G2
c1
c’1c5
c’5
c4
c’4
x
G1
c2 c2
G3
Check each remaining center, and skip c if min2(x) ≤ lb(x, Gi) - 𝛥cj
min2(x): the so-far 2nd shortest distance from x.
No extra space cost!
Update Step
17
Assign points to clusters based on d(p, c) ∀p,c
Update centers w/ new centroids
Convergence
Set initial centers
See paper.
18
Evaluation• Compared to three other methods:
• Classic (Lloyd’s) K-Means
• Elkan’s K-Means [2003]
• Drake’s K-Means [2012]
• Input: real-world data sets (with different N, K, Dim)
• 4 from UCI machine learning repository [Bache Lichman, 2013]
• 4 other commonly used image data sets [Wang et al., 2012].
• Implemented in GraphLab (a map-reduce framework)
• Two machines
• 16GB memory, 8-core i7-3770K processor
• 4GB memory, 4-core Core2 CPU
19
Baseline: Classic K-‐means(16GB, 8-‐core Intel Ivy Bridge)
Speedu
p (X)
20
Speedu
p (X)
Baseline: Classic K-‐means
Speedu
p (X)
(16GB, 8-‐core)
XX
21
16GBm
Please read our paper for details!
Final Takeaways• Yinyang K-Means: A Drop-In Replacement
• Consistently much faster
• Minimize distance calculations (via a Yinyang harmony)
• Superior cost-benefit tradeoff
• Elastic design
• Inherited trust
• Same results as classic K-Means gives
Code: http://research.csc.ncsu.edu/nc-caps/yykmeans.tar.bz2
K-Means