Yinyang K-Means: A Drop-In Replacement of the Classic K ... › f4b9 › ab75539ffddad031134c596… · Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means

with Consistent Speedup

Yufei Ding, Yue Zhao, Xipeng Shen

Madanlal Musuvathi, Todd Mytkowicz

North Carolina State University, USA Microsoft Research at Redmond, USA

K-Means

Popular Clustering Algorithm: K-Means

2

One of the most popular algorithms for clustering.

Decades of usage in various domains since proposed by Lloyd in 1957.

Classic Algorithm [by Lloyd]

3

Update centers w/ new centroids

Assign points to clusters based on d(p, c) ∀p,c

Convergence

Set initial centersGroup N points into K clusters

Each assignment step calculates N*K distances.cause slow clustering for large problems

4

Group N points into K clusters

Each assignment step calculates N*K distances.cause slow clustering for large problems

Are all distance calculations necessary?

If c is not the closest center to p, it’s safe to skip computing d(p, c).

Prior Work

5

• K-D Tree [Kanungo et al., 2002]

• Slowdowns for high dimensions

• Incremental Optimization

• [Elkan, 2003; Drake & Hamerly, 2012; Hamerly, 2010]

• Use triangle inequality to filter out the unnecessary

• Many times of memory space cost

• Slowdowns in some cases (medium dim, large K & N)

• Approximation [Wang et al., 2012]

• Different results

• Unable to inherit the level of trust

Lloyd’s K-‐Means remains the dominant choice in practice!

Goal of This Work

6

• Speed: Much faster in all settings (N, K, D)

• Trust: Same results as Lloyd’s K-Means produces

• Simplicity: Easy to implement and deploy

Develop a drop-‐in replacement of Lloyd’s K-‐Means.

Overview of Yinyang K-Means

7

• Carefully maintain and leverage lower bounds & upper bounds

• A 3-‐level filter

Minimize # of distance calculations:

YangYin

• Space-‐conscious elastic designA harmony w/ contrary forces

On average, 9.36X faster than classic K-Means. No slowdown regardless of N, K, dim.

Triangular Inequality

8

• Fundamental tool for getting bounds

cx

c’

d(x,c’) - d(c’,c) ≤ d(x,c) ≤ d(x,c’) + d(c’,c)

x: data point C’: center in iteration i C：center in iteration i+1

lower bound: lb(x, c) upper bound: ub(x, c)

Assignment Step

9

• Three levels of filters

• Overhead V.S. Benefits

Global filterGroup filter

Local filter

10

Global Filtering• One single check to decide whether a point is possible to

change its center assignment.

10

c1

c’1 c2

c’2

c’3

c3

c4

c’4

x

G

G’ = {C’1,C’2, …, C’k} - C’1

G = {C1,C2, …, Ck} - C1

Prior iter:

Question: Is c1 the closest center for x in this iteration?

This iter:

11

Global Filtering

11

c1

c’1 c2

c’2

c’3

c3

c4

c’4

x

G

G’ = {C’1,C’2, …, C’k} - C’1

G = {C1,C2, …, Ck} - C1

Prior iter:

This iter:

• Compute ub(x,c1) and lb(x, G) ub(x,c1) = ub(x, c’1) + 𝛥c1

lb(x, G) = lb(x,G’) - maxdelta

• 𝛥c1 = d(c,c’) maxdelta = Max (𝛥(c)) ∀c

Principle1: if ub(x,c1) ≤ lb(x, G);

then x does not change assignment.

• One single check to decide whether a point is possible to change its center assignment.

Orig time: O(n•k•d)

12

Global Filtering

12

Overhead:Space Cost: O(n) to store boundsTime Cost: O(k•d + n) to compute center shifts and bounds

ub(x,c1) = ub(x, c’1) + 𝛥c1 lb(x, G) ≤ lb(x,G’) -‐ maxdelta

Benefits: ~60% redundant distance computation can be removed• Limiting factor: maxdelta = Max (𝛥(c)) ∀c

Orig space: O(n•d)

13

Group Filtering

13

• Rules to compute ub(x,c1) and lb(x, G) ub(x,c1) = ub(x, c’1) + 𝛥c1

lb(x, Gi) ≤ lb(x,G’i) - maxdelta(Gi)

• 𝛥c1 = d(c1,c’1) maxdelta(Gi) = Max (𝛥(c)) ∀c ∊ Gi

• Group filtering:

Principle1I: if ub(x,c1) ≤ lb(x, Gi);

then no center in Gi can be closest to x.

Grouping centers into m

groups: {G1,G2, …, Gm}

c’3

c3G2

c1

c’1c5

c’5

c4

c’4

x

G1

c2 c2

G3

14

Group Filtering

14

• How to do the grouping?

Partial K-Means on initial centers (one time preprocessing)

Overhead: O(k•m•d•iter) (we set iter to 5)

c’3

c3G2

c1

c’1c5

c’5

c4

c’4

x

G1

c2 c2

G3

Choosing m—Elasticity: • m = k/10 if space allows • max value otherwise

Orig time: O(n•k•d)

15

Group Filtering

15

• Efficiency and overhead

Overhead:Space Cost: O(n•m) to maintain bounds across iterationTime Cost: O(k•d + n•m) to compute center shifts and bounds

ub(x,c1) = ub(x, c’1) + 𝛥(c1) lb(x, Gi) ≤ lb(x,Gi) -‐ maxdelta(Gi)

Efficiency: ~ 80% redundant distance computation can be removed

Orig space: O(n•d)

16

Local Filtering

16

c’3

c3G2

c1

c’1c5

c’5

c4

c’4

x

G1

c2 c2

G3

Check each remaining center, and skip c if min2(x) ≤ lb(x, Gi) - 𝛥cj

min2(x): the so-far 2nd shortest distance from x.

No extra space cost!

Update Step

17

Assign points to clusters based on d(p, c) ∀p,c

Update centers w/ new centroids

Convergence

Set initial centers

See paper.

18

Evaluation• Compared to three other methods:

• Classic (Lloyd’s) K-Means

• Elkan’s K-Means [2003]

• Drake’s K-Means [2012]

• Input: real-world data sets (with different N, K, Dim)

• 4 from UCI machine learning repository [Bache Lichman, 2013]

• 4 other commonly used image data sets [Wang et al., 2012].

• Implemented in GraphLab (a map-reduce framework)

• Two machines

• 16GB memory, 8-core i7-3770K processor

• 4GB memory, 4-core Core2 CPU

19

Baseline: Classic K-‐means(16GB, 8-‐core Intel Ivy Bridge)

Speedu

p (X)

20

Speedu

p (X)

Baseline: Classic K-‐means

Speedu

p (X)

(16GB, 8-‐core)

XX

21

16GBm

Please read our paper for details!

Final Takeaways• Yinyang K-Means: A Drop-In Replacement

• Consistently much faster

• Minimize distance calculations (via a Yinyang harmony)

• Superior cost-benefit tradeoff

• Elastic design

• Inherited trust

• Same results as classic K-Means gives

Code: http://research.csc.ncsu.edu/nc-caps/yykmeans.tar.bz2

K-Means

Documents

Yinyang K-Means: A Drop-In Replacement of the Classic K ... › f4b9 › ab75539ffddad031134c596… · Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent