BICO: BIRCH meets Coresets for k-meansls2- · Introduction Insights from BIRCH Coreset Theory BICO...

Preview:

Citation preview

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO: BIRCH meets Coresets for k -means

Hendrik Fichtenberger, Marc Gille, Melanie Schmidt, ChrisSchwiegelshohn, Christian Sohler

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

The k -means ProblemGiven a point set P ⊆ Rd ,compute a set C ⊆ Rd

with |C| = k centerswhich minimizescost(P,C)

=∑p∈P

minc∈C||c − p||2,

the sum of the squareddistances.

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

Popular k -means algorithms. . .Lloyd’s algorithm (1982)k -means++ (2007)several approximation algorithms (recent)

. . . for Big DataBIRCH (1996)MacQueen’s k -means (1967)several approximations using coresets (recent)

Implementability, Speed and good quality?

Stream-KM++ (2010)BICO

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

Popular k -means algorithms. . .Lloyd’s algorithm (1982)k -means++ (2007)several approximation algorithms (recent)

. . . for Big DataBIRCH (1996)MacQueen’s k -means (1967)several approximations using coresets (recent)

Implementability, Speed and good quality?

Stream-KM++ (2010)BICO

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

Popular k -means algorithms. . .Lloyd’s algorithm (1982)k -means++ (2007)several approximation algorithms (recent)

. . . for Big DataBIRCH (1996)MacQueen’s k -means (1967)several approximations using coresets (recent)

Implementability, Speed and good quality?

Stream-KM++ (2010)BICO

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

Popular k -means algorithms. . .Lloyd’s algorithm (1982) moderate speedk -means++ (2007) moderate speed & qualityseveral approximation algorithms (recent) quality

. . . for Big DataBIRCH (1996) fastMacQueen’s k -means (1967) fastseveral approximations using coresets (recent) quality

Implementability, Speed and good quality?

Stream-KM++ (2010) next slidesBICO next slides

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

Popular k -means algorithms. . .Lloyd’s algorithm (1982) moderate speedk -means++ (2007) moderate speed & qualityseveral approximation algorithms (recent) quality

. . . for Big DataBIRCH (1996) fastMacQueen’s k -means (1967) fastseveral approximations using coresets (recent) quality

Implementability, Speed and good quality?Stream-KM++ (2010) next slides

BICO next slides

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

Popular k -means algorithms. . .Lloyd’s algorithm (1982) moderate speedk -means++ (2007) moderate speed & qualityseveral approximation algorithms (recent) quality

. . . for Big DataBIRCH (1996) fastMacQueen’s k -means (1967) fastseveral approximations using coresets (recent) quality

Implementability, Speed and good quality?Stream-KM++ (2010) next slidesBICO next slides

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

Costs

3e+012

3.5e+012

4e+012

4.5e+012

5e+012

5.5e+012

6e+012

6.5e+012

7e+012

15 20 25 30

Cos

ts

Number of Centers

BigCross

StreamKM++BICO

MacQueenBIRCH

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Clustering Algorithms: Practice and Theory

Running Time

0

200

400

600

800

1000

1200

10 20 30 40 50

Run

ning

Tim

e

Number of Centers

Census

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

Ideastart with BIRCH for the basic design because it is very fastanalyze its flawsdevelop an improved algorithm based ontheoretical observations

Now: Description of BIRCH

WarningBIRCH has several phaseswe are only interested in the main phase(and a little in the rebuilding phase)

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

Ideastart with BIRCH for the basic design because it is very fastanalyze its flawsdevelop an improved algorithm based ontheoretical observations

Now: Description of BIRCH

WarningBIRCH has several phaseswe are only interested in the main phase(and a little in the rebuilding phase)

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

Ideastart with BIRCH for the basic design because it is very fastanalyze its flawsdevelop an improved algorithm based ontheoretical observations

Now: Description of BIRCH

WarningBIRCH has several phaseswe are only interested in the main phase(and a little in the rebuilding phase)

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCH

stores points in a treeeach node represents a subset of the input point setsubset is summarized by the number of points, the centroidof the set and the squared distances to the centroidall points in the same subset get the same center

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCHstores points in a tree

each node represents a subset of the input point setsubset is summarized by the number of points, the centroidof the set and the squared distances to the centroidall points in the same subset get the same center

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCHstores points in a treeeach node represents a subset of the input point set

subset is summarized by the number of points, the centroidof the set and the squared distances to the centroidall points in the same subset get the same center

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCHstores points in a treeeach node represents a subset of the input point setsubset is summarized by the number of points, the centroidof the set and the squared distances to the centroid

all points in the same subset get the same center

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCHstores points in a treeeach node represents a subset of the input point setsubset is summarized by the number of points, the centroidof the set and the squared distances to the centroidall points in the same subset get the same center

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCHnodes in the tree represent subsets of pointspoints at the same node get the same center

Insertion of a new pointWhen a new point p is added to the tree

BIRCH searches for the ‘closest’ node according to∑q∈(S∪{p})

(q − µ(S ∪ {p}))2 −∑

q∈S(q − µ(S))2

p is added to the node representing subset S∗ if∑q∈(S∗∪{p})

(q − µS)2/(|S∗|+ 1) ≤ T 2 for a given threshold T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCHnodes in the tree represent subsets of pointspoints at the same node get the same center

Insertion of a new pointWhen a new point p is added to the tree

BIRCH searches for the ‘closest’ node according to∑q∈(S∪{p})

(q − µ(S ∪ {p}))2 −∑

q∈S(q − µ(S))2

p is added to the node representing subset S∗ if∑q∈(S∗∪{p})

(q − µS)2/(|S∗|+ 1) ≤ T 2 for a given threshold T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCHnodes in the tree represent subsets of pointspoints at the same node get the same center

Insertion of a new pointWhen a new point p is added to the tree

BIRCH searches for the ‘closest’ node according to∑q∈(S∪{p})

(q − µ(S ∪ {p}))2 −∑

q∈S(q − µ(S))2

p is added to the node representing subset S∗ if∑q∈(S∗∪{p})

(q − µS)2/(|S∗|+ 1) ≤ T 2 for a given threshold T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

BIRCHnodes in the tree represent subsets of pointspoints at the same node get the same center

Insertion of a new pointWhen a new point p is added to the tree

BIRCH searches for the ‘closest’ node according to∑q∈(S∪{p})

(q − µ(S ∪ {p}))2 −∑

q∈S(q − µ(S))2

p is added to the node representing subset S∗ if∑q∈(S∗∪{p})

(q − µS)2/(|S∗|+ 1) ≤ T 2 for a given threshold T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

150 points drawn uniformly around (−0.5,0) and (0,0.5)75 points drawn uniformly from [−4,−2]× [4,2] as noiseCenters and partitions computed by BIRCH and BICO

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

150 points drawn uniformly around (−0.5,0) and (0,0.5)75 points drawn uniformly from [−4,−2]× [4,2] as noiseCenters and partitions computed by BIRCH and BICO

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

150 points drawn uniformly around (−0.5,0) and (0,0.5)75 points drawn uniformly from [−4,−2]× [4,2] as noiseCenters and partitions computed by BIRCH and BICO

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

Insights from BIRCHFast point by point updatesTree structure

Insertion decision should be improved

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

How BIRCH computes a summary of the data

Insights from BIRCHFast point by point updatesTree structure

Insertion decision should be improved

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

Coresetssmall summary of given datatypically of constant or polylogarithmic sizecan be used to approximate the cost of the original data

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

CoresetsGiven a set of points P, a weighted subset S ⊂ P is a(k , ε)-coreset if for all sets C ⊂ C of k centers it holds

|costw (S,C)− cost(P,C)| ≤ ε cost(P,C)

where costw (S,C) =∑

p∈S minc∈C w(p)(.p, c).

3 2

2

4

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

CoresetsGiven a set of points P, a weighted subset S ⊂ P is a(k , ε)-coreset if for all sets C ⊂ C of k centers it holds

|costw (S,C)− cost(P,C)| ≤ ε cost(P,C)

where costw (S,C) =∑

p∈S minc∈C w(p)(.p, c).

3 2

2

4

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

CoresetsGiven a set of points P, a weighted subset S ⊂ P is a(k , ε)-coreset if for all sets C ⊂ C of k centers it holds

|costw (S,C)− cost(P,C)| ≤ ε cost(P,C)

where costw (S,C) =∑

p∈S minc∈C w(p)(.p, c).

3 2

2

4

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

CoresetsGiven a set of points P, a weighted subset S ⊂ P is a(k , ε)-coreset if for all sets C ⊂ C of k centers it holds

|costw (S,C)− cost(P,C)| ≤ ε cost(P,C)

where costw (S,C) =∑

p∈S minc∈C w(p)(.p, c).

3 2

2

4

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

CoresetsGiven a set of points P, a weighted subset S ⊂ P is a(k , ε)-coreset if for all sets C ⊂ C of k centers it holds

|costw (S,C)− cost(P,C)| ≤ ε cost(P,C)

where costw (S,C) =∑

p∈S minc∈C w(p)(.p, c).

3 2

2

4

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

CoresetsGiven a set of points P, a weighted subset S ⊂ P is a(k , ε)-coreset if for all sets C ⊂ C of k centers it holds

|costw (S,C)− cost(P,C)| ≤ ε cost(P,C)

where costw (S,C) =∑

p∈S minc∈C w(p)(.p, c).

3 2

2

4

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

Merge & Reduceread data in blockscompute a coresetfor each block→ smerge coresets in atree fashion space s · log n

Runtime: No asymptotic increase, but overhead in practice

BIRCH uses point-wise updates :-)

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

Merge & Reduceread data in blockscompute a coresetfor each block→ smerge coresets in atree fashion space s · log n

Runtime: No asymptotic increase, but overhead in practice

BIRCH uses point-wise updates :-)

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

Insights from Coreset ThreoryLimit the induced error

Goal: Point set P ′ in each node should induce at mostε · cost(P ′,C) error (for an optimal solution C)

Base insertion decision on induced errorReplacing all points in a node by the (weighted) centroid islike moving all points to the centroidInduced error is connected to the 1-means cost of the set

Side noteAvoiding Merge & Reduce is a good idea

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

Coresets

Insights from Coreset ThreoryLimit the induced error

Goal: Point set P ′ in each node should induce at mostε · cost(P ′,C) error (for an optimal solution C)

Base insertion decision on induced errorReplacing all points in a node by the (weighted) centroid islike moving all points to the centroidInduced error is connected to the 1-means cost of the set

Side noteAvoiding Merge & Reduce is a good idea

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

BIRCHnodes in the tree represent subsets of pointspoints at the same node get the same centerimprove insertion decision

AdjustmentsNodes additionally have a reference point and a range

Closest is now determined by Euclidean distanceWe say a node is full with regard to a point p if adding p tothe node increases its 1-means cost above a threshold T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

BIRCHnodes in the tree represent subsets of pointspoints at the same node get the same centerimprove insertion decision

AdjustmentsNodes additionally have a reference point and a range

Closest is now determined by Euclidean distanceWe say a node is full with regard to a point p if adding p tothe node increases its 1-means cost above a threshold T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

BIRCHnodes in the tree represent subsets of pointspoints at the same node get the same centerimprove insertion decision

AdjustmentsNodes additionally have a reference point and a rangeClosest is now determined by Euclidean distance

We say a node is full with regard to a point p if adding p tothe node increases its 1-means cost above a threshold T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

BIRCHnodes in the tree represent subsets of pointspoints at the same node get the same centerimprove insertion decision

AdjustmentsNodes additionally have a reference point and a rangeClosest is now determined by Euclidean distanceWe say a node is full with regard to a point p if adding p tothe node increases its 1-means cost above a threshold T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

1. Find closest reference point2. If node is not in range3. Then create a new node4. Else add to node if possible5. If not, go one level down,6. And Find closest child, goto 2.

Threshold TRadius Ri

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

1. Find closest reference point2. If node is not in range3. Then create a new node4. Else add to node if possible5. If not, go one level down,6. And Find closest child, goto 2.

Threshold TRadius Ri

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

1. Find closest reference point2. If node is not in range3. Then create a new node4. Else add to node if possible5. If not, go one level down,6. And Find closest child, goto 2.

Threshold TRadius Ri

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

1. Find closest reference point2. If node is not in range3. Then create a new node4. Else add to node if possible5. If not, go one level down,6. And Find closest child, goto 2.

Threshold TRadius Ri

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

1. Find closest reference point2. If node is not in range3. Then create a new node4. Else add to node if possible5. If not, go one level down,6. And Find closest child, goto 2.

Threshold TRadius Ri

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

1. Find closest reference point2. If node is not in range3. Then create a new node4. Else add to node if possible5. If not, go one level down,6. And Find closest child, goto 2.

Threshold TRadius Ri

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Theorem

For T ≈ OPT/(k · log n · 8d · εd+2) and Ri :=√

T/(8 · 2i),the set of centroids weighted by the number of points in thesubset is a (1 + ε)-coresetfor constant d , the number of nodes is O(k · log n · ε−(d+2))

ProblemWe do not know OPT and thus cannot compute T !

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Theorem

For T ≈ OPT/(k · log n · 8d · εd+2) and Ri :=√

T/(8 · 2i),the set of centroids weighted by the number of points in thesubset is a (1 + ε)-coresetfor constant d , the number of nodes is O(k · log n · ε−(d+2))

ProblemWe do not know OPT and thus cannot compute T !

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Rebuilding algorithmDouble T when maximum number of nodes is reached‘Rebuild’ the tree according to new T

Let T ′ and R′i be before and T and Ri be after the doublingMove all nodes one level down and create empty first levelNotice that Ri =

√T/(8 · 2i) =

√T ′/8 · 2i−1 = R′i−1

⇒ Radius doesn’t change!

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Rebuilding algorithmDouble T when maximum number of nodes is reached‘Rebuild’ the tree according to new T

Let T ′ and R′i be before and T and Ri be after the doublingMove all nodes one level down and create empty first levelNotice that Ri =

√T/(8 · 2i) =

√T ′/8 · 2i−1 = R′i−1

⇒ Radius doesn’t change!

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

For every node in the tree1. Find ref. point one level higher2. If none in range, move node

and children one level up3. If it is in range, check T4. If sum is smaller than T ,

merge nodes6. Else make it a child

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Merging might result in violations of the range of nodes

Increases the coreset size by a constant factorBut does not destroy the coreset property

⇒ BICO computes a coreset in the data stream setting ¨

. . . if we compute lower bound on T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Merging might result in violations of the range of nodes

Increases the coreset size by a constant factorBut does not destroy the coreset property

⇒ BICO computes a coreset in the data stream setting ¨

. . . if we compute lower bound on T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Merging might result in violations of the range of nodesIncreases the coreset size by a constant factor

But does not destroy the coreset property

⇒ BICO computes a coreset in the data stream setting ¨

. . . if we compute lower bound on T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Merging might result in violations of the range of nodesIncreases the coreset size by a constant factorBut does not destroy the coreset property

⇒ BICO computes a coreset in the data stream setting ¨

. . . if we compute lower bound on T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Merging might result in violations of the range of nodesIncreases the coreset size by a constant factorBut does not destroy the coreset property

⇒ BICO computes a coreset in the data stream setting ¨

. . . if we compute lower bound on T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH meets Coresets

Merging might result in violations of the range of nodesIncreases the coreset size by a constant factorBut does not destroy the coreset property

⇒ BICO computes a coreset in the data stream setting ¨

. . . if we compute lower bound on T

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

The actual solution is computed with k -means++.

Adjustments

Add heuristic speed-up to find closest reference point

Speed-upProject all ref. points to d random 1-dim. subspacesProject new point p to the same subspacesCount how many ref. points are in range of p in everysubspaceSearch nearest neighbor in the shortest list

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

The actual solution is computed with k -means++.

AdjustmentsSet coreset size to 200k

Add heuristic speed-up to find closest reference point

Speed-upProject all ref. points to d random 1-dim. subspacesProject new point p to the same subspacesCount how many ref. points are in range of p in everysubspaceSearch nearest neighbor in the shortest list

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

The actual solution is computed with k -means++.

AdjustmentsSet coreset size to 200kAdd heuristic speed-up to find closest reference point

Speed-upProject all ref. points to d random 1-dim. subspacesProject new point p to the same subspacesCount how many ref. points are in range of p in everysubspaceSearch nearest neighbor in the shortest list

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

The actual solution is computed with k -means++.

AdjustmentsSet coreset size to 200kAdd heuristic speed-up to find closest reference point

Speed-upProject all ref. points to d random 1-dim. subspacesProject new point p to the same subspacesCount how many ref. points are in range of p in everysubspaceSearch nearest neighbor in the shortest list

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

The actual solution is computed with k -means++.

AdjustmentsSet coreset size to 200kAdd heuristic speed-up to find closest reference point

Speed-upProject all ref. points to d random 1-dim. subspacesProject new point p to the same subspacesCount how many ref. points are in range of p in everysubspaceSearch nearest neighbor in the shortest list

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Data SetsData Sets used in StreamKM++ paper from UCI repository:Tower, CoverType, Census and BigCross (cross product)CalTech128 by Rene Grzeszick, group of Prof. Finkconsists of 128 SIFT descriptors of an object database

Data Set SizesBigCross CalTech128 Census CoverType Tower

n 11620300 3168383 2458285 581012 4915200d 57 128 68 55 3

n · d 662357100 405553024 167163380 31955660 14745600

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Data SetsData Sets used in StreamKM++ paper from UCI repository:Tower, CoverType, Census and BigCross (cross product)CalTech128 by Rene Grzeszick, group of Prof. Finkconsists of 128 SIFT descriptors of an object database

Data Set SizesBigCross CalTech128 Census CoverType Tower

n 11620300 3168383 2458285 581012 4915200d 57 128 68 55 3

n · d 662357100 405553024 167163380 31955660 14745600

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

ImplementationsAuthor’s implementations for StreamKM++ and BIRCHimplementation for MacQueen’s k -means fromESMERALDA (framework by group of Prof. Fink)

ExperimentsExperiments done on mud1-6 and mud8100 runs for every test instancevalues shown in the diagrams are mean values

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

ImplementationsAuthor’s implementations for StreamKM++ and BIRCHimplementation for MacQueen’s k -means fromESMERALDA (framework by group of Prof. Fink)

ExperimentsExperiments done on mud1-6 and mud8100 runs for every test instancevalues shown in the diagrams are mean values

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Costs

3e+012

3.5e+012

4e+012

4.5e+012

5e+012

5.5e+012

6e+012

6.5e+012

7e+012

15 20 25 30

Cos

ts

Number of Centers

BigCross

StreamKM++BICO

MacQueenBIRCH

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Costs

1e+011

1.5e+011

2e+011

2.5e+011

3e+011

3.5e+011

4e+011

4.5e+011

5e+011

5.5e+011

6e+011

10 20 30 40 50

Cos

ts

Number of Centers

Covertype

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Costs

1e+012

1.02e+012

1.04e+012

1.06e+012

1.08e+012

1.1e+012

1.12e+012

1.14e+012

1.16e+012

250

Costs

Number of Centers

BigCross

StreamKM++BICO

MacQueenBIRCH

1e+011

1.5e+011

2e+011

2.5e+011

3e+011

250

Number of Centers

CalTech

1e+011

1.2e+011

1.4e+011

1.6e+011

1.8e+011

2e+011

2.2e+011

2.4e+011

2.6e+011

1000

Number of Centers

CalTech

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Running time

0

200

400

600

800

1000

1200

10 20 30 40 50

Run

ning

Tim

e

Number of Centers

Census

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Running time

0

1000

2000

3000

4000

5000

6000

7000

250

Ru

nn

ing

Tim

e

Number of Centers

Census

StreamKM++

BICOBICO KMeans++

MacQueen

BIRCH

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

1000

Number of Centers

Census

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Running time

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

250

Number of Centers

CalTech

0

100000

200000

300000

400000

500000

600000

1000

Number of Centers

CalTech

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Running time

0

2000

4000

6000

8000

10000

12000

14000

250

Ru

nn

ing

Tim

e

Number of Centers

BigCross

StreamKM++

BICOBICO KMeans++

MacQueen

BIRCH

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

1000

Number of Centers

BigCross

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

Trade-OffBIRCH MQ m = 200k 100k 25k

Time 616 4241 20271 5444 618Costs 58 · 1010 72 · 1010 51 · 1010 52 ·1010 55 · 1010

Tests run on on BigCross with k = 1000BICO is less than 1.5 times slower than MacQueen withm = 100k while still computing reasonable costsfaster than BIRCH for m = 25k , still much better cost thanBIRCH and MacQueen

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BICO is cool :-)

ZieleImplementierung in bekanntem Framework / Anbindung

Baustein fur andere AlgorithmenVergleich verschiedener Strategien fur Nearest Neighbor

BICO: BIRCH meets Coresets for k -means

Introduction Insights from BIRCH Coreset Theory BICO Experiments End

BIRCH MQ m = 200k 100k 25kTime 616 4241 20271 5444 618Costs 58 · 1010 72 · 1010 51 · 1010 52 ·1010 55 · 1010

Thank you for your attention!

BICO: BIRCH meets Coresets for k -means

Recommended