15
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Modified global k-means algorithm for minimum sum-of-squares clustering problems Pattern Recognition (PR, 2008) Presenter : Lin, Shu-Han Authors : Adil M. Bagirov

Modified global k-means algorithm for minimum sum-of-squares clustering problems

  • Upload
    dana

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

Modified global k-means algorithm for minimum sum-of-squares clustering problems. Presenter : Lin, Shu -Han Authors : Adil M. Bagirov. Pattern Recognition (PR, 2008). Outline. Motivation Objective Methodology Experiments Conclusion Comments. Motivation. k- Means algorithm - PowerPoint PPT Presentation

Citation preview

Page 1: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Modified global k-means algorithm for

minimum sum-of-squares clustering problems

Pattern Recognition (PR, 2008)

Presenter : Lin, Shu-Han

Authors : Adil M. Bagirov

Page 2: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments

Page 3: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

k-Means algorithm sensitive to the choice of starting points

inefficient for solving clustering problems in large data sets

Global k-Means (GKM) algorithm incremental algorithm (dynamically adds a cluster center at a time)

uses each data point as a candidate for the k-th cluster center

3

Page 4: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

Propose a new version of GKM

4

Page 5: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – k-Means

5

sensitive to the choice of a starting point

Page 6: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – The GKM algorithm

6

Objective function

Page 7: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Objective function

7

Old version

Reformulated version

Page 8: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – fast GKM algorithm

8

Old version

Proposed version (auxiliary cluster function)

Page 9: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – modified GKM algorithm

9

Proposed version

Page 10: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – modified GKM algorithm

10

Page 11: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

11

MS k-means: Multi-start k-means GKM: fast Global K-Means MGKM: Modified Global K-Means

Page 12: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

12

Page 13: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

13

Overall (14 datasets, 140 results) The MS k-means algorithm finds the best known (or near best known)

solutions 42 (33.3%) times

GKM algorithm 76 (60.3%) times

MGKM algorithm 102 (81.0%) times

Large k in large data sets (m) The MS k-means algorithm failed to find the best known (or near best

known) solutions

GKM algorithm finds such solutions 22 (45.8%) times

MGKM algorithm 42 (87.5%) times.

Page 14: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

14

Conclusions

A new version of the GKM Change the computation of starting points

By minimize the auxiliary cluster function

Given tolerance

Is more effective than GKM large dataset especially

The choice of starting points in k-means is crucial

Page 15: Modified  global k-means  algorithm  for minimum sum-of-squares  clustering  problems

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

15

Comments

Advantage Theoretically analysis

Drawback Describe why they think to modify anything they tend to modify is

important, or need to.

Application GKM outperforms k-means algorithm