Upload
seth
View
40
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presenter : Bo- Sheng Wang Authors: Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai , Chuangyin Dang KBS, 2012. A dissimilarity measure for the K-Modes clustering algorithm. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Citation preview
A dissimilarity measure for the K-Modes clustering algorithm
Presenter : Bo-Sheng Wang Authors : Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai, Chuangyin Dang
KBS, 2012
1
Outlines
• Motivation• Objectives• Methodology• Experiments• Conclusions• Comments
2
Motivation• In this paper, the limitations of simple matching
dissimilarity measure and Ng’s dissimilarity measure are revealed using some illustrative examples.
3
Limitations of simple matching dissimilarity measure
• Simple matching is a common approach, the simple matching dissimilarity measure is is defined as:
• However, simple matching often results :– Weak intrasimilarity.– Disregards the similarity hidden between categorical values.
4
x≡y =1, if x≠y
0, otherwise
Limitations of Ng’s dissimilarity measure
• For the k-Modes algorithm with Ng’s dissimilarity measure, the simple matching dissimilarity measure is still used in the first iteration.
– Disregards the similarity hidden between categorical values.
5
Objectives• Based on the idea of biological and genetic taxonomy
and rough membership function, a new dissimilarity measure for the k-Modes algorithm is define.
• The dissimilarity measure between a mode of a cluster and an object is given by improving Ng’s dissimilarity measure.
6
Methodology• Review some basic concepts of rough set theory.– Definition 1 Categorical information system• IS = (U,A,V,f)
– Definition 2 Binary relation IND(P)• 1.• 2.
– .Definition 3 The rough membership function µPX: U→[0,1]
•
7
Methodology-A new dissimilarity measure between two objects• Definition 4 A similarity measure between objects x and y with respect to
a–
8
Methodology-A new dissimilarity measure between two objects• Definition 5 The dissimilarity measure between x and y with respect to P.
9
Methodology-A new dissimilarity measure between two objects
• Example : A new dissimilarity measure between two objects– Simple Matching Dissimilarity Measure :
– New Dissimilarity Measure :
10
Methodology-A new dissimilarity measure between a mode and an object• Ng’s Dissimilarity Measure
11
Methodology-A new dissimilarity measure between a mode and an object• Definition 7
The new dissimilarity measure between xi and zl with respect to P
12
Methodology-A new dissimilarity measure between a mode and an objects• Example : A new dissimilarity measure between a mode and an object
– Ng’s dissimilarity measure
– New dissimilarity measure
13
Methodology-Convergence and complexity analysis• The objective of clustering a set of n = |U| objects into k
clusters is to find W and Z that minimize:
14
Methodology-Convergence and complexity analysis• This process can be formulated as the following k-
Modes algorithm:
15
Methodology-Convergence and complexity analysis• Now we consider the convergence of the k-Modes algorithm
with the proposed dissimilarity measure NDisP(zl ,x i )
16
Methodology-Convergence and complexity analysis• Proof. For a given W. we have :
17
Methodology-Convergence and complexity analysis
18
Methodology-Convergence and complexity analysis
19
Experiments• Evaluation on scalability
20
Experiments• Evaluation on scalability
21
Experiments• Evaluation on clustering efficiency
22
Conclusions• The new measure that unifies the dissimilarity measures
between two objects and between an object and a mode as well.
• The k-Modes algorithm using the new dissimilarity measure can be safely and effectively used in case of large data sets.
• The results of experiments using synthetic data sets and five real data sets from UCI show the effectiveness of the new dissimilarity measure.
23
Comments
• Advantages– The method that can save some time.
• Applications– Dissimilarity measure
24