The Minimum Code Length for Clustering Using the Gray Code

Embed Size (px)

Citation preview

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    1/93

    ECML PKDD September ,

    The Minimum Code Length for

    Clustering Using the Gray Code

    Mahito SUGIYAMA,Akihiro YAMAMOTOKyoto University

    JSPS Research Fellow

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    2/93

    Contributions

    . The MCL (Minimum Code Length) A new measure to score clustering results

    Needed to distinguish each cluster under some fixed encod-ing scheme for real-valued variables

    . COOL (COding-Oriented cLustering) A general clustering approach

    Always finds the best clusters (i.e.,the global optimal solution)which minimizes the MCL in O(nd)

    Parameter tuning is not needed. G-COOL (COOL with the Gray code)

    Achieves internal cohesion and external isolation

    Finds arbitrary shaped clusters

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    3/93

    Demonstration (Synthetic Dataset)

    0

    0.5

    1

    0 0.5 1

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    4/93

    G-COOL

    0

    0.5

    1

    0 0.5 1

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    5/93

    K-means

    0

    0.5

    1

    0 0.5 1

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    6/93

    Results (Real datasets)

    Binary

    ltering

    Originalimage

    Delta Dragon Europe Norway Ganges

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    7/93

    Results (Real datasets)

    G-C

    OOL

    K-m

    eans

    Delta Dragon Europe Norway Ganges

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    8/93

    Outline

    . Overview. Background and Our Strategy

    . MCL and Clustering

    . COOL Algorithm

    . G-COOL: COOL with the Gray Code

    . Experiments

    . Conclusion

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    9/93

    Outline

    . Overview. Background and Our Strategy

    . MCL and Clustering

    . COOL Algorithm

    . G-COOL: COOL with the Gray Code

    . Experiments

    . Conclusion

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    10/93

    Clustering Focusing on Compression

    The MDL approach [Kontkanen et al., ] Data encoding has to be optimized

    All encoding schemes are (implicitly) considered The time complexity O(n2)

    The Kolmogorov complexity approach [Cilibrasi, ] Measures the distance between data points based on com-

    pression of finite sequences

    Difficult to apply multivariate data Actual clustering process is the traditional agglomerative hi-erarchical clustering

    The time complexity O(n2) Both approaches are not suitable for massive data

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    11/93

    Our Strategy

    Requirements:. Fast, and linear in the data size. Robust to changes in input parameters. Can find arbitrary shaped clusters

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    12/93

    Our Strategy

    Requirements:. Fast, and linear in the data size. Robust to changes in input parameters. Can find arbitrary shaped clusters

    Solutions:. Fix an encoding scheme for continuous variables

    Motivated by Computable Analysis [Weihrauch, ]

    . Clustering = Discretizing real-valued data

    Always finds the best results w.r.t. the MCL. Use the Gray code for real numbers [Tsuiki, ]

    Discretized data points are overlapped and adjacent clus-ters are merged

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    13/93

    Outline

    . Overview. Background and Our Strategy

    . MCL and Clustering

    . COOL Algorithm

    . G-COOL: COOL with the Gray Code

    . Experiments

    . Conclusion

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    14/93

    MCL (Minimum Code Length)

    The MCL is the code length of the maximally compressedclusters by using a fixed encoding scheme

    The MCL is calculated in O(nd) by using radix sort n and dare the number of data and dimension, resp.

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    15/93

    MCL (Minimum Code Length)

    The MCL is the code length of the maximally compressedclusters by using a fixed encoding scheme

    The MCL is calculated in O(nd) by using radix sort n and dare the number of data and dimension, resp.

    Example: X= {0.1, 0.2, 0.8, 0.9},

    1 = {{0.1, 0.2}, {0.8, 0.9}}2 = {{0.1}, {0.2, 0.8}, {0.9}}

    Use binary encoding Which is preferred?

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    16/93

    Binary Encoding

    0 1

    Position

    0

    1

    2

    3

    4

    0.50.1 0.2

    00011...

    00110...

    0.8 0.9

    11001...

    11100...

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    17/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    id value

    A .B .C .D .

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    18/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    id value

    A .B .C .D .

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    19/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    0 1Lv. 1

    id value

    A .B .C .D .

    MCL = 1 + 1 = 2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    20/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    0 1Lv. 1

    id value

    A .B .C .D .

    MCL = 1 + 1 = 2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    21/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    0 1Lv. 1

    id value

    A .B .C .D .

    MCL = 1 + 1 = 2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    22/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    id value

    A .B .C .D .

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    23/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    0 1Lv. 1

    00 10 1101Lv. 2

    Lv. 3000 001 110 111

    id value

    A .B .C .D .

    MCL = 3 4 = 12

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    24/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    0 1Lv. 1

    00 10 1101Lv. 2

    Lv. 3000 001 110 111

    id value

    A .B .C .D .

    MCL = 3 4 = 12

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    25/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    0 1Lv. 1

    00 10 1101Lv. 2

    Lv. 3000 001 110 111

    id value

    A .B .C .D .

    MCL = 3 4 = 12

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    26/93

    MCL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D

    0 1Lv. 1

    00 10 1101Lv. 2

    Lv. 3000 001 110 111

    id value

    A .B .C .D .

    MCL = 3 4 = 12

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    27/93

    Definition of MCL

    Fix an embedding :

    d

    ( = {0

    ,1

    } usually) For p range() and P range(), define

    (p P) {w p wand P v= for all vsuch that |v| = |w| and p v }

    Each element in (p P) is a prefix that discriminates p from PFor a partition = {C1, , CK} of a data set X,MCL() i{1,,K} Li(), where

    Li() min{|W| (Ci) WandWxCi((x) (X Ci)) }

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    28/93

    Minimizing MCL and Clustering

    Clustering under the MCL criterion istofindthe global op-timal solution that minimizes the MCL

    Findop such thatop argmin

    (X)K

    MCL(),

    where(X)K = { is a partition ofX #C K}

    We give the lower bound of the number of clusters Kas a

    input parameter op becomes one set {X} without this assumption

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    29/93

    Outline

    . Overview. Background and Our Strategy

    . MCL and Clustering

    . COOL Algorithm

    . G-COOL: COOL with the Gray Code

    . Experiments

    . Conclusion

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    30/93

    Optimization by COOL

    COOL solves the optimization problem in O(nd) n and dare the number of data and dimension, resp. The nave approach takes exponential time and space

    Computing process of the MCL becomes clustering process it-self via discretization

    COOL is level-wise, and makes the level-k partition kfrom k= 1, 2, , which holds the following condition: For all x,y X, they are in the same cluster

    v= wfor some v (x) and w (y) with |v| = |w| = k Level-kpartitions form hierarchy For C k, there exists k+1 such that = C

    For all C op, there exists ksuch that C k

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    31/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D Eid value

    A .

    B .C .D .E .

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    32/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    id value

    A .

    B .C .D .E .

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    33/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    id value

    A 0

    B 0C 1D 1E 1

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    34/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    id value

    A 0

    B 0C 1D 1E 1

    MCL = 1 + 1 = 2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    35/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    00 10 1101Lv. 2

    id value

    A 00

    B 01C 10D 10E 11

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    36/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    00 10 1101Lv. 2

    id value

    A 00

    B 01C 10D 10E 11

    MCL = 2 4 = 8

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    37/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    00 10 1101Lv. 2

    100 101Lv. 3

    id value

    A 00

    B 01C 100D 101E 11

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    38/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    00 10 1101Lv. 2

    100 101Lv. 3

    id value

    A 00

    B 01C 100D 101E 11

    MCL = 6 + 6 = 12

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    39/93

    Noise Filtering by COOL

    Noise filtering is easily implemented in COOL

    DefineN {C #C N} for a partition See a cluster Cas noises if #C< N

    Example: Given = {{0.1}, {0.4, 0.5, 0.6}, {0.9}}

    2 = {{0.4, 0.5, 0.6}}, and 0.1 and 0.9 are noises We input the lower bound Nof the cluster size as a input

    parameter

    0 0.5 10.25 0.75

    N= 2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    40/93

    Noise Filtering by COOL

    Noise filtering is easily implemented in COOL

    DefineN {C #C N} for a partition See a cluster Cas noises if #C< N

    Example: Given = {{0.1}, {0.4, 0.5, 0.6}, {0.9}}

    2 = {{0.4, 0.5, 0.6}}, and 0.1 and 0.9 are noises We input the lower bound Nof the cluster size as a input

    parameter

    0 0.5 10.25 0.75

    N= 2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    41/93

    Algorithm of COOL

    Input: A data set X, two lower bounds Kand NOutput: The optimal partitionop and noisesfunction COOL(X, K, N): Find partitions1N, ,mN such that m1N < K mN: (op, MCL(op)) FINDCLUSTERS(X, K, {1N, ,mN}): return (

    op, Xop)

    function FINDCLUSTERS(X, K, {1, ,m}): Find ksuch that k1 < Kand k K: op k: ifK= 2 then return (op, MCL(op)): for each Cin

    1

    k1

    : (, L) FINDCLUSTERS(X C, K 1, {1, ,k}): ifMCL( C) < MCL(op) thenop C : return (op, MCL(op))

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    42/93

    Outline

    . Overview. Background and Our Strategy

    . MCL and Clustering

    . COOL Algorithm

    . G-COOL: COOL with the Gray Code

    . Experiments

    . Conclusion

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    43/93

    Gray Code

    Real numbers in [0, 1] are encoded with

    0

    ,1

    , andBinary: 0.1 00011 ,0.25 00111Gray: 0.1 00010 ,0.25 0100

    Originally, another binary encoding of natural numbers

    Especially important in applications of conversion betweenanalog and digital information [Knuth, ]

    The Gray code embedding is an injection G that mapsx [0,1]to an infinite sequence p0p1p2 , where pi 1 if 2

    i

    m2(i+1) < x< 2

    i

    m + 2(i+1) for an odd m,pi 0if the same holds for an even m,andpi ifx= 2im2(i+1)

    for some integer m For a vector x= (x1, ,xd), G(x) = p11pd1p12pd2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    44/93

    Gray Code Embedding

    0 0.5 1

    0

    1

    2

    35

    Po

    sition

    0.8

    10101...

    0.25

    0100...

    0.1

    00010...

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    45/93

    COOL with Gray Code (G-COOL)

    0 0.5 10.25 0.75

    A B C D Eid value

    A .

    B .C .D .E .

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    46/93

    COOL with Gray Code (G-COOL)

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    1

    id value

    A .

    B .C .D .E .

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    47/93

    COOL with Gray Code (G-COOL)

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    1

    id value

    A 0

    B 0, 1C 1, 1D 1, 1E 1

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    48/93

    COOL with Gray Code (G-COOL)

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    1

    id value

    A 0

    B 0, 1C 1, 1D 1, 1E 1

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    49/93

    COOL with Gray Code (G-COOL)

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    1

    id value

    A 0

    B 0, 1C 1, 1D 1, 1E 1

    MCL = 1 2 = 2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    50/93

    COOL with Gray Code (G-COOL)

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    100 10 1101Lv. 2

    01 10 11

    0 1Lv. 1

    0 1Lv. 1

    id value

    A 00

    B 01, 10C 10, 10D 10, 11E 11, 11

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    51/93

    COOL with Gray Code (G-COOL)

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    100 10 1101Lv. 2

    01 10 11

    0 1Lv. 1

    0 1Lv. 1

    id value

    A 00

    B 01, 10C 10, 10D 10, 11E 11, 11

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    52/93

    COOL with Gray Code (G-COOL)

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    100 10 1101Lv. 2

    01 10 11

    0 1Lv. 1

    0 1Lv. 1

    id value

    A 00

    B 01, 10C 10, 10D 10, 11E 11, 11

    MCL = 2 3 = 6

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    53/93

    COOL with Binary Encoding

    0 0.5 10.25 0.75

    A B C D E

    0 1Lv. 1

    id value

    A 0

    B 0C 1D 1E 1

    MCL = 1 + 1 = 2

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    54/93

    Theoretical Analysis of G-COOL

    Use the Gray code as a fixed encoding in COOL It achieves internal cohesion and external isolation

    Theorem: For the level-kpartition k, x,y Xare in thesame cluster ifd(x,y) < 2(k+1)

    Thus x,yare in the different clusters only ifd(x,y) 2(k+1)

    d(x,y) = maxi{1,,d}|xiyi| (L metric) Two adjacent intervals overlap and they are agglomerated

    Corollary: In the optimal partitionop, for allx C(C

    op), its nearest neighbory C yis nearest neighbor ofx y argminyXd(x,y)

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    55/93

    Demonstration of G-COOL

    0 0.5 1

    0

    0.5

    1

    G-COOL

    0 0.5 1

    0

    0.5

    1

    COOL with the binary encoding

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    56/93

    Outline

    . Overview

    . Background and Our Strategy

    . MCL and Clustering

    . COOL Algorithm

    . G-COOL: COOL with the Gray Code

    . Experiments

    . Conclusion

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    57/93

    Experimental Methods

    Analyze G-COOL empirically with synthetic and realdatasets compared to DBSCAN and K-means Synthetic datasets were generated by the R package cluster-

    Generation [Qiu and Joe, ]

    n = 1, 500 for each cluster and d= 3 Real datasets were geospatial images from Earth-as-Art

    reduced to pixels, translated into binary images All data were normalized by min-max normalization

    G-COOL was implemented by R (version ..)

    Internal and External measure were used Internal: MCL, connectivity, Silhouette width

    External: adjusted Rand index

    /

    http://eros.usgs.gov/imagegallery/
  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    58/93

    Results (Synthetic datasets)

    MCL

    G-COOL

    DBSCAN

    K-means

    50000

    5000

    Number of clusters

    2 4 6

    500

    Data show mean s.e.m.

    Each experiment was performed20 times

    Bad

    Good

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    59/93

    Results (Synthetic datasets)

    Connectivity Silhouette width1000

    1

    100

    Number of clusters2 4 6

    0.1

    10

    0.6

    0.3

    0.5

    Number of clusters2 4 6

    0.4

    G-COOL

    DBSCAN

    K-means

    Good

    Bad

    Bad

    Good

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    60/93

    Results (Synthetic datasets)

    Runtime (s)Adjusted Rand index

    20

    5

    15

    Number of clusters2 4 6

    0

    10

    1.0

    0.3

    0.9

    Number of clusters2 4 6

    0.6

    G-COOL

    DBSCAN

    K-means

    Good

    Bad

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    61/93

    Results (Synthetic datasets)

    MCLG-COOL

    Data show mean s.e.m.

    Each experiment was performed20 times

    5000

    3000

    00 4 62 8 10

    4000

    2000

    1000

    The noise parameter N

    Bad

    Good

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    62/93

    Results (Synthetic datasets)

    Connectivity Silhouette width

    80

    20

    60

    0

    40

    0.6

    0.2

    0.4

    0 4 62 8 10 0 4 62 8 10

    The noise parameter N The noise parameter N

    0

    Good

    Bad

    Bad

    Good

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    63/93

    Results (Synthetic datasets)

    Runtime (s)Adjusted Rand index

    2.0

    0.5

    1.5

    0 4 60

    1.0

    1.0

    0.4

    0.8

    0.6

    2 8 10The noise parameter N

    0 4 62 8 10The noise parameter N

    0.2

    0

    Good

    Bad

    /

    l l d

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    64/93

    Results (Real datasets)

    Binar

    yltering

    Originalimage

    Delta Dragon Europe Norway Ganges

    /

    R l (R l d )

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    65/93

    Results (Real datasets)

    G-COOL

    K-means

    Delta Dragon Europe Norway Ganges

    /

    R l (R l d )

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    66/93

    Results (Real datasets)

    Name n K Running time (s) MCL

    GC KM GC KM

    Delta . . Dragon . . Europe . . Norway . . Ganges . .

    GC: G-COOL, KM: K-means

    /

    O li

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    67/93

    Outline

    . Overview

    . Background and Our Strategy

    . MCL and Clustering

    . COOL Algorithm

    . G-COOL: COOL with the Gray Code

    . Experiments

    . Conclusion

    /

    C l i

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    68/93

    Conclusion

    Integrate clustering and its evaluation in the coding-oriented manner An effective solution for two essential problems, how to mea-

    sure goodness of results and how to find good clusters

    No distance calculation and no data distribution Key ideas:

    . Fixof an encoding scheme for real-valued variables Introduced the MCL focusing on compression of clusters

    Formulated clustering with the MCL, and constructed COOLthat finds the global optimal solution linearly

    . The Gray code We showed efficiency and effectiveness ofG-COOL by the-

    oretically and experimentally

    /

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    69/93

    Appendix

    A-/A-

    N t ti ( / )

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    70/93

    Notation (/)

    A datum x d, a data set X= {x1, ,xn} #Xis the number of elements in X X Yis the relative complement ofYin X

    Clustering is partition ofXinto Ksubsets (clusters) C1, , CK Ci and Ci Cj = We call = {C

    1, , CK} a partition ofX

    (X) = { is a partition ofX} The set of finite and infinite sequences over an alphabet are

    denoted by and , resp. The length

    |w

    |is the number of symbols other than

    Ifw= 11100 , then |w| = 5 For a set of sequences W, |W| = wW|w|

    A-/A-

    N t ti ( / )

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    71/93

    Notation (/)

    An embedding ofd is an injective function fromd to For p, q

    , define p q ifpi = qi for all iwith pi Intuitively, q is more concrete than p

    For w , we write w p ifw p w= {p range() w p} for w W= {p range() w p for some w W} for W

    The following monotonicity holds

    1(v) 1(w) iffv w

    A-/A-

    O ti i ti b COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    72/93

    Optimization by COOL

    The optimal partitionop can be constructed by the level-kpartitions

    For all C op, there exists ksuch that C k

    The level-kpartitions have the hierarchical structure For each C k we have = Cfor some D k+1 COOL is similar to divisive hierarchical clustering

    COOL always outputs the global optimal partitionop

    The time complexity is O(nd) (best) and O(nd+ K! ) (worst) Usually K n holds, hence O(nd)

    A-/A-

    Cl t i P f COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    73/93

    Clustering Process of COOL

    0 1

    0

    1

    A-/A-

    Cl t i P f COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    74/93

    Clustering Process of COOL

    0 1

    0

    1 K= 2

    A-/A-

    Cl t i P f COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    75/93

    Clustering Process of COOL

    0 1

    0

    1 K= 2

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    76/93

    Clustering Process of COOL

    0 10

    1 K= 2

    Lv. 1

    Cluster

    1 2 31

    2 3

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    77/93

    Clustering Process of COOL

    0 10

    1 K= 2

    Lv. 1

    Cluster

    1 2 31

    2 3

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    78/93

    Clustering Process of COOL

    0 10

    1

    Lv. 1

    Cluster

    1 2 31

    2 3K= 5

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    79/93

    Clustering Process of COOL

    0 10

    1

    Lv. 1

    Cluster

    1 2 31

    2 3K= 5

    Lv. 2

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    80/93

    Clustering Process of COOL

    0 10

    1

    Lv. 1

    ClusterK= 5

    Lv. 21 2

    3

    5

    4 6

    21 3 4 5 6

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    81/93

    Clustering Process of COOL

    0 10

    1

    Lv. 1

    ClusterK= 5

    Lv. 2

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    82/93

    Clustering Process of COOL

    0 10

    1

    Lv. 1

    ClusterK= 5

    Lv. 2

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    83/93

    Clustering Process of COOL

    0 10

    1

    Lv. 1

    ClusterK= 5

    Lv. 2

    A-/A-

    Clustering Process of COOL

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    84/93

    Clustering Process of COOL

    0 10

    1

    Lv. 1

    ClusterK= 5

    Lv. 2

    A-/A-

    TheMulti Dimensional Gray Code

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    85/93

    TheMulti-Dimensional Gray Code

    Usethewrappingfunction (p1, ,pd) p11pd1p12pd2Define the d-dimensional Gray code embedding dG:,d by dG(x1, ,xd) (G(x1), , G(xd))

    We abbreviate dofdG if it is understood from the context

    A-/A-

    Internal Measures

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    86/93

    Internal Measures

    Connectivity [Handl et al., ]

    Conn() = xXMi=1 f(x, nn(x, i))/i

    nn(x,j) isthe i-th neighbor ofx, f(x,y) is 0 if xandybe-long to the same cluster, and 1 otherwise M is an input parameter (we set as )

    Takes values from 0 to, should be minimized Silhouette width The average of Silhouette value S(x) for each x

    S(x) = (b(x) a(x)/ max(b(x), a(x))) a(x) = C

    1 yCd(x,y) (x C) b(x) = minDCD1 yD d(x,y) Takes values from 1 to 1, should be maximized

    A-/A-

    External Measures

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    87/93

    External Measures

    Adjusted Rand index Let the result be = {C1, , CK} and the correct parti-tion be = {D1, , DM} Suppose nij {x X x Ci,x Dj}. Then

    i,j nijC2 (iCiC2 h DjC2)/nC221(iCiC2 + h DjC2) (iCiC2 h DjC2)/nC2

    A-/A-

    Discussion

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    88/93

    Discussion

    Results for synthetic datasets Best performance under the internal measures (nearly) Best performance under the internal measures G-COOL is efficient and effective DBSCAN is sensitive to input parameters

    The MCL works well as an internal measure

    Results for real datasets not good, and not bad

    There are no clear clusters originally G-COOL is a good clustering method

    A-/A-

    RelatedWork

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    89/93

    RelatedWork

    Partitional methods [Chaoji et al., ]

    Mass-based methods [Ting and Wells, ]

    Density-based methods (DBSCAN [Ester et al., ]

    Hierarchical clustering methods

    (CURE [Guha et al., ], CHAMELEON [Karypis et al.,])

    Grid-based methods(STING [Wang etal., ], WaveCluster [Sheikholeslami etal.

    , ])

    A-/A-

    Future Works

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    90/93

    Future Works

    Speeding up by using tree-structures such as BDD

    Apply to anomaly detection

    Theoretical analysis, in particular relation with Com-putable Analysis Admissibility is a key property

    A-/A-

    References

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    91/93

    References

    [Berkhin, ] P. Berkhin. A survey of clustering data mining techniques.Grouping Multidimensional Data, pages , .

    [Chaoji et al., ] V. Chaoji, M. A. Hasan, S. Salem, and M. J. Zaki. SPARCL:An effective and efficient algorithm for mining arbitrary shape-basedclusters. Knowledge and Information Systems, ():, .

    [Cilibrasi and Vitnyi, ] R. Cilibrasi and P. M. B. Vitnyi. Cluster-ing by compression. IEEE Transactions on Information Theory,():, .

    [Ester et al., ] M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databaseswith noise. In Proceedings of KDD, , , .

    [Guha et al., ] S. Guha, R. Rastogi, and K. Shim. CURE: An effi-cient clustering algorithm for large databases. Information Systems,():, .

    [Handl et al., ] J. Handl, J. Knowles, and D. B. Kell. Computational

    A-/A-

    cluster validation in post-genomic data analysis Bioinformatics

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    92/93

    cluster validation in post-genomic data analysis. Bioinformatics,():, .

    [Karypis et al., ] G. Karypis, H. Eui-Hong, and V. Kumar. CHAMELEON:

    Hierarchical clustering using dynamic modeling. Computer,():, .

    [Kontkanen and Myllymki, ] P. Kontkanen and P. Myllymki. An em-pirical comparison of NML clustering algorithms. In Proceedings ofInformation Theory and Statistical Learning, .

    [Kontkanen et al., ] P. Kontkanen, P. Myllymki, W. Buntine, J. Rissanen,and H. Tirri. An MDL framework for data clustering. In P. Grnwald, I. J.Myung, and M. Pitt, editors,Advances in Minimum Description Length:Theory and Applications. MIT Press, .

    [Knuth, ] D. E. Knuth. The Art of Computer Programming, Volume , Fas-cicle : Generating All Tuples and Permutations. Addison-Wesley Pro-fessional, .

    [Qiu and Joe, ] W. Qiu and H. Joe. Generation of random clusters withspecified degree of separation. Journal of Classification, :,.

    A-/A-

    [Sheikholeslami et al ] G Sheikholeslami S Chatterjee and

  • 8/3/2019 The Minimum Code Length for Clustering Using the Gray Code

    93/93

    [Sheikholeslami et al., ] G. Sheikholeslami, S. Chatterjee, andA. Zhang. WaveCluster: A multi-resolution clustering approach forvery large spatial databases. In Proceedings of the th InternationalConference on Very Large Data Bases, pages , .

    [Ting and Wells, ] K. M. Ting and J. R. Wells. Multi-dimensional massestimation and mass-based clustering. In Proceedings of th IEEE In-ternational Conference on Data Mining, pages , .

    [Tsuiki, ] Hideki Tsuiki. Real number computation through Gray codeembedding. Theoretical Computer Science, ():, .

    [Wang et al., ] W. Wang, J. Yang, and R. Muntz. STING: A statisticalinformation grid approach to spatial data mining. In Proceedingsof the rd International Conference on Very Large Data Bases, pages, .

    [Weihrauch, ] K. Weihrauch. Computable Analysis. Springer, .