Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
ECML PKDD September –,
The Minimum Code Length forClustering Using the Gray Code
Mahito SUGIYAMA†,‡,Akihiro YAMAMOTO†
†Kyoto University‡JSPS Research Fellow
/
Contributions
. TheMCL (MinimumCode Length)– A new measure to score clustering results– Needed to distinguish each cluster under some fixed encod-
ing scheme for real-valued variables
. COOL (COding-Oriented cLustering)– A general clustering approach– Always finds the best clusters (i.e., the global optimal solution)
which minimizes the MCL in O(nd)– Parameter tuning is not needed
. G-COOL (COOLwith the Gray code)– Achieves internal cohesion and external isolation– Finds arbitrary shaped clusters
/
Demonstration (Synthetic Dataset)
0
0.5
1
0 0.5 1
/
G-COOL
0
0.5
1
0 0.5 1
/
K-means
0
0.5
1
0 0.5 1
/
Results (Real datasets)Bi
nary
�lte
ring
Orig
inal
imag
e
Delta Dragon Europe Norway Ganges
/
Results (Real datasets)G-COOL
K-means
Delta Dragon Europe Norway Ganges
/
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
Clustering Focusing on Compression
• The MDL approach [Kontkanen et al., ]– Data encoding has to be optimized
∘ All encoding schemes are (implicitly) considered∘ The time complexity ⩾ O(n2)
• The Kolmogorov complexity approach [Cilibrasi, ]– Measures the distance between data points based on com-
pression of finite sequences∘ Difficult to apply multivariate data
– Actual clustering process is the traditional agglomerative hi-erarchical clustering∘ The time complexity ⩾ O(n2)
• Both approaches are not suitable for massive data
/
Our Strategy
• Requirements:. Fast, and linear in the data size. Robust to changes in input parameters. Can find arbitrary shaped clusters
/
Our Strategy
• Requirements:. Fast, and linear in the data size. Robust to changes in input parameters. Can find arbitrary shaped clusters
• Solutions:. Fix an encoding scheme for continuous variables
– Motivated by Computable Analysis [Weihrauch, ]. Clustering = Discretizing real-valued data
– Always finds the best results w.r.t. the MCL. Use the Gray code for real numbers [Tsuiki, ]
– Discretized data points are overlapped and adjacent clus-ters are merged
/
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
MCL (Minimum Code Length)
• The MCL is the code length of the maximally compressedclusters by using a fixed encoding scheme• The MCL is calculated in O(nd) by using radix sort– n and d are the number of data and dimension, resp.
/
MCL (Minimum Code Length)
• The MCL is the code length of the maximally compressedclusters by using a fixed encoding scheme• The MCL is calculated in O(nd) by using radix sort– n and d are the number of data and dimension, resp.
Example: X = {0.1, 0.2, 0.8, 0.9},𝒞1 = {{0.1, 0.2}, {0.8, 0.9}}𝒞2 = {{0.1}, {0.2, 0.8}, {0.9}}
– Use binary encoding– Which is preferred?
/
Binary Encoding
0 1
Positio
n
01234
0.50.1 0.2
00011...00110...
0.8 0.9
11001...11100...
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
id value
A .B .C .D .
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
id value
A .B .C .D .
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
id value
A .B .C .D .
MCL = 1 + 1 = 2
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
id value
A .B .C .D .
MCL = 1 + 1 = 2
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
id value
A .B .C .D .
MCL = 1 + 1 = 2
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
id value
A .B .C .D .
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
00 10 1101Lv. 2
Lv. 3000 001 110 111
id value
A .B .C .D .
MCL = 3 ⋅ 4 = 12
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
00 10 1101Lv. 2
Lv. 3000 001 110 111
id value
A .B .C .D .
MCL = 3 ⋅ 4 = 12
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
00 10 1101Lv. 2
Lv. 3000 001 110 111
id value
A .B .C .D .
MCL = 3 ⋅ 4 = 12
/
MCL with Binary Encoding
0 0.5 10.25 0.75
A B C D
0 1Lv. 1
00 10 1101Lv. 2
Lv. 3000 001 110 111
id value
A .B .C .D .
MCL = 3 ⋅ 4 = 12
/
Definition of MCL
• Fix an embedding γ :ℝd → Σω (Σ = {0,1} usually)• For p ∈ range(γ) and P ⊂ range(γ), define
Φ(p ∣P) ≔ {w ∈ Σ∗ p ∈ ↑w and P ∩ ↑v = ∅ for all vsuch that |v| = |w| and p ∈ ↑v }
– Each element in Φ(p ∣P) is a prefix that discriminates p from P
For a partition 𝒞 = {C1,… ,CK} of a data set X,MCL(𝒞) ≔ ∑ i∈{1,…,K} Li(𝒞), where
Li(𝒞) ≔ min{|W|γ(Ci) ⊆ ↑W and
W ⊆ x∈CiΦ(γ(x) ∣ γ(X ⧵ Ci)) }
/
Minimizing MCL and Clustering
Clusteringunder theMCL criterion is to find theglobal op-timal solution that minimizes the MCL
– Find 𝒞op such that𝒞op ∈ argmin
𝒞∈𝒞(X)⩾KMCL(𝒞),
where 𝒞(X)⩾K = {𝒞 is a partition of X ∣ #C ⩾ K }
• We give the lower bound of the number of clusters K as ainput parameter– 𝒞op becomes one set {X} without this assumption
/
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
Optimization by COOL
• COOL solves the optimization problem in O(nd)– n and d are the number of data and dimension, resp.
∘ The naïve approach takes exponential time and space– Computing process of theMCL becomes clustering process it-
self via discretization
• COOL is level-wise, and makes the level-k partition 𝒞k
from k = 1, 2,… , which holds the following condition:– For all x, y ∈ X, they are in the same cluster ⟺
v = w for some v ⊏ γ(x) andw ⊏ γ(y) with |v| = |w| = k∘ Level-k partitions form hierarchy∘ For C ∈ 𝒞k, there exists 𝒟 ⊆ 𝒞k+1 such that ⋃ 𝒟 = C
• For all C ∈ 𝒞op, there exists k such that C ∈ 𝒞k
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D Eid value
A .B .C .D .E .
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
id value
A .B .C .D .E .
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
id value
A 0B 0C 1D 1E 1
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
id value
A 0B 0C 1D 1E 1
MCL = 1 + 1 = 2
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
00 10 1101Lv. 2
id value
A 00B 01C 10D 10E 11
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
00 10 1101Lv. 2
id value
A 00B 01C 10D 10E 11
MCL = 2 ⋅ 4 = 8
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
00 10 1101Lv. 2
100 101Lv. 3
id value
A 00B 01C 100D 101E 11
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
00 10 1101Lv. 2
100 101Lv. 3
id value
A 00B 01C 100D 101E 11
MCL = 6 + 6 = 12
/
Noise Filtering by COOL
• Noise filtering is easily implemented in COOL• Define 𝒞⩾N ≔ {C ∈ 𝒞 ∣ #C ⩾ N} for a partition 𝒞– See a cluster C as noises if #C < N• Example: Given 𝒞 = {{0.1}, {0.4, 0.5, 0.6}, {0.9}}– 𝒞⩾2 = {{0.4, 0.5, 0.6}}, and 0.1 and 0.9 are noises• We input the lower bound N of the cluster size as a inputparameter
0 0.5 10.25 0.75
N = 2
/
Noise Filtering by COOL
• Noise filtering is easily implemented in COOL• Define 𝒞⩾N ≔ {C ∈ 𝒞 ∣ #C ⩾ N} for a partition 𝒞– See a cluster C as noises if #C < N• Example: Given 𝒞 = {{0.1}, {0.4, 0.5, 0.6}, {0.9}}– 𝒞⩾2 = {{0.4, 0.5, 0.6}}, and 0.1 and 0.9 are noises• We input the lower bound N of the cluster size as a inputparameter
0 0.5 10.25 0.75
N = 2
/
Algorithm of COOLInput: A data set X, two lower bounds K and NOutput: The optimal partition 𝒞op and noisesfunction COOL(X, K, N): Find partitions 𝒞1
⩾N,… ,𝒞m⩾N such that ‖𝒞m−1
⩾N ‖ < K ⩽ ‖𝒞m⩾N‖
: (𝒞op, MCL(𝒞op)) ← FINDCLUSTERS(X, K, {𝒞1⩾N,… ,𝒞m
⩾N}): return (𝒞op, X ⧵ ⋃ 𝒞op)function FINDCLUSTERS(X, K, {𝒞1,… ,𝒞m}): Find k such that ‖𝒞k−1‖ < K and ‖𝒞k‖ ⩾ K: 𝒞op ← 𝒞k
: if K = 2 then return (𝒞op, MCL(𝒞op)): for each C in 𝒞1 ∪ … ∪ 𝒞k−1
: (𝒞, L) ← FINDCLUSTERS(X ⧵ C, K − 1, {𝒞1,… ,𝒞k}): if MCL(𝒞 ∪ C) < MCL(𝒞op) then 𝒞op ← C ∪ 𝒞: return (𝒞op, MCL(𝒞op))
/
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
Gray Code
• Real numbers in [0, 1] are encoded with 0, 1, and ⊥Binary: 0.1 → 00011… , 0.25 → 00111…Gray: 0.1 → 00010… , 0.25 → 0⊥100…
• Originally, another binary encoding of natural numbers– Especially important in applications of conversion between
analog and digital information [Knuth, ]
The Gray code embedding is an injection γG that maps x ∈ [0, 1]to an infinite sequence p0p1p2 … , where– pi ≔ 1 if 2−im−2−(i+1) < x < 2−im+2−(i+1) for an oddm, pi ≔ 0
if the same holds for an evenm, and pi ≔ ⊥ if x = 2−im−2−(i+1)
for some integerm– For a vector x = (x1,… , xd), γG(x) = p1
1 … pd1p12 … pd2 …
/
Gray Code Embedding
0 0.5 1
0
1
235
Positio
n
0.8
10101...
0.25
0⊥100...
0.1
00010...
/
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D Eid value
A .B .C .D .E .
/
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
⊥1
id value
A .B .C .D .E .
/
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
⊥1
id value
A 0B 0, ⊥1C 1, ⊥1D 1, ⊥1E 1
/
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
⊥1
id value
A 0B 0, ⊥1C 1, ⊥1D 1, ⊥1E 1
/
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
⊥1
id value
A 0B 0, ⊥1C 1, ⊥1D 1, ⊥1E 1
MCL = 1 ⋅ 2 = 2
/
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
⊥100 10 1101Lv. 2
0⊥1 ⊥10 1⊥1
0 1Lv. 1
0 1Lv. 1
id value
A 00B 01, ⊥10C 10, ⊥10D 10, 1⊥1E 11, 1⊥1
/
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
⊥100 10 1101Lv. 2
0⊥1 ⊥10 1⊥1
0 1Lv. 1
0 1Lv. 1
id value
A 00B 01, ⊥10C 10, ⊥10D 10, 1⊥1E 11, 1⊥1
/
COOL with Gray Code (G-COOL)
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
⊥100 10 1101Lv. 2
0⊥1 ⊥10 1⊥1
0 1Lv. 1
0 1Lv. 1
id value
A 00B 01, ⊥10C 10, ⊥10D 10, 1⊥1E 11, 1⊥1
MCL = 2 ⋅ 3 = 6
/
COOL with Binary Encoding
0 0.5 10.25 0.75
A B C D E
0 1Lv. 1
id value
A 0B 0C 1D 1E 1
MCL = 1 + 1 = 2
/
Theoretical Analysis of G-COOL
• Use the Gray code as a fixed encoding in COOL– It achieves internal cohesion and external isolation
• Theorem: For the level-k partition 𝒞k, x, y ∈ X are in thesame cluster if d∞(x, y) < 2−(k+1)
– Thus x, y are in the different clusters only if d∞(x, y) ⩾ 2−(k+1)
– d∞(x, y) = maxi∈{1,…,d}|xi − yi| (L∞ metric)∘ Two adjacent intervals overlap and they are agglomerated
• Corollary: In the optimal partition 𝒞op, for all x ∈ C (C ∈𝒞op), its nearest neighbor y ∈ C– y is nearest neighbor of x ⟺ y ∈ argminy∈Xd∞(x, y)
/
Demonstration of G-COOL
0 0.5 10
0.5
1
G-COOL
0 0.5 10
0.5
1
COOL with the binary encoding
/
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
Experimental Methods
• Analyze G-COOL empirically with synthetic and realdatasets compared to DBSCAN and K-means– Synthetic datasets were generated by the R package cluster-
Generation [Qiu and Joe, ]∘ n = 1, 500 for each cluster and d = 3
– Real datasets were geospatial images from Earth-as-Art∘ reduced to × pixels, translated into binary images
– All data were normalized by min-max normalization
• G-COOL was implemented by R (version ..)• Internal and External measure were used– Internal: MCL, connectivity, Silhouette width– External: adjusted Rand index
/
Results (Synthetic datasets)
MCL
G-COOLDBSCANK-means
50000
5000
Number of clusters2 4 6
500Data show mean ± s.e.m.Each experiment was performed20 times
Bad
Good
/
Results (Synthetic datasets)
Connectivity Silhouette width1000
1
100
Number of clusters2 4 6
0.1
10
0.6
0.3
0.5
Number of clusters2 4 6
0.4
G-COOLDBSCANK-means
Good
Bad
Bad
Good
/
Results (Synthetic datasets)
Runtime (s)Adjusted Rand index20
5
15
Number of clusters2 4 6
0
10
1.0
0.3
0.9
Number of clusters2 4 6
0.6
G-COOLDBSCANK-means
Good
Bad
/
Results (Synthetic datasets)
MCL
G-COOL
Data show mean ± s.e.m.Each experiment was performed20 times
5000
3000
00 4 62 8 10
4000
2000
1000
The noise parameter N
Bad
Good
/
Results (Synthetic datasets)
Connectivity Silhouette width
80
20
60
0
40
0.6
0.2
0.4
0 4 62 8 10 0 4 62 8 10The noise parameter N The noise parameter N
0
Good
Bad
Bad
Good
/
Results (Synthetic datasets)
Runtime (s)Adjusted Rand index
2.0
0.5
1.5
0 4 60
1.0
1.0
0.4
0.8
0.6
2 8 10The noise parameter N
0 4 62 8 10The noise parameter N
0.2
0
Good
Bad
/
Results (Real datasets)Bi
nary
�lte
ring
Orig
inal
imag
e
Delta Dragon Europe Norway Ganges
/
Results (Real datasets)G-COOL
K-means
Delta Dragon Europe Norway Ganges
/
Results (Real datasets)
Name n K Running time (s) MCLGC KM GC KM
Delta . . Dragon . . Europe . . Norway . . Ganges . .
GC: G-COOL, KM: K-means
/
Outline
. Overview
. Background and Our Strategy
. MCL and Clustering
. COOL Algorithm
. G-COOL: COOL with the Gray Code
. Experiments
. Conclusion
/
Conclusion
• Integrate clustering and its evaluation in the coding-oriented manner– An effective solution for two essential problems, how to mea-
sure goodness of results and how to find good clusters∘ No distance calculation and no data distribution
• Key ideas:. Fix of an encoding scheme for real-valued variables
– Introduced the MCL focusing on compression of clusters– Formulated clusteringwith theMCL, and constructedCOOL
that finds the global optimal solution linearly. The Gray code
– We showed efficiency and effectiveness of G-COOL by the-oretically and experimentally
/
Appendix
A-/A-
Notation (/)• A datum x ∈ ℝd, a data set X = {x1,… , xn}– #X is the number of elements in X– X ⧵ Y is the relative complement of Y in X• Clustering is partition of X into K subsets (clusters) C1, … , CK
– Ci ≠ ∅ and Ci ∩ Cj = ∅– We call 𝒞 = {C1,… ,CK} a partition of X– 𝒞(X) = {𝒞 ∣ 𝒞 is a partition of X}• The set of finite and infinite sequences over an alphabet Σ aredenoted by Σ∗ and Σω, resp.– The length |w| is the number of symbols other than ⊥
∘ Ifw = 11⊥100⊥⊥ … , then |w| = 5– For a set of sequencesW, |W| = ∑w∈W|w|
A-/A-
Notation (/)• An embedding of ℝd is an injective function γ from ℝd to Σω
• For p, q ∈ Σω, define p ⩽ q if pi = qi for all iwith pi ≠ ⊥– Intuitively, q is more concrete than p• For w ∈ Σ∗, we writew ⊏ p ifw⊥ω ⩽ p– ↑w = {p ∈ range(γ) ∣ w ⊏ p} forw ∈ Σ∗
– ↑W = {p ∈ range(γ) ∣ w ⊏ p for some w ∈ W} forW ⊆ Σ∗
• The following monotonicity holds– γ−1(↑v) ⊆ γ−1(↑w) iff v⊥ω ⩾ w⊥ω
A-/A-
Optimization by COOL
• The optimal partition𝒞op can be constructed by the level-k partitions
For all C ∈ 𝒞op, there exists k such that C ∈ 𝒞k
• The level-k partitions have the hierarchical structure– For each C ∈ 𝒞k we have ⋃ 𝒟 = C for some D ⊆ 𝒞k+1
– COOL is similar to divisive hierarchical clustering
COOL always outputs the global optimal partition 𝒞op
• The time complexity is O(nd) (best) and O(nd + K! ) (worst)– Usually K ≪ n holds, hence O(nd)
A-/A-
Clustering Process of COOL
0 10
1
A-/A-
Clustering Process of COOL
0 10
1 K = 2
A-/A-
Clustering Process of COOL
0 10
1 K = 2
A-/A-
Clustering Process of COOL
0 10
1 K = 2
Lv. 1
Cluster
1 2 31
2 3
A-/A-
Clustering Process of COOL
0 10
1 K = 2
Lv. 1
Cluster
1 2 31
2 3
A-/A-
Clustering Process of COOL
0 10
1
Lv. 1
Cluster
1 2 31
2 3K = 5
A-/A-
Clustering Process of COOL
0 10
1
Lv. 1
Cluster
1 2 31
2 3K = 5
Lv. 2
A-/A-
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK = 5
Lv. 21 2
3
5
4 6
21 3 4 5 6
A-/A-
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK = 5
Lv. 2
A-/A-
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK = 5
Lv. 2
A-/A-
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK = 5
Lv. 2
A-/A-
Clustering Process of COOL
0 10
1
Lv. 1
ClusterK = 5
Lv. 2
A-/A-
The Multi-Dimensional Gray Code
• Use thewrapping functionφ(p1,… , pd) ≔ p11 … pd1p
12 … pd2 …
Define the d-dimensional Gray code embedding γdG:ℐ →Σω
⊥,d by γdG(x1,… , xd) ≔ φ(γG(x1),… , γG(xd))
• We abbreviate d of γdG if it is understood from the context
A-/A-
Internal Measures
• Connectivity [Handl et al., ]– Conn(𝒞) = ∑x∈X ∑Mi=1 f(x,nn(x, i))/i
∘ nn(x, j) is the i-th neighbor of x, f(x, y) is 0 if x and y be-long to the same cluster, and 1 otherwise
∘ M is an input parameter (we set as )– Takes values from 0 to ∞, should be minimized• Silhouette width– The average of Silhouette value S(x) for each xS(x) = (b(x) − a(x)/max(b(x), a(x)))∘ a(x) = ‖C‖−1 ∑y∈C d(x, y) (x ∈ C)∘ b(x) = minD∈𝒞⧵C ‖D‖−1 ∑y∈D d(x, y)
– Takes values from −1 to 1, should be maximizedA-/A-
External Measures
• Adjusted Rand index– Let the result be 𝒞 = {C1,… ,CK} and the correct parti-
tion be 𝒟 = {D1,… ,DM}– Suppose nij ≔ ‖{x ∈ X ∣ x ∈ Ci, x ∈ Dj}‖. Then
∑i, j nijC2 − (∑i ‖Ci‖C2 ∑h ‖Dj‖C2)/nC2
2−1(∑i ‖Ci‖C2 + ∑h ‖Dj‖C2) − (∑i ‖Ci‖C2 ∑h ‖Dj‖C2)/nC2
A-/A-
Discussion
• Results for synthetic datasets– Best performance under the internal measures– (nearly) Best performance under the internal measures– G-COOL is efficient and effective
∘ DBSCAN is sensitive to input parameters– The MCL works well as an internal measure• Results for real datasets– not good, and not bad
∘ There are no clear clusters originally• G-COOL is a good clustering method
A-/A-
RelatedWork
• Partitional methods [Chaoji et al., ]
• Mass-based methods [Ting and Wells, ]
• Density-based methods (DBSCAN [Ester et al., ])• Hierarchical clustering methods(CURE [Guha et al., ], CHAMELEON [Karypis et al.,])• Grid-based methods(STING [Wang et al., ], WaveCluster [Sheikholeslami etal., ])
A-/A-
Future Works
• Speeding up by using tree-structures such as BDD• Apply to anomaly detection• Theoretical analysis, in particular relation with Com-putable Analysis– Admissibility is a key property
A-/A-
References
[Berkhin, ] P. Berkhin. A survey of clustering data mining techniques.GroupingMultidimensional Data, pages –, .
[Chaoji et al., ] V. Chaoji, M. A. Hasan, S. Salem, andM. J. Zaki. SPARCL:An effective and efficient algorithm formining arbitrary shape-basedclusters. Knowledge and Information Systems, ():–, .
[Cilibrasi and Vitányi, ] R. Cilibrasi and P. M. B. Vitányi. Cluster-ing by compression. IEEE Transactions on Information Theory,():–, .
[Ester et al., ] M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databaseswith noise. In Proceedings of KDD, , –, .
[Guha et al., ] S. Guha, R. Rastogi, and K. Shim. CURE: An effi-cient clustering algorithm for large databases. Information Systems,():–, .
[Handl et al., ] J. Handl, J. Knowles, and D. B. Kell. Computational
A-/A-
cluster validation in post-genomic data analysis. Bioinformatics,():, .
[Karypis et al., ] G. Karypis, H. Eui-Hong, and V. Kumar. CHAMELEON:Hierarchical clustering using dynamic modeling. Computer,():–, .
[Kontkanen and Myllymäki, ] P. Kontkanen and P. Myllymäki. An em-pirical comparison of NML clustering algorithms. In Proceedings ofInformation Theory and Statistical Learning, .
[Kontkanen et al., ] P. Kontkanen, P.Myllymäki,W. Buntine, J. Rissanen,andH. Tirri. AnMDL framework for data clustering. In P. Grünwald, I. J.Myung, andM. Pitt, editors, Advances inMinimumDescription Length:Theory and Applications. MIT Press, .
[Knuth, ] D. E. Knuth. TheArt of Computer Programming, Volume, Fas-cicle : Generating All Tuples and Permutations. Addison-Wesley Pro-fessional, .
[Qiu and Joe, ] W. Qiu and H. Joe. Generation of random clusters withspecified degree of separation. Journal of Classification, :–,.
A-/A-
[Sheikholeslami et al., ] G. Sheikholeslami, S. Chatterjee, andA. Zhang. WaveCluster: A multi-resolution clustering approach forvery large spatial databases. In Proceedings of the th InternationalConference on Very Large Data Bases, pages –, .
[Ting and Wells, ] K. M. Ting and J. R. Wells. Multi-dimensional massestimation and mass-based clustering. In Proceedings of th IEEE In-ternational Conference on DataMining, pages – , .
[Tsuiki, ] Hideki Tsuiki. Real number computation through Gray codeembedding. Theoretical Computer Science, ():–, .
[Wang et al., ] W. Wang, J. Yang, and R. Muntz. STING: A statisticalinformation grid approach to spatial data mining. In Proceedingsof the rd International Conference on Very Large Data Bases, pages–, .
[Weihrauch, ] K. Weihrauch. Computable Analysis. Springer, .
A-/A-