View
518
Download
1
Category
Preview:
Citation preview
ClusteringConstraint based Cluster
Analysis
1
Constraint based Clustering Constraint based Clustering – finds clusters that satisfy
user-specified preferences or constraints
Desirable to have the Clustering process take the user
preferences and constraints into consideration Expected number of clusters
Maximal / Minimal Cluster size
Weights for dimensions / Important dimensions
Mining becomes focused
2
Categories of Constraints Constraints on Individual objects
Ex: Luxury mansions worth over a million dollars Processed through selection
Constraints on the selection of Clustering parameters Number of clusters, radius, MinPts Not strictly constraint based clustering
Constraints on distance or similarity functions Different measures for specific attributes / Objects Weighting process – Clustering with obstacle objects
User specified constraints on properties of individual clusters Clusters satisfy given properties
Semi-supervised clustering based on partial supervision Pair-wise constraints
3
Clustering with Obstacle Objects
City – rivers, lakes, bridges, roads etc Obstacles must be avoided Distance function between objects must be re-defined
Straight ine distance is meaningless When using a partitioning approach – distance
calculation with obstacles becomes expensive k-means – not suitable as cluster centre may lie on an obstacle k-medoids can be used and distance between objects can be
determined using triangulation
4
Clustering with Obstacles Point p is visible from q in region R if straight line
between p and q does not intersect any obstacle Visibility graph - VG
Each vertex of the obstacle has a corresponding node Edge between two vertices only if they are visible to each other Additional points can be added and paths can be determined
5
Clustering with Obstacles To reduce cost of distance
computation points can be grouped into micro-clusters Triangulate a region Group nearby points in same triangle
into micro clusters Process micro-clusters instead of
points Computation of shortest paths in terms
of: VV indices – pair of obstacle objects MV indices for pair of micro-cluster and
obstacle objects
6
Clustering with Obstacles
7
User-Constrained Cluster Analysis Example: Relocating package delivery
centres N customers : high-value and ordinary customers Determine locations for k service stations Constraints
Each station should server At least 100 high value customers At least 5000 ordinary customers
Constrained Optimization problem Direct Mathematical approach is expensive
8
User-Constrained Cluster Analysis Micro-Clustering Initially find a partition of k-groups satisfying given
constraints Iteratively refine solution
Move m customers from cluster Ci to Cj if Ci has atleast m surplus customers
Movement done if total sum of distances (objects – Centers) is reduced
Can be directed by selecting promising points Dead lock has to be avoided (constraint cannot be satisfied)
Instead of points can work on micro-clusters
9
Semi-Supervised Cluster Analysis Constraint based Semi-supervised Clustering
Relies on user provided labels or constraints Initialize based on labeled objects Modify Objective function
Distance based Semi-supervised clustering Adaptive distance measure trained to satisfy labels or
constraints
10
CLTree (Clustering based on decision TREEs) Integrates unsupervised clustering with supervised classification Transforms clustering task into Classification
Points to be clustered – Y Adds a set of non-existence points - N
11
Semi-Supervised Cluster Analysis
Non-existence points Not added physically For decision tree construction only number of N points are
needed – not actual points At the root node, the number of inherited “N” points is 0. At any current node, E, if the number of “N” points inherited from
the parent node of E is less than the number of “Y” points in E, then the number of “N” points for E is increased to the number of “Y” points in E.
Basic idea is to use an equal number of “N” points to the number of “Y” points.
12
Semi-Supervised Cluster Analysis
Semi-Supervised Cluster Analysis Decision tree Splitting
Information gain CLTree forms initial cuts and looks ahead to find better partitions
that cut less into cluster regions CLTree
Handles high dimensional space Sub space clusters are determined Empty regions can also be detected
13
Recommended