SCAN: A Structural Clustering Algorithm for Networks

Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas Schweiger

KDD’07

An Introduction to DBSCAN

DBSCAN is a density-based algorithm.– Density = number of points within a specified radius (Eps)

– A point is a core point if it has more than a specified number of points (MinPts) within Eps

These are points that are at the interior of a cluster

– A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point

– A noise point is any point that is not a core point or a border point.

DBSCAN: Core, Border, and Noise Points

DBSCAN Algorithm

Eliminate noise points Perform clustering on the remaining points

DBSCAN: Core, Border and Noise Points

Original Points Point types: core, border and noise

Eps = 10, MinPts = 4

When DBSCAN Works Well

Original Points Clusters

• Resistant to Noise

• Can handle clusters of different shapes and sizes

DBSCAN: Determining EPS and MinPts

Idea is that for points in a cluster, their kth nearest neighbors are at roughly the same distance

Noise points have the kth nearest neighbor at farther distance

So, plot sorted distance of every point to its kth nearest neighbor

Network Clustering Problem

Networks made up of the mutual relationships of data elements usually have an underlying structure. Because relationships are complex, it is difficult to discover these structures. How can the structure be made clear?

Stated another way, given simply information of who associates with whom, could one identify clusters of individuals with common interests or special relationships (families, cliques, terrorist cells).

An Example of Networks

How many clusters? What size should they

be? What is the best

partitioning? Should some points

be differentiated?

A Social Network Model

Individuals in a tight social group, or clique, know many of the same people, regardless of the size of the group.

Individuals who are hubs know many people in different groups but belong to no single group. Politicians, for example bridge multiple groups.

Individuals who are outliers reside at the margins of society. Hermits, for example, know few people and belong to no group.

The Neighborhood of a Vertex

Define () as the immediate neighborhood of a vertex (i.e. the set of people that an individual knows ).

Structure Similarity

The desired features tend to be captured by a measure we call Structural Similarity

Structural similarity is large for members of a clique and small for hubs and outliers.

|)(||)(|

|)()(|),(

Structural Connectivity [1]

-Neighborhood: Core: Direct structure reachable:

Structure reachable: transitive closure of direct structure reachability

Structure connected:

}),(|)({)( wvvwvN

|)(|)(, vNvCORE

)()(),( ,, vNwvCOREwvDirRECH

),(),(:),( ,,, wuRECHvuRECHVuwvCONNECT

[1] M. Ester, H. P. Kriegel, J. Sander, & X. Xu (KDD'97)

Structure-Connected Clusters

Structure-connected cluster C– Connectivity:

– Maximality:

Hubs:– Not belong to any cluster– Bridge to many clusters

Outliers:– Not belong to any cluster– Connect to less clusters

),(:, , wvCONNECTCwv

CwwvREACHCvVwv ),(:, ,

outlier

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

0.730.73

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7

Algorithm

= 2 = 0.7 0.51

Algorithm

= 2 = 0.7

Running Time

Running time = O(|E|) For sparse networks = O(|V|)

[2] A. Clauset, M. E. J. Newman, & C. Moore, Phys. Rev. E 70, 066111 (2004).

Conclusion

We propose a novel network clustering algorithm:

– It is fast O(|E|), for scale free networks: O(|V|)

– It can find clusters, as well as hubs and outliers

SCAN: A Structural Clustering Algorithm for Networks

Documents

Email clustering algorithm

1 Clustering Algorithms Applications Hierarchical Clustering k -Means Algorithms CURE Algorithm

SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas Schweiger KDD’07

UE 141 Clustering - University at Buffalojing/ue141/sp13/doc/Clustering1.pdfAgglomerative Clustering Algorithm • More popular hierarchical clustering technique • Basic algorithm

CARSO: Clustering Algorithm for Road Surveillance and Overtakinghuszak/publ/CARSO - Clustering Algorithm... · 2019-07-01 · CARSO: Clustering Algorithm for Road Surveillance and

DATA MINING LECTURE 8 Clustering The k-means algorithm Hierarchical Clustering The DBSCAN algorithm

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Implementation of Clustering Algorithm using Graph

Genetic Algorithm Clustering

HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity

Trajectory clustering - Traclus Algorithm

K means Clustering Algorithm

K-means-Clustering Based Evolutionary Algorithm for Multi … · algorithm, genetic algorithm(GA), to solve MORAP. Using the K-means-clustering algorithm to divide the population

- itechprosolutions.initechprosolutions.in/.../uploads/2015/09/...Clustering-Algorithm-docx.… · Web viewFeature Selection based Fast Clustering Algorithm with Minimum Spanning

Introduction to Clustering algorithm

An efficient k-means clustering algorithm: analysis and ...mount/Projects/KMeans/pami02.pdfk-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy

Metaheuristic Clustering Algorithm · 2018-03-21 ·

1 Clustering Algorithms Hierarchical Clustering k -Means Algorithms CURE Algorithm

1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm

HCS Clustering Algorithm