Twitter Community Extraction by Markov Clustering

  • View

  • Download

  • Category



Citation preview

Twitter Community Extraction
(Beware of geeks bearing .gifs?)


Our Data Ourselves Project:

Giles Greenway, Tobias Blanke, Jennifer Pybus & Mark Cot.Departments of Digital Humanities & Culture, Media & Creative Industries, King's College London

Our aims are to increase our understanding of the nature and role of the data that young people produce when they use platforms and applications on their smartphones.

Social network community extraction is part of this.

Markov Clustering -MCL

Assumptions:There are clusters of Twitter users with densely connected networks of friend/follower relationships.

If you take a random walk around the network, you are likely to stay within the cluster you started in.

MCL -A Trivial Example

1: Build an adjacency matrix for the graph.

2: Normalize the columns to produce transition probabilities.

MCL -A Trivial Example

3: Square the matrix to get probabilities after two steps.

MCL -A Trivial Example

4: Element wise square the matrix and re-normalize.

5: Rinse and repeat until convergence.

The matrix entries will be 0 or 1. Interpret rows as: If I'm in this row node, which column nodes are credible start-points?

Does it work?

Gephi's OpenOrd layout is meant to emphasise clusters. Are nodes in the same cluster close together?

Compare with Gephi's own modularity algorithm, the Louvain method.

MCL was applied to two Twitter accounts of digital culture researchers with ~7000 once-removed friend-follower relationships.

Does it work?



Does it work?

Why did Gephi put these two in the same modularity class?

Researchers rated clusters for both methods.


Cluster is identifiable and relevant.20%0% !

Cluster is not identifiable, but possibly relevant.37%Cluster is neither identifiable or relevant.43%


Acquire Twitter data with Twython/Celery/Redis/RabbbitMQ.

Store Twitter data with: Neo4J/Py2Neo.

Perform MCL with NumPy.

Export to Gephi with NetworkX.


The Louvain method works by combining smaller clusters to maximize modularity. Does the very high degree of Twitter networks harm its performance?

MCL produces highly relevant clusterings, albeit rather slowly.
