29
StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices Hugo Gualdron, Robson L. F. Cordeiro, Jose F Rodrigues-Jr University of Sao Paulo In collaboration with Carnegie Mellon University (Prof. Christos Faloutsos, and PhD Danai Koutra) Funding by research agency Fapesp (2013/03906-0, 2014/07879-0, 2015/18335) In: The Fifth IEEE ICDM Workshop on Data Mining in Networks, Atlantic City, NJ, USA - November, 2015 http://www.icmc.usp.br/pessoas/junio Jose F Rodrigues-Jr (University of Sao Paulo) 1 / 20

StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Embed Size (px)

Citation preview

Page 1: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

StructMatrix: large-scale visualization of graphs bymeans of structure detection and dense matrices

Hugo Gualdron, Robson L. F. Cordeiro, Jose F Rodrigues-Jr

University of Sao PauloIn collaboration with Carnegie Mellon University

(Prof. Christos Faloutsos, and PhD Danai Koutra)

Funding by research agency Fapesp (2013/03906-0, 2014/07879-0, 2015/18335)

In: The Fifth IEEE ICDM Workshop on Data Mining in Networks,Atlantic City, NJ, USA - November, 2015

http://www.icmc.usp.br/pessoas/junio

Jose F Rodrigues-Jr (University of Sao Paulo) 1 / 20

Page 2: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Introduction

Motivation

Big Data!!!

A lot of information, much of it in the form of relationships;

Large-scale graphs: graphs generated by applications in which usersor entities are distributed along large geographical areas - even theentire planet;

Social networks, recommendation networks, road nets, e-commerce,computer networks, client-product logs, and many others.

Data analysis is the differential for industrial competition.

General Electric & Accenture.

Jose F Rodrigues-Jr (University of Sao Paulo) 2 / 20

Page 3: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Introduction

Problem

Such graphs are too big:

node-link visualization cannot handle even thousand-vertices graphs;

adjacency matrices are limited by the number of pixels of the screen;

in any case, the cardinality of the nodes prevents rationalization;

non-visual analytical techniques might produce way too manypatterns preventing human cognition.

Still, we want to characterize the structure of graphs for:

understanding the overall structure, and not only thedistribution-based analyses;

spotting outliers and trends that are not dominant;

requesting details on demand concerning subregions of the graphtopology.

Jose F Rodrigues-Jr (University of Sao Paulo) 3 / 20

Page 4: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Introduction

Problem

Layouts node-link and adjacency matrix

Node-link Adjacency matrix

Scalability:Hundred nodes Thousand nodes

Jose F Rodrigues-Jr (University of Sao Paulo) 4 / 20

Page 5: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Introduction

Methodology overview

Assumptions:

graphs are made of recurrent simple structures (cliques, bi-partitecores, stars, and chains);

such structures are more meaningful than sole nodes;

even at lower resolutions, the graph main properties are maintained ina visualization.

Hypothesis: we reach more scalable and meaningful graph visualizationswith:

graph summarization by detecting recurrent structures of the graph;

dense adjacency matrices.

Jose F Rodrigues-Jr (University of Sao Paulo) 5 / 20

Page 6: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

Proposed method: StructMatrix

Our method has two parts:

1 An algorithm to detect substructures;

2 A dense adjacency matrix of the structures that were detected.

Jose F Rodrigues-Jr (University of Sao Paulo) 6 / 20

Page 7: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection

Jose F Rodrigues-Jr (University of Sao Paulo) 7 / 20

Page 8: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection

We designed a graph partitioning algorithm based on the fact thatreal-world graphs obey to power-law distributions;In such graphs: few nodes with very high degree and the majority ofnodes with low degree;Kang and Faloutsos [1] demonstrated that the ordered removal of thehigher degree nodes leads to the removal of hubs from the giant CC,creating satellite (much smaller) connected components;

This ordered removal lends to a structural scanning of the graph.

Jose F Rodrigues-Jr (University of Sao Paulo) 8 / 20

Page 9: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Structure vocabulary

StructMatrix Vocabulary ψ

Jose F Rodrigues-Jr (University of Sao Paulo) 9 / 20

Page 10: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Algorithm

1 If the queue has connected components, StructMatrix gets the firstelement for processing.

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 11: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Algorithm

2 StructMatrix selects the vertices with higher degree (up to 1% of thevertices) and removes their edges.

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 12: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Algorithm

2 We get a set of smaller connected subcomponents.

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 13: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Algorithm

3 We classify the subcomponents according to the vocabulary.

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 14: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Structure classification

α = n2

4 β = n(n−1)2 ε = 0.2

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 15: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Algorithm

4 We store the classified subcomponents; the ones that were notidentified go to the queue waiting for a new round of shattering.

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 16: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Algorithm

5 We proceed to the next element in the queue.

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 17: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Structure detection results

Graph # Structures fs st ch nc fc nb fbDBLP 160.885 76% 5% 2% 2% 15% <1% -WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%Wikipedia-vote 1.706 65% 33% 2% - - <1% -Epinions 8774 52% 31% 14% <1% <1% 2% <1%Roadnet PA 51.175 23% 45% 27% - - 5% -Roadnet CA 88.993 27% 39% 29% - - 4% -Roadnet TX 62.614 25% 43% 28% - - 4% -

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 18: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

1.Structure detection–Runtime

We compare to algorithm VoG (Koutra et al.[2]): better performance, andbigger vocabulary.

Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20

Page 19: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

2.Visualization–Projection

After structure detection, we build an adjacency matrixstructure-to-structure whose edges’ weights indicate the number ofedges between the nodes of each structure;

Although smaller than the original matrix, for million-scale graphs,the struct matrix is still too large to fit in the screen;

For this reason we create a dense matrix according to a straightproportion (x , y)→ (ρx , ρy ) for:

ρx =⌈

(Resx − 1) x−xminxmax−xmin

+ 12

⌉ρy =

⌈(Rexy − 1) y−ymin

ymax−ymin+ 1

2

⌉ (1)

where (x , y) are points of the original matrix and Resx ,Resy are thetarget resolutions; the more resolution, the more details are presented– these parameters allow for interactive grasping of details.

Jose F Rodrigues-Jr (University of Sao Paulo) 11 / 20

Page 20: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

2.Visualization–Projection

Jose F Rodrigues-Jr (University of Sao Paulo) 12 / 20

Page 21: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

2.Visualization–Layout

We organize the matrix according to structure type, and to number ofedges – size of structures (number of nodes) is given by color.

Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20

Page 22: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Methodology

2.Visualization–Layout

We organize the matrix according to structure type, and to number ofedges – size of structures (number of nodes) is given by color.

Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20

Page 23: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Experiments

Experiments–Real datasets

Graph # Structures fs st ch nc fc nb fbDBLP 160.885 76% 5% 2% 2% 15% <1% -WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%Wikipedia-vote 1.706 65% 33% 2% - - <1% -Epinions 8774 52% 31% 14% <1% <1% 2% <1%Roadnet PA 51.175 23% 45% 27% - - 5% -Roadnet CA 88.993 27% 39% 29% - - 4% -Roadnet TX 62.614 25% 43% 28% - - 4% -

Jose F Rodrigues-Jr (University of Sao Paulo) 14 / 20

Page 24: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Experiments

Experiments–Real datasets–WWW-barabasi

WWW-barabasi: webpages and links between them.

Stars (st and fs) refer to webpages with many out links.

Most of the webpages have less than one thousand connections;however, some present unusual thousand connections.

Jose F Rodrigues-Jr (University of Sao Paulo) 15 / 20

Page 25: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Experiments

Experiments–Real datasets–Road nets

Pennsylvania California Texas

The three road graphs have a similar structure – all U.S. roads;

There is a hierarchical connectivity: bigger to smaller cities;

Surprising grid-like (due to symmetry) structure: intersections refer tohub cities, and lines refer to inter-city paths.

Jose F Rodrigues-Jr (University of Sao Paulo) 16 / 20

Page 26: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Experiments

Experiments–Real datasets–Road nets

Comparison: Structure-to-structure vs Node-to-node.

California (structure-to-structure) California (node-to-node)

Main differences:

1 The partitioning according to structures;

2 The ordering by number of edges to other structures;

3 There is a hierarchical connectivity: bigger to smaller cities;

4 Surprising grid-like structure: intersections refer to hub cities, andlines refer to inter-city paths.

Jose F Rodrigues-Jr (University of Sao Paulo) 17 / 20

Page 27: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Experiments

Experiments–Real datasets–DBLP

Overall FC-FC zoom

DBLP is mainly characterized by false stars – possibly becauseadvisors have students, and students connect one to each other;By zooming FC-FC, one can see outliers, for instance k3 = “TheBiomolecular Interaction Network Database and related tools 2005update” 75 authors.

Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20

Page 28: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Conclusions

Contributions

Visualization technique: we introduce a processing and visualizationmethodology that puts together algorithmic techniques and design inorder to reach large-scale visualizations;

Analytical scalability: our technique extends the most scalabletechnique found in the literature; plus, it is engineered to plot millionsof edges in a matter of seconds;

Practical analysis: we show that large-scale graphs have well-definedbehaviors concerning the distribution of structures, their size, andhow they are related one to each other; finally, using a standardlaptop, our techniques allowed us to experiment in real, large-scalegraphs coming from domains of high impact, i.e., WWW, Wikipedia,Roadnet, and DBLP.

Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20

Page 29: StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

References

U. Kang and C. Faloutsos, “Beyond ’caveman communities’: Hubsand spokes for graph compression and mining,” in ICDM, 2011, pp.300–309.

D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, “Vog:Summarizing and understanding large graphs,” in SDM, 2014.

Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20