28
Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems (Credit: X. Que, F. Checconi, F. Petrini, J. A. Gunnels IBM Research) Alexander Pozdneev Technical Consultant — High Performance Computing, IBM March 5, 2015 — GraphHPC-2015

Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Embed Size (px)

Citation preview

Page 1: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Graph Community Detection Algorithm for DistributedMemory Parallel Computing Systems

(Credit: X.Que, F. Checconi, F. Petrini, J. A. Gunnels — IBM Research)

Alexander PozdneevTechnical Consultant — High Performance Computing, IBM

March 5, 2015 — GraphHPC-2015

Page 2: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Outline

1 Introduction

2 Scalable graph community detection with the Louvain algorithm

3 Conclusion

4 References

2 c© 2015 IBM Corporation

Page 3: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Outline

1 IntroductionBig Data challengesIBM leadership in graph processing

2 Scalable graph community detection with the Louvain algorithm

3 Conclusion

4 References

3 c© 2015 IBM Corporation

Page 4: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Big Data challenges

4 c© 2015 IBM Corporation

Page 5: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

IBM leadership in graph processing: Graph500

Date # System Model Nodes Cores Scale GTEPSNov’14 1 Sequoia Q 96k 1.5M 41 23751Jun’14 2 Sequoia Q 64k 1M 40 16599Nov’13 1 Sequoia Q 64k 1M 40 15363Jun’13 1 Sequoia Q 64k 1M 40 15363Nov’12 1 Sequoia Q 64k 1M 40 15363Jun’12 1 Sequoia/Mira Q 32k 512k 38 3541Nov’11 1 BG/Q prototype Q 4k 64k 32 253Jun’11 1 Interpid/Jugene P 32k 128k 38 18Nov’10 1 Interpid P 8k 32k 36 7

5 c© 2015 IBM Corporation

Page 6: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Outline

1 Introduction

2 Scalable graph community detection with the Louvain algorithmThe graph community detection problem

Problem definitionThe modularity metric

Sequential Louvain algorithmModularity gain

Hash-based data organizationNovel convergence heuristicParallel Louvain algorithm

Community state propagationCommunity refinementGraph reconstruction

Scalability analysis

3 Conclusion

4 References6 c© 2015 IBM Corporation

Page 7: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Reference

X. Que, F. Checconi, F. Petrini, J. Gunnels.“Scalable Community Detection with the Louvain Algorithm”.29th IEEE International Parallel & Distributed Processing Symposium,Hyderabad International Convention Centre, Hyderabad, INDIA,May 25-29, 2015.http://dx.doi.org/10.1109/IPDPS.2015.59

7 c© 2015 IBM Corporation

Page 8: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

The graph community detection problem

• Important problem that spans many research areas:I health careI social networksI systems biologyI power grid optimization, etc.

• Graph community detection algorithms attempt to identifyI modulesI their hierarchical organization

• Challenges that limit the overall scalability and performanceI fine-grained communicationI irregular access pattern to memory and interconnect

• Open research problemI strong scalabilityI high quality of community detection

8 c© 2015 IBM Corporation

Page 9: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Graph community detection: problem definition

• A weighted directed graph G = (V,E):I V — set of verticesI E — set of edgesI when u, v ∈ V , an edge e(u, v) ∈ E has weight wu,v

• The goal of community detection is to partition graph G into a set Cof disjoint communities ci:

∪ ci = V, ∀ci ∈ C

ci ∩ cj = ∅, ∀ci, cj ∈ C

• Vertices in the same community are densely connected• Vertices in different communities are sparsely connected• The modularity [Newman, 2004] quantifies a community structure• Empirically, the higher a modularity value, the better a partition quality

9 c© 2015 IBM Corporation

Page 10: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Graph community detection: the modularity metricModularity Q,

Q =∑c∈C

[Σc

in2m−(Σc

tot2m

)2]

Σcin — the sum of the weights from all internal edges of community c,

Σcin =

∑u,v∈c

e(u,v)∈E

wu,v

Σctot — the sum of the weights from edges incident to any vertex in c,

Σctot =

∑u∈c or v∈ce(u,v)∈E

wu,v

m — normalization factor, the sum of the weights across the graph,

m =∑

e(u,v)∈E

wu,v

10 c© 2015 IBM Corporation

Page 11: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Sequential Louvain algorithm

Louvain algorithm [Blondel, 2008] is a popular greedy algorithm forcommunity detection.

1. Put all vertices into distinct communities (one per vertex)2. Refine communities

I For each vertex i

• Compute ΔQi→c(j) for each neighbor j• Join the community c(j) that yields the largest gain in ΔQ

I Repeat until no movement yields a gain3. Reconstruct the graph

I The partitions become superverticesI The weights of edges between communities are summed

4. Repeat steps 2 and 3 until convergence

11 c© 2015 IBM Corporation

Page 12: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Modularity gain

Modularity gain when moving vertex u into community c:

ΔQu→c =wu→c

2m− w(u)Σc

tot2m2

Σctot — the sum of the weights from edges incident to any vertex in c,

Σctot =

∑u∈c or v∈ce(u,v)∈E

wu,v

w(u) — the sum of the weights of the edges incident to vertex uwu→c — the sum of the weights of the edges from vertex u to vertices incommunity c

wu→c =∑v∈c

wu,v

12 c© 2015 IBM Corporation

Page 13: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Hash-based data organization

13 c© 2015 IBM Corporation

Page 14: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Novel convergence heuristicThe fraction of vertices updated during each iteration of the inner loop:

ε = p1 · e−p2·iter (7)

Regression analysis and the dynamical threshold for the LFR benchmark:

14 c© 2015 IBM Corporation

Page 15: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Parallel Louvain algorithm

15 c© 2015 IBM Corporation

Page 16: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Community state propagationUpdate of Out_Table

16 c© 2015 IBM Corporation

Page 17: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Community refinement

17 c© 2015 IBM Corporation

Page 18: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Graph reconstructionReconstruction of In_Table

18 c© 2015 IBM Corporation

Page 19: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Scalability analysis: weak scalingP7-IH, BTER: 222 vertices per node, average degree of 32

19 c© 2015 IBM Corporation

Page 20: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Scalability analysis: strong scaling

20 c© 2015 IBM Corporation

Page 21: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Outline

1 Introduction

2 Scalable graph community detection with the Louvain algorithm

3 Conclusion

4 References

21 c© 2015 IBM Corporation

Page 22: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Conclusion

• Highly scalable parallel Louvain algorithm for distributed memorysystems

• Preserves/slightly improvesI the convergence propertiesI the overall modularityI the quality of the detected communities

• A novel implementation strategy to store and process dynamic graphs• Validation on a wide variety of real-world social graphs• Scalability:

I BTER, 4B vertices/138B edges — 1k P7-IH nodes (32k threads)I R-MAT, 8B vertices/138B edges — 8k BG/Q nodes (512k threads)

22 c© 2015 IBM Corporation

Page 23: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Outline

1 Introduction

2 Scalable graph community detection with the Louvain algorithm

3 Conclusion

4 References

23 c© 2015 IBM Corporation

Page 24: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

References

M.E. J. Newman, M. Girvan.Finding and Evaluating Community Structure in Networks.Phys. Rev. E, 69(2):026113, Feb. 2004.

V. Blondel et al.Fast unfolding of communities in large networks.J. Stat. Mech., P10008, 2008.

X. Que et al.Scalable Community Detection with the Louvain Algorithm.Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEEInternational, Hyderabad, India, 25-29 May 2015, pp. 28-37.

24 c© 2015 IBM Corporation

Page 25: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Outline

5 Modularity maximization techniques

25 c© 2015 IBM Corporation

Page 26: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Modularity maximization techniques

• Greedy optimization — applies different approaches to merge verticesinto communities for higher modularity

• Simulated annealing — adopts a probabilistic procedure for globaloptimization on modularity

• Extremal optimization — is a heuristic search procedure• Spectral optimization — uses eigenvalues and eigenvectors of a specialmatrix for modularity optimization

26 c© 2015 IBM Corporation

Page 27: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Disclaimer

All the information, representations, statements, opinions and proposals in thisdocument are correct and accurate to the best of our present knowledge but arenot intended (and should not be taken) to be contractually binding unless anduntil they become the subject of separate, specific agreement between us.Any IBM Machines provided are subject to the Statements of Limited Warrantyaccompanying the applicable Machine.Any IBM Program Products provided are subject to their applicable license terms.Nothing herein, in whole or in part, shall be deemed to constitute a warranty.IBM products are subject to withdrawal from marketing and or service uponnotice, and changes to product configurations, or follow-on products, may resultin price changes.Any references in this document to “partner” or “partnership” do not constitute orimply a partnership in the sense of the Partnership Act 1890.IBM is not responsible for printing errors in this proposal that result in pricing orinformation inaccuracies.

27 c© 2015 IBM Corporation

Page 28: Graph Community Detection Algorithm for Distributed Memory Parallel Computing Systems

Правовая информация

IBM, логотип IBM, BladeCenter, System Storage и System x являются товарными знаками International BusinessMachines Corporation в США и/или других странах. Полный список товарных знаков компании IBM смотритена узле Web: www.ibm.com/legal/copytrade.shtml.Названия других компаний, продуктов и услуг могут являться товарными знаками или знаками обслуживаниядругих компаний.(c) 2015 International Business Machines Corporation. Все права защищены.Упоминание в этой публикации продуктов или услуг корпорации IBM не означает, что IBM предполагаетпредоставлять их во всех странах, в которых осуществляет свою деятельность, информация опредоставлении продуктов или услуг может быть изменена без уведомления. За самой свежей информациейо продуктах и услугах компании IBM, предоставляемых в Вашем регионе, следует обращаться в ближайшееторговое представительство IBM или к авторизованным бизнес-партнерам.Все заявления относительно намерений и перспективных планов IBM могут быть изменены без уведомления.Информация о продуктах третьих фирм получена от производителей этих продуктов или из опубликованныханонсов указанных продуктов. IBM не тестировала эти продукты и не может подтвердитьпроизводительность, совместимость, или любые другие заявления относительно продуктов третьих фирм.Вопросы о возможностях продуктов третьих фирм следует адресовать поставщику этих продуктов.Информация может содержать технические неточности или типографические ошибки. В представленную впубликации информацию могут вноситься изменения, эти изменения будут включаться в новые редакцииданной публикации. IBM может вносить изменения в рассматриваемые в данной публикации продукты илиуслуги в любое время без уведомления.Любые ссылки на узлы Web третьих фирм приведены только для удобства и никоим образом не служатподдержкой этим узлам Web. Материалы на указанных узлах Web не являются частью материалов дляданного продукта IBM.

28 c© 2015 IBM Corporation