32
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2014-17387 PE Sparse Matrix Par--oning for Parallel Eigenanalysis of Large Sta-c and Dynamic Graphs Michael Wolf, Sandia Na-onal Laboratories Ben Miller, MIT Lincoln Laboratory IEEE HPEC 2014 September 10, 2014 The Lincoln Laboratory portion of this work is sponsored by the Intelligence Advanced Research Projects Activity (IARPA) under Air Force Contract FA8721-05-C-0002. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA or the U.S. Government.

Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Photos placed in horizontal position with even amount

of white space between photos

and header

Photos placed in horizontal position

with even amount of white space

between photos and header

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2014-17387 PE

Sparse  Matrix  Par--oning  for  Parallel  Eigenanalysis  of  Large  Sta-c  and  Dynamic  Graphs    

Michael  Wolf,  Sandia  Na-onal  Laboratories  Ben  Miller,  MIT  Lincoln  Laboratory      IEEE  HPEC  2014  September  10,  2014    

The Lincoln Laboratory portion of this work is sponsored by the Intelligence Advanced Research Projects Activity (IARPA) under Air Force Contract FA8721-05-C-0002. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA or the U.S. Government.

=

Page 2: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

1

8

64

512

4,096

32,768

262,144

2,097,152

1 16 256 4,096 65,536 1,048,576

Big  Data  and  High  Performance  Compu-ng  M

emor

y re

quire

d fo

r sto

ring

Spar

se M

atrix

(GB

)

Graph Vertices (millions)

ISR Cyber

Web

Use high performance computing to address compute challenges posed by problem scales of interest to DoD/IC

Petabyte

Terabyte

NSA-RD-2013-056001v1

Web scale. . .

50 billion vertices, 1 trillion edges

271 EB adjacency matrix

29.5 TB adjacency list

29.1 TB edge list

Internet graph from the Opte Project(http://www.opte.org/maps)

Web graph from the SNAP database(http://snap.stanford.edu/data)

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

NSA-RD-2013-056001v1

Social scale. . .

1 billion vertices, 100 billion edges

111 PB adjacency matrix

2.92 TB adjacency list

2.92 TB edge list

Twitter graph from Gephi dataset(http://www.gephi.org)

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

Social

NSA-RD-2013-056001v1

Brain scale. . .

100 billion vertices, 100 trillion edges

2.08 mNA · bytes2 (molar bytes) adjacency matrix

2.84 PB adjacency list

2.84 PB edge list

Human connectome.Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011

2NA = 6.022⇥ 1023mol�1

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

Brain

P. Burkhardt and C. Waring, “An NSA Big Graph experiment,” National Security Agency, Tech. Rep. NSA-RD-2013-056002v1, 2013.

NSA Big Graph

Experiment

2  

Page 3: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Mo-va-ng  Graph  Analy-cs  Applica-ons    Cyber

•  Graphs represent communication patterns of computers on a network

•  1,000,000s – 1,000,000,000s network events

•  GOAL: Detect cyber attacks or malicious software

Social

•  Graphs represent relationships between individuals or documents

•  10,000s – 10,000,000s individual and interactions

•  GOAL: Identify hidden social networks

•  Graphs represent entities and relationships detected through multiple sources

•  1,000s – 1,000,000s tracks and locations

•  GOAL: Identify anomalous patterns of life

ISR

Detection of anomalies in massive datasets (very large graphs) 3  

Page 4: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Sta-s-cal  Detec-on  Framework  for  Graphs  

Develop fundamental graph signal processing concepts

Demonstrate in simulation

Apply to real data

THRESHOLD

NOISE SIGNAL ‘+’

NOISE H0 H1

Graph Theory Detection Theory

4  

Signal Processing for Graphs (SPG)

Page 5: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Computa-onal  Focus:  Dimensionality  Reduc-on  

GRAPH MODEL CONSTRUCTION

RESIDUAL DECOMPOSITION

COMPONENT SELECTION

ANOMALY DETECTION IDENTIFICATION TEMPORAL

INTEGRATION

DIMENSIONALITY REDUCTION

•  Dimensionality reduction dominates SPG computation •  Eigen decomposition is key computational kernel •  Parallel implementation required for very large graph problems -  Fit into memory, minimize runtime

Need fast parallel eigensolvers 5  

B = (A−E[A])

Bxi = λi xi, i =1,…,mSolve:

Example: Modularity Matrix

E[As ]= k kT / (2 e )

|e| – Number of edges in graph G(A) k – degree vector ki = degree(vi),

vi ∈G(A)

Eigensystem

Page 6: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

§  Anomaly  Detec-on  in  Very  Large  Graphs  §  Eigenanalysis  and  Performance  Challenges  §  Improving  Sparse  Matrix-­‐Vector  Mul-plica-on  (SpMV)  

Performance  through  Data  Par--oning  §  Par--oning:  Dynamic  Graphs  and  Sampling  §  Summary  

Outline  

6  

Page 7: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Dimensionality  Reduc-on:  Parallel  Implementa-on  

§  Using  Anasazi  (Trilinos)  Eigensolver    §  Block  Krylov-­‐Schur  §  Eigenpairs  corresponding  to  eigenvalues  with  largest  real  component  §  User  defined  operators  (don’t  form  matrix  explicitly)  

§  Ini-al  Numerical  Experiments  §  R-­‐Mat  (a=0.5,  b=0.125,  c=0.125,  d=0.25)    

§  Average  nonzeros  per  row:  8  §  Number  of  rows:  222  to  232  

§  Two  systems  §  Hopper*  (NERSC)  -­‐-­‐  Cray  XE6  supercomputer  §  LLGrid  (MIT  LL)  –  compute  cluster  (10  GB  ethernet)  

§  Ini-ally:  1D  random  row  distribu-on  (good  load  balance)  7  

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Page 8: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Weak  Scaling  Eigensolver  

Solved system for up to 4 billion vertex graph

1"

10"

100"

1000"

16" 32" 64" 128" 256" 512" 1024" 2048" 4096" 8196" 16384"

Time%(s)%

Number%of%Cores%

Run2me%to%Find%1st%Eigenvalue%(R<MAT,%218%ver2ces%per%core)%

Hopper"1D" 4 billion vertices

Runtime to Find 1st Eigenvector

R-MAT, 218 vertices/core Modularity Matrix

1D random partitioning

8  

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

NERSC Hopper*

Page 9: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Strong  Scaling:    Eigensolver  

Scalability limited and runtime increases for large numbers of cores

1.00$

10.00$

100.00$

1000.00$

1$ 4$ 16$ 64$ 256$ 1024$ 4096$ 16384$

Time%(s)%

Number%of%Cores%

LLGrid$1D$

Hopper$1D$

R-MAT, 223 vertices Modularity Matrix

Runtime to Find 1st Eigenvector

1D random partitioning

9  

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Page 10: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Sparse  Matrix-­‐Vector  Mul-plica-on  

§  Sparse  matrix-­‐dense  vector  mul-plica-on  (SpMV)  key  computa-onal  kernel  in  eigensolver  

§  Performance  of  SpMV  challenging  for  matrices  resul-ng  from  power-­‐law  graphs  §  Load  imbalance  §  Irregular  communica-on  §  Lifle  data  locality    

§  Important  to  improve  performance  of  SpMV  

=

10  

Page 11: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Strong  Scaling:    SpMV  

Scalability limited and runtime increases for large numbers of cores 11  

1.E$02'

1.E$01'

1.E+00'

1.E+01'

1' 4' 16' 64' 256' 1024' 4096'

Time%(s)%

Number%of%cores%

Time%for%1%SpMV%Opera7on%

1D'random'

R-Mat, 223 vertices

1D random partitioning

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

NERSC Hopper*

Page 12: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

§  Anomaly  Detec-on  in  Very  Large  Graphs  §  Eigenanalysis  and  Performance  Challenges  §  Improving  Sparse  Matrix-­‐Vector  Mul-plica-on  (SpMV)  

Performance  through  Data  Par--oning  §  Par--oning:  Dynamic  Graphs  and  Sampling  §  Summary  

Outline  

12  

Page 13: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Data  Par--oning  to  Improve  SpMV  

§  Par--on  matrix  nonzeros  §  Par--on  vector  elements  

13  

12431421

15000400

61800000

09120000

00710060

05001300

00008190

00000312

00070041

y1y2y3y4y5y6y7y8

Page 14: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Par--oning  Objec-ve  

§  Ideally  we  minimize  total  execu-on  -me  of  SpMV  §  Sefle  for  easier  objec-ves  

§  Balance  computa-onal  work  §  Minimize  communica-on  metric  

§  Total  communica-on  volume  §  Number  of  messages  

§  Can  Par--on  matrices  in  different  ways  §  1D    §  2D  

§  Can  model  problem  in  different  ways  §  Graph  §  Bipar-te  graph  §  Hypergraph  

14  

Page 15: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

1D  Par--oning  

§  Each  process  assigned  nonzeros  for  set  of  columns  

15  

§  Each  process  assigned  nonzeros  for  set  of  rows  

1D Column 1D Row

Page 16: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Communica-on  Pafern:  1D  Block  Par--oning  

NNZ/process min: 1.17E+06 max: 1.18E+06 avg: 1.18E+06 max/avg: 1.00 # Messages (Phase 1) total: 126 max: 2 Volume (Phase 1) total: 2.58E+05 max: 4.10E+03

sour

ce p

roce

ss

destination process P=64

Nice properties: Great load balance Small number of messages Low communication volume

2D Finite Difference Matrix (9 point) Number of Rows: 223

Nonzeros/Row: 9

16  

Page 17: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Communica-on  Pafern:  1D  Random  Par--oning  

NNZ/process min: 1.05E+06 max: 1.07E+06 avg: 1.06E+06 max/avg: 1.01 # Messages (Phase 1) total: 4032 max: 63 Volume (Phase 1) total: 5.48E+07 max: 8.62E+05

sour

ce p

roce

ss

destination process P=64 Challenges: All-to-all communication

R-Mat (0.5, 0.125, 0.125, 0.25) Number of Rows: 223

Nonzeros/Row: 8

Nice properties: Great load balance

17  

Page 18: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

2D  Par--oning  

§  2D  Par--oning  §  More  flexibility:  no  par-cular  part  for  en-re  row/column,  more  general  sets  of  nonzeros  

§  Use  flexibility  of  2D  par--oning  to  bound  number  of  messages  §  2D  Random  Cartesian*  

§  Block  Cartesian  with  rows/columns  randomly  distributed  §  Cyclic  striping  to  minimize  number  of  messages  

§  2D  Cartesian  Hypergraph**    §  Use  hypergraph  par--oning  to  minimize  communica-on  volume  §  Con:  more  costly  to  par--on  than  random  

*Hendrickson, et al.; Bisseling; Yoo, et al. **Boman, Devine, Rajamanickam, “Scalable Matrix Computations on Large Scale-Free Graphs Using 2D Partitioning, SC2013.

2D Cartesian Hypergraph** 2D Random Cartesian*

= =

(permuted) (permuted)

18  

Page 19: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Communica-on  Pafern:  2D  Random  Par--oning  Cartesian  Blocks  (2DR)  

NNZ/process min: 1.04E+06 max: 1.05E+06 avg: 1.05E+06 max/avg: 1.01 # Messages (Phase 1) total: 448 max: 7 Volume (Phase 1) total: 2.57E+07 max: 4.03E+05

sour

ce p

roce

ss

destination process P=64

Number of Rows: 223

Nonzeros/Row: 8

Nice properties: No all-to-all communication Total volume lower than 1DR

1DR = 1D Random

R-Mat (0.5, 0.125, 0.125, 0.25)

19  

Page 20: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Communica-on  Pafern:  2D  Random  Par--oning  Cartesian  Blocks  (2DR)  

NNZ/process min: 1.04E+06 max: 1.05E+06 avg: 1.05E+06 max/avg: 1.01 # Messages (Phase 2) total: 448 max: 7 Volume (Phase 2) total: 2.57E+07 max: 4.03E+05

sour

ce p

roce

ss

destination process P=64

Number of Rows: 223

Nonzeros/Row: 8

Nice properties: No all-to-all communication Total volume lower than 1DR

1DR = 1D Random

R-Mat (0.5, 0.125, 0.125, 0.25)

20  

Page 21: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Communica-on  Pafern:  2D  Cartesian    Hypergraph  Par--oning    

NNZ/process min: 5.88E+05 max: 1.29E+06 avg: 1.05E+06 max/avg: 1.23 # Messages (Phase 1) total: 448 max: 7 Volume (Phase 1) total: 2.33E+07 max: 4.52E+05

sour

ce p

roce

ss

destination process P=64

R-Mat (0.5, 0.125, 0.125, 0.25) Number of Rows: 223

Nonzeros/Row: 8

Challenges: Imbalance worse than 2DR

Nice properties: No all-to-all communication Total volume lower than 2DR

2DR = 2D Random Cartesian

21  

Page 22: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Improved  Strong  Scaling:    SpMV  

2D methods show improved scalability 22  

0.00#

0.01#

0.10#

1.00#

10.00#

1# 4# 16# 64# 256# 1024# 4096#

Time%(s)%

Number%of%cores%

Time%for%1%SpMV%Opera7on%

1D#random#

2D#Cartesian#random#

2D#Cartesian#hypergraph#

R-Mat, 223 vertices/rows NERSC Hopper*

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Page 23: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Improved  Strong  Scaling:    Eigensolver  

2D methods show improved scalability

Runtime to Find 1st Eigenvector

0.10$

1.00$

10.00$

100.00$

1000.00$

1$ 4$ 16$ 64$ 256$ 1024$ 4096$ 16384$

Time%(s)%

Number%of%Cores%

1D$Random$

2D$Random$

2D$Hypergraph$

R-Mat, 223 vertices Modularity Matrix

23  

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

NERSC Hopper*

Page 24: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

§  Anomaly  Detec-on  in  Very  Large  Graphs  §  Eigenanalysis  and  Performance  Challenges  §  Improving  Sparse  Matrix-­‐Vector  Mul-plica-on  (SpMV)  

Performance  through  Data  Par--oning  §  Par--oning:  Dynamic  Graphs  and  Sampling  §  Summary  

Outline  

24  

Page 25: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

§  High  par--oning  cost  of  hypergraph  methods  must  be  amor-zed  by  compu-ng  many  SpMV  opera-ons  

§  Detec-on*  requires  at  most  1000s  of  SpMV  opera-ons  §  Expensive  par--ons  need  to  be  effec-ve  for  mul-ple  graphs  

1.00E%01&

1.00E+00&

1.00E+01&

1.00E+02&

1.00E+03&

1.00E+04&

1.00E+05&

1& 10& 100& 1000& 10000& 100000& 1000000&

Time%(

s)%

Number%of%SpMV%Opera5ons%

Time%to%Par55on%and%Compute%SpMV%opera5ons%

2D&random&

2D&hypergraph&

~40,000 SpMVs

R-Mat, 223 vertices 1024 cores

NERSC Hopper

*L1 norm method: computing 100 eigenvectors 25  

Challenge  with  Hypergraph  Par--oning  

Page 26: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

§  Key  ques-on:  How  long  will  a  par--on  be  effec-ve?  §  Ini-al  experiment  

§  Evolving  R-­‐Mat  matrices:  fixed  number  of  rows,  R-­‐Mat  parameters  (a,b,c,d)  

§  Start  with  a  given  number  of  nonzeros  (|e0|)  §  Itera-vely  add  nonzeros  un-l  target  number  of  nonzeros  is  reached        

(|en|)  

Experiment:  Par--oning  for  Dynamic  Graphs  

Evolving Graph

Initial Graph, G0 e0 edges

Partition P0

G1 e1 edges

Partition P0

Final graph, Gn en edges

Partition P0

26  

Page 27: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Results:  Par--oning  for  Dynamic  Graphs  

27  

0"

0.02"

0.04"

0.06"

0.08"

0.1"

0.12"

1.00" 2.75" 4.50" 6.25" 8.00" 9.75" 11.50" 13.25" 15.00" 16.75"

Average'SpMV'Time'(s)'

NNZ'/'NNZ(0)'

SpMV'Time'

2DH"

2DR"

NERSC Hopper*

2DR = 2D Random Cartesian 2DH = 2D Cartesian Hypergraph

|ei| / |e0|

Hypergraph partition surprising effective after more than 16x |e0| edges added

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Page 28: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Sampling  and  Par--oning  for  Web/SN  Graphs  

§  Sampling  +  Par--oning:  1.  Produce  smaller  graph  G’  by  sampling  edges  in  graph  G  (uniform  

random  sampling),  keep  ver-ces  same  2.  Par--on  G’  (2D  Cartesian  Hypergraph)  3.  Apply  par--on  to  G  

28  

Graph Sampling and Partitioning

Input Graph, G=(V, E)

G’=(V,E’) G1=(V1,E1) G2=(V2,E2)

Sample E Partition G’

G1’=(V1,E1’) G2’=(V2,E2’)

Apply partition to G

Idea: Partition sampled graph to reduce partitioning time

Page 29: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Par--oning  +  Sampling:    Par--oning  Time  

**The University of Florida Sparse Matrix Collection

29  

Edge sampling greatly reduces partitioning time (by up to 8x)

0"

100"

200"

300"

400"

500"

600"

700"

1" 0.9" 0.8" 0.7" 0.6" 0.5" 0.4" 0.3" 0.2" 0.1"

Time%(s)%

Sampling%Rate%

hollywood62009%Graph:%%Par>>oning%Time%

P=16"

P=64"

P=256"

P=1024"

NERSC Hopper*

2D Cartesian Hypergraph

hollywood-2009**: Actor network 1.1 M vertices, 110 M edges

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Page 30: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Par--oning  +  Sampling:  SpMV  Time  

**The University of Florida Sparse Matrix Collection

30  

Resulting SpMV time does not increase for modest sampling

0"

0.01"

0.02"

0.03"

0.04"

0.05"

0.06"

0.07"

0.08"

0.09"

0.1"

1" 0.9" 0.8" 0.7" 0.6" 0.5" 0.4" 0.3" 0.2" 0.1"

Time%(s)%

Sampling%Rate%

hollywood62009%Graph:%%SpMV%Time%

P=16"

P=64"

P=256"

P=1024"

NERSC Hopper*

2D Cartesian Hypergraph

hollywood-2009**: Actor network 1.1 M vertices, 110 M edges

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Page 31: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

31  

* This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Challenge  with  Hypergraph  Par--oning  Revisited  

0.1$

1$

10$

100$

1000$

1$ 10$ 100$ 1000$ 10000$ 100000$

Time%(s)%

Number%of%SpMV%Opera5ons%

Time%to%Par55on%and%Compute%SpMV%Opera5ons%

2DR$

2DH$

2DH$w/$Sampling$NERSC Hopper*

~100,000 SpMVs

~10,000 SpMVs

hollywood-2009 1024 cores

Sampling reduces overhead of hypergraph partitioning (fewer SpMVs needed to amortize partitioning cost)

2DR = 2D Random Cartesian 2DH = 2D Cartesian Hypergraph

Page 32: Sparse’Matrix’Par--oning’for’Parallel’ of white space …mmwolf/presentations/Conferences/... · 2014-11-13 · Stas-cal’Detec-on’Framework’for’Graphs’ Develop

Summary  

§  Outlined  HPC  approach  to  detec-ng  anomalies  in  big  data  §  Key  component  is  eigensolver  §  Solving  resul-ng  eigensystems  challenging  

§  Load  imbalance  §  Poor  data  locality  

§  SpMV  key  computa-onal  kernel  §  1D  data  par--oning  limits  performance  due  to  all-­‐to-­‐all  communica-on  §  2D  data  par--oning  can  be  used  to  improve  scalability  

§  2D  hypergraph  par--oning  promising  but  expensive  §  Sampling  can  improve  2D  hypergraph  par--oning  performance  

for  web/SN  graphs  

32