11
Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Embed Size (px)

Citation preview

Page 1: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Graph Algorithms for Irregular, Unstructured Data

John Feo

Center for Adaptive Supercomputing SoftwarePacific Northwest National Laboratory

July, 2010

Page 2: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Analytic methods and applications

Community thought leaders

Blog Analysis

Community Activities

FaceBook - 300 M users

Connect-the-dots

Bus

Hayashi

Zaire

Train

Anthrax

MoneyEndo

National Security

People, Places, & Actions

Semantic Web

Anomaly detection

Security

N-x contingency analysis

SmartGrid

Page 3: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Data analytics

Sample queries: Allegiance switching: identify entities that switch communities.Community structure: identify the genesis and dissipation of communitiesPhase change: identify significant change in the network structure

Traditional graph partitioning often fails:Topology: Interaction graph is low-diameter and has no good separatorsIrregularity: Communities are not uniform in sizeOverlap: individuals are members of one or more communities

1000x growth

in 3 years!

has more than 300 million active users

Page 4: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Graphs are not grids

Graphs arising in informatics are very different from the grids used in scientific computing

Static or slowly involving

Planar

Nearest neighbor communication

Work performed per cell or node

Work modifies local data

Scientific Grids

Dynamic

Non-planar

Communications are non-local and dynamic

Work performed by crawlers or autonomous agents

Work modifies data in many places

Graphs for Data Informatics

Page 5: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Small-world and scale-free

In low diameter graphswork explodes

difficult to partition

high percentage of nodes are visited

“Six degrees of separation”

Large hubs are in grey

In scale-free graphs difficult to partition

work concentrates in a few nodes

Page 6: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

PathsShortest path

Betweenness

Min/max flow

StructuresSpanning trees

Connected components

Graph isomorphism

GroupsMatching/Coloring

Partitioning

Equivalence

Graph methods

Influential FactorsDegree distribution

Normal

Scale-free

Planar or non-planar

Static or dynamic

Weighted or unweightedWeight distribution

Typed or untyped edges

Load imbalanceNon-planar

Concurrent insertsand deletions

Difficult to partition

Page 7: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Challenges

Problem sizeTon of bytes, not ton of flops

Little data locality

Have only parallelism to tolerate latencies

Low computation to communication ratioSingle word access

Threads limited by loads and stores

Frequent synchronizationNode, edge, record

Work tends to be dynamic and imbalancedLet any processor execute any thread

Page 8: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Grids, Uniform, and Scale-Free GraphsUSA Roadmap

Uniform

Scale-Free

METIS Partitioner

Page 9: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

System requirements

Global shared memoryNo simple data partitions

Local storage for thread private data

Network support for single word accessesTransfer multiple words when locality exists

Multi-threaded processorsHide latency with parallelism

Single cycle context switching

Multiple outstanding loads and stores per thread

Full-and-empty bitsEfficient synchronization

Wait in memory

Message driven operationsDynamic work queues

Hardware support for thread migration

Cray XMT

Page 10: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Center for Adaptive Supercomputer Software

Driving Development of

Next-Generation Massively Multithreading ArchitecturesSponsored by DODSponsored by DOD

Page 11: Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010

Summary

The new HPC is irregular and sparse

There are commercial and consumer applications

If the applications are important enough, machines will be built

HPC is too large and too diverse for “one size fits all”

We need to build the right machines for the problems we have to solve