23
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri

SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

Embed Size (px)

Citation preview

Page 1: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

SWARM: A Parallel Programming Framework for Multicore ProcessorsDavid A. Bader, Varun N. Kanade and Kamesh Madduri

Page 2: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

2

Our Contributions

• SWARM: SoftWare and Algorithms for Running on Multicore, a portable open-source parallel framework for multicoresystems– http://multicore-swarm.sourceforge.net/

• A library of fundamental parallel primitives: prefix-sums, list ranking, sorting and selection, symmetry breaking etc.

• Computational model for analyzing algorithms on multicore systems

Page 3: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

3

Talk Outline

• Introduction• Model for algorithm design on multicore

systems– Case studies– Performance

• SWARM– Programming framework– Algorithm design– Performance

Page 4: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

4

Multicore Computing

High-performance multicore programming requirements– Exploit concurrency at the algorithmic level, and

design efficient parallel algorithms– Judiciously utilize memory bandwidth – minimize inter-processor communication,

synchronization

Page 5: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

5

Multicore Computing

High-performance multicore programming requirements– Exploit concurrency at the algorithmic level, and

design efficient parallel algorithms– Judiciously utilize memory bandwidth – minimize inter-processor communication,

synchronization

Model for multicore algorithm designSWARM: A Parallel Programming Framework

Page 6: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

6

Multicore Algorithm Design

• Architectural model: p homogeneous processing cores, dedicated L1 cache, shared L2 cache

• Memory Bandwidth of L1 cache > L2 cache > main memory

Page 7: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

7

Complexity Model

Algorithm complexity is given by• Computational Complexity:

– RAM model of computation, complexity as a function of input size

• Memory accesses: – B: no. of blocks transferred from main memory to shared

L2 cache, σ: bandwidth parameter– Aggarwal-Vitter I/O model of computation

• Synchronization:– S(n): complexity of synchronization operations (barriers,

locks), L: synchronization overhead parameter

pipnT(n,p)T ciic ,...,1),,(max ==

1)( −= σnB(n)TM

LnS(n)Ts )(=

Page 8: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

8

Multicore Algorithm Design

• Complexity is given by the tuple <TC,TM,TS>• Cache-aware approaches

– Data layout optimizations, blocking/tiling, padding– Merge Sort case-study

• Cache-oblivious algorithms• Minimize synchronization overhead

– Lock-free algorithms, atomic operations

• All these paradigms considered in SWARM parallel primitives and library design

Page 9: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

9

Test Platform

• Sun Fire T2000 (UltraSparc T1 Niagara processor)– 8 multithreaded cores– 4 threads per core– Shared 3MB on-chip L2 cache– 1.0GHz clock speed

Page 10: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

10

Problem Size (in log scale)

15 16 17 18 19 20 21 22 23 24 25 26 27 28

Spe

edup

ach

ieve

d by

the

cach

e-aw

are

appr

oach

ove

r the

nai

ve a

ppro

ach

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

1.12

1.14

2 cores, 1 thread per core4 cores, 1 thread per core8 cores, 1 thread per core8 cores, 2 threads per core

Tiling optimization benchmark: Performance on Sun Fire T2000

Page 11: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

11

SWARM

• Multicore programming framework and a collection of multicore-optimized libraries

• Portable, open-source– http://multicore-swarm.sourceforge.net/– Current: version 1.1

• Examples and efficient implementations of various parallel programming primitives– Prefix-sums, pointer-jumping, divide and conquer,

pipelining, graph algorithms, symmetry breaking

Page 12: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

12

SWARM

• POSIX-threads based framework• Support for parallel primitives

– data parallel, control, memory management– barrier, replicate, scan, broadcast

• Incremental approach to parallelizing applications– Minimal modifications to existing code

Page 13: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

13

Typical SWARM Usage

int main (int argc, char **argv) {SWARM_Init(&argc, &argv);/* sequential code */........

SWARM_Run((void *) fun1(a, b));

/* more sequential code */fun2(c,d);....

SWARM_Finalize();}

• C code

int main (int argc, char **argv) {

/* sequential code */........

fun1(a, b);

/* more sequential code */fun2(c,d);....

}

Identify compute-intensive functions

Parallelize with SWARM library

Page 14: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

14

Data parallelism: pardo directive

/* pardo example: partitioning a "for" loop among the cores */pardo(i, start, end, incr) {

A[i] = B[i] + C[i];}

• pardo: Parallel do, implicitly partitions a loop among the cores without the need for coordinating.

• SWARM provides both block and cyclic partitioning options

Page 15: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

15

Control

• SWARM control primitives restrict threads that can participate in a context.// THREADS: total number of execution threads// MYTHREAD: the rank of a thread, from 0 to // THREADS-1

/* example: execute code on a specific thread */on_thread (3) {

....

....}

/* example: execute code on just one thread */on_one_thread {

...

...}

Page 16: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

16

Memory management

• SWARM provides two directives – SWARM_ malloc: dynamically allocate a shared

structure– SWARM_free: release shared memory back to

the heap

/* example: allocate a shared array of size n */A = (int *)SWARM_malloc(n*sizeof(int),TH);

/* example: free the array A */SWARM_free(A);

Page 17: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

17

Synchronization

• Barrier: SWARM_Barrier()– Two variants

• Locks– POSIX threads Mutex locks, user-level atomic

locks

Page 18: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

18

Other communication primitives

• Broadcast: supplies each processing core with the address of the shared buffer by replicating the memory address.

• Reduce: performs a reduction operation with a binary associative operator, such as addition, multiplication, maximum, minimum, bitwise-AND, and bitwise-OR

/* function signatures */int SWARM_Bcast_i (int myval, THREADED);int* SWARM_Bcast_ip (int* myval, THREADED);char SWARM_Bcast_c (char myval, THREADED);

/* function signatures */int SWARM_Reduce_i(int myval, reduce_t op, THREADED);double SWARM_Reduce_d(double myval, reduce_t op, THREADED);

Page 19: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

19

SWARM: Merge Sort Performance, Sun Fire T2000

Page 20: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

20

SWARM: List Ranking Performance, Sun Fire T2000

Page 21: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

21

SWARM: List Ranking Performance, Intel dual-core Xeon 5150

Page 22: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

22

Acknowledgment of Support

• National Science Foundation – CSR: A Framework for Optimizing Scientific Applications (06-14915)– CAREER: High-Performance Algorithms for Scientific Applications (06-11589; 00-93039)– ITR: Building the Tree of Life -- A National Resource for Phyloinformatics and

Computational Phylogenetics (EF/BIO 03-31654)– ITR/AP: Reconstructing Complex Evolutionary Histories (01-21377)– DEB Comparative Chloroplast Genomics: Integrating Computational Methods, Molecular

Evolution, and Phylogeny (01-20709)– ITR/AP(DEB): Computing Optimal Phylogenetic Trees under Genome Rearrangement

Metrics (01-13095)– DBI: Acquisition of a High Performance Shared-Memory Computer for Computational

Science and Engineering (04-20513).• IBM PERCS / DARPA High Productivity Computing Systems (HPCS)

– DARPA Contract NBCH30390004• IBM Shared University Research (SUR) Grant• Sony-Toshiba-IBM (STI)• Microsoft Research• Sun Academic Excellence Grant

Page 23: SWARM: A Parallel Programming Framework for Multicore ...hpc.pnl.gov/mtaap/mtaap07/mtaap_files/madduri.pdf · SWARM: A Parallel Programming Framework for Multicore Processors David

23

Conclusions

• SWARM: SoftWare and Algorithms for Running on Multicore, a portable open-source parallel framework for multicore systems– http://multicore-swarm.sourceforge.net/

• We present a complexity model for algorithm design on multicore systems– It is critical to optimize memory access patterns and

synchronization on multicore systems

• Future work: more SWARM libraries and primitives