Parallel Processing 1
Parallel Processing (CS 676)
Overview
Jeremy R. Johnson
Parallel Processing 2
Goals
• Parallelism: To run large and difficult programs fast.
• Course: To become effective parallel programmers– “How to Write Parallel Programs”– “Parallelism will become, in the not too distant future, an essential part
of every programmer’s repertoire”– “Coordination – a general phenomenon of which parallelism is one
example – will become a basic and widespread phenomenon in CS”
• Why? – Some problems require extensive computing power to solve– The most powerful computer by definition is a parallel machine– Parallel computing is becoming ubiquitous– Distributed & networked computers with simultaneous users require
coordination
Parallel Processing 3
Top 500
Parallel Processing 4
LINPACK Benchmark
• Solve a dense N N system of linear equations, y = Ax, using Gaussian Elimination with partial pivoting
– 2/3N3 + 2N2 FLOPS
• High Performance LINPACK used to measure performance for TOP500 (introduced by Jack Dongarra)
uuuuuu
lllll
l
aaaaaaaaa
33
2322
131211
333231
2221
11
333231
232221
131211
00
00
00
Parallel Processing 5
Example LU Decomposition
• Solve the following linear system
• Find LU decomposition A = PLU
1
1
1
yx
zx
zy
011
101
110
A
Parallel Processing 6
Big Machines
Cray 2DoE-Lawrence Livermore
National Laboratory (1985)3.9 gigaflops
8 processor vector machine
Cray XMP/4DoE, LANL,… (1983)
941 megaflops4 processor vector machine
Parallel Processing 7
Big Machines
Cray JaguarORNL (2009)
1.75 petaflops224,256 AMD Opteron cores
Tianhe-1ANSC Tianjin, China (2010)
2.507 petaflops14,336 Xeon X5670 processors 7,168 Nvidia Tesla M2050 GPUS
Parallel Processing 8
Need for Parallelism
Parallel Processing 9
Multicore
Intel Core i7
Parallel Processing 10
Multicore
IBM Blue Gene/L2004-2007
478.2 teraflops65,536 "compute nodes”
Cyclops6480 gigaflops
80 cores @ 500 megahertzmultiply-accumulate
Parallel Processing 11
Multicore
Parallel Processing 12
Multicore
Parallel Processing 13
GPU
Nvidia GTX 480 1.34 teraflops
480 SP (700 MHz)Fermi chip 3 billion transistors
Parallel Processing 14
Google Server
• 2003: 15,000 servers ranging from 533 MHz Intel Celeron to dual 1.4 GHz Intel Pentium III
• 2005: 200,000 servers
• 2006: upwards of servers
Drexel Machines
• Tux• 5 nodes
– 4 Quad-Core AMD Opteron 8378 processors (2.4 GHz)
– 32 GB RAM
• Draco• 20 nodes
– Dual Xeon Processor X5650 (2.66 GHz)
– 6 GTX 480– 72 GB RAM
• 4 nodes– 6 C2070 GPUs
Parallel Processing 15
Parallel Processing 16
Programming Challenge
• “But the primary challenge for an 80-core chip will be figuring out how to write software that can take advantage of all that horsepower.”
• Read more: http://news.cnet.com/Intel-shows-off-80-core-processor/21001006_36158181.html?tag=mncol#ixzz1AHCK1LEc
Parallel Processing 17
Basic Idea
• One way to solve a problem fast is to break the problem into pieces, and arrange for all of the pieces to be solved simultaneously.
• The more pieces, the faster the job goes - upto a point where the pieces become too small to make the effort of breaking-up and distributing worth the bother.
• A “parallel program” is a program that uses the breaking up and handing-out approach to solve large or difficult problems.