Upload
cameron-morrison
View
216
Download
1
Embed Size (px)
Citation preview
CSCI-455/522
Introduction to High Performance Computing
Lecture 1
Why Need Powerful Computers • Traditional scientific and engineering paradigm:
– Do theory or paper design.
– Perform experiments or build system.
• Limitations:– Too expensive -- build a throw-away passenger jet.– Too slow -- wait for climate or galactic evolution.– Too dangerous -- weapons, drug design, climate
experimentation.
• Computational science paradigm:– Use high performance computer systems to model the
phenomenon in detail, using known physical laws and efficient numerical methods.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.3
Grand Challenge Problems
One that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable.
Examples
• Modeling large DNA structures• Global weather forecasting• Modeling motion of astronomical bodies.
More Grand Challenge Problems• Crash simulation
• Earthquake and structural modeling
• Medical studies -- i.e. genome data analysis
• Web service and web search engines
• Financial and economic modeling
• Transaction processing
• Drug design -- i.e. protein folding
• Nuclear weapons -- test by simulations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.5
Weather Forecasting
• Atmosphere modeled by dividing it into 3-dimensional cells.
• Calculations of each cell repeated many times to model passage of time.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.6
Global Weather Forecasting Example
• Suppose whole global atmosphere divided into cells of size 1 mile 1 mile 1 mile to a height of 10 miles (10 cells high) - about 5 108 cells.
• Suppose each calculation requires 200 floating point operations. In one time step, 1011 floating point operations necessary.
• To forecast the weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days.
• To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4 1012 floating point operations/sec).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.7
Modeling Motion of Astronomical Bodies
• Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body.
• With N bodies, N - 1 forces to calculate for each body,
or approx. N2 calculations. (N log2 N for an efficient approx. algorithm.)
• After determining new positions of bodies, calculations repeated.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.8
• A galaxy might have, say, 1011 stars.
• Even if each calculation done in 1 ms (extremely optimistic figure), it takes 109 years for one iteration using N2 algorithm and almost a year for one iteration using an efficient N log2 N approximate algorithm.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.9
Astrophysical N-body simulation by Scott Linssen (undergraduate UNC-Charlotte student).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.10
Parallel Computing
• Using more than one computer, or a computer with more than one processor, to solve a problem.
Motives• Usually faster computation - very simple idea - that n
computers operating simultaneously can achieve the result n times faster - it will not be n times faster for various reasons.
• Other motives include: fault tolerance, larger amount of memory available, ...
How I Used to Make Breakfast……….
How to Set Family to Work...
How Finally Got to the Office in Time….
Von Neumann Architecture
CPU
Memory
Typically much slowerThan the CPU
May not have bandwidthRequired to match theCPU
•Some improvements have included
Interleaved MemoryCachingPipeliningSuperscalar
Moore’s Law
Processor-Memory Performance Gap
µProc60%/yr.
DRAM7%/yr.
1
10
100
1000
19
80
19
81
19
83
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
DRAM
CPU19
82
Per
form
ance
“ Moore’s Law”
From D. Patterson, CS252, Spring 1998 ©UCB
Parallel vs. Serial Computers
• Two big advantages of parallel computers:1. total performance
2. total memory
• Parallel computers enable us to solve problems that:– benefit from, or require, fast solution– require large amounts of memory– example that requires both: weather forecasting
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.18
Background
• Parallel computers - computers with more than one processor - and their programming - parallel programming - has been around for
more than 40 years.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.19
Gill writes in 1958:
“... There is therefore nothing new in the idea of parallel programming, but its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation ...”
Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp. 2-10.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.20
Speedup Factor
where ts is execution time on a single processor and tp is execution time on a multiprocessor.
S(p) gives increase in speed by using multiprocessor.
Use best sequential algorithm with single processor system. Underlying algorithm for parallel implementation might be (and is usually) different.
S(p) = Execution time using one processor (best sequential algorithm)
Execution time using a multiprocessor with p processors
ts
tp
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.21
Speedup factor can also be cast in terms of computational steps:
Can also extend time complexity to parallel computations.
S(p) = Number of computational steps using one processor
Number of parallel computational steps with p processors
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.22
Maximum Speedup
Maximum speedup is usually p with p processors (linear speedup).
Possible to get superlinear speedup (greater than p) but usually a specific reason such as:
• Extra memory in multiprocessor system• Nondeterministic algorithm
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.23
Maximum Speedup Amdahl’s law
Serial section Parallelizable sections
(a) One processor
(b) Multipleprocessors
fts (1 - f)ts
ts
(1 - f)ts /ptp
p processors
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.24
Speedup factor is given by:
This equation is known as Amdahl’s law
S(p) ts p
fts (1 f )ts /p 1 (p 1)f
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.26
Even with infinite number of processors, maximum speedup limited to 1/f.
ExampleWith only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.28
Superlinear Speedup example - Searching
(a) Searching each sub-space sequentially
tsts/p
Start Time
t
Solution foundxts/p
Sub-spacesearch
x indeterminate
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.29
(b) Searching each sub-space in parallel
Solution found
t
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.30
Speed-up then given by
S(p)xtsp
t+
t=
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.31
Worst case for sequential search when solution found in last sub-space search. Then parallel version offers greatest benefit, i.e.
S(p)
p 1–p
ts t+
t=
as t tends to zero
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 1.32
Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e.
Actual speed-up depends upon which subspace holds solution but could be extremely large.
S(p) = tt
= 1
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.
Amdahl’s Law provides a theoretical upper limit on parallel speedup assuming that there are no costs for communications. In reality, communications (and I/O) will result in a further degradation of performance.
0
10
20
30
40
50
60
70
80
0 50 100 150 200 250Number of processors
spee
dup Amdahl's Law
Reality
fp = 0.99
Amdahl’s Law Vs. Reality
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved.
Gustafson’s Law• Thus, Amdahl’s Law predicts that there is a
maximum scalability for an application, determined by its parallel fraction, and this limit is generally not large.
• There is a way around this: increase the problem size– bigger problems mean bigger grids or more particles:
bigger arrays– number of serial operations generally remains constant;
number of parallel operations increases: parallel fraction increases