22
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications

ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

ECE 669

Parallel Computer Architecture

Lecture 4

Parallel Applications

Page 2: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Outline

° Motivating Problems (application case studies)

° Classifying problems

° Parallelizing applications

° Examining tradeoffs

° Understanding communication costs

• Remember: software and communication!

Page 3: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Simulating Ocean Currents

° Model as two-dimensional grids• Discretize in space and time• finer spatial and temporal resolution => greater accuracy

° Many different computations per time step- set up and solve equations

• Concurrency across and within grid computations° Static and regular

(a) Cross sections (b) Spatial discretization of a cross section

Page 4: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Creating a Parallel Program

° Pieces of the job:• Identify work that can be done in parallel

- work includes computation, data access and I/O• Partition work and perhaps data among processes• Manage data access, communication and synchronization

° Simplification:• How to represent big problem using simple computation and

communication

° Identifying the limiting factor• Later: balancing resources

Page 5: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

4 Steps in Creating a Parallel Program

P0

Tasks Processes Processors

P1

P2 P3

p0 p1

p2 p3

p0 p1

p2 p3

Partitioning

Sequentialcomputation

Parallelprogram

Assignment

Decomposition

Mapping

Orchestration

° Decomposition of computation in tasks

° Assignment of tasks to processes

° Orchestration of data access, comm, synch.

° Mapping processes to processors

Page 6: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Decomposition

° Identify concurrency and decide level at which to exploit it

° Break up computation into tasks to be divided among processors

• Tasks may become available dynamically• No. of available tasks may vary with time

° Goal: Enough tasks to keep processors busy, but not too many

• Number of tasks available at a time is upper bound on achievable speedup

Page 7: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Limited Concurrency: Amdahl’s Law

° Most fundamental limitation on parallel speedup

° If fraction s of seq execution is inherently serial,speedup <= 1/s

° Example: 2-phase calculation• sweep over n-by-n grid and do some independent computation• sweep again and add each value to global sum

° Time for first phase = n2/p

° Second phase serialized at global variable, so time = n2

° Speedup <= or at most 2

° Trick: divide second phase into two• accumulate into private sum during sweep• add per-process private sum into global sum

° Parallel time is n2/p + n2/p + p, and speedup at best

2n2

n2

p + n2

2n2

2n2 + p2

Page 8: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Understanding Amdahl’s Law

1

p

1

p

1

n2/p

n2

p

wor

k do

ne c

oncu

rren

tly

n2

n2

Timen2/p n2/p

(c)

(b)

(a)

Page 9: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Concurrency Profiles

• Area under curve is total work done, or time with 1 processor• Horizontal extent is lower bound on time (infinite processors)

• Speedup is the ratio: , base case:

• Amdahl’s law applies to any overhead, not just limited concurrency

Con

curr

ency

150

219

247

286

313

343

380

415

444

483

504

526

564

589

633

662

702

733

0

200

400

600

800

1,000

1,200

1,400

Clock cycle number

fk k

fkkp∑

k=1

∑k=1

∞1

s + 1-sp

Page 10: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Applications

° Classes of problems• Continuum• Particle• Graph, Combinatorial

° Goal: Demystifying

° Differential equations ---> Parallel Program

Page 11: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Particle Problems

° Simulate the interactions of many particles evolving over time

° Computing forces is expensive• Locality• Methods take advantage of force law: G m1m2

r2

•Many time-steps, plenty of concurrency across stars within one

Star on which for cesare being computed

Star too close toappr oximate

Small gr oup far enough away toappr oximate to center of mass

Large gr oup farenough away toappr oximate

Page 12: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Graph problems

• Traveling salesman• Network flow• Dynamic programming

° Searching, sorting, lists,

° Generally unstructured

•2

1

5

4

3

•6

2

3

2

2

21

1

5

5

6

4

7

Page 13: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Continuous systems

° Hyperbolic

° Parabolic

° Elliptic

° Examples:• Heat diffusion

• Electrostatic potential

• Electromagnetic waves

∂ 2AC2∂T2 = ∇2A+ B

Laplace: B is zeroPoisson: B is non-zero

∂A

C∂T= ∇2A+ B

0 = ∇2A +B

Page 14: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Numerical solutions

° Let’s do finite difference first

° Solve• Discretize• Form system of equations• Solve ---> • Direct methods

• Indirect methods• Iterative

finite difference methodsfinite element methods

.

.

.

Result insystem of equations

Eg.

∂ A∂ T

=∂ 2 A∂ x 2

Page 15: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Discretize

° Time• Where

° Space

° 1st• Where

° 2nd

• Can use other discretizations- Backward- Leap frog

Forward difference

∂ A∂ x

=A i + 1 − A i

∆ x

∂ A∂ T

=A n + 1 − A n

∆ t

∆ t =

1T steps

A12A11

Space

Boundary conditions

n-2n-1

n

Time

∂∂ x

∂ A∂ x

=

A i + 1 − A i( ) − A i − A i − 1( )∆ x 2

∂ 2 A∂ x 2 =

A i + 1 − 2 A i + A i − 1

∆ x 2

∆ x =1

X grid points

Page 16: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

1D Case

° Or

∂ A∂ T

=∂ 2 A∂ x 2

+ B

A i

n + 1 =∆ t

∆ x 2 A i + 1n − 2 A i

n + A i −1n[ ]+ B i ∆ t + A i

n

A xn + 1

A in + 1

A 2n + 1

A in + 1

= ∆ t

∆ x 2

− 2 ∆ t

∆ x 2+ 1

∆ t

∆ x 2

A xn

A i

A 2n

A in

+ B

0

0

A in + 1 − A i

n

∆ t=

1

∆ x 2 A i + 1n − 2 A i

n + Ai-1n[ ] + Bi

Page 17: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Poisson’s

For

Or

A x = b

∂ 2 A

∂ x 2+ B = 0

∀ i 0 =

1

∆ x 2Ai +1 − 2A i + A i −1[ ] + B i

.

.

.

1

∆ x 2

− 2∆ x 2

1∆ x 2

A 0

A 1

Ai

=

B 0

B 1

Bi

.

.

.

.

.

.

.

.

.

0

0

Page 18: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

2-D case

∂ A∂ T

=∂ 2 A

∂ x 2+

∂ 2 A

∂ y 2+ B

i , jn +1A − i , j

nA∆ t

=1

∆ s 2 i +1, jnA + i −1, j

nA + i , j +1nA + i , j −1

nA − 4 A i , jn[ ]+ i , jB

i , jn +1Α =

∆ t∆s2 i +1, j

nΑ + i −1, jnΑ + i , j +1

nΑ + i , j −1nΑ − i , j

n4Α[ ]+ Bi , j ∆t + Α i, jn

i, jn +1A[ ]= ?[ ] i, j

nA[ ]+ i, jB[ ]

n

A11 A12 A13 . . .

A21 A22 . . .

∆ s

∆s

° What is the form of this matrix?

Page 19: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Current status

° We saw how to set up a system of equations

° How to solve them

° Poisson: Basic idea

° In iterative methods

• Iterate till no difference• The ultimate parallel method

IterativeDirect

Jacobi, ...Multigrid...

.

.

.

Or

0 for Laplace

0 =

1

∆ s 2 i +1 , jA + i −1 , jA + i , j + 1A + i , j − 1kA − i , j4 A[ ] + i , jB

Ai , j =

Ai +1, j + Ai −1, j + i , j +1A + i , j −1A4

+ i , jC

i , jk +1A = i +1, j

kA + i −1, jkA + i , j +1

kA + i , j −1kA

4= i , jC

Page 20: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

In Matrix notation Ax = b

° Set up a system of equations.

° Now, solve

° Direct:

° Iterative:

Direct methodsSemi-direct - CGIterative

Gaussian elim.Recursive dbl.

JacobiMG

Solve Ax=b directly LU

Ax = b= -Ax+b

Mx = Mx - Ax + bMx = (M - A) x + b

Mx k+1 = (M - A) xk + b

Solve iteratively

Page 21: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Machine model

• Data is distributed among memories (ignore initial I/O costs)• Communication over network-explicit• Processor can compute only on data in local memory. To effect

communication, processor sends data to other node (writes into other memory).

Interconnectionnetwork

M

P

M M

P P

Page 22: ECE 669 Parallel Computer Architecture - UMass Amherst · ECE669 L4: Parallel Applications February 10, 2004 Concurrency Profiles • Area under curve is total work done, or time

ECE669 L4: Parallel Applications February 10, 2004

Summary

° Many types of parallel applications

• Attempt to specify as classes (graph, particle, continuum)

° We examine continuum problems as a series of finite differences

° Partition in space and time

° Distribute computation to processors

° Understand processing and communication tradeoffs