62
Parallel Numerical Algorithms: Iterative Linear Systems, Differential Equations and Finite Difference Methods Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior T´ ecnico December 6, 2012 CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 1 / 40

Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Numerical Algorithms:Iterative Linear Systems, Differential Equations and

Finite Difference Methods

Parallel and Distributed Computing

Department of Computer Science and Engineering (DEI)Instituto Superior Tecnico

December 6, 2012

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 1 / 40

Page 2: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Outline

Iterative Methods for Linear Systems

Relaxation Methods

Linear Second-order Partial Differential Equations (PDEs)

Finite difference methods

Example: steady-state heat distribution

Ghost points

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 2 / 40

Page 3: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Solving Linear Systems: recall

Direct Methods: solution is sought directly, at once

Gaussian Elimination

LU Factorization

Iterative Methods: solution is sought iteratively, by improvement

Relaxation Methods

Krylov Methods

Preconditioning

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 3 / 40

Page 4: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Iterative Methods for Linear Systems

Iterative methods for solving linear system Ax = b begin with initialguess for solution and successively improve it until solution is asaccurate as desired

In theory, infinite number of iterations might be required to convergeto exact solution

In practice, iteration terminates when residual ‖r‖ = ‖b − Ax‖, orsome other measure of error, is as small as desired

Iterative methods are especially useful when matrix A is sparsebecause, unlike direct methods, no fill is incurred

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 4 / 40

Page 5: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Jacobi Method

Beginning with initial guess x (0), Jacobi method computes nextiterate by solving for each component of x in terms of others

x(k+1)i =

bi −∑j=1

aijx(k)j

/aii , i = 1, · · · , n

If D, L, and U are diagonal, strict lower triangular, and strict uppertriangular portions of A, then Jacobi method can be written as

x (k+1) = D−1(b − (L + U)x (k)

)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 5 / 40

Page 6: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Jacobi Method

Jacobi method requires nonzero diagonal entries, which can usuallybe accomplished by permuting rows and columns if not already true

Jacobi method requires duplicate storage for x , since no componentcan be overwritten until all new values have been computed

Components of new iteration do not depend on each other, so theycan be computed simultaneously

Jacobi method does not always converge, but it is guaranteed toconverge under conditions that are often satisfied (e.g., if matrix isstrictly diagonally dominant), though convergence rate may be veryslow

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 6 / 40

Page 7: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Gauss-Seidel Method

Faster convergence can be achieved by using each new componentvalue as soon as it has been computed rather than waiting until nextiteration

This gives Gauss-Seidel method

x(k+1)i =

bi −∑j<i

aijx(k+1)j −

∑j>i

aijx(k)j

/aii

Using same notation as for Jacobi, Gauss-Seidel method can bewritten

x (k+1) = (D + L)−1(b − Ux (k)

)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 7 / 40

Page 8: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Gauss-Seidel Method

Gauss-Seidel requires nonzero diagonal entries. Gauss-Seidel does notrequire duplicate storage for x , since component values can beoverwritten as they are computed

But each component depends on previous ones, so they must becomputed successively

Gauss-Seidel does not always converge, but it is guaranteed toconverge under conditions that are somewhat weaker than those forJacobi method (e.g., if matrix is symmetric and positive definite)

Gauss-Seidel converges about twice as fast as Jacobi, but may still bevery slow

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 8 / 40

Page 9: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Implementation

Iterative methods for linear systems are composed of basic operationssuch as

vector updates

inner products

matrix-vector multiplication

solution of triangular systems

In parallel implementation, both data and operations are partitionedacross multiple tasks

In addition to communication required for these basic operations,necessary convergence test may require additional communication(e.g., sum or max reduction)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 9 / 40

Page 10: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Partitioning of Vectors

Iterative methods typically require several vectors, including solutionx , right-hand side b, residual r = b − Ax , and possibly others

Even when matrix A is sparse, these vectors are usually dense

These dense vectors are typically uniformly partitioned among p tasks,with given task holding same set of component indices of each vector

Thus, vector updates require no communication, whereas innerproducts of vectors require reductions across tasks

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 10 / 40

Page 11: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Partitioning of Sparse Matrix

Sparse matrix A can be partitioned among tasks by rows, by columns,or by submatrices

Partitioning by submatrices may give uneven distribution of nonzerosamong tasks; indeed, some submatrices may contain no nonzeros atall

Partitioning by rows or by columns tends to yield more uniformdistribution because sparse matrices typically have about samenumber of nonzeros in each row or column

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 11 / 40

Page 12: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Sparse MatVec with 1-D Partitioning

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 12 / 40

Page 13: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Sparse MatVec with 2-D Partitioning

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 13 / 40

Page 14: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Row Partitioning of Sparse Matrix

Suppose that each task is assigned n/p rows, yielding p tasks, wherefor simplicity we assume that p divides n

In dense matrix-vector multiplication, since each task owns only n/pcomponents of vector operand, communication is required to obtainremaining components

If matrix is sparse, however, few components may actually be needed,and these should preferably be stored in neighboring tasks

Assignment of rows to tasks by contiguous blocks or cyclically wouldnot, in general, result in desired proximity of vector components

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 14 / 40

Page 15: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Two-Dimensional Partitioning

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 15 / 40

Page 16: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Jacobi and Gauss-Seidel

Contiguous groups of variables are assigned to each task, so mostcommunication is internal, and external communication is limited tonearest neighbors in 1-D mesh

More generally, Jacobi method usually parallelizes well if underlyinggrid is partitioned in this manner, since all components of x can beupdated simultaneously

Unfortunately, Gauss-Seidel methods require successive updating ofsolution components in given order (in effect, solving triangularsystem), rather than permitting simultaneous updating as in Jacobimethod

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 16 / 40

Page 17: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Red-Black Ordering

Apparent sequential order can be broken, however, if components arereordered according to coloring of underlying graph

For 5-point discretization on square grid, for example, color alternatenodes in each dimension red and others black, giving color pattern ofchess or checker board

Then all red nodes can be updated simultaneously, as can all blacknodes, so algorithm proceeds in alternating phases, first updating allnodes of one color, then those of other color, repeating untilconvergence

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 17 / 40

Page 18: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Row-Wise Ordering for 2-D Grid

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 18 / 40

Page 19: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Red-Black Ordering for 2-D Grid

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 19 / 40

Page 20: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Partial Differential Equations

Partial Differential Equations, PDE

Equation containing derivatives of a function of two or more variables(Ordinary if single variable).

u = f (x , y) uy =∂f

∂yuxy =

∂2f

∂x∂y

Second-Order PDE: contains no partial derivatives of order more than two.

Linear second-order PDEs are of the form:

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

A, B, C , D, E , F , G and H are functions of x and y only.

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 20 / 40

Page 21: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Partial Differential Equations

Partial Differential Equations, PDE

Equation containing derivatives of a function of two or more variables(Ordinary if single variable).

u = f (x , y) uy =∂f

∂yuxy =

∂2f

∂x∂y

Second-Order PDE: contains no partial derivatives of order more than two.

Linear second-order PDEs are of the form:

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

A, B, C , D, E , F , G and H are functions of x and y only.

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 20 / 40

Page 22: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Partial Differential Equations

Partial Differential Equations, PDE

Equation containing derivatives of a function of two or more variables(Ordinary if single variable).

u = f (x , y) uy =∂f

∂yuxy =

∂2f

∂x∂y

Second-Order PDE: contains no partial derivatives of order more than two.

Linear second-order PDEs are of the form:

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

A, B, C , D, E , F , G and H are functions of x and y only.

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 20 / 40

Page 23: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Linear second-order PDEs

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

1. πuxy + x2uyy = sin(xy)

2. u2xx + uyy = 0

3. uuxy + 4xuyy = x + y

4. sin(xy)ux + 6xyuxy = 0

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 21 / 40

Page 24: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Linear second-order PDEs

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

1. πuxy + x2uyy = sin(xy)

2. u2xx + uyy = 0

3. uuxy + 4xuyy = x + y

4. sin(xy)ux + 6xyuxy = 0

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 21 / 40

Page 25: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Linear second-order PDEs

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

1. πuxy + x2uyy = sin(xy)

2. u2xx + uyy = 0

3. uuxy + 4xuyy = x + y

4. sin(xy)ux + 6xyuxy = 0

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 21 / 40

Page 26: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Linear second-order PDEs

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

1. πuxy + x2uyy = sin(xy)

2. u2xx + uyy = 0

3. uuxy + 4xuyy = x + y

4. sin(xy)ux + 6xyuxy = 0

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 21 / 40

Page 27: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Linear second-order PDEs

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

1. πuxy + x2uyy = sin(xy)

2. u2xx + uyy = 0

3. uuxy + 4xuyy = x + y

4. sin(xy)ux + 6xyuxy = 0

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 21 / 40

Page 28: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Elliptic PDEs

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

Elliptic PDEs: B2 − AC < 0

Poisson Equation: uxx + uyy = f (x , y) (A = C = 1, B = 0)

(if f (x , y) = 0, called Laplace Equation)

potential problems: electric, magnetic, gravitic

distribution of heat or charge in conductors

certain fluid and torsion problems

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 22 / 40

Page 29: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parabolic PDEs

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

Parabolic PDEs: B2 − AC = 0

Heat Equation or Diffusion Equation: kuxx = ut (A = k , B = C = 0)

heat conduction in solids

diffusion of liquids and gases

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 23 / 40

Page 30: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Hyperbolic PDEs

Auxx + 2Buxy + Cuyy + Eux + Fuy + Gu = H

Hyperbolic PDEs: B2 − AC > 0

Wave Equation: c2uxx = utt (A = c2, B = 0, C = −1)

wave propagation model

vibration of strings and membranes

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 24 / 40

Page 31: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Finite Difference Methods

In general, not possible to obtain an analytical solution to a PDE.

Finite Difference Methods

Numerical methods that obtain an approximate result of PDEs by dividingthe variables (often time and space) into discrete intervals.

Approximation to the first derivative of f (x), f ′(x):

f ′(x) ≈ f (x + h/2)− f (x − h/2)

h

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 25 / 40

Page 32: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Finite Difference Methods

In general, not possible to obtain an analytical solution to a PDE.

Finite Difference Methods

Numerical methods that obtain an approximate result of PDEs by dividingthe variables (often time and space) into discrete intervals.

Approximation to the first derivative of f (x), f ′(x):

f ′(x) ≈ f (x + h/2)− f (x − h/2)

h

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 25 / 40

Page 33: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Finite Difference Methods

Approximation to the second derivative of f (x), f ′′(x):

f ′′(x) ≈f (x+h/2+h/2)−f (x+h/2−h/2)

h − f (x−h/2+h/2)−f (x−h/2−h/2)h

h

=f (x + h)− f (x)− (f (x)− f (x − h))

h2

=f (x + h)− 2f (x) + f (x − h)

h2

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 26 / 40

Page 34: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Steady-State Heat Distribution

Problem: find the steady-state heat distribution over a thin square plate.

Steam

Steam

Steam

Ice

Underlying PDE is the Poisson Equation:

uxx + uyy = f (x , y) 0 ≤ x ≤ a, 0 ≤ y ≤ b

Boundary conditions:

u(x , 0) = G1(x) u(x , a) = G2(x) 0 ≤ x ≤ a

u(0, y) = G3(y) u(a, y) = G4(y) 0 ≤ y ≤ b

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 27 / 40

Page 35: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Steady-State Heat Distribution

Sequential program: create a 2-dimensional grid, with dimensions x and ydivided into n and m pieces, respectively. Iteratively compute u(xi , yj)until convergence.

Let h = x/n and k = y/m.

uxx(xi , yj) ≈u(xi + h, yj)− 2u(xi , yj) + u(xi − h, yj)

h2

=ui+1,j − 2ui ,j + ui−1,j

h2

uyy (xi , yj) ≈ui ,j+1 − 2ui ,j + ui ,j−1

k2

Inserting in the original equation:

ui+1,j − 2ui ,j + ui−1,j

h2+

ui ,j+1 − 2ui ,j + ui ,j−1

k2= fi ,j

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 28 / 40

Page 36: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Steady-State Heat Distribution

ui+1,j − 2ui ,j + ui−1,j

h2+

ui ,j+1 − 2ui ,j + ui ,j−1

k2= fi ,j

In our case,

no internal sources, fi ,j = 0

laterals and bottom at 100o, ui ,1 = u1,j = ua,j = 100

top at 0o, ui ,b = 0

also, if we use h = k

simplifies to:

ui+1,j − 2ui ,j + ui−1,j + ui ,j+1 − 2ui ,j + ui ,j−1 = 0

Solving for ui ,j :

ui ,j =ui+1,j + ui−1,j + ui ,j+1 + ui ,j−1

4

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 29 / 40

Page 37: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Steady-State Heat Distribution

ui ,j =ui+1,j + ui−1,j + ui ,j+1 + ui ,j−1

4

Jacobi method:

start with an initial guess of ui ,j

compute ui ,j at iteration q from values of q − 1

stop when values between iterations within a given error

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 30 / 40

Page 38: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Heat Distribution

Primitive task:

ui ,j in each iteration

Communication: adjacent cells, ui−1,j , ui+1,j , ui ,j−1, ui ,j+1

Agglomeration: row-wise block stripped

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

11

12

14

15

16

1

2

3

4

5

6

7

8

9

13

Sideways within process, what about top/bottom?

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 31 / 40

Page 39: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Heat Distribution

Primitive task: ui ,j in each iteration

Communication:

adjacent cells, ui−1,j , ui+1,j , ui ,j−1, ui ,j+1

Agglomeration: row-wise block stripped

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

11

12

14

15

16

1

2

3

4

5

6

7

8

9

13

Sideways within process, what about top/bottom?

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 31 / 40

Page 40: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Heat Distribution

Primitive task: ui ,j in each iteration

Communication: adjacent cells, ui−1,j , ui+1,j , ui ,j−1, ui ,j+1

Agglomeration:

row-wise block stripped

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

11

12

14

15

16

1

2

3

4

5

6

7

8

9

13

Sideways within process, what about top/bottom?

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 31 / 40

Page 41: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Heat Distribution

Primitive task: ui ,j in each iteration

Communication: adjacent cells, ui−1,j , ui+1,j , ui ,j−1, ui ,j+1

Agglomeration: row-wise block stripped

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

11

12

14

15

16

1

2

3

4

5

6

7

8

9

13

Sideways within process, what about top/bottom?

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 31 / 40

Page 42: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Heat Distribution

Primitive task: ui ,j in each iteration

Communication: adjacent cells, ui−1,j , ui+1,j , ui ,j−1, ui ,j+1

Agglomeration: row-wise block stripped1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

11

12

14

15

16

1

2

3

4

4

5

5

6

7

8

8

9

12

9

13

13

At the end of each iteration, each process sends the top and bottom lines.

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 31 / 40

Page 43: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Ghost Points

Ghost Points

Memory locations used to store redundant copies of data held byneighboring processes

Allocating ghost points as extra rows / columns simplifies parallelalgorithms by allowing same loop to update all cells.

How many rows for ghost points?

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 32 / 40

Page 44: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Ghost Points

Ghost Points

Memory locations used to store redundant copies of data held byneighboring processes

Allocating ghost points as extra rows / columns simplifies parallelalgorithms by allowing same loop to update all cells.

How many rows for ghost points?

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 32 / 40

Page 45: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Ghost Points

if only one row/column transmitted, communication time may bedominated by message latency

if we send two, we can advance simulation two time steps beforeanother communication⇒ however we need to replicate computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

11

12

14

15

16

11

12

14

13

13

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 33 / 40

Page 46: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Ghost Points

if only one row/column transmitted, communication time may bedominated by message latency

if we send two, we can advance simulation two time steps beforeanother communication⇒ however we need to replicate computation

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 33 / 40

Page 47: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Complexity Analysis of Row-wise Parallel Algorithm

(analysis per iteration, and a single ghost row)

Sequential time complexity:

Θ(n2)

Parallel computational complexity: Θ(n2

p )

Parallel communication complexity: Θ(n)(two sends and two receives of n elements)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 34 / 40

Page 48: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Complexity Analysis of Row-wise Parallel Algorithm

(analysis per iteration, and a single ghost row)

Sequential time complexity: Θ(n2)

Parallel computational complexity:

Θ(n2

p )

Parallel communication complexity: Θ(n)(two sends and two receives of n elements)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 34 / 40

Page 49: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Complexity Analysis of Row-wise Parallel Algorithm

(analysis per iteration, and a single ghost row)

Sequential time complexity: Θ(n2)

Parallel computational complexity: Θ(n2

p )

Parallel communication complexity:

Θ(n)(two sends and two receives of n elements)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 34 / 40

Page 50: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Complexity Analysis of Row-wise Parallel Algorithm

(analysis per iteration, and a single ghost row)

Sequential time complexity: Θ(n2)

Parallel computational complexity: Θ(n2

p )

Parallel communication complexity: Θ(n)(two sends and two receives of n elements)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 34 / 40

Page 51: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Isoefficiency Analysis of Row-wise Parallel Algorithm

Isoefficiency analysis: T (n, 1) ≥ CT0(n, p)

(T (n, 1) sequential time; T0(n, p) parallel overhead)

Sequential time complexity: T (n, 1) = Θ(n2)

Parallel overhead dominated by communication: Θ(n)

T0(n, p) = p ×Θ(n) = Θ(pn)

n2 ≥ Cpn ⇒ n ≥ Cp

Scalability function: M(f (p))/p

M(n) = n2 ⇒ M(Cp)

p=

C 2p2

p= C 2p

⇒ System is not highly scalable.

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 35 / 40

Page 52: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Isoefficiency Analysis of Row-wise Parallel Algorithm

Isoefficiency analysis: T (n, 1) ≥ CT0(n, p)

(T (n, 1) sequential time; T0(n, p) parallel overhead)

Sequential time complexity: T (n, 1) = Θ(n2)

Parallel overhead dominated by communication: Θ(n)

T0(n, p) = p ×Θ(n) = Θ(pn)

n2 ≥ Cpn ⇒ n ≥ Cp

Scalability function: M(f (p))/p

M(n) = n2 ⇒ M(Cp)

p=

C 2p2

p= C 2p

⇒ System is not highly scalable.

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 35 / 40

Page 53: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Isoefficiency Analysis of Row-wise Parallel Algorithm

Isoefficiency analysis: T (n, 1) ≥ CT0(n, p)

(T (n, 1) sequential time; T0(n, p) parallel overhead)

Sequential time complexity: T (n, 1) = Θ(n2)

Parallel overhead dominated by communication: Θ(n)

T0(n, p) = p ×Θ(n) = Θ(pn)

n2 ≥ Cpn ⇒ n ≥ Cp

Scalability function: M(f (p))/p

M(n) = n2 ⇒ M(Cp)

p=

C 2p2

p= C 2p

⇒ System is not highly scalable.

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 35 / 40

Page 54: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Heat Distribution

Primitive task: ui ,j in each iteration

Communication: adjacent cells, ui−1,j , ui+1,j , ui ,j−1, ui ,j+1

Agglomeration: checkerboard block decomposition

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

11

12

14

15

16

1

2

3

4

5

6

7

8

9

13

Ghost points?

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 36 / 40

Page 55: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Parallel Heat Distribution

Primitive task: ui ,j in each iteration

Communication: adjacent cells, ui−1,j , ui+1,j , ui ,j−1, ui ,j+1

Agglomeration: checkerboard block decomposition1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10

11

12

14

15

16

1

2

3

4

5

6

7

8

9

13

12

13

13 129

9

8

85

5

4

4

At the end of each iteration, each process sends the top, bottom andsides.

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 36 / 40

Page 56: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Complexity Analysis of Checkerboard Decomposition

(analysis per iteration, and a single ghost row/column)

Sequential time complexity: Θ(n2)

Parallel computational complexity: Θ(n2

p )

Parallel communication complexity:

Θ(n/√p)

(four sends and four receives of n√p elements)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 37 / 40

Page 57: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Complexity Analysis of Checkerboard Decomposition

(analysis per iteration, and a single ghost row/column)

Sequential time complexity: Θ(n2)

Parallel computational complexity: Θ(n2

p )

Parallel communication complexity: Θ(n/√p)

(four sends and four receives of n√p elements)

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 37 / 40

Page 58: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Isoefficiency Analysis of Checkerboard Parallel Algorithm

Isoefficiency analysis: T (n, 1) ≥ CT0(n, p)

(T (n, 1) sequential time; T0(n, p) parallel overhead)

Sequential time complexity: T (n, 1) = Θ(n2)

Parallel overhead dominated by communication: Θ(n/√p)

T0(n, p) = p ×Θ(n/√p) = Θ(n

√p)

n2 ≥ Cn√p ⇒ n ≥ C

√p

Scalability function: M(f (p))/p

M(n) = n2 ⇒M(C

√p)

p=

C 2p

p= C 2

⇒ System is perfectably Scalable!

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 38 / 40

Page 59: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Isoefficiency Analysis of Checkerboard Parallel Algorithm

Isoefficiency analysis: T (n, 1) ≥ CT0(n, p)

(T (n, 1) sequential time; T0(n, p) parallel overhead)

Sequential time complexity: T (n, 1) = Θ(n2)

Parallel overhead dominated by communication: Θ(n/√p)

T0(n, p) = p ×Θ(n/√p) = Θ(n

√p)

n2 ≥ Cn√p ⇒ n ≥ C

√p

Scalability function: M(f (p))/p

M(n) = n2 ⇒M(C

√p)

p=

C 2p

p= C 2

⇒ System is perfectably Scalable!

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 38 / 40

Page 60: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Isoefficiency Analysis of Checkerboard Parallel Algorithm

Isoefficiency analysis: T (n, 1) ≥ CT0(n, p)

(T (n, 1) sequential time; T0(n, p) parallel overhead)

Sequential time complexity: T (n, 1) = Θ(n2)

Parallel overhead dominated by communication: Θ(n/√p)

T0(n, p) = p ×Θ(n/√p) = Θ(n

√p)

n2 ≥ Cn√p ⇒ n ≥ C

√p

Scalability function: M(f (p))/p

M(n) = n2 ⇒M(C

√p)

p=

C 2p

p= C 2

⇒ System is perfectably Scalable!

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 38 / 40

Page 61: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Review

Iterative Methods for Linear Systems

Relaxation Methods

Linear Second-order Partial Differential Equations (PDEs)

Finite difference methods

Example: steady-state heat distribution

Ghost points

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 39 / 40

Page 62: Parallel Numerical Algorithms: Iterative Linear Systems ...parallelcomp.github.io › finite-difference1.pdf · Parallel Numerical Algorithms: Iterative Linear Systems, Di erential

Next Class

Map-Reduce

ccNUMA

CPD (DEI / IST) Parallel and Distributed Computing – 23 2012-12-6 40 / 40