A Novel approach to solving large-scale linear systems

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMSKen Habgood, Itamar Arel

Department of Electrical Engineering & Computer Science

GABRIEL CRAMER (1704-1752)

April 3, 2009

Outline

Problem statement and motivation Novel approach

Revisiting Cramer’s rule Matrix condensation Illustration of the proposed

scheme Implementation results Challenges ahead

2

Solving large-scale linear systems

Many scientific applications Computer models in finance, biology,

physics Real-time load flow calculations for

electric utilitiesShort-circuit fault, economic analysisConsumer generation on the electric

grid may soon require real-time calculations (hybrid cars, solar panels)

3

Why try to improve?

We want parallel processing for speed

Current schemes use Gaussian elimination Mainstream approach: LU decomposition O(N3) computational complexity, O(N2)

parallelizable If you had N2 processors O(N) time

So far so good … The “catch”: Irregular communication

patterns for load balancing across processing nodes

4

Cramer’s rule revisited5

ax + by = e

cx + dy = f

e b

f d

a b

c d

x =

The solution to a linear system Ax = b is given by xi = |Ai(b)|/|A|

where Ai(b) denotes A with its ith column replaced by b

O(N!) computational complexity

Chio’s matrix condensation6

a1 b1 c1

a2 b2 c2

a3 b3 c3

Matrix A

= a1–(n-2)

a1 b1

a2 b2

a1 c1

a2 c2

a1 b1

a3 b3

a1 c1

a3 c3

b2’ c2’

b3’ c3’

= a1–(n-2)

a1,1 cannot be 0

If a1,1 is 1 then a1,1–(n-2) =1

Let D denote the matrix obtained by replacing each element ai,j by

a1,1 a1,j

ai,1 ai,j

Then |A| = |D|

a1,1n-2

Recursive determinant calculation O(N3) computational complexity

Highlights of the approach

Chio’s condensation combined with Cramer’s rule results in O(N4)

Goal to remain at O(N3) Retain attractive parallel processing potential

Solution: clever bookkeeping to reduce computations “Mirror” matrix before applying condensation Each matrix solves for half of the unknowns Condense each until matrix size matches the

number of unknowns Mirror the matrices again

7

3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

3

-2

0

2

-5

-3

-2

2

-4

-3

-5

0

2

0

-4

4

4

-2

-2

0

-4

-5

-1

-5

-5

-2

0

1

-1

0

-1

2

-4

4

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4

3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

0 -4 0

Mirroring the matrix

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

3

-2

0

2

-5

-3

-2

2

-4

-3

-5

0

2

0

-4

4

4

-2

-2

0

-4

-5

-1

-5

-5

-2

0

1

-1

0

-1

2

0 3 4 -3 -2 0

9 unknowns to solve for

3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

3

-2

0

2

-5

-3

-2

2

-4

-3

-5

0

2

0

-4

4

4

-2

-2

0

-4

-5

-1

-5

-5

-2

0

1

-1

0

-1

2

-4

4

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4

3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

-3

2

0

-2

5

3

2

-2

4

3

5

0

-2

0

4

-4

-4

2

2

0

4

5

1

5

5

2

0

-1

1

0

1

-2

0-4003-4320

-4

4

0

1

-4

-5

0

0

4

Mirroring the matrix (cont’)

3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

3

-2

0

2

-5

-3

-2

2

-4

-3

-5

0

2

0

-4

4

4

-2

-2

0

-4

-5

-1

-5

-5

-2

0

1

-1

0

-1

2

-4

4

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4

3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

-3

2

0

-2

5

3

2

-2

4

3

5

0

-2

0

4

-4

-4

2

2

0

4

5

1

5

5

2

0

-1

1

0

1

-2

0-4003-4320

-4

4

0

1

-4

-5

0

0

4




3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

3

-2

0

2

-5

-3

-2

2

-4

-3

-5

0

2

0

-4

4

4

-2

-2

0

-4

-5

-1

-5

-5

-2

0

1

-1

0

-1

2

-4

4

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4

3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

-3

2

0

-2

5

3

2

-2

4

3

5

0

-2

0

4

-4

-4

2

2

0

4

5

1

5

5

2

0

-1

1

0

1

-2

0-4003-4320

-4

4

0

1

-4

-5

0

0

4




3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

3

-2

0

2

-5

-3

-2

2

-4

-3

-5

0

2

0

-4

4

4

-2

-2

0

-4

-5

-1

-5

-5

-2

0

1

-1

0

-1

2

-4

4

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4

Chio’s matrix condensation

3

3

2

1

2

-5

4

0

-1

-1

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

3

-2

0

2

-5

-3

-2

2

-4

-3

-5

0

2

0

-4

4

4

-2

-2

0

-4

-5

-1

-5

-5

-2

0

1

-1

0

-1

2

-4

4

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4

Chio’s matrix condensation (cont’)

= 0

3

3

2

1

2

-5

4

0

-1

0

-5

-1

0

-4

-3

0

4

2

3

4

-4

-4

-1

2

3

-2

3

1

-1

4

-5

2

2

4

-5

-2

4

-5

1

-1

3

-2

0

2

-5

-3

-2

2

-4

-3

-5

0

2

0

-4

4

4

-2

-2

0

-4

-5

-1

-5

-5

-2

0

1

-1

0

-1

2

-4

4

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4


-6=

3

3

1

2

-5

4

0

0

-5

-1

0

-4

-3

0

4

-6

3

4

-4

-4

-1

2

3

-15

3

1

-1

4

-5

2

2

6

-5

-2

4

-5

1

-1

3

-15

0

2

-5

-3

-2

2

-4

3

-5

0

2

0

-4

4

4

-18

-2

0

-4

-5

-1

-5

-5

9

0

1

-1

0

-1

2

4

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4


=

2

-4

24

-1

3

3

2

1

2

-5

4

0

0

-5

-1

0

-4

-3

0

4

-6

3

4

-4

-4

-1

2

3

-15

3

1

-1

4

-5

2

2

6

-5

-2

4

-5

1

-1

3

-15

0

2

-5

-3

-2

2

-4

3

-5

0

2

0

-4

4

4

-18

-2

0

-4

-5

-1

-5

-5

9

0

1

-1

0

-1

2

24

0

1

-4

-5

0

0

0 -4 0 0 3 4 -3 -2 0 4


=

-1

-134

3

2

1

2

-5

4

0

-13

-2

2

-17

-5

0

4

1

8

-20

8

-19

6

3

3

0

-9

27

-27

6

2

-19

-8

8

-5

-5

-3

3

-6

3

-21

6

-18

6

-4

-7

4

14

-20

4

12

4

-14

-4

-20

5

-19

-15

-5

10

8

7

-25

17

6

8

7

-4

-35

16

0

0 -12 0 0 9 12 -9 -6 0 12


=-1

3 0 -6 -15 6 -15 3 -18 9 24

-13

-2

2

-17

-5

0

1

8

-20

8

-19

6

3

0

-9

27

-27

6

-19

-8

8

-5

-5

-3

-6

3

-21

6

-18

6

-7

4

14

-20

4

12

-14

-4

-20

5

-19

-15

10

8

7

-25

17

6

8

7

-4

-35

16

0

-12 0 0 9 12 -9 -6 0 12


0 -6 -15 6 -15 3 -18 9 24

The value in the a1,1 position cannot be zero

9

-5

1

-3

6

-1

7

-2

2

4

-5

-2

6

-7

5

-7

-1

2

-2

-2

-9

2

9 7 -1 3 -4 0

-3 -1

Mirroring the matrix

9

-5

1

-3

6

-1

7

-2

2

4

-5

-2

6

-7

5

-7

1

2

-2

-2

-9

2

9 7 -1 3 -4 0

-3 -1

9

-5

1

-3

6

-1

7

-2

2

4

-5

-2

6

-7

5

-7

1

2

-2

-2

-9

2

9 7 -1 3 -4 0

-3 -1


9

-5

1

-3

6

-1

7

-2

2

4

-5

-2

6

-7

5

-7

1

2

-2

-2

-9

2

9 7 -1 3 -4 0

-3 -1

9

-5

1

-3

6

-1

7

-2

2

4

5

2

-6

7

-5

7

-1

-2

-2

-2

-9

2

97-1-34 0

-3-1

Chio’s matrix condensation

-6

8

-1

19

-10

6

-17

14

-9

8

-7

4

-8

-6

1

4 20 -12 11 -6



Applying Cramer’s rule

-6

8

-1

19

-10

6

-17

14

-9

8

-7

4

-8

-6

1

4 20 -12 11 -6

-6

8

-1

19

-10

6

-17

14

-9

8

-7

4

-8

-6

1

4 20 -12 11 -6


-6

8

-1

19

-10

6

-17

14

-9

8

-7

4

-8

-6

1

4 20 -12 11 -6

-6

8

-1

19

-10

6

-17

14

-9

8

-7

4

4 20 -12 11=

=

2688

7728

= 3 Answer for x9


-6

8

-1

19

-10

6

-17

14

-9

8

-7

4

-8

-6

1

4 20 -12 11-6

-6

8

-1

19

-10

6

-17

14

-9

8

-7

4

4 20 -12 11=

=

2688

180

= 0.07 Answer for x8

Overview of data flow structure

25

Mirroring of the matrix keeps an O(N3) algorithm.

Original Matrix (N)

Original Matrix Mirror (N)

(N/2)

(N/2)

(N/2) (N/2) Image

(N/2) (N/2) Image

(N/4) (N/4)Image (N/4)Image(N/4)

(N/4) (N/4) I (N/4) (N/4) I (N/4) I (N/4) I(N/4) (N/4)

Original Matrix (N)

24 variables

12 variables 12 variables

6 variables 6 variables 6 variables 6 variables

3 x 3 3 x 3 3 x 3 3 x 3 3 x 3 3 x 3 3 x 3 3 x 3

Chio’s condensation Chio’s condensation

Parallel computations

Similar to LU-decomposition (Access by rows) Broadcast communication only Send-ahead on lead row values

26

Mirroring provides an advantage Algorithm mirrors as matrix

reduces in size Load naturally redistributed

among processors LU-decomposition needs

blocking and interleaving to avoid idle processors, leads to complex communication patterns (overhead)

Figure 9.2: Parallel Scientific Computing in C++ and MPI. George Em Karniadakis and Robert M. Kirby II

Paradigm shift – key points

Apply Cramer’s rule Employ matrix condensation for

efficient determinant calculations Highly parallel O(N3) process

Clever bookkeeping to re-use information Final result O(N3) comp. with O(N2)

comm. Key advantage: regular communication

patterns with low comm overhead and balanced processing load

27

Implementation results

Trial platform Single-core Pentium M @ 1.5 GHz 64 KB L1 cache, 1 MB L2 cache

Coded in C with SSE used for core function (Chio’s condensation)

Memory access optimized using cache blocking

Double precision variables and calculations

Result: ~2.4x slower than Matlab (consistently)

28

Challenges ahead

Further code improvement/optimization Current L2 miss rate is high

Precision improvement Parallel implementation

GPU implementation Distributed architecture implementation

Sparse matrix optimization Other linear algebra applications (e.g.

matrix inversion)

29

Thank you30

Documents

A Novel approach to solving large-scale linear systems