Upload
ralph
View
32
Download
1
Embed Size (px)
DESCRIPTION
Gabriel cramer (1704-1752). A Novel approach to solving large-scale linear systems. Ken Habgood , Itamar Arel Department of Electrical Engineering & Computer Science. April 3, 2009. Outline. Problem statement and motivation Novel approach Revisiting Cramer’s rule Matrix condensation - PowerPoint PPT Presentation
Citation preview
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMSKen Habgood, Itamar Arel
Department of Electrical Engineering & Computer Science
GABRIEL CRAMER (1704-1752)
April 3, 2009
Outline
Problem statement and motivation Novel approach
Revisiting Cramer’s rule Matrix condensation Illustration of the proposed
scheme Implementation results Challenges ahead
2
Solving large-scale linear systems
Many scientific applications Computer models in finance, biology,
physics Real-time load flow calculations for
electric utilitiesShort-circuit fault, economic analysisConsumer generation on the electric
grid may soon require real-time calculations (hybrid cars, solar panels)
3
Why try to improve?
We want parallel processing for speed
Current schemes use Gaussian elimination Mainstream approach: LU decomposition O(N3) computational complexity, O(N2)
parallelizable If you had N2 processors O(N) time
So far so good … The “catch”: Irregular communication
patterns for load balancing across processing nodes
4
Cramer’s rule revisited5
ax + by = e
cx + dy = f
e b
f d
a b
c d
x =
The solution to a linear system Ax = b is given by xi = |Ai(b)|/|A|
where Ai(b) denotes A with its ith column replaced by b
O(N!) computational complexity
Chio’s matrix condensation6
a1 b1 c1
a2 b2 c2
a3 b3 c3
Matrix A
= a1–(n-2)
a1 b1
a2 b2
a1 c1
a2 c2
a1 b1
a3 b3
a1 c1
a3 c3
b2’ c2’
b3’ c3’
= a1–(n-2)
a1,1 cannot be 0
If a1,1 is 1 then a1,1–(n-2) =1
Let D denote the matrix obtained by replacing each element ai,j by
a1,1 a1,j
ai,1 ai,j
Then |A| = |D|
a1,1n-2
Recursive determinant calculation O(N3) computational complexity
Highlights of the approach
Chio’s condensation combined with Cramer’s rule results in O(N4)
Goal to remain at O(N3) Retain attractive parallel processing potential
Solution: clever bookkeeping to reduce computations “Mirror” matrix before applying condensation Each matrix solves for half of the unknowns Condense each until matrix size matches the
number of unknowns Mirror the matrices again
7
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
3
-2
0
2
-5
-3
-2
2
-4
-3
-5
0
2
0
-4
4
4
-2
-2
0
-4
-5
-1
-5
-5
-2
0
1
-1
0
-1
2
-4
4
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
0 -4 0
Mirroring the matrix
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
3
-2
0
2
-5
-3
-2
2
-4
-3
-5
0
2
0
-4
4
4
-2
-2
0
-4
-5
-1
-5
-5
-2
0
1
-1
0
-1
2
0 3 4 -3 -2 0
9 unknowns to solve for
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
3
-2
0
2
-5
-3
-2
2
-4
-3
-5
0
2
0
-4
4
4
-2
-2
0
-4
-5
-1
-5
-5
-2
0
1
-1
0
-1
2
-4
4
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
-3
2
0
-2
5
3
2
-2
4
3
5
0
-2
0
4
-4
-4
2
2
0
4
5
1
5
5
2
0
-1
1
0
1
-2
0-4003-4320
-4
4
0
1
-4
-5
0
0
4
Mirroring the matrix (cont’)
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
3
-2
0
2
-5
-3
-2
2
-4
-3
-5
0
2
0
-4
4
4
-2
-2
0
-4
-5
-1
-5
-5
-2
0
1
-1
0
-1
2
-4
4
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
-3
2
0
-2
5
3
2
-2
4
3
5
0
-2
0
4
-4
-4
2
2
0
4
5
1
5
5
2
0
-1
1
0
1
-2
0-4003-4320
-4
4
0
1
-4
-5
0
0
4
Mirroring the matrix (cont’)
5 unknowns to solve for
4 unknowns to solve for
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
3
-2
0
2
-5
-3
-2
2
-4
-3
-5
0
2
0
-4
4
4
-2
-2
0
-4
-5
-1
-5
-5
-2
0
1
-1
0
-1
2
-4
4
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
-3
2
0
-2
5
3
2
-2
4
3
5
0
-2
0
4
-4
-4
2
2
0
4
5
1
5
5
2
0
-1
1
0
1
-2
0-4003-4320
-4
4
0
1
-4
-5
0
0
4
Mirroring the matrix (cont’)
5 unknowns to solve for
4 unknowns to solve for
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
3
-2
0
2
-5
-3
-2
2
-4
-3
-5
0
2
0
-4
4
4
-2
-2
0
-4
-5
-1
-5
-5
-2
0
1
-1
0
-1
2
-4
4
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
Chio’s matrix condensation
3
3
2
1
2
-5
4
0
-1
-1
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
3
-2
0
2
-5
-3
-2
2
-4
-3
-5
0
2
0
-4
4
4
-2
-2
0
-4
-5
-1
-5
-5
-2
0
1
-1
0
-1
2
-4
4
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
Chio’s matrix condensation (cont’)
= 0
3
3
2
1
2
-5
4
0
-1
0
-5
-1
0
-4
-3
0
4
2
3
4
-4
-4
-1
2
3
-2
3
1
-1
4
-5
2
2
4
-5
-2
4
-5
1
-1
3
-2
0
2
-5
-3
-2
2
-4
-3
-5
0
2
0
-4
4
4
-2
-2
0
-4
-5
-1
-5
-5
-2
0
1
-1
0
-1
2
-4
4
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
Chio’s matrix condensation (cont’)
-6=
3
3
1
2
-5
4
0
0
-5
-1
0
-4
-3
0
4
-6
3
4
-4
-4
-1
2
3
-15
3
1
-1
4
-5
2
2
6
-5
-2
4
-5
1
-1
3
-15
0
2
-5
-3
-2
2
-4
3
-5
0
2
0
-4
4
4
-18
-2
0
-4
-5
-1
-5
-5
9
0
1
-1
0
-1
2
4
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
Chio’s matrix condensation (cont’)
=
2
-4
24
-1
3
3
2
1
2
-5
4
0
0
-5
-1
0
-4
-3
0
4
-6
3
4
-4
-4
-1
2
3
-15
3
1
-1
4
-5
2
2
6
-5
-2
4
-5
1
-1
3
-15
0
2
-5
-3
-2
2
-4
3
-5
0
2
0
-4
4
4
-18
-2
0
-4
-5
-1
-5
-5
9
0
1
-1
0
-1
2
24
0
1
-4
-5
0
0
0 -4 0 0 3 4 -3 -2 0 4
Chio’s matrix condensation (cont’)
=
-1
-134
3
2
1
2
-5
4
0
-13
-2
2
-17
-5
0
4
1
8
-20
8
-19
6
3
3
0
-9
27
-27
6
2
-19
-8
8
-5
-5
-3
3
-6
3
-21
6
-18
6
-4
-7
4
14
-20
4
12
4
-14
-4
-20
5
-19
-15
-5
10
8
7
-25
17
6
8
7
-4
-35
16
0
0 -12 0 0 9 12 -9 -6 0 12
Chio’s matrix condensation (cont’)
=-1
3 0 -6 -15 6 -15 3 -18 9 24
-13
-2
2
-17
-5
0
1
8
-20
8
-19
6
3
0
-9
27
-27
6
-19
-8
8
-5
-5
-3
-6
3
-21
6
-18
6
-7
4
14
-20
4
12
-14
-4
-20
5
-19
-15
10
8
7
-25
17
6
8
7
-4
-35
16
0
-12 0 0 9 12 -9 -6 0 12
Chio’s matrix condensation (cont’)
0 -6 -15 6 -15 3 -18 9 24
The value in the a1,1 position cannot be zero
9
-5
1
-3
6
-1
7
-2
2
4
-5
-2
6
-7
5
-7
-1
2
-2
-2
-9
2
9 7 -1 3 -4 0
-3 -1
Mirroring the matrix
9
-5
1
-3
6
-1
7
-2
2
4
-5
-2
6
-7
5
-7
1
2
-2
-2
-9
2
9 7 -1 3 -4 0
-3 -1
9
-5
1
-3
6
-1
7
-2
2
4
-5
-2
6
-7
5
-7
1
2
-2
-2
-9
2
9 7 -1 3 -4 0
-3 -1
Mirroring the matrix (cont’)
9
-5
1
-3
6
-1
7
-2
2
4
-5
-2
6
-7
5
-7
1
2
-2
-2
-9
2
9 7 -1 3 -4 0
-3 -1
9
-5
1
-3
6
-1
7
-2
2
4
5
2
-6
7
-5
7
-1
-2
-2
-2
-9
2
97-1-34 0
-3-1
Chio’s matrix condensation
-6
8
-1
19
-10
6
-17
14
-9
8
-7
4
-8
-6
1
4 20 -12 11 -6
3 unknowns to solve for
2 unknowns to solve for
Applying Cramer’s rule
-6
8
-1
19
-10
6
-17
14
-9
8
-7
4
-8
-6
1
4 20 -12 11 -6
-6
8
-1
19
-10
6
-17
14
-9
8
-7
4
-8
-6
1
4 20 -12 11 -6
Applying Cramer’s rule
-6
8
-1
19
-10
6
-17
14
-9
8
-7
4
-8
-6
1
4 20 -12 11 -6
-6
8
-1
19
-10
6
-17
14
-9
8
-7
4
4 20 -12 11=
=
2688
7728
= 3 Answer for x9
Applying Cramer’s rule
-6
8
-1
19
-10
6
-17
14
-9
8
-7
4
-8
-6
1
4 20 -12 11-6
-6
8
-1
19
-10
6
-17
14
-9
8
-7
4
4 20 -12 11=
=
2688
180
= 0.07 Answer for x8
Overview of data flow structure
25
Mirroring of the matrix keeps an O(N3) algorithm.
Original Matrix (N)
Original Matrix Mirror (N)
(N/2)
(N/2)
(N/2) (N/2) Image
(N/2) (N/2) Image
(N/4) (N/4)Image (N/4)Image(N/4)
(N/4) (N/4) I (N/4) (N/4) I (N/4) I (N/4) I(N/4) (N/4)
Original Matrix (N)
24 variables
12 variables 12 variables
6 variables 6 variables 6 variables 6 variables
3 x 3 3 x 3 3 x 3 3 x 3 3 x 3 3 x 3 3 x 3 3 x 3
Chio’s condensation Chio’s condensation
Parallel computations
Similar to LU-decomposition (Access by rows) Broadcast communication only Send-ahead on lead row values
26
Mirroring provides an advantage Algorithm mirrors as matrix
reduces in size Load naturally redistributed
among processors LU-decomposition needs
blocking and interleaving to avoid idle processors, leads to complex communication patterns (overhead)
Figure 9.2: Parallel Scientific Computing in C++ and MPI. George Em Karniadakis and Robert M. Kirby II
Paradigm shift – key points
Apply Cramer’s rule Employ matrix condensation for
efficient determinant calculations Highly parallel O(N3) process
Clever bookkeeping to re-use information Final result O(N3) comp. with O(N2)
comm. Key advantage: regular communication
patterns with low comm overhead and balanced processing load
27
Implementation results
Trial platform Single-core Pentium M @ 1.5 GHz 64 KB L1 cache, 1 MB L2 cache
Coded in C with SSE used for core function (Chio’s condensation)
Memory access optimized using cache blocking
Double precision variables and calculations
Result: ~2.4x slower than Matlab (consistently)
28
Challenges ahead
Further code improvement/optimization Current L2 miss rate is high
Precision improvement Parallel implementation
GPU implementation Distributed architecture implementation
Sparse matrix optimization Other linear algebra applications (e.g.
matrix inversion)
29
Thank you30