Upload
nixie
View
59
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Gabriel cramer (1704-1752). A Condensation-based Low Communication Linear Systems Solver Utilizing Cramer's Rule. Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science The University of Tennessee. Outline. Motivation & problem statement Algorithm review - PowerPoint PPT Presentation
Citation preview
A CONDENSATION-BASED LOW COMMUNICATION LINEAR SYSTEMS SOLVER UTILIZING CRAMER'S RULE
Ken Habgood, Itamar ArelKen Habgood, Itamar ArelDepartment of Electrical Engineering & Computer ScienceDepartment of Electrical Engineering & Computer ScienceThe University of TennesseeThe University of Tennessee
GABRIEL CRAMER (1704-1752)
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Outline2
Motivation & problem statement
Algorithm review Numerical accuracy &
stability Parallel Implementation Communication Results
Source: http://tridane.faculty.asu.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Introduction
Mainstream approach: Gaussian Elimination e.g. LU decomposition
Looking for a lower communication overhead, efficient parallel solver
Targeting an unpopular approach: Cramer’s Rule
3
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
LU Communication Pattern
Source: http://www.caam.rice.edu/~timwar/MA471F03/
Communication for distributed LU decomposition
L00U00
U01 U02
L10 A11 A12
L20 A21 A22
Three sequential steps1. Top left computes and
sends2. Row and column leads
compute and send3. Remaining processors
factorize their blocks One-to-one
communication Idle time while leads
processing
4
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Outline5
Motivation & problem statement
Algorithm review Numerical accuracy &
stability Parallel Implementation Communication Results
Source: http://tridane.faculty.asu.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Proposed Algorithm Flow6
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Matrix “Mirroring”
1,42,43,44,4
1,32,33,34,3
1,22,23,24,2
1,12,13,14,1
4,43,42,41,4
4,33,32,31,3
4,23,22,21,2
4,13,12,11,1
aaaa
aaaa
aaaa
aaaa
mirror
aaaa
aaaa
aaaa
aaaa
Mirroring example
1,42,4
1,32,3
4,43,4
4,33,3
''
''
''
''
aa
aa
aa
aa
Applying Chio’s condensation yields:
7
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Outline8
Motivation & problem statement
Algorithm review Numerical accuracy &
stability Parallel Implementation Communication Results
Source: http://tridane.faculty.asu.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Accuracy and Numerical Stability Backward error estimation
Theoretical estimate of rounding error
E matrix depends on two items The largest element in A or b The growth factor of the algorithm
Same growth factor as LU-decomposition with partial pivoting
9
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Forward Error Comparisons
Matrix Size
κ(A)
Max Matlab
Max GSL
Avg Matlab
Avg GSL
1000 x 1000 506930 2.39E-09 1.93E-10 1.03E-10 5.38E-12
2000 x 2000 790345 4.52E-09 5.36E-09 1.01E-10 7.27E-12
3000 x 3000 1540152 1.95E-08 1.84E-08 1.12E-10 2.09E-11
4000 x 4000 12760599 4.81E-08 5.62E-08 1.43E-10 7.91E-11
5000 x 5000 765786 2.92E-08 4.39E-08 1.18E-10 3.46E-11
6000 x 6000 1499430 8.67E-08 8.70E-08 1.37E-10 6.04E-11
7000 x 7000 3488010 9.92E-08 8.95E-08 1.27E-10 5.15E-11
8000 x 8000 8154020 9.09E-08 9.43E-08 1.86E-10 7.85E-11
10
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Forward Error - Residual
Matrix Size κ(A)
Max Residual
Avg Residual
1000 x 1000 506930 3.14E-08 4.46E-09
2000 x 2000 790345 6.72E-09 9.48E-10
3000 x 3000 1540152 2.79E-08 3.28E-09
4000 x 4000 12760599 1.06E-05 1.34E-06
5000 x 5000 765786 2.00E-08 2.65E-09
6000 x 6000 1499430 2.95E-08 3.86E-09
7000 x 7000 3488010 1.99E-08 2.44E-09
8000 x 8000 8154020 1.94E-08 2.32E-09
11
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
MATLAB Matrix Gallery
Special Matrix
Avg Matlab Residual
Matlab Residual
clement — Tridiagonal matrix with zero diagonal entries 1.40E-05 7.43E+133 7.85E+144
lehmer — Symmetric positive definite matrix 2.49E-06 7.20E-09 3.89E-06
circul — Circulant matrix 3.23E-08 1.53E-13 1.04E-09
chebspec — Chebyshev spectral differentiation matrix 9.12E-02 3.74E+04 2.0E-01
lesp — Tridiagonal matrix with real, sensitive eigenvalues 9.56E-11 5.11E-16 7.30E-10
minij — Symmetric positive definite matrix 5.14E-10 1.71E-08 6.59E-06
orthog — Orthogonal and nearly orthogonal matrices 1.03E-07 1.09E-14 2.80E-08
randjorth — Random J-orthogonal matrix 1.55E-04 1.68E-00 1.13E-04
12
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Outline13
Motivation & problem statement
Algorithm review Numerical accuracy &
stability Parallel Implementation Communication Results
Source: http://tridane.faculty.asu.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Serial Performance
Results support the theoretical ~2.5x complexity ratio
14
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Algorithm Processing Flow15
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Overview of Parallel Implementation
16
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Parallel Implementation (cont’)
17
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Two phases of parallel communication Parallel Chio’s
Gather Columns
Overall Bandwidth
Communication Complexity
N
k
N
k
NNkkkkN
0
log
0
2
2
2
221
2
2
NFN
FP
P
F
NNdoublebytes
2
31
2
3/8
22
2
FPPFFN 12 /log2
N: Original matrix size, P: number of processors, F: gather columns size
18
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Communication Overhead19
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Point at which Communication “dead time” matches computational workload
Where’s the Breakeven Point?
0
05.1
5.2
322
22223
223
32
3
CdN
CC
CC
pp
NdpNdNp
dpNp
NdN
p
N
Assuming dC = .05 and N = 1000, the breakeven processors point would be P~142
20
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Closing Thoughts …
Proposed O(N3) Cramer’s Rule method Significantly lower communications
overhead Many more “broadcasts” than “unicasts” Comm. function of problem size not processors
Next steps … Optimize parallel implementation Spare matrix version
21
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Thank you22