- Home
- Documents
*What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?*

prev

next

of 28

View

17Download

1

Tags:

Embed Size (px)

DESCRIPTION

What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?. Shengxin Zhu The University of Oxford. Prof. Xingping Liu and Prof. Tongxiang Gu National Key Laboratory of Computational Physics Institute of Applied Physics and Computational Mathematics. - PowerPoint PPT Presentation

What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?Shengxin ZhuThe University of Oxford Prof. Xingping Liu and Prof. Tongxiang Gu

National Key Laboratory of Computational Physics Institute of Applied Physics and Computational Mathematics

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Outlines

Brief introduction on Heterogeneous supper-computers Computation kernels of Krylov methodsInfluence of communications Case study: GPBiCG(m,l)Challenging problemsConclusion

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Introduction to heterogeneous supper-computersDawning5000ANodes: Bandwidth:Memory: *

Dawning 5000Ranking history 11/200811th 06/200915th11/200919th 06/201024th 11/201035th06/201140th 11/2011 58th

2011/ Nov : top5001st K (JP)2st NUDT (CN)3rd Cray (US)4th Dawning (CN)

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Computational kernels of Krylov methodsVector update: parallel in natureMat-vec Computation intensive; multi-core technology CUDA/OpenMPInner product: Communication intensive (CPU/MPI).

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Influence of communicationfirst glance S Zhu, MSc Thesis, CAEP, 2010Computation cheapCommunication expensive Based on Aztec by Prof. Tuminaro et al @ Sandia

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Real reason for time-consuming communicationsSmall workshops: focus less preparing timeConference: diversity more preparing time

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Strategies for minimizing communicationsReplacing dot by others (semi-Chebyshev ) : workshop only no conference if possible. Inner product free , Gu, Liu, Mo(2002)Reorganizing algorithm such that: (reduce number of conference and each conference accept more talks) residual replacement strategies due to Von de Vorst (2000s). CA KSMs, Demmel et al (2008) Overlapping communication over computation

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

A case study, Paralleling GPBiCG(m,l) (S. Fujino, 2002)GPBiCG(1,0) BiCGSTAB

GPBiCG(0,1) GPBiCG

GPBiCG(1,1) BiCGSTAB2Could be used to design breakdown free BiCGSTAB method.

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

GPBiCG(m,l) (S. Fujino, 2002)

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

GPBiCG(m,l) (S. Fujino, 2002)

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Algorithm Design of PGPBiCG(m,l) Method

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

PGPBiCG(m,l) Method(reduce # global commun. )Algorithm reconstruct: three GobalCs to one

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

PerformanceBased on Aztec by Prof. R.S. Tuminaro et al @ Sandia

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Convergence analysisResidual replacements strategiesBackward stable analysis

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Challenging problemAccurate compute dotWhy Mindless by Kahan Accurate compute inner product.Ogita and Rump et-al, Accurate sum and dot product, SIAM Sci Compt. 2005 cited 188 times. (but) .PLASMA team Backward stable analysis of residual replacement methods.Carson and Demmel, A residual replacement strategy for improving the maximum attainable accuracy of communication avoiding Krylov subspace Methods, April 20 2012 Reliable dot computation algorithm

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Conclusion:Avoiding communication Reliable computationInner product computation is very likely to be the most challenging kernel for HHPC, while Mat_vec important for bothSoftware abstraction and threads programming are helpful, together with re-designing algorithms will do betterMath/AlgorithmCS/Performance Applications interfaceAztecPOSKIPOSKI Hyper, PETSc; Trilinos(Parallel Optimized Sparse Kernel Interface LIbrary) Poski v.1.0 May 02/2012

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Thanks !

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

More than ten thousand processors are connected by networkGlobal Communication becomes more and more seriousInitial study on communication complexity

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Based on the former two strategiesde Sturler and van der Vorst: Parallel GMRES(m) and CG methods (1995)Bucker and Sauren: Parallel QMR method (1997)Yang and Brent: Improved CGS, BiCG and BiCGSTAB methods (2002-03)Gu and Liu et al.: ICR, IBiCR, IBiCGSTAB(2) and PQMRCGSTAB methods (2004-2010)Demmel et al CA-KSMs (2008---)Gu, Liu and Mo: MSD-CG: multiple search direction conjugate gradient method (2004)replaced the inner products computation by solving linear systems with small size. Eliminates global inner products completely.The idea have been generated to MPCG by Grief and Bridson (2006)Methods in literatures

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Comparison of computational count of two Algorithms

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Comparison of computational count of two Algorithms

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Mathematical model of the time consummation

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Scalability analysis

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

The optimal number of processors

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Convergence Analysis

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Numerical Experiments: timing and improvements

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

Numerical Experiments: Speedup

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

PGPBiCG(m,l) method is more scalable and parallel for solving large sparse unsymmetrical linear systems on distributed parallel architecturesPerformance, isoefficiency analysis and numerical experiments have been done for PGPBiCG(m,l) and GPBiCG(m,l) methodsThe parallel communication performance can be improved by a factor of larger than 3. The PGPBiCG(m,l) method has better parallel speed up compared with the GPBiC(m,l) method.For further performance improvements: overlap of computation with communication, numerical stability. Conclusions

SNSCC'12, shengxin.zhu@maths.ox.ac.uk

***