22
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures Ahmed Sameh and Ananth Grama Computer Science Department Purdue University. http://www.cs.purdue.edu/people/{sameh/ayg} Linear Solvers Grant Kickoff Meeting, 9/26/06.

Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures

  • Upload
    olisa

  • View
    43

  • Download
    1

Embed Size (px)

DESCRIPTION

Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures. Ahmed Sameh and Ananth Grama Computer Science Department Purdue University. http://www.cs.purdue.edu/people/{sameh/ayg}. Linear Solvers Grant Kickoff Meeting, 9/26/06. - PowerPoint PPT Presentation

Citation preview

Page 1: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Evaluating Sparse Linear System

Solvers on Scalable Parallel Architectures

Ahmed Sameh and Ananth GramaComputer Science Department

Purdue University.http://www.cs.purdue.edu/people/{sameh/ayg}

Linear Solvers Grant Kickoff Meeting, 9/26/06.

Page 2: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures

Project Overview Objectives and Methodology

• Design scalable sparse solvers (direct, iterative, and hybrid) and evaluate their scaling/communication characteristics.

• Evaluate architectural features and their impact on scalable solver performance.

• Evaluate performance and productivity aspects of programming models -- PGAs (CAF, UPC) and MPI.

Challenges and Impact

• Generalizing the space of linear solvers.

• Implementation and analysis on parallel platforms

• Performance projection to the petascale

• Guidance for architecture and programming model design / performance envelope.

• Benchmarks and libraries for HPCS.

Milestones / Schedule

• Final deliverable: Comprehensive evaluation of scaling properties of existing (and novel solvers).

• Six month target: Comparative performance of solvers on multicore SMPs and clusters.

• Twelve-month target: Comprehensive evaluation on Cray X1, BG, JS20/21, of CAF/UPC/MPI implementations.

Page 3: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Introduction

• A critical aspect of High-Productivity relates to the identification of points/regions in the algorithm/ architecture/ programming model space that are amenable to petascale systems.

• This project aims to identify such points in the context of commonly used sparse linear system solvers and to develop novel solvers.

• These novel solvers emphasize reduction in memory/remote accesses at the expense of (possibly) higher FLOP counts – yielding much better actual performance.

Page 4: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Project Rationale

• Sparse solvers form the most commonly used kernels on HPC machines.

• Design of HPC architectures and programming models must be influenced by their suitability to this (and related) kernels.

• Extreme need for concurrency and novel architectural models require fundamental re-examination of conventional solvers.

Page 5: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Project Goals• Develop a generalization of direct and iterative

solvers – the Spike polyalgorithm.• Implement this generalization on various

architectures (multicore, multicore SMP, multicore SMP aggregates) and programming models (PGAs, Messaging APIs)

• Analytically quantify performance and project to petascale platforms.

• Compare relative performance, identify architecture/programming model features, and guide algorithm/ architecture/ programming model co-design.

Page 6: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Background

• Personnel:– Ahmed Sameh, Samuel Conte Chair in Computer

Science, has worked on development of parallel sparse solvers for four decades.

– Ananth Grama, Professor and University Scholar, has worked both on numerical aspects of sparse solvers, as well as analytical frameworks for parallel systems.

– (To be named – Postdoctoral Researcher)* will be primarily responsible for implementation and benchmarking.

*We have identified three candidates for this position and will shortly be hiring one of them.

Page 7: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Background

• Technical– We have built extensive infrastructure on

parallel sparse solvers – including the Spike parallel toolkit, augmented-spectral ordering techniques, and multipole-based preconditioners

– We have diverse hardware infrastructure, including Intel/AMP multicore SMP clusters, JS20/21 Blade servers, BlueGene/L, Cray X1.

Page 8: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Background

• Technical (continued)– We have initiated installation of Co-Array

Fortran and Unified Parallel C on our machines and porting our toolkits to these PGAs.

– We have extensive experience in analysis of performance and scalability of parallel algorithms, including development of the isoefficiency metric for scalability.

Page 9: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Technical Highlights

• The SPIKE Toolkit

• (Dr. Sameh, could you include a few slides here).

Page 10: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Technical Highlights

• Analysis of Scaling Properties– In early work, we developed the Isoefficiency

metric for scalability.– With the likely scenario of utilizing up to 100K

processing cores, this work becomes critical.– Isoefficiency quantifies the performance of a

parallel system (a parallel program and the underlying architecture) as the number of processors is increased.

Page 11: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Technical Highlights• Isoefficiency Analysis

– The efficiency of any parallel program running on a given problem instance goes down with increasing number of processors.

– For a family of parallel programs (formally referred to as scalable programs), increasing the problem size results in an increase in efficiency.

Page 12: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Technical Highlights• Isoefficiency is the rate at which problem

size must be increased w.r.t. number of processors, to maintain constant efficiency.

• This rate is critical, since it is ultimately limited by total memory size.

• Isoefficiency is a key indicator of a program’s ability to scale to very large machine configurations.

• Isoefficiency analysis will be used extensively for performance projection and scaling properties.

Page 13: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Architecture

• We target the following currently available architectures

– IBM JS20/21 and BlueGene/L platforms– Cray X1/XT3– AMD Opteron multicore SMP and SMP clusters– Intel Xeon multicore SMP and SMP clusters

• These platforms represent a wide range of currently available architectural extremes.

Page 14: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Implementation

• Current implementations are MPI based.• The Spike tooklit (iterative as well as direct

solvers) will be ported to– POSIX and OpenMP– UPC and CAF– Titanium and X10 (if releases are available)

• These implementations will be comprehensively benchmarked across platforms.

Page 15: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Benchmarks/Metrics

• We aim to formally specify a number of benchmark problems (sparse systems arising in structures, CFD, and fluid-structure interaction)

• We will abstract architecture characteristics – processor speed, memory bandwidth, link bandwidth, bisection bandwidth.

• We will quantify solvers on the basis of wall-clock time, FLOP count, parallel efficiency, scalability, and projected performance to petascale systems.

Page 16: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Progress/Accomplishments

• Implementation of the parallel Spike polyalgorithm toolkit

• Incorporation of a number of direct (SuperLU, MUMPS) and iterative solvers into Spike (preconditioned Krylov subspace methods)

• Evaluation of Spike on IBM/SP and Intel multicore platforms, integration into the Intel MKL library.

Page 17: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Milestones

• Final deliverable: Comprehensive evaluation of scaling properties of existing (and new solvers).

• Six month target: Comparative performance of solvers on multicore SMPs and clusters.

• Twelve-month target: Comprehensive evaluation on Cray X1, BG, JS20/21, of CAF/UPC/MPI implementations.

Page 18: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Financials

• The total cost of this project is approximately $150K for its one-year duration.

• The budget primarily accounts for a post-doctoral researcher’s salary/benefits and minor summer-time for the PIs.

• Together, these three project personnel are responsible for accomplishing project milestones and reporting.

Page 19: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Concluding Remarks

• This project takes a comprehensive view of linear system solvers and the suitability of petascale HPC systems.

• Its results directly influence ongoing and future development of HPC systems.

• A number of major challenges are likely to emerge, both as a result of this project, and from impending architectural innovations.

Page 20: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Concluding Remarks

• Architectural features include– Scalable multicore platforms: 64 to 128 cores on the

horizon– Heterogeneous multicore: It is likely that cores are

likely to be heterogeneous – some with floating point units, others with vector units, yet others with programmable hardware (indeed such chips are commonly used in cell phones)

– Significantly higher pressure on the memory subsystem

Page 21: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Concluding Remarks

• Impact of architectural features on algorithms and programming models.– Affinity scheduling is important for

performance – need to specify tasks that must be co-scheduled (suitable programming abstractions needed).

– Programming constructs for utilizing heterogeneity.

Page 22: Evaluating Sparse Linear System  Solvers on Scalable Parallel  Architectures

Concluding Remarks

• Impact of architectural features on algorithms and programming models.– FLOPS are cheap, memory references are expensive

– explore new families of algorithms that optimize for (minimize) latter

– Algorithmic techniques and programming constructs for specifying algorithmic asynchrony (used to mask system latency)

– Many of the optimizations are likely to be beyond the technical reach of applications programmers – need for scalable library support

– Increased emphasis on scalability analysis