Upload
lamngoc
View
222
Download
0
Embed Size (px)
Citation preview
Introduction to scientific computing using
PETSc and Trilinos Václav Hapla David Horák
Michal Merta
PRACE Spring School, Cracow 2012
Should we reinvent the wheel?
many complex but well-known and often-used algorithms (LU, CG, matrix-vector multiply, …) have been already implemented, tested and are ready to use!
a software framework is a software providing generic functionality that can be selectively changed by user code, thus providing application specific software (wikipedia.org)
motivation: programmers should consider focusing on new, original algorithms that make an added value
Frameworks for scientific computing – why ?
are parallelized on the data level (vectors & matrices) using MPI
use BLAS and LAPACK – de facto standard for dense LA
have their own implementation of sparse BLAS
include robust preconditioners, linear solvers (direct and iterative) and nonlinear solvers
can cooperate with many other external solvers and libraries (e.g. MATLAB, MUMPS, UMFPACK, …)
already support CUDA and hybrid parallelization
are licensed as open-source
Both PETSc and Trilinos…
PETSc
„essential object orientation“
for programmers used to procedural programming but seeking for modular code
recommended for C and FORTRAN users
Trilinos
„pure object orientation“
for programmers who are not scared of OOP, appreciate good SW design and have some experience with C++
extensibility and reusability
Potential users
PETSc Library Václav Hapla
David Horák
PETSc project
PETSc programming primitives
Objects in PETSc
Vectors, index sets and matrices in PETSc
Linear solvers
Debugging & profiling
Outline of PETSc tutorial
PETSc project Václav Hapla
PETSc = Portable, Extensible Toolkit for Scientific computation
developed by Argonne National Laboratory since 1991
data structures and routines for the scalable parallel solution of scientific applications modeled by PDE
coded primarily in C language but good FORTRAN support, can also be called from C++ and Python codes
homepage: www.mcs.anl.gov/petsc
current stable version is 3.2
PETSc project (1)
petsc-dev (development branch) is evolving intensively
code and mailing lists open to anybody
portable to any parallel system supporting MPI
tightly coupled systems (Cray XT5, BG/P, Earth Simulator, Sun Blade, SGI Altix)
loosely coupled systems, such as networks of workstations (Linux, Windows, IBM, Mac, Sun)
iPhone support
PETSc project (2)
Developing parallel, nontrivial PDE solvers that deliver high performance is still difficult and requires months (or even years) of concentrated effort. PETSc is a toolkit that can ease these difficulties and reduce the development time, but it is not a black-box PDE solver, nor a silver bullet.
Barry Smith
(PETSc Team)
Role of PETSc
„We will continually add new features and enhanced functionality in upcoming releases; small changes in usage and calling sequences of PETSc routines will continue to occur. Although keeping one's code accordingly up-to-date can be annoying, all PETSc users will be rewarded in the long run with a cleaner, better designed, and easier-to-use interface.“
Changes
Documentation
all documention available at http://www.mcs.anl.gov/petsc/documentation/index.html
PETSc users manual – PDF (fully searchable, hypertext)
help topics – general topics such as „error handling“, „multigrid“, „shared memory“
manual pages – individual routines, split into 4 categories: Beginner - basic usage
Intermediate - setting options for algorithms and data structures
Advanced - setting more advanced options and customization
Developer - interfaces intended primarily for library developers
PETSc is layered on top of MPI
MPI provides low-level tools to exchange data primitives between processes
PETSc provides medium-level tools such as insert matrix element to arbitrary location
parallel matrix-vector product
you do not need to know much about MPI
but you can call arbitrary MPI routine directly if needed
same code for sequential and parallel runs
Parallelism in PETSc
PETSc cooperates with... (1)
Python: petsc4py
Documentation utilities: Sowing, lgrind, c2html
MPI: MPICH, MPE, Open MPI
Dense LA: BLAS, LAPACK, BLACS, ScaLAPACK, PLAPACK
Graphs & load balancing: ParMetis, Chaco, Jostle, Party, Scotch, Zoltan
Direct linear solvers: MUMPS, Spooles, SuperLU, SuperLU_Dist, UMFPack
PETSc cooperates with... (2)
Iterative linear solvers: PaStiX, HYPRE
Multigrid: Trilinos ML
Eigenvalue solvers: BLOPEX
FFT: FFTW
Time-stepping: Sundials
Meshing: Triangle, TetGen, FIAT, FFC, Generator
Data exchange: HDF5
Boost
TAO - Toolkit for Advanced Optimization
SLEPc - Scalable Library for Eigenvalue Problems
fluidity - a finite element/volume fluids code
Prometheus - scalable unstructured finite element solver
freeCFD - general purpose CFD solver
OpenFVM - finite volume based CFD solver
OOFEM - object oriented finite element library
libMesh - adaptive finite element library
Packages that use/extend PETSc (1)
MOOSE - Multiphysics Object-Oriented Simulation Environment developed at INL built on top of libmesh on top of PETSc
DEAL.II - sophisticated C++ based finite element simulation package
PHMAL - The Parallel Hierarchical Adaptive MultiLevel Project
Chaste - Cancer, Heart and Soft Tissue Environment
Packages that use/extend PETSc (2)
PETSc has been used for modeling in all of these areas: Acoustics, Aerodynamics, Air Pollution, Arterial Flow, Bone Fractures, Brain Surgery, Cancer Surgery, Cancer Treatment, Carbon Sequestration, Cardiology, Cells, CFD, Combustion, Concrete, Corrosion, Data Mining, Dentistry, Earth Quakes...
Applications (1)
Applications (2)
Fracture mechanics
Mechanics- elasticity
Real-time surgery
Magma dynamics
PETSc installation in a nutshell
Václav Hapla
stable releases of PETSc can be downloaded via HTTP as a tarball
petsc-3.2-p7.tar.gz - full distribution (including all current patches) with documentation
petsc-lite-3.2-p7.tar.gz - smaller version with no documentation (all documentation may be accessed online)
Download - tarball
stable releases as well as current development release can be downloaded using Mercurial versioning system
caution – build system has its own separate repository!
stable: hg clone http://petsc.cs.iit.edu/petsc/releases/petsc-3.2
hg clone http://petsc.cs.iit.edu/petsc/releases/BuildSystem-3.2 \
petsc-3.2/config/BuildSystem
dev: hg clone http://petsc.cs.iit.edu/petsc/releases/petsc-3.2
hg clone http://petsc.cs.iit.edu/petsc/releases/BuildSystem-3.2 \
petsc-3.2/config/BuildSystem
Download - Mercurial
./configure script written in Python
realizes PETSc auto-tuning capabilities
sets many internal variables and macros depending on the machine
generates makefile
--help – prints all options
see www.mcs.anl.gov/petsc/documentation/installation.html
Configuration
PETSC_DIR and PETSC_ARCH variables that control the configuration and build process of PETSc
These variables can be set as environment variables or specified on the command line.
PETSC_DIR points to the location of the PETSc installation that is used.
Multiple PETSc versions can coexist on the same file-system. By changing PETSC_DIR value, one can switch between these installed versions of PETSc.
PETSC_DIR
PETSC_ARCH variable gives a name to a configuration and build.
configure uses this value to store the generated makefiles in ${PETSC_DIR}/${PETSC_ARCH}/conf.
make uses this value to determine the location
program libraries (.a or .so) of PETSc and downloaded external packages are stored into ${PETSC_DIR}/${PETSC_ARCH}/lib
Thus one can install multiple variants of PETSc libraries - by providing different PETSC_ARCH values to each configure build.
Then one can switch between using these variants of libraries by switching the PETSC_ARCH value used.
PETSC_ARCH
PETSc supports tens of external packages
[pkg] = mumps, superlu, parmetis, sprng, netcdf, ...
download and compile automatically:
--download-[pkg] - downloads and installs a package for you in $PETSC_DIR/lib
use existing installation
--with-[pkg] =<bool> test for [pkg]
--with-[pkg]-dir=<dir> the root directory of the [pkg] installation
--with-[pkg]-include=<dirs>
--with-[pkg]-lib=<libraries: e.g.[/Users/..../libboost.a,...]>
External packages
./configure --with-batch
for machines with a batch system
configure generates special executable binary conftest
run conftest on one computing node (e.g. submit the batch script)
it will generate a new ./reconfigure-$PETSC_ARCH script with machine specific variables set (cache size etc.)
run ./reconfigure to complete the configuration stage
Batch mode
after configuration stage is completed successfully you get the message like this Configure stage complete. Now build PETSc libraries with (cmake build): make PETSC_DIR=/home/vhapla/devel/petsc-dev \
PETSC_ARCH=debug-so-mpich2-gnu all
you can copy and paste the make command
it will compile the source files and build the program library
it can make use of CMake if installed
significant speedup of compilation
shows progress percentage
Compilation
PETSc programming primitives Václav Hapla
#include "petsc.h"
#undef __FUNCT__
#define __FUNCT__ "main"
int main(int argc,char **argv)
Declare the name of each routine by redefining __FUNCT__ macro to get more useful tracebacks on error
Program header in C
program init
implicit none
#include "finclude/petsc.h"
FORTRAN has more limited error handling, one cannot use __FUNCT__ macro
If you are familiar with C, please use C.
We will focus on PETSc C interface.
Program header in F
You can include all PETSc headers at once by #include "petsc.h" //includes all PETSc headers
Or you can include specific headers #include "petscsys.h" //framework routines
#include "petscvec.h" //vectors
#include "petscmat.h" //matrices
Higher level headers include all lower level headers needed
#include "petscksp.h" //includes vec,mat,dm,pc
What headers to include?
Initialize & Finalize (1)
static char help[] = "Empty program.\n\n";
#include <petscsys.h>
int main(int argc,char **argv)
{
ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr);
ierr = PetscFinalize();CHKERRQ(ierr);
return 0;
}
Every PETSc program begins with the call to PetscInitialize()
ends with the call to PetscFinalize()
they call MPI_Init(), MPI_Finalize()
Initialize & Finalize (2)
static char help[] = "Empty program.\n\n";
#include <petscsys.h>
int main(int argc,char **argv)
{
ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr);
ierr = PetscFinalize();CHKERRQ(ierr);
return 0;
}
argc,argv - propagate command line arguments to PETSc and MPI
help - additional help messages to print when the executable is invoked with the cmd-line-arg -help (will be discussed later)
PETSc is written in C
C has no support for C++ exceptions
instead of throwing exception, every routine returns integer error code (PetscErrorCode type)
error code is „catched“ by CHKERRQ(ierr) macro
PetscErrorCode ierr;
ierr = SomePetscRoutine();CHKERRQ(ierr);
Error handling (1)
#include <petscsys.h>
int main(int argc,char **argv)
{
ierr = PetscFinalize(); CHKERRQ(ierr);
return 0;
}
This code throws this error: PetscInitialize() must be called before PetscFinalize()
(+ stacktrace)
Error handling (2)
Communicators
communicator = an opaque object of MPI_Comm type that defines process group and synchronization channel
PETSc built-in communicators: PETSC_COMM_SELF – just this process – for serial objects
PETSC_COMM_WORLD – all processes – for parallel objects
MPI can split communicators, spawn processes on new communicators – PETSc does not deal with it
Function Collectiveness
1. Not Collective – no communication nor synchronization VecGetLocalSize(), MatSetValues()
2. Logically Collective – checked when running in debug mode KSPSetType(), PCMGSetCycleType()
3. Neighbor-wise Collective – point-to-point communication between two processes VecScatterBegin(), MatMult()
4. Collective – global communication, synchronous VecNorm(), MatAssemblyBegin(), KSPCreate()
PETSc provides many useful utilities
prefixed by Petsc
parallel flow control: Barrier, SequentialPhaseBegin/End
memory management and checking: Malloc,Free,MallocValidate,MallocDump
Utility routines (1)
logging: PetscLogEventRegister/Begin/End
string handling: Strcat/cmp/cpy/len/tolower/replace/ToArray
MATLAB engine interface: MatlabEngineCreate/Destroy/Evaluate
and many more
Utility routines (2)
PetscInt n = 20;
PetscScalar v = -3.5, w = 3.1e9;
PetscReal x = 2.55, y = 1e-9;
PETSc has its own typedefs for numeric data types
It is better to use them instead of built-in C types
Better portability and easier switching between
real and complex numbers
32-bit and 64-bit numbers
Primitive datatypes
PETSc provides routines for managing the options database
in your program, you can call routines PetscOptionsGetInt,
PetscOptionsGetString,
PetscOptionsGetReal, etc. to obtain the values
Options (1)
Example:
in command-line ./yourapp -myint 10 -myreal 1e3
in program yourapp: PetscReal myreal; PetscInt myint; PetscOptionsGetInt(PETSC_NULL,"-myint",&myint,
PETSC_NULL);
PetscOptionsGetReal(PETSC_NULL,"-myreal",&myreal,
PETSC_NULL);
Options (2)
-help command-line argument prints essential info about the PETSc-based program:
program description (the last argument of PetscInitialize()
options specific for the program
general built-in options
built-in options relevant for the program
PETSc version
command-line help
trainee@pss2012vm:~/petsc-tutorial$ ./ex2 -help
Solves a linear system in parallel with KSP.
Input parameters include:
-random_exact_sol : use a random exact solution vector
-view_exact_sol : write exact solution vector to stdout
-m <mesh_x> : number of mesh points in x-direction
-n <mesh_n> : number of mesh points in y-direction
-----------------------------------------------------------
Petsc Release Version 3.2.0, Patch 7, Thu Mar 15 09:30:51 CDT 2012
...
-----------------------------------------------------------
Options for all PETSc programs:
-help: prints help method for each option
-on_error_abort: cause an abort when an error is detected. Useful
only when run in the debugger
...
command-line help - example
command line
filename in the third argument of PetscInitialize()
~/.petscrc
$PWD/.petscrc
$PWD/petscrc
PetscOptionsInsertFile()
PetscOptionsInsertString()
PETSC_OPTIONS environment variable
command line option -options_file [file]
Ways to set options
C: PetscErrorCode PetscPrintf(MPI_Comm,
const char format[],...)
prints to standard output
only from the first processor in the communicator comm
F: PetscPrintf(MPI_Comm, character(*),
PetscErrorCode)
limited support in FORTRAN
only single character string can be passed
Print to standard output
static char help[] = "Hello world program.\n\n";
#include <petscsys.h>
int main(int argc,char **argv)
{
PetscErrorCode ierr;
PetscMPIInt rank;
PetscInitialize(&argc,&argv,(char *)0,help);
MPI_Comm_rank(PETSC_COMM_WORLD,&rank);
PetscPrintf(PETSC_COMM_SELF,"Hello World from %d\n",rank);
PetscFinalize();
return 0;
}
PETSc Hello world in C
program main
integer ierr, rank
#include "include/finclude/petsc.h"
call PetscInitialize(PETSC_NULL_CHARACTER, ierr)
call MPI_Comm_rank(PETSC_COMM_WORLD, rank, ierr)
if (rank .eq. 0) then
print *, ‘Hello World from ’, rank
endif
call PetscFinalize(ierr)
end
PETSc Hello world in F
static char help[] = "Hello world program.\n\n";
#include <petscsys.h>
int main(int argc,char **argv)
{
PetscErrorCode ierr;
PetscMPIInt rank;
ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr);
ierr = MPI_Comm_rank(PETSC_COMM_WORLD,&rank);CHKERRQ(ierr);
ierr = PetscPrintf(PETSC_COMM_SELF,"Hello World from %d\n",
rank);CHKERRQ(ierr);
ierr = PetscFinalize();
return 0;
}
PETSc Hello world in C - with error checking
To obtain output of the first processor followed by that of the second, etc., one can call:
PetscSynchronizedPrintf(PETSC_COMM_WORLD,
"Hello World from %d\n",rank);
PetscSynchronizedFlush(PETSC_COMM_WORLD);
Output: Hello World from 0
Hello World from 1
Hello World from 2
Synchronized print
Objects in PETSc Václav Hapla
Hierarchy of components le
vel o
f abstractio
n
PETSc
paralle
lization
Nonlinear solvers (SNES)
Time Steppers (TS)
Linear solvers (KSP)
Preconditioners (PC)
Matrices (Mat)
Vectors (Vec)
Index Sets (IS)
MPI BLAS
Application
use
r
LAPACK
every object in PETSc belongs to some communicator
MPI_Comm is the first argument of every object‘s constructor
two objects can only interact if they belong to the same communicator
Objects and communicators
PETSc uses specific and limited inheritance
every object in PETSc is an instance of a class: Vec, Mat, KSP, SNES, …
functions called on objects (= methods in C++) are prefixed by a class name: MatMult(Mat,…)
class is specified when the object is created using proper Create function (= constructor in C++): Mat A;
MatCreate(comm, &A);
PETSc object oriented design: classes
PETSc object oriented design: types
classes are further subdivided into types: seqaij,mpidense,composite,…
= seq. sparse, par. dense, implicit matrix addition/multiplication
type of object is specified during object lifetime Mat A;
MatCreate(comm, &A);
MatSetType(A, MATSEQAIJ);
Mat A,B; Vec x; KSP solver; are opaque objects
you don‘t access inner fields directly
in include/petscmat.h you can find typedef struct _p_Mat* Mat;
so B = A only copies pointer, not data
prevents unwanted data copying
makes pointer handling easier
allows hiding implementation from public interface → polymorphism
PETSc object oriented design: opaque objects
Polymorphism
MatMult(Mat A,Vec x,Vec y); //y = A*x
public interface
uniform for all types of matrices: sequential, parallel, dense, sparse, …
documented
calls private implementation based on type: MatMult_SeqDense(Mat A,Vec x,Vec y)
hidden, specific for each matrix type
PetscObject (1)
Every PETSc object can be cast to PetscObject: Mat A;
PetscObject obj;
obj = (PetscObject) A;
PetscObject provides general methods such as:
Get/SetName() – name the object (used for printing, MATLAB interface, etc.)
GetType() – the type of the object
GetComm() – the communicator the object belongs to
PetscObject (2)
Mat A;
char *type;
MPI_Comm comm;
PetscObjectGetComm((PetscObject)A,&comm);
PetscObjectGetType((PetscObject)A,&type);
//is the same as
MatGetType(A,&type);
PETSc inheritance
classes
types
...
...
once again: method names must be prefixed by the class name: Vec,Mat,KSP, etc.
all PETSc buil-in classes support following methods
Create() - create the object
Get/SetType() - set the implementation type
Common methods (1)
SetFromOptions() - set all options of the object from the options database
Get/SetOptionsPrefix() - set a specific option prefix for the given object
SetUp() - prepare the object inner state for computation
View() - print object info to specified output
Destroy() - deallocate the memory used by the object
Common methods (2)
Destroy method uses simple reference counting.
If counter > 0, then only nullify the pointer and decrement the counter.
If reference count equals 0
call type-specific private destroy routine
deallocate the whole object
So PETSc uses destroy always paradigm
Not like smart pointers in new C standard, Boost or Trilinos RCP, that use destroy never paradigm
Destroy
PETSc contains special PetscViewer class for printing to stdout, files (several text and binary formats), strings or even socket connection
basic usage: PetscViewer viewer;
PetscViewerCreate(comm, &viewer);
PetscViewerSetType(viewer, PETSCVIEWERASCII);
PetscViewerDestroy(&viewer);
prints only from the first processor of comm
Viewers (1)
predefined viewers: PETSC_VIEWER_STDOUT_WORLD, PETSC_VIEWER_BINARY_SELF, ...
every PETSc object can be viewed by the viewer:
Viewer v; Mat A; Vec x;
...
MatView(A,v);
VecView(x,v);
Viewers (2)
#include <petscviewer.h>
int main(int argc,char **args)
{
PetscViewer viewer;
PetscInt i;
PetscInitialize(&argc,&args,(char *)0,(char *)0);
PetscViewerCreate(PETSC_COMM_WORLD, &viewer);
PetscViewerSetType(viewer, PETSCVIEWERASCII);
PetscViewerFileSetMode(viewer, FILE_MODE_APPEND);
PetscViewerFileSetName(viewer, "test.txt");
for(i = 0; i <= 5; i++) {
PetscViewerASCIIPrintf(viewer, "test line %d\n", i);
}
PetscViewerDestroy(&viewer);
PetscFinalize();
return 0;
}
PetscViewer Example (1)
This program will append the following text to the file test.txt:
test line 0
test line 1
test line 2
test line 3
test line 4
test line 5
PetscViewer Example (2)
Vectors, index sets and matrices in PETSc
David Horák
Vec v;
VecCreate(MPI_Comm comm,&v);
VecDestroy(&v);
a vector is an array of PetscScalars
the vector object is not completely created in one call, you must at least set sizes: VecSetSizes(Vec v, int m, int M);
Create another vector with the same type and layout: VecDuplicate(Vec v,Vec *w);
Vec: Vectors
Create a vector from an existing array
Create vector from user provided array:
VecCreateSeqWithArray(MPI_Comm comm,
PetscInt n, const PetscScalar array[],
Vec *v)
VecCreateMPIWithArray(MPI_Comm comm,
PetscInt n, PetscInt N,
const PetscScalar array[], Vec *vv)
Global size can be specified as PETSC_DECIDE.
Local size can be specified as PETSC_DECIDE.
Vector parallel layout
Query vector layout:
VecGetOwnershipRange(Vec x, PetscInt *low,
PetscInt *high)
Create general layout:
PetscSplitOwnership(MPI_Comm comm,PetscInt *n,
PetscInt *N)
Ownership Range
Vec x;
Set all entries of vector to constant value: VecSet(Vec,PetscScalar)
VecSet(x,1.0);
Set individual elements (global indexing !): VecSetValues(Vec,PetscInt,PetscInt*,
PetscScalar*,InsertMode);
i = 1; v = 3.14;
VecSetValues(x,1,&i,&v,INSERT_VALUES);
//eq.
VecSetValue(x,i,v,INSERT_VALUES);
Setting vector values (1)
Setting vector values (2)
Set more entries at once: ii[0]=1; ii[1]=2; vv[0]=2.7; vv[1]=3.1;
VecSetValues(x,2,ii,vv,INSERT_VALUES);
The last argument can be INSERT_VALUES - replace original value
ADD_VALUES - add to original value
VecSetValues is not collective, values are cached
after setting all values, you must call assembly routine to exchange values between processors: VecAssemblyBegin(Vec x);
VecAssemblyEnd(Vec x);
get a copy of entries of x with indices ix to an array y:
VecGetValues(Vec x, PetscInt ni, const PetscInt ix[],
PetscScalar y[])
user must provide an allocated array y
get the pointer to the internal array:
Vec x; PetscScalar *a;
VecGetArray(Vec x,PetscScalar *a[]);
/* do something with the array */
VecRestoreArray(Vec x,PetscScalar *a[]);
local only; see VecScatter for general
Getting values
int localsize,first,i;
PetscScalar *a;
VecGetLocalSize(x,&localsize);
VecGetOwnershipRange(x,&first,PETSC_NULL);
VecGetArray(x,&a);
for (i=0; i<localsize; i++)
printf("Vector element %d : %e\n",
first+i,a[i]);
VecRestoreArray(x,&a);
Getting values example
VecAXPY(Vec y,PetscScalar a,Vec x); /* y = y + a*x */
VecAYPX(Vec y,PetscScalar a,Vec x); /* y = a*y + x */
VecScale(Vec x, PetscScalar a);
VecDot(Vec x, Vec y, PetscScalar *r); /* several variants */
VecMDot(Vec x,int n,Vec y[],PetscScalar *r);
VecNorm(Vec x,NormType type, double *r);
VecSum(Vec x, PetscScalar *r);
VecCopy(Vec x, Vec y);
VecSwap(Vec x, Vec y);
Basic operations (1)
VecPointwiseMult(Vec w,Vec x,Vec y);
VecPointwiseDivide(Vec w,Vec x,Vec y);
VecMAXPY(Vec y,int n, PetscScalar *a, Vec x[]);
VecMax(Vec x, int *idx, double *r);
VecMin(Vec x, int *idx, double *r);
VecAbs(Vec x);
VecReciprocal(Vec x);
VecShift(Vec x,PetscScalar s);
Basic operations (2)
Index Set is a set of indices
generalization of an integer array
can be distributed (if comm has more than one process)
general IS: IS is; PetscInt indices[]={1,3,7}; PetscInt n=3;
ISCreateGeneral(comm,n,indices,PETSC_COPY_VALUES,&is);
/* indices can now be freed */
ISCreateGeneral(comm,n,indices,PETSC_OWN_VALUES,&is);
/* indices are stored inside is and freed when
ISDestroy(&is) is called */
IS: Index Sets (1)
IS: Index Sets (2)
stride IS
in MATLAB: is = 0:2:n-1
in PETSCc:
ISCreateStride (comm,n,0,2,&is);
ISDestroy(&is);
Various manipulations: ISSum, ISDifference, ISInvertPermutations
To get the values given by isx from x and put them at positions
determined by isy into y:
VecScatterCreate(Vec x,IS isx,Vec y,IS isy,VecScatter*)
VecScatterBegin(VecScatter,Vec x,Vec y,InsertMode,
ScatterMode)
VecScatterEnd(VecScatter,Vec x,Vec y,InsertMode,
ScatterMode)
VecScatterDestroy(VecScatter*)
IS & VecScatters
Creating a vector and a scatter context that copies all values of MPI
vector vin to each processor into Seq. vector vout :
VecScatterCreateToAll(Vec vin,VecScatter *ctx,Vec *vout)
Creating an output vector and a scatter context used to copy all
values of MPI vector vin into the seq. vector vout on the zeroth core
VecScatterCreateToZero(Vec vin,VecScatter *ctx,Vec *vout)
Standard sequence follows: VecScatterBegin(), VecScatterEnd(),
VecScatterDestroy()
Other VecScatters
The usual create/destroy calls:
MatCreate(MPI_Comm comm,Mat *A);
MatDestroy(Mat *A);
Several more aspects to creation:
MatSetType(A,MATSEQAIJ); /*or MATMPIAIJ,MATAIJ */
MatSetSizes(Mat A,PetscInt m,PetscInt n,PetscInt M,
PetscInt N);
MatSeqAIJSetPreallocation(Mat B, PetscInt nz,
const PetscInt nnz[]);
Local or global size can be PETSC_DECIDE.
Mat: Matrices
MatCreateSeqAIJ(MPI_Comm comm, PetscInt m, PetscInt n,
PetscInt nz, const PetscInt nnz[],Mat *A);
nz - expected number of nonzeros per row (or slight overestimate)
nnz - array of expected row lengths (or slight overestimates)
considerable savings over dynamic allocation!
Matrix creation all in one
MatCreateMPIAIJ(MPI_Comm comm,PetscInt m,
PetscInt n,PetscInt M,PetscInt N,
PetscInt d_nz,const PetscInt d_nnz[],
PetscInt o_nz,const PetscInt o_nnz[],
Mat *A);
d_nz - # of nonzeros per row in diagonal part
o_nz - # of nonzeros per row in off-diagonal part
d_nnz - array of # of nonzeros per row in diagonal part
o_nnz - array of # of nonzeros per row in off-diagonal part
Matrix creation all in one
Basic matrix types
MATAIJ, MATSEQAIJ, MATMPIAIJ
basic sparse format, known as compressed row format, CRS, Yale
MATAIJ is identical to MATSEQAIJ when constructed with a single process communicator, and MATMPIAIJ otherwise.
MATBAIJ, MATSEQBAIJ, MATMPIAIJ
extensions of the AIJ formats described above
store matrix elements by fixed-sized dense blocks
intended especially for use with multiclass PDEs
multiple DOFs per mesh node
MATDENSE, MATSEQDENSE, MATMPIDENSE
dense matrices
MatGetSize(Mat mat, PetscInt *M, PetscInt* N);
MatGetLocalSize(Mat mat, PetscInt *m, PetscInt* n);
MatGetOwnershipRange(Mat A, PetscInt *first row,
PetscInt *last row);
Querying parallel structure
MatGetVecs(Mat mat,Vec *right,Vec *left)
right - vector that the matrix can be multiplied against
left - vector that the matrix vector product can be stored in
both can be PETSC_IGNORE
Compatible vectors
PETSc matrix creation is very flexible
No sparsity pattern
any processor can set any element => potential for lots of malloc calls
malloc is very expensive
tell PETSc the matrix' sparsity structure (do construction loop twice: once counting, once making)
MatSeqAIJSetPreallocation(Mat B,
PetscInt nz, const PetscInt nnz[]);
Matrix Preallocation
Set one value:
MatSetValue(Mat v, PetscInt i,PetscInt j,
PetscScalar va,InsertMode mode);
where insert mode is INSERT_VALUES, ADD_VALUES
Set logically 2-D array of values:
MatSetValues(Mat A,
PetscInt m, const PetscInt idxm[],
PetscInt n, const PetscInt idxn[],
const PetscScalar values[], InsertMode mode);
Setting values
MatSetValues is not collective, values are cached
MatAssemblyBegin(Mat A,MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(Mat A,MAT_FINAL_ASSEMBLY);
cannot mix inserting/adding values
need to do assembly in between
Assembling the matrix
MatGetValues(Mat mat, PetscInt m, const PetscInt
idxm[], PetscInt n, const PetscInt idxn[],
PetscScalar v[])
Gets a block of values given by idxm and idxn from a matrix, only returns a local block
mat - the matrix
v - a logically two-dimensional array for storing the values
m, idxm - the number of rows and their global indices
n, idxn - the number of columns and their global indices
The user must allocate space (m*n PetscScalars) for the values v which are then returned in a row-oriented format, analogous to that used by default in MatSetValues()
Getting Values
Values are often not needed: many matrix operations supported
Matrix elements can only be obtained locally
PetscErrorCode MatGetRow(Mat mat,PetscInt row,
PetscInt *ncols,const PetscInt *cols[],
const PetscScalar *vals[]);
PetscErrorCode MatRestoreRow(/*same parameters*/);
Getting values in array
Extract one parallel submatrix:
MatGetSubMatrix(Mat mat, IS isrow, IS iscol,
MatReuse cll, Mat *newmat)
Extract multiple single-processor matrices:
MatGetSubMatrices(Mat mat, PetscInt n,
const IS irow[], const IS icol[],
MatReuse scall, Mat *submat[])
Collective call, but different index sets per processor
Submatrices
MatTranspose(Mat A, MatReuse reuse, Mat *B)
computes an out-of-place transpose B of a matrix A if reuse=MAT_INITIAL_MATRIX or
an in-place transpose of a matrix A if reuse=MAT_REUSE_MATRIX and B=A
MatMultTranspose()
MatMultTransposeAdd()
MatIsTranspose()
Matrix Transpose
matrix-vector
MatMult(Mat A,Vec in,Vec out);
MatMultAdd
MatMultTranspose
MatMultTransposeAdd
simple operations on matrices
MatNorm
MatScale
MatDiagonalScale
Matrix operations
Implicit matrices
some of the matrix types in PETSc are not stored by elements but they behave like normal matrices in some operations
nomenclature: matrix-free, implicit, not assembled, not formed, not stored ...
the most important operation is a matrix-vector product (MatMult) which can be considered an application of a linear operator
when using an iterative solver, this operation suffices to solve a linear system
matrix type MATTRANSPOSE
implicit transpose of a matrix
maintains pointer to the original matrix
its MatMult just calls MatMultTranspose of an underlying matrix and vice versa
MatTranspose (1)
Mat A, Ati, Ate;
Vec x, yi, ye;
//assemble somehow matrix A and vector x
MatCreateTranspose(A, &Ati);
MatTranspose(A, MAT_INITIAL_MATRIX,&Ate);
MatGetVecs(Ati,&x,&yi);
VecDuplicate(yi, &ye);
MatMult(Ati,x,yi);
MatMult(Ate,x,ye);
//norm(yi-ye) is close to 0
MatTranspose (2)
MatComposite
Mat F,G;
Mat arr[3] = {C, B, A}; // reverse order!
// F = A*B*C (implicitly)
MatCreateComposite(comm, 3, arr, &F);
MatCompositeSetType(F,
MAT_COMPOSITE_MULTIPLICATIVE);
// G = A+B+C (implicitly)
MatCreateComposite(comm, 3, arr, &G);
MatCompositeSetType(G, MAT_COMPOSITE_ADDITIVE);
matrix type MATCOMPOSITE
implicit matrix sum or product
matrix type MATSHELL
no predefined operation
arbitrary size
any operations can be defined by the user (C function pointers) using MatShellSetOperation function
can have a context with additional data
MatShellSetContext(Mat mat,void *ctx);
MatShellGetContext(Mat mat,void **ctx);
Shell matrices
#undef __FUNCT__
#define __FUNCT__ "mymatmult"
/* user-defined matrix-vector multiply */
PetscErrorCode mymatmult(Mat mat,Vec in,Vec out) {
MyType *matData;
PetscFunctionBegin;
MatShellGetContext(mat,(void**)&matData);
/* compute out from in, using matData */
PetscFunctionReturn(0);
}
Shell matrix example (1)
Shell matrix example (2)
Mat A;
PetscInt m,n,M,N;
MyType Adata;
...
MatCreate(comm,&A);
MatSetSizes(A,m,n,M,N);
MatSetType(A,MATSHELL);
MatShellSetOperation(A,MATOP_MULT,
(void(*)(void)) mymatmult);
MatShellSetContext(A,(void*)&Adata);
...
Linear solvers David Horák
Solving a linear system Ax = b with Gaussian elimination can take a lot of time and memory.
alternative: iterative solvers use successive approx. of the solution:
convergence not always guaranteed
possibly much faster / less memory
basic operation: y = Ax executed once per iteration
convergence can be accelerated by a preconditioner B ~ A-1
KSP & PC: Iterative solvers
All KSP solvers in PETSc are iterative
direct solvers - one iteration with perfect preconditioning (LU, Cholesky)
Object oriented: solvers only need matrix action, so can handle shell matrices
Preconditioners
Fargoing control through commandline options
Tolerances
Convergence and divergence reason
Custom monitors and convergence tests
Basic concepts
KSPCreate(comm,&solver);
// general:
KSPSetOperators(solver,A,B,DIFFERENT_NONZERO_PATTERN);
// common:
KSPSetOperators(solver,A,A,DIFFERENT_NONZERO_PATTERN);
// also SAME_NONZERO_PATTERNS and SAME_PRECONDITIONER
KSPSolve(solver,rhs,sol);
/* optional */ KSPSetup(solver);
KSPDestroy(solver);
Iterative solver basics
KSPSetType(solver,KSPGMRES);
KSP can be controlled from the commandline:
KSPSetFromOptions(solver);
/* right before KSPSolve or KSPSetUp */
then options -ksp_... are parsed -ksp_type gmres
-ksp_gmres_restart 20
-ksp_view
Solver type
Iterative solvers can fail
solve call itself gives no feedback
solution may be completely wrong
KSPGetConvergedReason(solver,&reason)
positive for convergence, negative for divergence
KSPGetIterationNumber(solver,&nits) after how many iterations did the method stop?
Convergence
KSPSetTolerances(solver,rtol,atol,dtol,maxit);
Monitors can also be set in code, but easier:
-ksp_monitor
-ksp_monitor_true_residual
Monitors and convergence tests
Many options for the (mathematically) sophisticated user, some specific to one method
KSPSetInitialGuessNonzero
KSPGMRESSetRestart
KSPSetPreconditionerSide
KSPSetNormType
Advanced options
MatNullSpace sp;
MatNullSpaceCreate /* constant vector */
(PETSC_COMM_WORLD,PETSC_TRUE,0,PETSC_NULL,&sp);
MatNullSpaceCreate /* general vectors */
(PETSC_COMM_WORLD,PETSC_FALSE,5,vecs,&sp);
KSPSetNullSpace(ksp,sp);
The solver will now properly remove the null space at each iteration.
Null spaces
PC usually created as part of KSP: separate create and destroy calls exist, but are (almost) never needed
KSP solver; PC precon;
KSPCreate(comm,&solver);
KSPGetPC(solver,&precon);
PCSetType(precon,PCJACOBI);
PCILU, PCJACOBI, PCASM, PCBJACOBI, PCMG, etc.
Controllable through commandline options:
-pc_type ilu -pc_factor_levels 3
PC basics
Iterative method with direct solver as preconditioner would converge in one step
Direct methods in PETSc implemented as special iterative method: KSPPREONLY only apply preconditioner - skips stopping criteria etc.
All direct methods are preconditioner type PCLU:
myprog -pc_type lu -ksp_type preonly \
-pc_factor_mat_solver_package mumps
KSP direct methods
IS isr, isc; MatFactorInfo info;
MatGetOrdering(A,MATORDERING_NATURAL,&isr,&isc);
MatLUFactor(A,isr,isc,&info);
// MatLUFactorSymbolic(), MatLUFactorNumeric()
// MatCholeskyFactor(A, isr, &info);
MatSolve(A,b,x);
MatSolves(Mat A,Vecs bs,Vecs xs)
// Solves A x = b, given a factored matrix, for a
collection of vectors
MatMatSolve(Mat A,Mat B,Mat X)
//Solves A X = B, given a factored matrix
Low-level direct methods
Krylov Subspace Methods
Using PETSc linear algebra, just add:
KSPSetOperators(KSP ksp, Mat A, Mat M,
MatStructure flag);
KSPSolve(KSP ksp, Vec b, Vec x);
Can access subobjects
KSPGetPC(KSP ksp, PC *pc)
Preconditioners must obey PETSc interface
Basically just the KSP interface
Can change solver dynamically from the command line
-ksp_type bicgstab
Linear solvers - summary
Newton and Picard Methods
Using PETSc linear algebra, just add:
SNESSetFunction(SNES snes,Vec r,residualFunc,
void *ctx);
SNESSetJacobian(SNES snes, Mat A, Mat M,
jacFunc,void *ctx);
SNESSolve(SNES snes, Vec b, Vec x);
Can access subobjects
SNESGetKSP(SNES snes, KSP *ksp)
Can customize subobjects from the cmd line
Set the subdomain preconditioner to ILU with -sub_pc_type ilu
Nonlinear solvers - summary
1 Sequential LU
ILUDT (SPARSEKIT2, Yousef Saad, U of MN)
EUCLID & PILUT (Hypre, David Hysom, LLNL)
ESSL (IBM)
SuperLU (Jim Demmel and Sherry Li, LBNL)
Matlab
UMFPACK (Tim Davis, U. of Florida)
LUSOL (MINOS, Michael Saunders, Stanford)
2 Parallel LU
MUMPS (Patrick Amestoy, IRIT)
SPOOLES (Cleve Ashcroft, Boeing)
SuperLU_Dist (Jim Demmel and Sherry Li, LBNL)
3 Parallel Cholesky
DSCPACK (Padma Raghavan, Penn. State)
MUMPS (Patrick Amestoy, Toulouse)
CHOLMOD (Tim Davis, Florida)
3rd party direct solvers in PETSc
1 Parallel ICC
BlockSolve95 (Mark Jones and Paul Plassman, ANL)
2 Parallel ILU
PaStiX (Faverge Mathieu, INRIA)
3 Parallel Sparse Approximate Inverse
Parasails (Hypre, Edmund Chow, LLNL)
SPAI 3.0 (Marcus Grote and Barnard, NYU)
4 Sequential Algebraic Multigrid
RAMG (John Ruge and Klaus Steuben, GMD)
SAMG (Klaus Steuben, GMD)
5 Parallel Algebraic Multigrid
Prometheus (Mark Adams, PPPL)
BoomerAMG (Hypre, LLNL)
ML (Trilinos, Ray Tuminaro and Jonathan Hu, SNL)
3rd party preconditioners in PETSc
DM: Data management and grid manipulation
SNES: Nonlinear solvers
TS: Time stepping
PETSc components we were not speaking about
Debugging & profiling
Launch the debugger
-start_in_debugger [gdb,dbx,noxterm]
-on_error_attach_debugger [gdb,dbx,noxterm]
Attach debugger only to some parallel processes: -debugger_nodes 0,1
Put a breakpoint in PetscError() to catch errors as they occur
Debugging - stepping
PETSc tracks memory overwrites at both ends of arrays
the CHKMEMQ macro causes a check of all allocated memory
track memory overwrites by bracketing them with CHKMEMQ
PETSc checks for leaked memory
use PetscMalloc() and PetscFree() for all allocation
print unfreed memory on PetscFinalize() with -malloc_dump
Simply the best tool today is valgrind (http://www.valgrind.org)
it checks memory access, cache performance, memory usage...
needs -trace-children=yes when running under MPI
Debugging - memory checking
PETSc has integrated profiling (timing, flops, memory usage, MPI messages)
Option -log_summary prints a report on PetscFinalize()
PETSc allows user-defined events
PetscLogEventRegister(), PetscLogEventBegin/End()
to create and to manage events reporting time, calls, flops, communication, etc.
Memory usage is tracked by object
Events may also be nested and will aggregate in a nested fashion
Profiling is separated into stages
PetscLogStageRegister(), PetscLogStagePush/Pop()
to create and to to manage stages identified by an integer handle
Stages may be nested, but will not aggregate in a nested fashion
Profiling
output of -log_summary:
Example profiling
References
Introduction to PETSc, TACC, Jan 17, 2012 (Victor Eijkhout). Slides
Short Course at the Graduate University, Chinese Academy of Sciences, Beijing, China, July 2010 (Matthew Knepley). Slides
Tutorial at ICES, UT Austin, TX September 2011 (Matthew Knepley). Slides
PETSc homepage, http://www.mcs.anl.gov/petsc/
PETSc Users Manual, http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf
PETSc Developer Guide, http://www.mcs.anl.gov/petsc/developers/developers.pdf
Thank you for your attention!