1
LISZT! A DSL for Mesh-Based PDEs Z. DeVito, M. Medina, N. Joubert, M. Barrientos, E. Elsen, S. Oakley, J. Alonso, E. Darve, F. Ham, P. Hanrahan GPGPU RUNTIME Liszt Code val Flux = FieldWithConst[Cell,Float](0.f) while (t < 2.f) { for(f <- interior_set) { val normal = face_unit_normal(f) val vDotN = dot(globalVelocity,normal) val area = face_area(f) var flux = 0.f val cell = if(vDotN >= 0.f) inside(f) else outside(f) flux = area * vDotN * Phi(cell) Flux(inside(f) : Cell) -= flux Flux(outside(f) : Cell) += flux } for(f <- inlet_set) { val area = face_area(f) val vDotN = dot(globalVelocity,face_unit_normal(f)) Flux(outside(f) : Cell) += area * vDotN * phi_sine_function(t) } } val Flux = FieldWithConst[Cell,Float](0.f) def determineInclusions() : Unit = { for(f <- inlet_set) { isInlet_face(f) = 1; } for(f <- interior_set) { isInterior_face(f) = 1; } } for(c <- cells(mesh)) { for(f <- faces(c)) { if(isInterior_face(f) > 0){ val normal = face_unit_normal(f) val vDotN = dot(globalVelocity,normal) val area = face_area(f) val cell = if(vDotN >= 0.f) inside(f) else outside(f) var flux = area * vDotN * Phi(cell) Flux(c) -= flux if(ID(c) == insideID(f)) Flux(c) += flux if(ID(c) == outsideID(f)) } } } for(c <- cells(mesh)){ for(f <- faces(c)){ if(isInlet_face(f) > 0) { if(ID(c) == outsideID(f)) { val area = face_area(f) val vDotN = dot(globalVelocity,face_unit_normal(f)) Flux(c) += area * vDotN * phi_sine_function(t) } } } } } Liszt GPU Code IMPLICIT METHODS ARCHITECTURE Liszt is a domain specific language that exposes a high-level interface for building mesh-based solvers of PDEs. This frees scientists from architecture-specific implementations and increases programmer productivity tenfold. Current PSAAP solvers are tied to a specific platform, while Liszt solvers are portable across architectures. Our compiler achieves this by using domain knowledge in its program analysis stage to produce high performance code for a variety of platforms. Liszt has a stable implementation for finite difference methods with a fully functional MPI-based backend. Liszt now supports implicit methods by providing native sparse matrix operations, as used by our implementation of the Joe RANS solver. Program transformations for our GPU runtime are in development, and our preliminary GPU runtime provides explicit finite difference support. A full stack of debugging, visualization, and compiler tools is now available. OVERVIEW State of the art finite element and finite difference methods use implicit solvers to provide stability and performance. Implicit methods depend on global solves of sparse matrices. Liszt has added language- level support for solving sparse matrices, and integrates the PETSc solver as a backend. Sparse matrices are tied to the topology of the mesh, allowing for simple referencing. Implicit formulations of finite difference methods have a regular matrix structure, currently supported by Liszt. Higher order finite element methods require multiple, different submatrices per element in its matrix formulation, currently in development. The implicit version of Joe has been ported to Liszt, reducing its codebase from 3106 lines to 1520 lines (this disregards the 20 000+ lines of MPI boilerplate code in C++ Joe). MPI performance is comparable for both the explicit and implicit versions of Joe. The Liszt framework cross‐compiles Scala‐embedded DSL code to C++. Three implementa=ons of the run=me exist: an MPI‐based run=me for clusters, an OpenMP‐based run=me for SMPs and a preliminary GPU backend. The GPU backend implements gathers and reductions in native NVidia C, and manages mesh and field data on the GPU. The JIT phase for the GPU performs transformations to convert standard scatter-based operations into gathers, allowing arbitrary code to be executed on the GPU. CURRENT AND FUTURE WORK We are currently working on: DSL advances through Polymorphic Embedding GPGPU-specific loop transformations FEM & DG support through canonical elements Future work: Release private beta at upcoming Codeathon Uncertainty quantification support Transformations between scatters, gathers & reduces A hybrid runtime combining MPI and GPGPU double *A = new double[ncv][5][5]; double *phi = new double[ncv][5]; double *rhs = new double[ncv][5]; for (int ifa = 0; ifa < nfa; ifa++) { int icv0 = cvofa[ifa][0]; int icv1 = cvofa[ifa][1]; int noc00, noc01, noc11, noc10; getImplDependencyIndex(noc00, noc01, noc11, noc10, icv0, icv1); calcEulerFluxMatrices_HLLC(Apl, Ami); for (int i=0; i<5; i++) for (int j=0; j<5; j++) { A[noc00][i][j] += Apl[i][j]; A[noc01][i][j] += Ami[i][j]; } } int *nbocv_v_global = new int[ncv_g]; for (int icv = 0; icv < ncv; icv++) { nbocv_v_global[icv] = cvora[mpi_rank] + icv; updateCvData(nbocv_v_global, REPLACE_DATA); } PetscSolver petscSolver(..., cvora, nbocv_i, nbocv_v, 5); petscSolver.solveGMRES(A, phi, rhs, cvora, nbocv_i, nbocv_v, nbocv_v_global, 5, ...); val A = new SparseMatrix[Float] ; val phi = new SparseVector[Float] ; val rhs = new SparseVector[Float] ; for ( c <- cells(mesh) ) { for ( f <- faces(c) ) { val Apl = AplMatrixStorageField(f) ; val Ami = AmiMatrixStorageField(f) ; val cc = inside(f) ; A(c,c) += Apl ; A(c,cc) += Ami ; } } phi = A/rhs ; Liszt Implicit Code Joe Implicit Code Scala Compiler MPI Build GPU Build Liszt JIT GPU Codegen MPI Codegen SMP Codegen Program Analysis Platform-specific Transforms MPICXX NVCC GCC Viz Build GCC SMP Build GCC mpirun cocoa threads ptx SC.class SC.scala liszt.cfg ! # $% &’ (& ! # $% &’ Speedup over Scalar Number of nodes Joe Explicit Euler ! # $% &’ (& ! # $% &’ (& Speedup over Scalar Number of nodes Joe Implicit Euler LisztViz is an extension of our single‐core run=me that provides mesh visualiza=on of the simula=on system. LisztViz eases debugging by making all symbols visible through watchpoints in the execu=on stream. The GPU implementa=on demands a separa=on of code into CPU drivers and GPU kernels, manages memory transfers and transforms types. This happens in two passes: “transform” and “codegen”. PERFORMANCE

ISZT - Stanford Universityforum.stanford.edu/events/posterslides/LisztaDSLforMesh... · 2010-05-26 · platform, while Liszt solvers are portable across architectures. Our compiler

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ISZT - Stanford Universityforum.stanford.edu/events/posterslides/LisztaDSLforMesh... · 2010-05-26 · platform, while Liszt solvers are portable across architectures. Our compiler

LISZT!A DSL for Mesh-Based PDEs

Z. DeVito, M. Medina, N. Joubert, M. Barrientos, E. Elsen,S. Oakley, J. Alonso, E. Darve, F. Ham, P. Hanrahan

GPGPU RUNTIME

Liszt Codeval Flux = FieldWithConst[Cell,Float](0.f)while (t < 2.f) {

for(f <- interior_set) { val normal = face_unit_normal(f) val vDotN = dot(globalVelocity,normal) val area = face_area(f) var flux = 0.f val cell = if(vDotN >= 0.f) inside(f) else outside(f) flux = area * vDotN * Phi(cell) Flux(inside(f) : Cell) -= flux Flux(outside(f) : Cell) += flux}for(f <- inlet_set) { val area = face_area(f) val vDotN = dot(globalVelocity,face_unit_normal(f)) Flux(outside(f) : Cell) += area * vDotN * phi_sine_function(t)}

}

val Flux = FieldWithConst[Cell,Float](0.f)def determineInclusions() : Unit = { for(f <- inlet_set) { isInlet_face(f) = 1; } for(f <- interior_set) { isInterior_face(f) = 1; }}for(c <- cells(mesh)) { for(f <- faces(c)) { if(isInterior_face(f) > 0){ val normal = face_unit_normal(f) val vDotN = dot(globalVelocity,normal) val area = face_area(f) val cell = if(vDotN >= 0.f) inside(f) else outside(f) var flux = area * vDotN * Phi(cell) Flux(c) -= flux if(ID(c) == insideID(f)) Flux(c) += flux if(ID(c) == outsideID(f)) } } }for(c <- cells(mesh)){ for(f <- faces(c)){ if(isInlet_face(f) > 0) { if(ID(c) == outsideID(f)) { val area = face_area(f) val vDotN = dot(globalVelocity,face_unit_normal(f)) Flux(c) += area * vDotN * phi_sine_function(t) } } } }}

Liszt GPU Code

IMPLICIT METHODS

ARCHITECTURE

Liszt is a domain specific language that exposes a high-level interface for building mesh-based solvers of PDEs. This frees scientists from architecture-specific implementations and increases programmer productivity tenfold. Current PSAAP solvers are tied to a specific platform, while Liszt solvers are portable across architectures. Our compiler achieves this by using domain knowledge in its program analysis stage to produce high performance code for a variety of platforms.

Liszt has a stable implementation for finite difference methods with a fully functional MPI-based backend. Liszt now supports implicit methods by providing native sparse matrix operations, as used by our implementation of the Joe RANS solver. Program transformations for our GPU runtime are in development, and our preliminary GPU runtime provides explicit finite difference support. A full stack of debugging, visualization, and compiler tools is now available.

OVERVIEW

State of the art finite element and finite difference methods use implicit solvers to provide stability and performance. Implicit methods depend on global solves of sparse matrices. Liszt has added language-level support for solving sparse matrices, and integrates the PETSc solver as a backend.

Sparse matrices are tied to the topology of the mesh, allowing for simple referencing. Implicit formulations of finite difference methods have a regular matrix structure, currently supported by Liszt. Higher order finite element methods require multiple, different submatrices per element in its matrix formulation, currently in development.

The implicit version of Joe has been ported to Liszt, reducing its codebase from 3106 lines to 1520 lines (this disregards the 20 000+ lines of MPI boilerplate code in C++ Joe). MPI performance is comparable for both the explicit and implicit versions of Joe.

The Liszt framework cross‐compiles Scala‐embedded DSL code to C++.  Three implementa=ons of the run=me exist: an MPI‐based run=me for clusters, an OpenMP‐based run=me for SMPs and a preliminary GPU backend. 

The GPU backend implements gathers and reductions in native NVidia C, and manages mesh and field data on the GPU. The JIT phase for the GPU performs transformations to convert standard scatter-based operations into gathers, allowing arbitrary code to be executed on the GPU.

CURRENT AND FUTURE WORKWe are currently working on:

DSL advances through Polymorphic Embedding GPGPU-specific loop transformations FEM & DG support through canonical elements

Future work: Release private beta at upcoming Codeathon Uncertainty quantification support Transformations between scatters, gathers & reduces A hybrid runtime combining MPI and GPGPU

double *A = new double[ncv][5][5];double *phi = new double[ncv][5];double *rhs = new double[ncv][5];for (int ifa = 0; ifa < nfa; ifa++) {   int icv0 = cvofa[ifa][0];   int icv1 = cvofa[ifa][1];   int noc00, noc01, noc11, noc10;   getImplDependencyIndex(noc00, noc01, noc11, noc10, icv0, icv1);   calcEulerFluxMatrices_HLLC(Apl, Ami);   for (int i=0; i<5; i++)      for (int j=0; j<5; j++) {         A[noc00][i][j] += Apl[i][j];         A[noc01][i][j] += Ami[i][j];      }}int *nbocv_v_global = new int[ncv_g];for (int icv = 0; icv < ncv; icv++) {   nbocv_v_global[icv] = cvora[mpi_rank] + icv;   updateCvData(nbocv_v_global, REPLACE_DATA);}PetscSolver petscSolver(..., cvora, nbocv_i, nbocv_v, 5);petscSolver.solveGMRES(A, phi, rhs, cvora, nbocv_i,   nbocv_v, nbocv_v_global, 5, ...);

val A = new SparseMatrix[Float] ;val phi = new SparseVector[Float] ;val rhs = new SparseVector[Float] ;for ( c <- cells(mesh) ) { for ( f <- faces(c) ) {    val Apl = AplMatrixStorageField(f) ;      val Ami = AmiMatrixStorageField(f) ;      val cc = inside(f) ;       A(c,c) += Apl ;              A(c,cc) += Ami ; }}phi = A/rhs ;

Liszt Implicit CodeJoe Implicit Code

Scala Compiler

MPI Build GPU Build

Liszt JIT

GPU Codegen

MPI Codegen

SMP Codegen

Program Analysis

Platform-specific Transforms

MPICXXNVCC

GCC

Viz Build

GCC

SMP Build

GCCmpirun

cocoa threads

ptx

SC.class

SC.scala

liszt.cfg

!"

#"

$%"

&'"

(&"

!" #" $%" &'" (&"

Sp

eed

up

over

Scala

r

Number of nodes

Joe Explicit Euler

!"

#"

$%"

&'"

(&"

!" #" $%" &'" (&"

Sp

eed

up

over

Scala

r

Number of nodes

Joe Implicit Euler

LisztViz is an extension of our single‐core run=me that provides mesh visualiza=on of the simula=on system. LisztViz eases debugging by making all symbols visible through watchpoints in the execu=on stream.

The GPU implementa=on demands a separa=on of code into CPU drivers and GPU kernels, manages memory transfers and transforms types. This happens in two passes: “transform” and “codegen”.

PERFORMANCE