31
Implementation of Deflated Preconditioned Conjugate Gradient on the GPU Rohit Gupta Responsible Professor: Prof. Dr. Ir. Henk Sips Supervisors: Prof. Dr. Ir. Kees Vuik and Ir. C.W.J. Lemmens

Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

Implementation of Deflated Preconditioned Conjugate Gradient

on the GPU

Rohit Gupta

Responsible Professor: Prof. Dr. Ir. Henk Sips

Supervisors: Prof. Dr. Ir. Kees Vuik and Ir. C.W.J. Lemmens

Page 2: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

Two-Phase Fluid Flow

Page 3: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

Model for Two Phase Computation

Boundary

Conditions

Page 4: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

4 -1

-1 4 -1

-1 4 -1

-1 4 0

0 4 -1

-1 4 -1

-1 4 -1

-1 4

3 0

0 1.5015 -0.5005

-0.5005 2.002 -0.5005

-0.5005 2.002 -0.5005

-0.5005 2.002 -0.5005

-0.5005 2.002 -0.5005

-0.5005 2.002 -0.5005

-0.5005 2.002

Page 5: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

GPU Architecture

Page 6: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

Memory Bandwidth

Page 7: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

Launch Configuration

Page 8: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Coalescing

•Minimum Memory Transfers

•Minimum Divergence

•Caching

Key Optimizations

Page 9: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•SpMV Kernel

•Preconditioning Kernels

•Deflation Kernels

Related Work

Page 10: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

Two-Phase Fluid Flow

Page 11: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•SpMV Kernel

•Preconditioning Kernels

•Deflation Kernels

Implementation GPU

Page 12: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Meschach/ gotoBLAS

•Compiler Options

•Same Data Structures

Implementation CPU

Page 13: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Conjugate Gradient (CG)

•CG with Preconditioning

•CG with Preconditioning and

Deflation

Flavors

Page 14: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Diagonal

•Incomplete Cholesky

•Incomplete Poisson

Preconditioning

Page 15: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Important Operations

•Subdomains

•Deflation Kernels

Deflation

Page 16: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Step-by-Step Optimizations

•Profiler for Advice

•Modularity and Readability

Methodology

Page 17: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•DIA Format is best

•SpmV dominate GPU execution

Conjugate Gradient - Vanilla

5

2

23 10||||

||||10

k

kexact

X

XX

Page 18: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Preconditioning dominates execution

•More Blocks – Better SpeedUp

•Shared Memory Use restricted

Preconditioning

Block Incomplete Cholesky

Page 19: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Speed Up recovers

•Preconditioning gets company in

•More Deflation Vectors – More Speed Up

Deflation

bEAZ 1

Page 20: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Storage of optimized

•Speed Up Suffers – CPU performs better

•Optimizations on CPU

Deflated Preconditioned

Conjugate Gradient

AZ

Page 21: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Incomplete Poisson Preconditioning

•Speed Up favoring move

Deflated Preconditioned

Conjugate Gradient

2

2

||||

||||

k

kexact

X

XX

2

2

||||

||||

k

kexact

X

XX

Page 22: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Matrix Vector Multiplication optimized

•Kernels utilize 85% of bandwidth

•Clubbing Kernels – Reducing Memory Traffic

Deflated Preconditioned

Conjugate Gradient

bE 1

Page 23: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Density Contrast 1000:1

•False Stop

Two Phase Matrix

1

2

2 10||||

||||0

k

kexact

X

XX

Page 24: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

Performance Timeline

0

5

10

15

20

25

30Sp

ee

d U

p

Iterative OptimizationsSpeedUp Factors

Page 25: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Very close to possible peak performance

•Bandwidth bound Kernels

•Platform Utilization – Startling Facts

Analysis

Page 26: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Multi-GPU Multi-CPU

•3D Domains, More Interfaces, Mixed Precision

•Different Preconditioners, Grid Types.

Future Work

Page 27: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

Conclusions

Page 28: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient

•Deflation suits the many core platform

•Two Phase accuracy suffers

•Deflation with IP Preconditioning

Conclusions

Page 29: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient
Page 30: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient
Page 31: Implementation Of Deflated Preconditioned Conjugate ...ta.twi.tudelft.nl/nw/users/vuik/numanal/gupta_presentation2.pdf · Implementation of Deflated Preconditioned Conjugate Gradient