Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
KIT – The Research University in the Helmholtz Association
HartwigAnzt,TerryCojean, Yuhsiang M.TsaiSteinbuchCentre for Computing (SCC)
www.kit.edu
TheSparseMatrixVectorProductonHigh-EndGPUsSIAMConferenceonParallelProcessingforScientificComputing(PP20)February12- 15,2020HyattRegencySeattle|Seattle,Washington,U.S.
MikeTsai
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of EnergyOffice of Science and the National Nuclear Security Administration and the Helmholtz Impuls und VernetzungsfondVH-NG-1241.
2 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
SpMV onGPUs– MovingawayfromtheNVIDIAhegemony
• Inthepast,NVIDIAGPUsweredominating theGPGPUmarket;
• WeseeanincreasingadoptionofAMDGPUsinleadershipsupercomputers:• FrontiersysteminOakRidge (2021)• ElCapitaninLawrenceLivermoreNationalLab?(2023)
• AMDisheavilyinvestingintheHIPsoftwaredevelopmentecosystem;• HIPprogramming similartoCUDAprogramming;• HIPlibrariessimilartocuBLAS, cuSPARSE,…
• TheRaceison!
• HowcanwepreparetheGinkgosparselinearalgebralibrary forcross-platformportability?
• AretheCUDA-optimizedkernelssuitableforAMDGPUs?
• HowdoestheperformancecompareacrossdifferentGPUs?
3 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceCUDA-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDA
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
https://github.com/ginkgo-project/ginkgo
Part of
https://xsdk.info/
4 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDAHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
HIP
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
5 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDAHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
HIP
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
CUDA HIP
6 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceCUDA-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDAHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
HIP
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
7 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceCUDA-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDAHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
HIP
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
• SharedkernelsCommon
Toavoidcodeduplication,commoncontainskernelssharedbetweenCUDAandHIP(upon parameterconfigs)
8 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
• KernelssharedbetweenCUDAandAMDbackends (upon parametersetting)arerelocatedinthe``common’’ module.
• NewcodenecessaryforHIP-specificoptimizationsandforimplementingfunctionalitycurrentlymissing intheHIPecosystem(e.g.cooperativegroups).
CUDA
new
CUDA
common
HIP
9 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoesGinkgocomparetothevendorlibraries- COOSpMV
GinkgovsHIPsparse onRadeonVII GinkgovscuSPARSE onV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
10 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoesGinkgocomparetothevendorlibraries- CSRSpMV
GinkgovsHIPsparse onRadeonVII GinkgovscuSPARSE onV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
11 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoesGinkgocomparetothevendorlibraries- ELLSpMV
GinkgovsHIPsparse onRadeonVII GinkgovscuSPARSE onV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
12 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoesGinkgocomparetothevendorlibraries- hybridSpMV
GinkgovsHIPsparse onRadeonVII GinkgovscuSPARSE onV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
13 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
PerformanceProfileonAMD’sRadeonVII
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
14 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
PerformanceProfileonNVIDIA’sV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
15 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
• NativeCUDAvs.HIPcompiledforNVIDIAGPUs• Samekernel• AlltestsonNVIDIAV100(Summit)• WeexpectCUDAtobeslightlyfaster
16 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
• NativeCUDAvs.HIPcompiledforNVIDIAGPUs• Samekernel• AlltestsonNVIDIAV100(Summit)• WeexpectCUDAtobeslightlyfaster
outliers?machinenoise?HIPfasterthanCUDAonNVIDIAGPU?
17 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
• NativeCUDAvs.HIPcompiledforNVIDIAGPUs• Samekernel• AlltestsonNVIDIAV100(Summit)• WeexpectCUDAtobeslightlyfaster
outliers?machinenoise?HIPfasterthanCUDAonNVIDIAGPU?
Outlierstatson100runsa20reps:
18 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
• NativeCUDAvs.HIPcompiledforNVIDIAGPUs• Samekernel• AlltestsonNVIDIAV100(Summit)• WeexpectCUDAtobeslightlyfaster
outliers?machinenoise?HIPfasterthanCUDAonNVIDIAGPU?
Outlierstatson100runsa20reps:
Reproducible– butrelevant?
19 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
HIPcodefaster
CUDAcodefaster
• Running onV100GPU• 2,800testmatrices• Comparekeyfunctionality
• GinkgoSellp SpMV• GinkgoCooSpMV• Vendor’sCsr SpMV• Ginkgo’s CGsolver
SlightadvantagesontheCUDAside,butusually<5%.
20 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoGPUarchitecturescompareintermsofSpMV performance?
GinkgoCOOSpMV Vendor libraryCOOSpMV
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
21 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoGPUarchitecturescompareintermsofSpMV performance?
GinkgoCSRSpMV Vendor libraryCSRSpMV
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
22 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoGPUarchitecturescompareintermsofSpMV performance?
GinkgoELLSpMV Vendor libraryELLSpMV
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
23 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
Summary:TheSparseMatrixVectorProductonHigh-EndGPUs
• AMDanditsHIPsoftwareecosystembecomesarelevantalternativetoNVIDIACUDA;
• Significantsimilarities inthelanguages (HIP/CUDA)allowsforsharedkernel implementations;
• HIPallowstocompileforNVIDIAGPUs
– inmostcaseswithmoderateperformance losscomparedtonativeCUDAcode;
• AMD GPUsandNVIDIA GPUscomparableinsparselinearalgebraperformance;
• Wedeployed comprehensivecross-platformSpMV functionalityintheGinkgo library;
• Weprovideacomprehensive SpMV performancestudyforinteractiveexploration:
https://ginkgo-project.github.io/gpe/
CheckoutourposteronGinkgoatthepostersessiontonight inPP2at6pm,5th floor:Ginkgo- aNode-LevelSparseLinearAlgebraLibraryforHighPerformanceComputing
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of EnergyOffice of Science and the National Nuclear Security Administration and the Helmholtz Impuls und VernetzungsfondVH-NG-1241.
https://xsdk.info/
24 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CITest
PerformanceDataRepository
ContinuousIntegration(CI)
DeveloperCodeReview
CIBuildSourceCodeRepository
Push
Schedule inBatchSystem
Web-Application
HPCSystem
TrustedReviewer
Users
MergeintoMasterBranch
CIBenchmarkTests https://ginkgo-project.github.io/
https://ginkgo-project.github.io/gpe/
Backup:Ginkgodevelopmentworkflow
25 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
Backup:GinkgoDesign
• Open-sourceC++frameworkforsparselinearalgebra.• Sparselinearsolvers,preconditioners, SpMV etc.• FocusedonMulticoreandManycore accelerators;• Softwarequalityandsustainabilityefforts
guidedbyxSDK communitypolicies:
•
•
•
https://xsdk.info/
• Staticpolymorphism fortemplatingprecisions• ValueType (default:Z,C,D,S), IndexType (int32/64)
• Smartpointerstoavoidmemory leaks• Runtimepolymorphism foroperators andkernels
• Kernelshavethesamesignaturefordifferentarchitectures
• Executordetermineswhichkernel isused
• LinOp classforanylinearoperator:• Matrices• Solvers• Preconditioners• …
generateapply…
DetermineswhereDatalives&operationsareexecuted
26 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
Backup:GPUformatconversion