56
CARLO NARDONE, Sr. Solution Architect EMEA ICTP Workshop on Accelerated HPC in Computational Sciences, May-June 2015 NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMT

NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

CARLO NARDONE, Sr. Solution Architect EMEA

ICTP Workshop on Accelerated HPC in Computational Sciences, May-June 2015

NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMT

Page 2: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

2

THE WORLD LEADER IN VISUAL COMPUTING

GAMING ENTERPRISE TECHNOLOGY HPC & CLOUD AUTO

Page 3: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

3

1 Intro

2 Future GPU Generation

3 Development Software Trends

4 Tesla Platform System Management

AGENDA

Page 4: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

4

Power for CPU-only

Exaflop Supercomputer = Power for the Bay Area, CA

(San Francisco + San Jose)

HPC’S BIGGEST CHALLENGE: POWER

Page 5: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

5

US TO BUILD TWO FLAGSHIP SUPERCOMPUTERS

IBM POWER9 CPU + NVIDIA Volta GPU

NVLink High Speed Interconnect

>40 TFLOPS per Node, >3,400 Nodes

2017

SUMMIT SIERRA 150-300 PFLOPS

Peak Performance > 100 PFLOPS

Peak Performance

Major Step Forward on the Path to Exascale

Page 6: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

6

Development Platform for Embedded

Computer Vision, Robotics, Medical

Tegra K1 SoC

Quad core A15 + Kepler GPU

192 CUDA Enabled cores

326 Gflops @ 5 Watt

$192

JETSON TK1

THE WORLD’S 1st EMBEDDED SUPERCOMPUTER

Page 7: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

7 7

ENERGY COST DP FMA FLOP ON CPU

Page 8: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

8

COMMUNICATION ENERGY

Page 9: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

9

1 Intro

2 Future GPU Generation

3 Development Software Trends

4 Tesla Platform System Management

AGENDA

Page 10: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

10

TESLA PLATFORM DRIVES INNOVATION

Unlocking New Opportunities in the HPC ecosystem

x86

X86, Power and ARM NVLink

Page 11: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

11

SG

EM

M /

W

2012 2014 2008 2010 2016

48

36

12

0

24

60

2018

72

Tesla Fermi

Kepler

Maxwell

Pascal Mixed Precision 3D Memory NVLink

Volta

GPU ROADMAP Pascal 2x SGEMM/W

Page 12: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

12

Peak Performance

PASCAL: NEXT GENERATION TESLA GPU

Stacked Memory

NVLink High-Speed Interconnect Unified Memory

>3 TeraFLOPS 4x Higher Bandwidth (~1 TB/s)

Larger Capacity (16 GB)

80 GB/sec

POWER CPU & GPU-to-GPU Interconnect

Single Memory Space

Lower Developer Effort

Page 13: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

13

PASCAL GPU FEATURING NVLINK AND STACKED MEMORY

NVLINK GPU high speed interconnect

80-200 GB/s

3D Stacked Memory 4x Higher Bandwidth (~1 TB/s)

3x Larger Capacity

4x More Energy Efficient per bit

Page 14: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

14

3D Chip-on-Wafer integration

Many X bandwidth

2.5X capacity

4X energy efficiency

0

200

400

600

800

1000

1200

2008 2010 2012 2014 2016

Memory Bandwidth

3D STACKED MEMORY

Page 15: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

15

TSV (THROUGH SILICON VIA) The Enabling Technology for 3D Memory Stack

Page 16: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

16

HBM (HIGH BANDWIDTH MEMORY) STANDARD

Page 17: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

17

NVLINK and Unified Memory

Page 18: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

18

BANDWIDTH BOTTLENECKS

CPU GPU

PCIe

PCI Express

CPU Memory

GPU Memory

16GB/sec

60GB/sec

288GB/sec

Page 19: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

19

FUTURE INTERCONNECT: NVLINK

Differential with embedded clock

PCIe programming model (w/ DMA+)

Unified Memory

Cache coherency in Gen 2.0

5 to 12X PCIe

CPU GPU

PCIe

Page 20: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

20

5X MORE BANDWIDTH FOR SCALING

GPU

PCIe SWITCH

CPU GPU GPU GPU

Page 21: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

21

UNIFIED MEMORY

Traditional Developer View of Heterogenous System

Developer View With Unified Memory

Unified Memory System Memory

GPU Memory

Page 22: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

22

SIMPLIFIED MEMORY MANAGEMENT CODE

void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); fread(data, 1, N, fp); qsort(data, N, 1, compare); use_data(data); free(data); }

void sortfile(FILE *fp, int N) { char *data; cudaMallocManaged(&data, N); fread(data, 1, N, fp); qsort<<<...>>>(data,N,1,compare); cudaDeviceSynchronize(); use_data(data); cudaFree(data); }

CPU Code CUDA 6 Code with Unified Memory

Page 23: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

23

UNIFIED MEMORY

Traditional Developer View of Heterogenous System

Developer View With Unified Memory

Unified Memory with NVLink

System Memory

GPU Memory

Share Data Structures at

CPU Memory Speeds, not PCIe speeds

Oversubscribe GPU Memory

Page 24: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

24

SUPER SIMPLIFIED MEMORY MANAGEMENT

void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); fread(data, 1, N, fp); qsort(data, N, 1, compare); use_data(data); free(data); }

void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); fread(data, 1, N, fp); qsort<<<...>>>(data,N,1,compare); cudaDeviceSynchronize(); use_data(data); free(data); }

CPU Code Pascal with Unified Memory

Page 25: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

25

1 Intro

2 Future GPU Generation

3 Development Software Trends

4 Tesla Platform System Management

AGENDA

Page 26: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

26

VISION: MAINSTREAM PARALLEL PROGRAMMING

Enable more programmers to write parallel software

Give programmers the choice of language

Embrace and evolve key language standards

Page 27: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

27

C++ PARALLEL ALGORITHMS LIBRARY

• Complete set of parallel primitives:

for_each, sort, reduce, scan, etc.

• ISO C++ committee voted unanimously to

accept as official tech. specification working draft

N3960 Technical Specification Working Draft: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3960.pdf

Prototype: https://github.com/n3554/n3554

std::vector<int> vec = ... // previous standard sequential loop std::for_each(vec.begin(), vec.end(), f); // explicitly sequential loop std::for_each(std::seq, vec.begin(), vec.end(), f); // permitting parallel execution std::for_each(std::par, vec.begin(), vec.end(), f);

Page 28: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

28

INLINE PARALLELISM Language features enable parallelism in-line with sequential code

Example: Parallel Black-Scholes kernel in (future) standard C++

Standard parallel algorithms library (Projected C++17)

Lambda, std::begin/end (C++11)

Can substitute vendor-specific execution policy (std::par nvidia::gpu)

std::for_each( std::par, std::begin(options), std::end(options), [](Option &i)

{

const double d1 = (log((i.S/i.X))+(i.r+i.v*i.v/2)*i.T)/(i.v*sqrtf(i.T));

const double d2 = d1-i.v*sqrt(i.T);

i.call = i.S * CND(d1)-i.X * exp(-i.r*i.T)*CND(d2);

i.put = i.X * exp(-i.r * i.T) * CND(-d2) - i.S * CND(-d1);

});

Page 29: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

29

DEVELOPER PLATFORM WITH OPEN ECOSYSTEM ACCELERATE APPLICATIONS ACROSS MULTIPLE CPUS

x86

Libraries

Programming

Languages

Compiler

Directives

AmgX

cuBLAS

/

Page 30: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

30

DROP-IN ACCELERATION WITH GPU LIBRARIES

Up to 10x speedups out of the box

Automatically scale with multi-GPU libraries

75% of developers use GPU

libraries to accelerate their

application

AmgX cuFFT

NPP cuBLAS cuRAND

cuSPARSE MATH

Page 31: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

31

ACCELERATED LIBRARIES

Page 32: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

32

Page 33: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

33

Page 34: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

34

DEEP LEARNING WITH CUDNN cuDNN is a library for deep learning primitives

GPUs

cuDNN

Frameworks

Applications

Tesla TX-1 Titan

Caffe (CPU*)

1x

Caffe (GPU) 11x

Caffe (cuDNN)

14x

Baseline Caffe compared to Caffe

accelerated by cuDNN on K40

Page 35: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

35

OpenACC

Page 36: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

36

OPENACC

OpenACC is a specification for high-level, compiler directives for expressing parallelism for accelerators

Aims to be performance portable to a wide range of accelerators

Multiple Vendors, Multiple Devices, One Specification

The OpenACC specification was first released in November 2011

Original members: CAPS, Cray, NVIDIA, Portland Group

OpenACC 2.0 was released in June 2013, expanding functionality and improving portability

Standard for Directives-Based Acceleration

Page 37: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

37

OPENACC DIRECTIVES

Program myscience

... serial code ...

!$acc kernels

do k = 1,n1

do i = 1,n2

... parallel code ...

enddo

enddo

!$acc end kernels

...

End Program myscience

CPU GPU

Your original

Fortran or C code

Simple Compiler hints

Compiler Parallelizes code

Works on many-core GPUs &

multicore CPUs

Inspired by OpenMP, but more

descriptive than prescriptive

OpenACC

Compiler

Hint

Page 38: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

38

OPENACC MEMBERS AND PARTNERS

Page 39: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

39

OPENACC & OPENMP MEMBERS OVERLAP

Page 40: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

40

OPENACC 2.0 VS OPENMP 4.0

Page 41: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

41

Exploit GPU with LESS effort; maintain ONE legacy source code

APPLYING OPENACC TO SOURCE CODES

Examples: REAL-WORLD application tuning using directives

ELAN Computational

Electro- Magnetics

COSMO Weather

• Goals: optimize w/ less effort, preserve code base

• Kernels 6.5X to 13X faster than 16-core Xeon

• Overall speedup 3.2X

• Goal: preserve physics code (22% of runtime), augmenting dynamics kernels in CUDA

• Physics speedup 4.2X vs. multi-core Xeon

Results from EMGS, MeteoSwiss/CSCS, NCSA/Cray/NVIDIA

GAMESS CCSD(T) Molecular Modeling

• Goals: 3X speedup (2 kernels = 98% of runtime); scale to 1536 nodes

• Overall speedup 3.1X vs. 8-core Interlagos

Page 42: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

42

OPENACC RESOURCES

www.openacc.org

PGI OpenACC C/Fortran compiler free trial: www.pgroup.com

See latest preso at GTC15

S5192: Intro to OpenACC

S5195: Advanced OpenACC

S5196: Comparing OpenACC and OpenMP

Page 43: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

43

CUDA 7 Highlights

Page 44: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

44

CUDA 7

C++11 language features

Increases productivity with lambdas, auto, and more

New cuSOLVER library

Accelerates key LAPACK routines and direct sparse solvers

Runtime Compilation

Advanced feature used to generate highly optimized kernels at runtime

Production Release: Mar 2015 at GTC15

cuSOLVER 7.0, K40

MKL 11.0.4, i7-3930K CPU @ 3.20GHz

0x

1x

2x

3x

4x

5x

6x

7x

SPOTRF DPOTRF CPOTRF ZPOTRF

cuSOLVER Speedup vs CPU on LAPACK Cholesky Factorization Routine

Page 45: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

Application

// launch foo()

Runtime

Compilation

Library

(libnvrtc)

Compile CUDA kernel source at run time

Compiled kernels can be cached on disk

Run-time C++ template specialization

Before: Compile templates multiple times—once for each datatype that *might* be needed by any user

Now: only compile for datatypes a specific user needs

Reduces compile time and compiled code size

No compromise on performance

Simplifies deployment of DSLs that target CUDA-C instead of NVVM IR

__global__

foo(..) { .. }

Compiled

Kernel

CUDA RUNTIME COMPILATION Preview Feature in CUDA 7.0

Page 46: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

46

GPU 0 GPU 1

CUDA

MPI

Rank 0

CUDA

MPI

Rank 1

CUDA

MPI

Rank 2

CUDA

MPI

Rank 3

HYPERQ/MPI (MPS): MULTIPLE GPUS PER NODE

MPS Server

MPS Server efficiently overlaps work from multiple ranks to each GPU

lrank=$OMPI_COMM_WORLD_LOCAL_RANK

case ${lrank} in

[0]) export CUDA_VISIBLE_DEVICES=0; numactl —cpunodebind=0 ./executable;;

[1]) export CUDA_VISIBLE_DEVICES=1; numactl —cpunodebind=1 ./executable;;

[2]) export CUDA_VISIBLE_DEVICES=0; numactl —cpunodebind=0 ./executable;;

[3]) export CUDA_VISIBLE_DEVICES=1; numactl —cpunodebind=1 ./executable;;

esac

Page 47: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

47

Mark Harris GTC15 preso S5820

Page 48: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

48

1 Intro

2 Future GPU Generation

3 Development Software Trends

4 Tesla Platform System Management

AGENDA

Page 49: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

49

Development Data Center Infrastructure

Tesla Accelerated Computing Platform

GPU

Accelerators Interconnect

System

Management

Compiler

Solutions

GPU Boost

GPU Direct

NVLink

NVML

LLVM

Profile and

Debug

CUDA Debugging API

Development

Tools

Programming

Languages

Infrastructure

Management Communication System Solutions

/

Software

Solutions

Libraries

cuBLAS

NVIDIA TESLA PLATFORM

Page 50: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

50

NVIDIA TESLA ECOSYSTEM

Page 51: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

51

GPU-AWARE JOB SCHEDULERS (1)

Page 52: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

52

GPU-AWARE JOB SCHEDULERS (2)

Page 53: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

53

GPU-AWARE CLUSTER MGMT

Page 54: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

54

MONITORING SYSTEM WITH NVML SUPPORT

Page 55: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

55

GTC15 preso S5144

Page 56: NVIDIA GPU COMPUTING TRENDS IN HW, SW AND SYSMGMTindico.ictp.it/event/a14302/session/10/contribution/41/material/0/2.pdf · ICTP Workshop on Accelerated HPC in Computational Sciences,

THANK YOU!

[email protected]

+39 335 5828197