33
Jensen Huang, Founder & CEO | SC17 | Nov. 13, 2017 ACCELERATED COMPUTING: THE PATH FORWARD

ACCELERATED COMPUTING: THE PATH FORWARD

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Jensen Huang, Founder & CEO | SC17 | Nov. 13, 2017

ACCELERATED COMPUTING:THE PATH FORWARD

2

Domain-Specialized Accelerator

Pioneered CUDA Accelerated Computing Extending Performance Post-Moore’s Law

1980 1990 2000 2010 2020

103

105

107

1.5X per year

Single-threaded perf

GPU-Accelerated

Computing

1.1X per year

40 Years of CPU Trend Data

— Tech Walker

COMPUTING AFTER MOORE’S LAW

3

NVIDIA ACCELERATED COMPUTING

Every Computer MakerVolta in Production Every Cloud

| 5,120 CUDA cores

| 7.8 FP64 TFLOPS

21B xtors

15.7 FP32

125 Tensor TFLOPS

VOLTA TAKING OFF

4

ANNOUNCINGNVIDIA TESLA V100 IN MICROSOFT AZURE

NCv3 Series with NVIDIA Tesla V100

Sign up today at www.aka.ms/ncv3

5

ANNOUNCING JAPAN’S AIST ADOPTS NVIDIA VOLTA FOR ABCI SUPERCOMPUTER

Most Powerful AI Supercomputer in Japan

4,352 Tesla V100 GPUs

37 PetaFLOPS FP64 HPC Performance

0.55 ExaFLOPS AI Performance

6

109% Y-Y DATACENTER GROWTH

$0

$500

$1,000

$1,500

FY10 FY11 FY12 FY13 FY14 FY15 FY16 FY17 YTD 18

$1.5B

$1B

$.5B

Full Fiscal Year

1st Three Quarters

FYQ3

7

CUDAComputeWorks

NVIDIA AI SDKcuDNN, NCCL, TensorRT, DIGITS

Every Framework

NGCGRID vPC

Quadro vWS

HPC CSP TRAINING CSP INFERENCE PUBLIC CLOUD INDUSTRIES ENTERPRISE

$12B Market 80% of Apps by 2020 20M Inference Servers $25B Market 600M Amazon Packages $3T IT Industry

— MacHow2

NVIDIA TESLA DATACENTER PLATFORM

8

DEEP LEARNING COMES TO HPC

OPTIMIZED HPC SOFTWARE

ARCHITECTING MODERN DATACENTERS

9

ARCHITECTING MODERN DATACENTERS

APPS

% W

ORKLO

AD

Strong Scaling

Weak Scaling

Deep Learning

Apps Accelerated

Parallel Speed-Up

% Workload

% Sequential “Amdahl’s Law”

10

ARCHITECTING MODERN DATACENTERS

Strong Core CPU

Volta 5,120 CUDA Cores

125 TFLOPS Tensor Core

NVLink for Strong Scaling

11

0

10

20

30

40

50

60

70

80

20 40 60 80 1000

# of CPUs

ns/

day

AMBER Simulation of CRISPR

ARCHITECTING MODERN DATACENTERS

48 CPU Nodes Comet Supercomputer

12

0

10

20

30

40

50

60

70

80

20 40 60 80 1000

# of CPUs

1 Node with 4x V100 GPUs

48 CPU Nodes Comet Supercomputer

ns/

day

AMBER Simulation of CRISPR

ARCHITECTING MODERN DATACENTERS

The Power of Accelerated Computing

13

ARCHITECTING MODERN DATACENTERS

APPS

% W

ORKLO

AD

% Accelerated

REACH

APPS

Apps Accelerated

Parallel Speed-Up

% Workload

% Sequential “Amdahl’s Law”

14Intersect360 Research, Nov 2017 “HPC Application Support for GPU Computing”

VASP

AMBER

NAMD

GROMACS

Gaussian

Simulia Abaqus

WRF

OpenFOAM

ANSYS

LS-DYNA

BLAST

LAMMPS

ANSYS Fluent

Quantum Espresso

GAMESS

500+ Accelerated ApplicationsTop 15 HPC Applications

70% OF THE WORLD’S SUPERCOMPUTINGWORKLOAD ACCELERATED

15

Mixed Workload: Materials Science (VASP)Life Sciences (AMBER)Physics (MILC)Deep Learning (ResNet-50)

160 Self-hosted Servers

96 KWatts

4X BETTER HPC SYSTEM TCO

16

Mixed Workload: Materials Science (VASP)Life Sciences (AMBER)Physics (MILC)Deep Learning (ResNet-50)

12 Accelerated Servers w/4 V100 GPUs

20 KWatts

1/3 the Cost

1/4 the Space

1/5 the Power

4X BETTER HPC SYSTEM TCO

17

Image of ResNet 50 network

(…) Preferred Networks Nov '171024 Tesla P100

IBM Aug '17256 Tesla P100

Facebook June '17256 Tesla P100

Preferred Networks Feb '17128 TitanX (Maxwell)

48 Mins60 Mins

15 Mins

4.4 Hours

Time to Train

ResNet-50 ResNet-50 | Dataset: Imagenet | Trained for 90 Epochs

NVIDIA POWERS WORLD’S FASTEST DEEP LEARNING PERFORMANCE

18

ERRORS

REGRESSION TESTING (FP16/INT8)

INFERENCE (FP16/INT8)

TRAINING (FP32/FP16)

SIMULATION (FP64/FP32)

NEW DATA

TRAINING SET REGRESSION SET NEW DATA

DEEP LEARNING COMES TO HPC

19

U. FLORIDA & UNC: DRUG DISCOVERY300,000X Molecular Energetics Prediction

SLAC: ASTROPHYSICSGravitational Lensing: From Weeks to 10ms

U.S. DoE: CLEAN ENERGY33% More Accurate Neutrino Detection

PRINCETON & ITER: PARTICLE PHYSICS90% Accuracy for Fusion Sustainment

U. PITT: DRUG DISCOVERY35% Higher Accuracy for Protein Scoring

UNSW: PHYSICS14X Faster Bose-Einstein Condensate Creation

DEEP LEARNING COMES TO HPC

20

RE-SIM INFERENCETRAINING3D-SIM

TRAINING SET REGRESSION SET NEW DATA

ERRORS

NEW DATA

NVIDIA DRIVE AI CAR PLATFORM

21

40 PetaFLOPS Peak FP64 Performance | 660 PetaFLOPS DL FP16 Performance | 660 NVIDIA DGX-1 Server Nodes

NVIDIA SATURNV WITH VOLTA

22

Containerized in NVDocker

Optimization Across the Full Stack

Always Up-to-Date

Fully Tested and Maintained by NVIDIA

— MacHow2

NVIDIA GPU CLOUD

CLOUD CONTAINER REGISTRY

23

24

Containerized in NVDocker

Optimized for GPU-accelerated Systems

Up-to-Date Containers

Available NOW

Sign up at nvidia.com/gpu-cloud

ANNOUNCINGNVIDIA GPU CLOUD FOR HPC

CLOUD CONTAINER REGISTRYFOR ACCELERATED HPC APPS

25

GAMESS GROMACS CP2K PYFR BIOINFORMATICS CFD

LAMMPS NAMD QUANTUM ESPRESSO QMCPACK MATERIAL SCIENCE PHYSICS

RELION

SPECFEM3D

CARTESIAN

SPECFEM3D

GLOBE SEISMIC

HPC APPS COMING TO NGC

NOW NEXT

26

27

— MacHow2

VISUALIZATION IS VITAL TO SCIENCE

28

— MacHow2

VISUALIZATION IS VITAL TO SCIENCE

29

Large-scale Volumetric Rendering

Physically Accurate Ray Tracing

Production-quality Images

Seamless integration with ParaView

Early Access NOW

Signup now at nvidia.com/gpu-cloud

ANNOUNCINGNVIDIA GPU CLOUD FOR HPC VISUALIZATION

UNIFIED VISUALIZATIONFOR LARGE DATA SETS

ParaView with NVIDIA OptiX

ParaView with NVIDIA Holodeck

ParaView with NVIDIA IndeX

30

31

NVIDIA GPU CLOUDOPTIMIZED HPC SOFTWARE

DEEP LEARNING HPC APPS HPC VIS

32

NVIDIA ACCELERATED COMPUTING

500+ Accelerated Apps

NVIDIA GPU Cloud for HPC

NVIDIA GPU Cloud

for HPC Vis

— MacHow2

Deep Learning

Comes to HPC

TO EXASCALE, AND BEYOND

Volta in Every Cloud, Every Computer Maker

Accelerated Computing’s Time Has Come