China GTC Marc Hamilton, VP Solutions …images.nvidia.com/cn/gtc/downloads/pdf/hpc/300.2开幕词...300K CUDA Developers, 4x Growth in 4 years Majority of HPC Applications are GPU-Accelerated,

China GTC

Marc Hamilton, VP Solutions Architecture & Engineering

NVIDIA HPC STRATEGY

2

NDA INFORMATION

All Information is under NDA until Sept 12, 2016 | 8:00pm PDT

3

OUR TEN YEARS IN HPC

2006 2008 2012 2016 2010 2014

Fermi: World’s First HPC GPU

Oak Ridge Deploys World’s Fastest Supercomputer w/ GPUs

World’s First Atomic Model of HIV Capsid

GPU-Trained AI Machine Beats World Champion in Go

Stanford Builds AI Machine using GPUs

World’s First 3-D Mapping of Human Genome

CUDA Launched

World’s First GPU Top500 System

Google Outperform Humans in ImageNet

Discovered How H1N1 Mutates to Resist Drugs

AlexNet beats expert code by huge margin using GPUs

4

CREDENTIALS BUILT OVER TIME

300K CUDA Developers, 4x Growth in 4 years

Majority of HPC Applications are GPU-Accelerated, 410 and Growing

100% of Deep Learning Frameworks are Accelerated

113

206

242

370

410

0

50

100

150

200

250

300

350

400

450

2011 2012 2013 2014 2015 2016

287

# of Applications Academia Games Finance

Manufacturing Internet Oil & Gas

National Labs Automotive Defense

M & E

300K

TORCH

THEANO

CAFFE

MATCONVNET

PURINEMOCHA.JL

MINERVA MXNET*

BIG SUR TENSORFLOW

WATSON CNTK

5

A NEW COMPUTING MODEL Something Big That Will Change the Landscape of HPC

Deep Learning Object Detection DNN + Data + HPC

Traditional Computer Vision Experts + Time

Deep Learning Achieves “Superhuman” Results

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2009 2010 2011 2012 2013 2014 2015 2016

Traditional CV

Deep Learning

ImageNet

6

DEEP LEARNING FUELING SCIENCE

Classify Satellite Images for Carbon Monitoring

Analyze Obituaries on the Web for Cancer-related Discoveries

Determine Drug Treatments to Increase Child’s Chance of Survival

NASA AMES

7 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

END-TO-END PRODUCT FAMILY

MIXED-APPS HPC STRONG-SCALE HPC

Data centers running HPC and DL apps scaling to multiple GPUs

HPC data centers running mix of CPU and GPU workloads

HYPERSCALE HPC

Hyperscale deployment for deep learning training & inference

Training - Tesla P100

Inference - Tesla P40 & P4

Tesla P100 with NVLink Tesla P100 with PCI-E

8

MOST PERVASIVE HPC PLATFORM EVER BUILT

ACCESS ANYWHERE BUY ANYWHERE LEARN EVERYWHERE

+ 240 Resellers Worldwide

1000 Universities Teaching CUDA

78 Countries

300K CUDA Developers

9

MASSIVE LEAP IN PERFORMANCE

0x

5x

10x

15x

20x

25x

30x

NAMD VASP MILC HOOMD Blue AMBER Caffe/Alexnet

2x K80 2x P100 (PCIe) 4x P100 (PCIe)

Speed-u

p v

s D

ual Socket

Bro

adw

ell

CPU: Xeon E5-2697v4, 2.3 GHz, 3.6 GHz Turbo Caffe Alexnet: batch size of 256 Images; VASP, NAMD, HOOMD-Blue, and AMBER average speedup across a basket of tests

10 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

CUDA 8 – WHAT’S NEW

New Pascal Architecture

Stacked Memory

NVLINK

FP16 math

P100 Support Large Datasets

Demand Paging

New Tuning APIs

Standard C/C++ Allocators

Unified Memory

New nvGRAPH library

cuBLAS improvements for Deep Learning

Libraries Critical Path Analysis

2x faster compile time

OpenACC profiling

Debug CUDA Apps on display GPU

Developer Tools

NVIDIA CONFIDENTIAL. FOR USE UNDER NDA

Wayne Gaudin and Oliver Perks

Atomic Weapons Establishment, UK

We were extremely impressed that we can run OpenACC on a CPU with no code change and get equivalent performance to our OpenMP/MPI implementation.

OpenACC Performance Portability: CloverLeaf

Hydrodynamics Application OpenACC Performance Portability

Spe

edup

vs

1 CP

U Co

re

Benchmarked Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz, Accelerator: Tesla K80 (dual GPU)

CloverLeaf

“

”

8

Documents

China GTC Marc Hamilton, VP Solutions …images.nvidia.com/cn/gtc/downloads/pdf/hpc/300.2开幕词...300K CUDA Developers, 4x Growth in 4 years Majority of HPC Applications are GPU-Accelerated,