Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
China GTC
Marc Hamilton, VP Solutions Architecture & Engineering
NVIDIA HPC STRATEGY
2
NDA INFORMATION
All Information is under NDA until Sept 12, 2016 | 8:00pm PDT
3
OUR TEN YEARS IN HPC
2006 2008 2012 2016 2010 2014
Fermi: World’s First HPC GPU
Oak Ridge Deploys World’s Fastest Supercomputer w/ GPUs
World’s First Atomic Model of HIV Capsid
GPU-Trained AI Machine Beats World Champion in Go
Stanford Builds AI Machine using GPUs
World’s First 3-D Mapping of Human Genome
CUDA Launched
World’s First GPU Top500 System
Google Outperform Humans in ImageNet
Discovered How H1N1 Mutates to Resist Drugs
AlexNet beats expert code by huge margin using GPUs
4
CREDENTIALS BUILT OVER TIME
300K CUDA Developers, 4x Growth in 4 years
Majority of HPC Applications are GPU-Accelerated, 410 and Growing
100% of Deep Learning Frameworks are Accelerated
113
206
242
370
410
0
50
100
150
200
250
300
350
400
450
2011 2012 2013 2014 2015 2016
287
# of Applications Academia Games Finance
Manufacturing Internet Oil & Gas
National Labs Automotive Defense
M & E
300K
TORCH
THEANO
CAFFE
MATCONVNET
PURINEMOCHA.JL
MINERVA MXNET*
BIG SUR TENSORFLOW
WATSON CNTK
5
A NEW COMPUTING MODEL Something Big That Will Change the Landscape of HPC
Deep Learning Object Detection DNN + Data + HPC
Traditional Computer Vision Experts + Time
Deep Learning Achieves “Superhuman” Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2009 2010 2011 2012 2013 2014 2015 2016
Traditional CV
Deep Learning
ImageNet
6
DEEP LEARNING FUELING SCIENCE
Classify Satellite Images for Carbon Monitoring
Analyze Obituaries on the Web for Cancer-related Discoveries
Determine Drug Treatments to Increase Child’s Chance of Survival
NASA AMES
7 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
END-TO-END PRODUCT FAMILY
MIXED-APPS HPC STRONG-SCALE HPC
Data centers running HPC and DL apps scaling to multiple GPUs
HPC data centers running mix of CPU and GPU workloads
HYPERSCALE HPC
Hyperscale deployment for deep learning training & inference
Training - Tesla P100
Inference - Tesla P40 & P4
Tesla P100 with NVLink Tesla P100 with PCI-E
8
MOST PERVASIVE HPC PLATFORM EVER BUILT
ACCESS ANYWHERE BUY ANYWHERE LEARN EVERYWHERE
+ 240 Resellers Worldwide
1000 Universities Teaching CUDA
78 Countries
300K CUDA Developers
9
MASSIVE LEAP IN PERFORMANCE
0x
5x
10x
15x
20x
25x
30x
NAMD VASP MILC HOOMD Blue AMBER Caffe/Alexnet
2x K80 2x P100 (PCIe) 4x P100 (PCIe)
Speed-u
p v
s D
ual Socket
Bro
adw
ell
CPU: Xeon E5-2697v4, 2.3 GHz, 3.6 GHz Turbo Caffe Alexnet: batch size of 256 Images; VASP, NAMD, HOOMD-Blue, and AMBER average speedup across a basket of tests
10 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
CUDA 8 – WHAT’S NEW
New Pascal Architecture
Stacked Memory
NVLINK
FP16 math
P100 Support Large Datasets
Demand Paging
New Tuning APIs
Standard C/C++ Allocators
Unified Memory
New nvGRAPH library
cuBLAS improvements for Deep Learning
Libraries Critical Path Analysis
2x faster compile time
OpenACC profiling
Debug CUDA Apps on display GPU
Developer Tools
NVIDIA CONFIDENTIAL. FOR USE UNDER NDA
Wayne Gaudin and Oliver Perks
Atomic Weapons Establishment, UK
We were extremely impressed that we can run OpenACC on a CPU with no code change and get equivalent performance to our OpenMP/MPI implementation.
OpenACC Performance Portability: CloverLeaf
Hydrodynamics Application OpenACC Performance Portability
Spe
edup
vs
1 CP
U Co
re
Benchmarked Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz, Accelerator: Tesla K80 (dual GPU)
CloverLeaf
“
”
8