Upload
others
View
22
Download
0
Embed Size (px)
Citation preview
Taipei | May 19 , 2011
Sanford H. RussellDirector of CUDA Marketing, NVIDIA Corporation
The Evolution of
Modern Parallel Computing
2
GPU Computing
Is not vs.
It is +
CPU
CPU
GPU
GPU
3
GPU Computing
Application Code
+
GPU CPUParallelize using CUDA Programming Model
Only Critical FunctionsRest of Sequential
CPU Code
4
GPU Computing Milestones
2002
GPGPU
Programming on
top of OGL
2007
G80 NVIDIA First Parallel
Computing Architecture (Single Precision)
2007
C Compiler
C SDK for GPU
5
2009
1st True HPC Class GPU
Fermi Architecture
DP, ECC and C++ support
2010
Industry Standard IDE
Parallel Nsight for
Microsoft Visual Studio
GPU Computing Milestones
6
#1 : Tianhe-1A7168 Tesla GPU’s 2.5 PFLOPS
#3 : Nebulae4650 Tesla GPU’s 1.2 PFLOPS
#4 : Tsubame 2.04224 Tesla GPU’s 1.194 PFLOPS
Tesla GPUs Power 3 of Top 5 Supercomputers
7
Tesla in 3 of Top 5 Supercomputers
0
500
1000
1500
2000
2500
Tianhe-1A Jaguar Nebulae Tsubame Hopper II
Gig
afl
ops
Performance
8
0
1
2
3
4
5
6
7
8
0
500
1000
1500
2000
2500
Tianhe-1A Jaguar Nebulae Tsubame Hopper II
Megaw
att
s
Gig
afl
ops
Tesla Best Performance/Watt
Power
9
World’s Fastest HPC ProcessorTesla M2090: The 512 Core Fermi
512 CUDA Cores
665 GFlops
178 GB/s memory B/W
10
Industry and Research Partners
FinanceGovernmentEdu/ResearchOil and gas Life Sciences Manufacturing
Reverse Time Migration
Kirchoff Time Migration
Reservoir Sim
AstrophysicsMolecular Dynamics
Weather / Climate Modeling
Signal ProcessingSatellite ImagingVideo Analytics
Synthetic Aperture Radar
Bio-chemistryBio-informatics
Material ScienceSequence Analysis
Genomics
Risk AnalyticsMonte Carlo
Options PricingInsurance modeling
Structural Mechanics
Computational Fluid DynamicsMachine Vision
Electromagnetics
11
Lattice QCD
Simulating Quarks - Lattice QCD Simulation
“Performance of Blue Gene/L at 1% the Cost”
Professor Ting-Wei Chiu Department of Physics
National Taiwan University
15 Tflops for $200,000
12
Abaqus: Accelerated by CUDA
2x Faster
Faster Better Quality=
Engine Block s4b5 Million Degrees of Freedom
Simulate More ScenariosMore Fuel Efficient Engine
Lower CO2 Emissions
The Future
14
0
1
2
3
4
5
6
7
8
9
2005 2006 2007 2008 2009 2010e 2011e 2012e 2013e 2014e
ARM is Pervasive and Open
… and supported by Microsoft
Unit
s in
Billions
Source: ARM, Mercury Research, NVIDIA
ARM
x86
Annual Shipments
15
Project DenverNVIDIA-Designed
High Performance ARM Core
16
CUDA GPU Roadmap
16
2
4
6
8
10
12
14
DP G
FLO
PS p
er
Watt
2007 2009 2011 2013
TeslaFermi
Kepler
Maxwell
17
Summary and Call to Action
Heterogeneous Computing has achieved commercial volume
CPU + GPU
Developers:
Developing on NVIDIA based systems allows you to scale your work onto
Heterogeneous Clusters and Supercomputers
Scientist:
Publish your work, share with your peers
Industry:
Buy systems that are heterogeneous capable