The Evolution of Modern Parallel Computing · The Evolution of Modern Parallel Computing. 2 GPU Computing Is not vs. It is + CPU CPU GPU GPU. 3 GPU Computing Application Code + GPU

Taipei | May 19 , 2011

Sanford H. RussellDirector of CUDA Marketing, NVIDIA Corporation

The Evolution of

Modern Parallel Computing

2

GPU Computing

Is not vs.

It is +

CPU

CPU

GPU

GPU

3

GPU Computing

Application Code

+

GPU CPUParallelize using CUDA Programming Model

Only Critical FunctionsRest of Sequential

CPU Code

4

GPU Computing Milestones

2002

GPGPU

Programming on

top of OGL

2007

G80 NVIDIA First Parallel

Computing Architecture (Single Precision)

2007

C Compiler

C SDK for GPU

5

2009

1st True HPC Class GPU

Fermi Architecture

DP, ECC and C++ support

2010

Industry Standard IDE

Parallel Nsight for

Microsoft Visual Studio

GPU Computing Milestones

6

#1 : Tianhe-1A7168 Tesla GPU’s 2.5 PFLOPS

#3 : Nebulae4650 Tesla GPU’s 1.2 PFLOPS

#4 : Tsubame 2.04224 Tesla GPU’s 1.194 PFLOPS

Tesla GPUs Power 3 of Top 5 Supercomputers

7

Tesla in 3 of Top 5 Supercomputers

0

500

1000

1500

2000

2500

Tianhe-1A Jaguar Nebulae Tsubame Hopper II

Gig

afl

ops

Performance

8

0

1

2

3

4

5

6

7

8

0

500

1000

1500

2000

2500

Tianhe-1A Jaguar Nebulae Tsubame Hopper II

Megaw

att

s

Gig

afl

ops

Tesla Best Performance/Watt

Power

9

World’s Fastest HPC ProcessorTesla M2090: The 512 Core Fermi

512 CUDA Cores

665 GFlops

178 GB/s memory B/W

10

Industry and Research Partners

FinanceGovernmentEdu/ResearchOil and gas Life Sciences Manufacturing

Reverse Time Migration

Kirchoff Time Migration

Reservoir Sim

AstrophysicsMolecular Dynamics

Weather / Climate Modeling

Signal ProcessingSatellite ImagingVideo Analytics

Synthetic Aperture Radar

Bio-chemistryBio-informatics

Material ScienceSequence Analysis

Genomics

Risk AnalyticsMonte Carlo

Options PricingInsurance modeling

Structural Mechanics

Computational Fluid DynamicsMachine Vision

Electromagnetics

http://www.nvidia.com/page/home.html

http://www.simulia.com/index.html

11

Lattice QCD

Simulating Quarks - Lattice QCD Simulation

“Performance of Blue Gene/L at 1% the Cost”

Professor Ting-Wei Chiu Department of Physics

National Taiwan University

15 Tflops for $200,000

12

Abaqus: Accelerated by CUDA

2x Faster

Faster Better Quality=

Engine Block s4b5 Million Degrees of Freedom

Simulate More ScenariosMore Fuel Efficient Engine

Lower CO2 Emissions

The Future

14

0

1

2

3

4

5

6

7

8

9

2005 2006 2007 2008 2009 2010e 2011e 2012e 2013e 2014e

ARM is Pervasive and Open

… and supported by Microsoft

Unit

s in

Billions

Source: ARM, Mercury Research, NVIDIA

ARM

x86

Annual Shipments

15

Project DenverNVIDIA-Designed

High Performance ARM Core

16

CUDA GPU Roadmap

16

2

4

6

8

10

12

14

DP G

FLO

PS p

er

Watt

2007 2009 2011 2013

TeslaFermi

Kepler

Maxwell

17

Summary and Call to Action

Heterogeneous Computing has achieved commercial volume

CPU + GPU

Developers:

Developing on NVIDIA based systems allows you to scale your work onto

Heterogeneous Clusters and Supercomputers

Scientist:

Publish your work, share with your peers

Industry:

Buy systems that are heterogeneous capable

Documents

The Evolution of Modern Parallel Computing · The Evolution of Modern Parallel Computing. 2 GPU Computing Is not vs. It is + CPU CPU GPU GPU. 3 GPU Computing Application Code + GPU