Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
The Role of Accelerated Computing in the Multi-Core Era
Chuck MooreSenior Fellow
Advanced Micro Devices
May 17, 2006
The Role of Accelerated Computing in the Multi-core Era2 November 10, 2007
Key Points in this Talk
1. The semiconductor industry is dependent upon ongoing customer value:
A virtuous cycle:
2. Programming for Multi-Core is a difficult challenge, but it is really just the leading edge of the bigger challenges yet to come
build
investcustomer
value
$$$
The Role of Accelerated Computing in the Multi-core Era3 November 10, 2007
It’s Time to Reorient Around Customer Value
Our industry is obsessed with Performance
The Role of Accelerated Computing in the Multi-core Era4 November 10, 2007
Outline
• Important BackgroundA Few High-level TrendsSome Thoughts on SMP and Multi-core Computing
• The Accelerated Computing ImperativeDense Computing: GPUs and GP-GPUsThe broader potential
• A Framework for Accelerated Computing enablementThe Role of ArchitectureThe Emerging Layers of Computation
• Summary
The Role of Accelerated Computing in the Multi-core Era5 November 10, 2007
A Few High-level TrendsIP
C
Issue Width
The Complexity Wall
o
we arehere
Inte
gra
tion (
log s
cale
)
Time
Moore’s Law ☺!
we arehere
o
Pow
er B
udget
(TD
P)
Time
The Power Wall
we arehere
o
Freq
uen
cy
Time
The Frequency Wall
we arehere
o
Perf
orm
ance
Cache Size
Locality
we arehere
o
Sin
gle
-thre
ad P
erf
?
Time
we arehere
o
Single thread Perf (!)
So, how can we add customer value?
- DFM- Variability- Reliability- Wire delay
Server: power=$$DT: eliminate fansMobile: battery
The Role of Accelerated Computing in the Multi-core Era6 November 10, 2007
Customer Value beyond just Performance
++DualDual--core technologycore technology
Pervasive Pervasive 6464--bit capabilitybit capability
++Systems architecture Systems architecture
innovationinnovation
AMD Native Dual Core Opteron
SMP and Multi-Core to the long term rescue?
AMD Native Quad Core Core Opteron
SeamlessSeamlessupgradeabilityupgradeability
++VirtualizationVirtualization
++PerformancePerformance--perper--watt watt
innovationinnovation
++QuadQuad--core technologycore technology
The Role of Accelerated Computing in the Multi-core Era7 November 10, 2007
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Processors
Relative
Per
form
ance
Single-threadedApplicationResponsiveness
Naive Parallel ApplicationSophisticated Parallel App
SMP Performance (Hypothetical values)
The Role of Accelerated Computing in the Multi-core Era8 November 10, 2007
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Processors
Relative
Per
form
ance
Single-threadedApplicationResponsiveness
Naive Parallel ApplicationSophisticated Parallel App
Single-thread performanceactually goes down! (power constraints)
SMP Performance (Hypothetical values)
The Role of Accelerated Computing in the Multi-core Era9 November 10, 2007
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Processors
Relative
Per
form
ance
Single-threadedApplicationResponsiveness
Naive Parallel ApplicationSophisticated Parallel App
For most users, responsivenessbenefit diminishes after 2-4
processors
SMP Performance (Hypothetical values)
The Role of Accelerated Computing in the Multi-core Era10 November 10, 2007
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Processors
Relative
Per
form
ance
Single-threadedApplicationResponsiveness
Naive Parallel ApplicationSophisticated Parallel App
Writing scalable parallel programsis HARD. Perhaps too hard?
This is a 30 year old problem!
SMP Performance (Hypothetical values)
The Role of Accelerated Computing in the Multi-core Era11 November 10, 2007
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Processors
Relative
Per
form
ance
Single-threadedApplicationResponsiveness
Naive Parallel ApplicationSophisticated Parallel App
Even the best parallel programs rollover at a
modest number of processors
SMP Performance (Hypothetical values)
The Role of Accelerated Computing in the Multi-core Era12 November 10, 2007
Optimized SMP and Multi-core Platforms
• In the near-term, there is definitely potential hereCommodity multi-core processors break the “chicken & egg” barrierImpressive amount of interesting research firing up:
– TM, coherency filters, hierarchical scheduling, MREs, VMs, etc
Lots of good activity on the Tools front More to come
• Some workloads will do well with this, but many will not:As it turns out, software isn’t really that soft
– The underlying structural assumption is often serial processing
– Transitioning the concurrency model is a very big deal
Amdahl’s Law seriously inhibits unstructured parallelism
• In reality, SMP/Multi-core challenges are just an early indicator of the shifts yet to come
Power constraints will force these to be “performance heterogeneous”
Advances in synchronization and NUMA will give rise to new options…
The Role of Accelerated Computing in the Multi-core Era13 November 10, 2007
Outline
• Important BackgroundA Few High-level TrendsSome Thoughts on SMP and Multi-core Computing
• The Accelerated Computing ImperativeDense Computing: GPUs and GP-GPUsThe broader potential
• A Framework for Accelerated Computing enablementThe Role of ArchitectureThe Emerging Layers of Computation
• Summary
The Role of Accelerated Computing in the Multi-core Era14 November 10, 2007
The Accelerated Processing Imperative
x86 applications, workloads and usage models continue to rapidly diversify
Java, XML, web services
3D, digital media
HD, DRM
E-mail, GUI, PowerPoint, web browsers
Spreadsheets, word-processing
x86 Software Complexity and Diversity
First x86 PCIBM model 5150
2000s 2010s1990s1981
The Role of Accelerated Computing in the Multi-core Era15 November 10, 2007
The Accelerated Processing Imperative
Java, XML, web services
3D, digital media
HD, DRM
E-mail, GUI, PowerPoint, web browsers
Spreadsheets, word-processing
x86 Software Complexity and Diversity
64
-bit
Ho
mo
gen
eo
us
Mu
lti-
CP
U
DIV
ER
SIT
Y
64
-bit
Sin
gle
Co
re
PO
WER
/P
ER
F.
≤16-bitSingle Core
PER
F. 32-bit
Single CoreP
ER
F.
AMD64
Dual-Core AMD Opteron™processors
486
Acc
ele
rate
d P
roce
sso
rs
By the end of the decade, homogenous multi-core becomes increasingly inadequate
Pla
tfo
rm A
ccele
rati
on
2000s 2010s1990s1981
The Role of Accelerated Computing in the Multi-core Era16 November 10, 2007
Compute Density: Graphics Processor Performance ☺
AddedShader Programmability
GPGPU:Beyond justgraphics and
gaming
The Role of Accelerated Computing in the Multi-core Era17 November 10, 2007
Ruby Statistics
DoubleCross The Assassin Whiteout
Ruby Polygons 80,000 80,000 200,000
Avg. Triangles/Frame 227,212 546,087 1,069,503
Max Triangles/Frame 556,305 1,018,312 2,150,521
No. of Pixel Shaders 100 316 210
Avg. Pixel Shader Length 20 74 142
Facial Animation Targets 4 4 > 128
ALU:Tex Ratio 4:1 7:1 13:1
2004 2005 2006
The Role of Accelerated Computing in the Multi-core Era18 November 10, 2007
2004 2005 2006
Ruby Statistics
DoubleCross The Assassin Whiteout
Ruby Polygons 80,000 80,000 200,000
Avg. Triangles/Frame 227,212 546,087 1,069,503
Max Triangles/Frame 556,305 1,018,312 2,150,521
No. of Pixel Shaders 100 316 210
Avg. Pixel Shader Length 20 74 142
Facial Animation Targets 4 4 > 128
ALU:Tex Ratio 4:1 7:1 13:1
The Role of Accelerated Computing in the Multi-core Era19 November 10, 2007
As much as
20x
Realities of GP-GPU Power Efficiency
1 TeraFLOPS in a CrossFireconfiguration
Generalized GPU provides unprecedented opportunity for performance-per-watt
FLO
PS
-per-
watt
*
Dual-Core CPU GP-GPU *Source: AMD
500 GigaFLOPS per GPU
Available today –not just theoretical
More than 2 GigaFLOPS-per-watt
The Role of Accelerated Computing in the Multi-core Era20 November 10, 2007
HPC: Remember Attack of the Killer Micros?
Chart Source: Gordon Bell and Jim Gray, ISCA 2000
1/10th the performance, but at 1/100th the costAbsolute performance “good enough”
Productivity greater on a workstation than on a super
Perf
orm
an
ce in
Mfl
op
/s
Micros
Supers
0.01
0.1
1
10
100
1000
10000
1986
1988
1990
1992
1994
1996
8087 802876881
80387
R2000
i860
RS6000/540Alpha
RS6K/590Alpha
Cray 1S
Cray X-MP
Cray 2 Cray Y-MP Cray C90Cray T90
1998
1980
1982
The Role of Accelerated Computing in the Multi-core Era21 November 10, 2007
History Repeating Itself?
Familiar vector-style programming model
$1K - $5K PCs get amazing computational power via GPU
Traditional “computing” is an order of magnitude behind
The Role of Accelerated Computing in the Multi-core Era22 November 10, 2007
Attack of th
e Killer
GPUs
Attack of th
e Killer
Micros
You just can’t ignore this …
The Role of Accelerated Computing in the Multi-core Era23 November 10, 2007
GPU Performance = End of the CPU? NO!
Amdahl’s Law is Alive and Well..
We needboth!
The Role of Accelerated Computing in the Multi-core Era24 November 10, 2007
Package levelintegration
(MCM)
Chip levelIntegration
(SoC)A
ccele
rato
r
CP
U
Acce
lera
tor
CPU
NB
Add-in
Accelerated Computing has very broad potential -- A Continuum of Solutions
PCIeTM Accelerator
HTXTM Accelerator
PCI-ETM
Chipset
Accelerator
Chipset
Socket compatible accelerator
Accelerator
AMD OpteronTM
Socket
AMDProcessor
Integrated Acceleration
Coherent Domainnon-Coherent Domain
“Torrenza”“Fusion”
CPU
The Role of Accelerated Computing in the Multi-core Era25 November 10, 2007
Enablement• Horizontal
technology• to open markets
I/O
Math
FPGA
NetworkProcessing• Established $B
market in network platform
• Likely migration to server platform
Content
Security
Enterprise Technologies• Identified data
center opportunities
XML
Offload
SMPJavaSOA
Storage
Media• Highly competitive
market in flux• Known growth
opp.
IPTV
Proce
ssin
g
Tran
scod
ing
Telco
VoIP IMS
Torrenza: Enabling Partners to Build onthe Concept of Accelerated Computing
The Role of Accelerated Computing in the Multi-core Era26 November 10, 2007
Outline
• Important BackgroundA Few High-level TrendsSome Thoughts on SMP and Multi-core Computing
• The Accelerated Computing ImperativeDense Computing: GPUs and GP-GPUsThe broader potential
• A Framework for Accelerated Computing enablementThe Role of ArchitectureThe Emerging Layers of Computation
• Summary
The Role of Accelerated Computing in the Multi-core Era27 November 10, 2007
The Role of Architecture
• Architecture: The contract between layers of Hardware and Software
• Provides formalism and standardization Defines CompatibilityCompatibility has been a key enabler in our industry – this will continue
History shows that viable products don’t bet on wildly incompatible solutions
• Symbiotic Relationship between Hardware and SoftwareSW is typically the enabler for new HW features or new types of HW
– Actual results dominated by the weakest link in this relationship– SW value chain often values features more than HW optimization
Software complexity driven to extreme levels – this can’t continue
• Architecture gives rise to The Emerging Layers of ComputationCan we use this to simplify the programming models?
The Role of Accelerated Computing in the Multi-core Era28 November 10, 2007
Indicatorsof a bigger
picture?
The Emerging Layers of ComputationStart with an Analogy to the Communications Industry
Hardware
OperatingSystem
Application
Computing Model
BrowsersWeb services
JVM, CLR
Virtualization
Intelligent I/OAdvanced APIs
Reconfiguration
Dynamic binarytranslation
Fault tolerant“meta apps”
The Role of Accelerated Computing in the Multi-core Era29 November 10, 2007
The Emerging Layers of Computation
Physical Layer
Platform Layer
Native RuntimeLayer
Network Layer
Network Runtime Layer
Compatible Hardware Platform
Hypervisor (virtual platform)
API’s, Libs MRE’s
Traditional OS
Applications
Data Center Runtime Environment
Data Center Applications
Networked Platform
Network-aware Applications (web services)
Network Services
RAW Hardware
x86 Compatible Hardware
Devices
VMwar
eAr
chex
tens
ions
GPUs
Dire
ctX
Prox
ied
offlo
ad AJAX
Redu
ndan
t
hard
ware
Erro
rre
cove
ry
Micr
ocod
e
Dyna
mic
trans
latio
n
apps SE
TI@
Hom
e
The Role of Accelerated Computing in the Multi-core Era30 November 10, 2007
Accelerated Computing
Specialpurpose
HW
New types of
Program-able HW
The Data CenterCompute Platform
Lots of Interesting Implications
Compatible Hardware Platform
RAW Hardware
x86 Compatible Hardware
Hypervisor (virtual platform)
API’s, Libs
Traditional OS
Networked Platform
Data Center Runtime Environment
MRE’s
Applications
Data Center Applications
Network-aware Applications (web services)
Network Services
Devices
Parallel Applications
using CMP/SMP
The Role of Accelerated Computing in the Multi-core Era31 November 10, 2007
Summary: The Case for Accelerated Computing
Traditional “host” offload to dense compute acceleratorUse APIs to enable this without heroic programming effortsProven techniques already in use with DirectX & GPUs todayISA compatibility yields to API and Platform Compatibility
Many application classes have reasonably common “kernels”Video encoding; Encryption; Data Movement; Java/CLR …
Broad range of possible accelerator designs & attach pointsCoherent domain or non-coherent domainDedicated special-purpose HW or programmable processor
Lots of ChallengesManaging context state Virtualizing the context stateCommunications/Messaging: “It’s the synchronization, stupid”Memory BW and Data Movement (keep up with computation)New and appropriate APIs
The Role of Accelerated Computing in the Multi-core Era32 November 10, 2007
Thank You !
Questions?
©2007. Advanced Micro Devices, Inc. All rights reserved. AMD, theAMD Arrow logo, AMD Opteron, and combinations thereof, are
trademarks of Advanced Micro Devices, Inc.
Other names are for informational purposes only and may be trademarks of their respective owners.