Upload
jason-park
View
222
Download
0
Embed Size (px)
Citation preview
8/2/2019 Introduction Parallel Heterogeneous Computing Final
1/35
Introduction to Parallel andHeterogeneous Computing
Benedict R. Gaster| October, 2010
8/2/2019 Introduction Parallel Heterogeneous Computing Final
2/35
| Introduction to Parallel and Heterogeneous Computing| October, 20102
Agenda
Motivation
A little terminology
Hardware in a heterogeneous world
Software in a heterogeneous world
8/2/2019 Introduction Parallel Heterogeneous Computing Final
3/35
| Introduction to Parallel and Heterogeneous Computing| October, 20103
The Free Lunch is Over
Herb Sutter (2005)
Hardware can no longer depend on getting:
Increased clock speed
Execution optimization (i.e. instruction levelparallelism)
Larger caches
How has and is this being addressed?
8/2/2019 Introduction Parallel Heterogeneous Computing Final
4/35
| Introduction to Parallel and Heterogeneous Computing| October, 20104
Solution
Parallelism
(lots of it!)
8/2/2019 Introduction Parallel Heterogeneous Computing Final
5/35
| Introduction to Parallel and Heterogeneous Computing| October, 20105
Quick stop to cover a bit of terminology
8/2/2019 Introduction Parallel Heterogeneous Computing Final
6/35
| Introduction to Parallel and Heterogeneous Computing| October, 20106
Definitions
Parallelism
A property of a computation where portions of thecalculations are independent of each other, allowingthem to be executed at the same time.
For example, consider the following pseudo code:
Assignments a, b, c, andd are independent, socan be run in parallel
float a = E + A;float b = E + B;float c = E + C;float d = E + D;float r = a + b + c + d;
float a = E + A;float b = E + B;float c = E + C;float d = E + D;float r = a + b + c + d;
8/2/2019 Introduction Parallel Heterogeneous Computing Final
7/35
| Introduction to Parallel and Heterogeneous Computing| October, 20107
Definitions
Concurrency
A logical programming abstraction used to arbitratecommunication between multiple processing entities(like processes or threads).
For example, concurrency can be used to build user
interfaces and other asynchronous tasks. Concurrency is NOT the same as parallelism
Does no preclude running tasks in parallel, it is not anecessary component.
8/2/2019 Introduction Parallel Heterogeneous Computing Final
8/35
| Introduction to Parallel and Heterogeneous Computing| October, 20108
Definitions
Heterogenous Computing
A system comprised of two or more compute engineswith signficant structural differences
In our case, a low latency x86 CPU and a highthroughput Radeon GPU
Fusion
Bringing together two or more components and joiningthem into a single unified whole
In our case, combining CPUs and GPUs on a single
silicon die for higher performance and lower power
8/2/2019 Introduction Parallel Heterogeneous Computing Final
9/35
| Introduction to Parallel and Heterogeneous Computing| October, 20109
Hardware in a heterogeneous world
8/2/2019 Introduction Parallel Heterogeneous Computing Final
10/35
| Introduction to Parallel and Heterogeneous Computing| October, 201010
AMD Balanced Platform Advantage
Delivers optimal performance for a wide range ofplatform configurations
Other HighlyParallel Workloads
Graphics Workloads
Serial/Task-ParallelWorkloads
CPU is ideal for scalar processing
Out of order x86 cores with low
latency memory access
Optimized for sequential andbranching algorithms
Runs existing applications very well
GPU is ideal for parallel processing
GPU shaders optimized for
throughput computing
Ready for emerging workloads
Media processing, simulation, naturalUI, etc
8/2/2019 Introduction Parallel Heterogeneous Computing Final
11/35
| Introduction to Parallel and Heterogeneous Computing| October, 201011
Three Eras of Processor Performance
Single-CoreEra
Single-thread
Performance
?
Time
we arehere
o
Enabled by: Moores Law
Voltage Scaling MicroArchitecture
Constrained by:Power
Complexity
Multi-CoreEra
Throughpu
tPerformance
Time(# of Processors)
we are
here
o
Enabled by: Moores Law
Desire for Throughput 20 years of SMP arch
Constrained by:Power
Parallel SW availabilityScalability
HeterogeneousSystems Era
Targeted
Application
Performance
Time(Data-parallel exploitation)
we are
here
o
Enabled by: Moores Law
Abundant data parallelism Power efficient GPUs
Temporarilyconstrained by:Programming models
Communication overheads
8/2/2019 Introduction Parallel Heterogeneous Computing Final
12/35
| Introduction to Parallel and Heterogeneous Computing| October, 201012
GPU SP ALU Performance
HD4870
HD5870
CPU
8/2/2019 Introduction Parallel Heterogeneous Computing Final
13/35
| Introduction to Parallel and Heterogeneous Computing| October, 201013
GPU DP ALU Performance
HD4870
HD5870
CPU
8/2/2019 Introduction Parallel Heterogeneous Computing Final
14/35
| Introduction to Parallel and Heterogeneous Computing| October, 201014
GPU BW Performance expectations over time
250
0
100
200
50
150
300
HD5870
HD4870
8/2/2019 Introduction Parallel Heterogeneous Computing Final
15/35
| Introduction to Parallel and Heterogeneous Computing| October, 201015
GPU Computing Efficiency Trend
7.50
4.56
4.50
2.24
2.21
0.92
2.01
1.06
1.07
0.42
GFLOPS/W
GFLOPS/mm2
14.47GFLOPS/W
7.90GFLOPS/mm2
8/2/2019 Introduction Parallel Heterogeneous Computing Final
16/35
| Introduction to Parallel and Heterogeneous Computing| October, 201016
Fusion APUs: Putting it all together
System-levelProgrammable
Multi-CoreEra
HeterogeneousSystems Era
Single-ThreadEra
FusionAPU
HeterogeneousComputing
Throughput Performance
ProgrammerAccessibility
GraphicsDriver-basedprograms
OCL/DCDriver-basedprograms
Power-efficient
Data Parallel
Execution
High Performance
Task Parallel Execution
Microprocessor Advancement
GPU
Advancement
Unaccep
table
ExpertsO
nly
Mainstre
am
8/2/2019 Introduction Parallel Heterogeneous Computing Final
17/35
| Introduction to Parallel and Heterogeneous Computing| October, 201017
Why AMD Fusion APUs?A balanced approach is optimal
The GPU is theGame Changer
Enormous parallelcomputing capacity
Outstandingperformance perwatt per dollar
Very efficient
hardware threading
SIMD architecturewell matched tomedia workloads:video, audio, graphics
Positioned to enablethe emergence ofimmersive mediabased experiences
X86 CPU ownsthe SW Universe
Windows, MacOSand Linux Franchises
Many thousandsof applications
Well matched tobranchy scalar code
Establishedprogramming andmemory model
Mature tool chain
Backward compatiblefor 15 years of
applications and OSs Highly Programmable Power Efficient Massive Throughput Best of both worlds
8/2/2019 Introduction Parallel Heterogeneous Computing Final
18/35
| Introduction to Parallel and Heterogeneous Computing| October, 201018
PC with Discrete GPU
8/2/2019 Introduction Parallel Heterogeneous Computing Final
19/35
| Introduction to Parallel and Heterogeneous Computing| October, 201019
Fusion APU Based PC
8/2/2019 Introduction Parallel Heterogeneous Computing Final
20/35
| Introduction to Parallel and Heterogeneous Computing| October, 201020
The Benefits of Fusion
Unparalleled processing capabilities in mobile form
factors Shared memory for the CPU and GPU
Eliminates copies, increasing performance
Reduces dispatch overhead
Lower latency from the GPU to memory
Power efficient design
Enables architectural innovations between CPU, GPU and
the Memory System Scalable architecture that can target a broad range ofplatforms from mobile to data center
8/2/2019 Introduction Parallel Heterogeneous Computing Final
21/35
| Introduction to Parallel and Heterogeneous Computing| October, 201021
These machines are being built but
Heterogeneous systems are being built and there is no
question that we will build more of them
There are new emerging workloads that contain enoughparallelism to use them, but
This not enough!
The question then becomes:
How do we program 10, 100, or even >1000 cores?
The future of performance is entirely about software!
8/2/2019 Introduction Parallel Heterogeneous Computing Final
22/35
| Introduction to Parallel and Heterogeneous Computing| October, 201022
AMD Fusion Developer Summit
Find out more about
Fusion APUs;
Programming models for Fusion; and
Much more
June 13-16, 2011Seattle, Washington, USA
http://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspx
http://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
23/35
| Introduction to Parallel and Heterogeneous Computing| October, 201023
OpenCL Programming Webinar Series
Designed to help advance your experience in parallel
programming, with a focus on OpenCL
Much of what will be taught is useful for parallelprograming in general
Beginners Tacks Advanced Tracks
Introduction to OpenCL Device Fission Extension forOenCL
OpenCL Programming in Detail Optimization Techniques I
Using OpenCL C Language Optimization Techniques II
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
http://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
24/35
| Introduction to Parallel and Heterogeneous Computing| October, 201024
Software in a heterogeneous world
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
25/35
| Introduction to Parallel and Heterogeneous Computing| October, 201025
What lies ahead?
Guy Steele (2009)
The Future Is Parallel: Whats a Programmer to Do?
Million dollar question with many (many) answers!
Taskparallelism
OpenMP
MPI
OpenCL
JavaThreads
TaskParallelLibrary
Cuda
ConcurrentML
ThreadBuildingBlocks
Cilk
POSIX
Win32Threads
Kite
Brook+
AcceleratorX10
FortressData
parallelism
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
26/35
| Introduction to Parallel and Heterogeneous Computing| October, 201026
Different types of parallelism
Braided parallelism
Task-decomposition
Data-decomposition
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
27/35
| Introduction to Parallel and Heterogeneous Computing| October, 201027
Task-decomposition
Divides the problem by type of task to be done
For example, in modern games:
Computations are organized as tasks/jobs
Some maybe fine-grained (short-running)
Others long-running and data-parallel
Tasking runtime must account for:
Task dependencies
Synchronization
Load balancing
Etc
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
28/35
| Introduction to Parallel and Heterogeneous Computing| October, 201028
44
Load balancing - work Stealing
Internally, most tasking runtimes use
Work stealing implementation
Work stealing has provably
Good locality
Work distribution properties
1 2 3
Seminal reference:Cilk: an efficient multithreadedruntime system
Blumofe et alSIGPLAN Notices1995
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
29/35
| Introduction to Parallel and Heterogeneous Computing| October, 201029
Popular task runtimes (CPU only)
Unmanaged C/C++
Intels Thread Building Blocks
Apples Grand Central Dispatch
OpenMP Parallelism should not be tacked on!
Managed languages
Microsofts Task Parallel library for .NET4
Different OS, different options!
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
30/35
| Introduction to Parallel and Heterogeneous Computing| October, 201030
Data-decomposition
Divides the problem into elements to be processed
assigning a subset of elements to a parallel worker
For example, in modern games:
Particle systems
1000 maybe 100,000, even millions
forces and actions computed independently (localitycan be used to describe interaction)
Data-parallel execution must account for:
Local communication Synchronization
Etc
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
31/35
| Introduction to Parallel and Heterogeneous Computing| October, 201031
Popular data-parallel languages
Unmanaged C/C++
Khronos Open Compute Language (OpenCL) (CPU+GPU)
NVIDIAs Cuda (GPU only)
OpenMP Parallelism should not be tacked on! (CPU only)
Managed languages
Microsofts Accelerator II for .NET4 (CPU + GPU via DX9)
AMDs Aparapi (A PARallel API) for Java (CPU + GPU viaOpenCL)
Different OS, different options!
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
32/35
| Introduction to Parallel and Heterogeneous Computing| October, 201032
Task and data-parallelism together
Reference:Aaron Lefohn.Programming Larrabee: Beyond Data Parallelism.Beyond Programmable Shading Course. SIGGRAPH 2008.
Braided Parallelism
Job graph from DICEs
Battlefield Bad Company 2
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
33/35
| Introduction to Parallel and Heterogeneous Computing| October, 201033
Fusion APUs: Putting it all together
System-levelProgrammable
Multi-CoreEra
HeterogeneousSystems Era
Single-ThreadEra
Fusion
APU
HeterogeneousComputing
Throughput Performance
ProgrammerA
ccessibility
GraphicsDriver-basedprograms
OCL/DCDriver-basedprograms
Power-efficient
Data Parallel
Execution
High Performance
Task Parallel Execution
Microprocessor Advancement
GPU
Advancement
Unaccep
table
ExpertsO
nly
Mainstre
am
Braided Parallelisma natural programming
model for heterogeneouscomputing
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
34/35
| Introduction to Parallel and Heterogeneous Computing| October, 201034
Conclusion and Questions
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx8/2/2019 Introduction Parallel Heterogeneous Computing Final
35/35
| Introduction to Parallel and Heterogeneous Computing| October, 201035
Trademark Attribution
AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in theUnited States and/or other jurisdictions. Other names used in this presentation are for identificationpurposes only and may be trademarks of their respective owners.
2009 Advanced Micro Devices, Inc. All rights reserved.
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx