An Execution Model for Heterogeneous An Execution Model for Heterogeneous Multicore ArchitecturesMulticore Architectures
Gregory Diamos, Andrew Kerr, and Sudhakar Gregory Diamos, Andrew Kerr, and Sudhakar YalamanchiliYalamanchili
Computer Architecture and Systems LaboratoryComputer Architecture and Systems LaboratoryCenter for Experimental Research in Computer SystemsCenter for Experimental Research in Computer Systems
School of Electrical and Computer EngineeringSchool of Electrical and Computer EngineeringGeorgia Institute of TechnologyGeorgia Institute of Technology
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Software Challenges of Heterogeneity
Programming Model
Execution Model
Portability
Performance
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
System Space
System Size and Configuration
Level o
f Ab
stractio
n
Multi-nodeMulti GPUMulticore CPU
Single GPUMulticore CPU
Runtime Execution Model(Harmony)
Runtime Translation of Data-Parallel IR
(Ocelot)
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Scalable Portable Execution – Harmony Runtime
LocalMemoryCache
ACC
DMA
FIFOLocalMemoryCache
LocalMemoryCache
ACC
DMA
FIFOACC
DMA
FIFO
Network (e.g., Hypertransport, QPI, PCIe)
CPU CPU CPU
Harmony Run-time
chunk
chunk
Inputs Outputs InputsOutputs
Memory
Transparent scheduling, execution management of chunks
Binary compatibility across system sizes
Cap Model 3readInputs();computeInvariants();for all chunks{ simulateChunk();}generateResults();
Minimize/avoid retuning and porting applications as you add accelerators
Advanced optimizations Speculation, performance prediction, kernel fusion
kernel
kernel
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Emerging Environment
Run Time (Harmony)
CUDA JITLLVM I/FEmulator
GPGPU SimulatorSupported ISAs (MIPS,
SPARC, x86, etc.)
Language Front End
Kernel IR
Language Front End
Ocelot
Prof. H. Kim
StatusStatus: • Single node/multi-GPU
StatusStatus: • Test and Debug
StatusStatus: • In progress (Fall 2009)
StatusStatus: • Summer 2009• With Prof. Nate Clark
Datalog
CUDA/OpenCL
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Emerging HVM Platform Architecture
With K. Schwan and A. Gavrilovska
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Problem Scaling – Risk Analysis Application
With latest CPUs (2x faster) and GPUs(4x faster), GPU advantage should grow by 2x
Measured execution
times
GPU interactive overhead dominates
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Other Applications
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Execution Group
GPU Compilation Flow
Abstract Syntax Tree(Datalog Clauses)
GPU(EU)
GPU(EU)
GPU(EU)
GPU Core CPU Core
Data Structures Compute Kernels
P
P
Runtime
Clauses to Execution Units
Execution Units to Algorithms (Kernels)
Predicates to Data Structures
Runtime Mapping of Kernels to Cores