Download ppt - An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

An Execution Model for Heterogeneous An Execution Model for Heterogeneous Multicore ArchitecturesMulticore Architectures

Gregory Diamos, Andrew Kerr, and Sudhakar Gregory Diamos, Andrew Kerr, and Sudhakar YalamanchiliYalamanchili

Computer Architecture and Systems LaboratoryComputer Architecture and Systems LaboratoryCenter for Experimental Research in Computer SystemsCenter for Experimental Research in Computer Systems

School of Electrical and Computer EngineeringSchool of Electrical and Computer EngineeringGeorgia Institute of TechnologyGeorgia Institute of Technology

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Software Challenges of Heterogeneity

Programming Model

Execution Model

Portability

Performance


System Space

System Size and Configuration

Level o

f Ab

stractio

n

Multi-nodeMulti GPUMulticore CPU

Single GPUMulticore CPU

Runtime Execution Model(Harmony)

Runtime Translation of Data-Parallel IR

(Ocelot)


Scalable Portable Execution – Harmony Runtime

LocalMemoryCache

ACC

DMA

FIFOLocalMemoryCache

LocalMemoryCache

ACC

DMA

FIFOACC

DMA

FIFO

Network (e.g., Hypertransport, QPI, PCIe)

CPU CPU CPU

Harmony Run-time

chunk

chunk

Inputs Outputs InputsOutputs

Memory

Transparent scheduling, execution management of chunks

Binary compatibility across system sizes

Cap Model 3readInputs();computeInvariants();for all chunks{ simulateChunk();}generateResults();

Minimize/avoid retuning and porting applications as you add accelerators

Advanced optimizations Speculation, performance prediction, kernel fusion

kernel

kernel



Emerging Environment

Run Time (Harmony)

CUDA JITLLVM I/FEmulator

GPGPU SimulatorSupported ISAs (MIPS,

SPARC, x86, etc.)

Language Front End

Kernel IR

Language Front End

Ocelot

Prof. H. Kim

StatusStatus: • Single node/multi-GPU

StatusStatus: • Test and Debug

StatusStatus: • In progress (Fall 2009)

StatusStatus: • Summer 2009• With Prof. Nate Clark

Datalog

CUDA/OpenCL


Emerging HVM Platform Architecture

With K. Schwan and A. Gavrilovska


Problem Scaling – Risk Analysis Application

With latest CPUs (2x faster) and GPUs(4x faster), GPU advantage should grow by 2x

Measured execution

times

GPU interactive overhead dominates


Other Applications


Execution Group

GPU Compilation Flow

Abstract Syntax Tree(Datalog Clauses)

GPU(EU)

GPU(EU)

GPU(EU)

GPU Core CPU Core

Data Structures Compute Kernels

P

P

Runtime

Clauses to Execution Units

Execution Units to Algorithms (Kernels)

Predicates to Data Structures

Runtime Mapping of Kernels to Cores