Legion Overview · Bootcamp Focus Writing Legion programs Different from the academic papers Cover...

Preview:

Citation preview

December 4, 2014 1 http://legion.stanford.edu

Alex Aiken

Legion Overview

December 4, 2014 2 http://legion.stanford.edu

Logistics

 Wireless   Choose “Stanford Visitor” network, follow directions Bootcamp slides @ legion.stanford.edu

 Thursday

  Extending the schedule by 15 minutes   Parking   Lunch   Dinner

 Friday   Different building: Gates 505

December 4, 2014 3 http://legion.stanford.edu

Team

 Alex Aiken  Mike Bauer (Nvidia) Zhihao Jia Wonchan Lee  Elliott Slaughter  Sean Treichler

 Charles Ferenbaugh  Sam Gutierrez  Pat McCormick

December 4, 2014 4 http://legion.stanford.edu

Legion

 A programming model for heterogeneous, distributed machines

 Heterogeneous   Mixed CPUs and GPUs

 Distributed   Large spread, and variability, of communication latencies   Caches, RAM, NUMA, network, …

December 4, 2014 5 http://legion.stanford.edu

One Slide History

 Started in 2011

 First version in 2012

 S3D implementation in 2013   Collaboration with Jackie Chen’s group at Sandia   Part of the ExaCT Center   Drove many feature changes/additions   And many optimizations/improvements

 Emphasis on scaling up in 2014   S3D on 8,000 Titan nodes

December 4, 2014 6 http://legion.stanford.edu

Legion S3D Heptane Performance

 1.73X - 2.85X faster between 1024 and 8192 nodes

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192Nodes

0

50000

100000

150000

200000

Thro

ughp

utPe

rNod

e(P

oint

s/s)

Titan Legion 483

Titan Legion 643

Titan Legion 963

Keeneland Legion 483

Keeneland Legion 643

Keeneland Legion 963

Titan OpenACC 483

Titan OpenACC 643

December 4, 2014 7 http://legion.stanford.edu

Bootcamp Focus

 Writing Legion programs   Different from the academic papers   Cover many pragmatic, usability aspects

 This morning: The programming model

  Tasks, regions, mapping

 This afternoon: Everything else   Structuring applications   Debugging & profiling

 Tomorrow: Working with application groups

December 4, 2014 8 http://legion.stanford.edu

Philosophy

 Designed to be a real programming system

 Good abstractions, clear semantics

 But can also “open the hood”   Ways to drop down to lower-levels of abstraction   Within the programming model

December 4, 2014 9 http://legion.stanford.edu

Example Code

December 4, 2014 10 http://legion.stanford.edu

First Point

 Legion has a sequential semantics   Easy to reason about   But see discussion of advanced features this afternoon

 Not like   MPI OpenACC   CUDA

December 4, 2014 11 http://legion.stanford.edu

Second Point

 A programming model   embedded in C++   but see discussion of future Legion compiler later today

December 4, 2014 12 http://legion.stanford.edu

Third Point

 A runtime system   All decisions are made dynamically   Again, see discussion of Legion compiler …

December 4, 2014 13 http://legion.stanford.edu

Tasks

 A task is   The unit of parallel computation in Legion   Takes regions (typed collections) as arguments   Can launch subtasks

December 4, 2014 14 http://legion.stanford.edu

Task Tree

 Legion programs can launch arbitrary trees of tasks

 By default, execute in the order launched

 Runtime system automatically identifies parallel tasks

T

T T T

T T T T

December 4, 2014 15 http://legion.stanford.edu

Regions: Index & Field Spaces

11 2 3 . . . N

Field x =

December 4, 2014 16 http://legion.stanford.edu

Regions

 Two Dimensions

 Unbounded set of rows

 Bounded set of columns   Fields

 Tasks declare   Which fields they use   And how they use them

 Regions can be partitioned

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

Voltage Capac. Induct. Charge

December 4, 2014 17 http://legion.stanford.edu

Partitioning

SP

N

December 4, 2014 18 http://legion.stanford.edu

Partitioning

N

SP

p1 pn … s1 sn … g1 gn …

December 4, 2014 19 http://legion.stanford.edu

Organize Into Pieces

N

SP

p1 pn … s1 sn … g1 gn …

December 4, 2014 20 http://legion.stanford.edu

Embedded in C++

Can write any C++ code within a task   Local state, pointers, etc.   Must follow discipline when using Legion API

 Regions are first class   Can be passed as arguments, stored in data structures

December 4, 2014 21 http://legion.stanford.edu

Populating Regions

 Can’t read/update a region without an instance   Instances hold a valid current copy of the data

December 4, 2014 22 http://legion.stanford.edu

Populating Regions

 To read/update a region, need an accessor   A handle to reference, or iterate through, elements

December 4, 2014 23 http://legion.stanford.edu

Regions: Privileges & Coherence

December 4, 2014 24 http://legion.stanford.edu

Back to the Simulation

One task per circuit piece

Read/Write on wires pieces Read Only on everything else

December 4, 2014 25 http://legion.stanford.edu

The Crux

 Crucial design decisions in a Legion program are:

 What are the regions?   How are the regions partitioned into subregions?

 What are the tasks?   How are the tasks decomposed into subtasks?

 Often tension between the two   These decisions drive the program’s design

December 4, 2014 26 http://legion.stanford.edu

Summary

 The programmer   Describes the structure of the program’s data

  Regions   The tasks that operate on that data

 The Legion implementation

  Guarantees tasks appear to execute in sequential order   Ensures tasks have valid versions of their regions

December 4, 2014 27 http://legion.stanford.edu

Mapping Interface  Programmer selects:

  Where tasks run   Where regions are placed

 Mapping computed dynamically

 Decouple correctness from performance

27  

t1

t2

t3

t4t5

rc

rw

rw1 rw2

rn

rn1 rn2

$

$

$

$

N U M A

N U M A

FB

D R A M

x86

CUDA

x86

x86

x86

December 4, 2014 28 http://legion.stanford.edu

Mapping

 Mapper interface = callback interface   Legion runtime calls user-supplied methods   Can do arbitrary computation to make decisions

  But often very simple

December 4, 2014 29 http://legion.stanford.edu

The Crux, Revisited

 Crucial design decisions in a Legion program are   What are the regions?   What are the tasks?

  In particular, mapping decisions depend on design of the regions and tasks

  Not the other way around

December 4, 2014 30 http://legion.stanford.edu

Debugging

December 4, 2014 31 http://legion.stanford.edu

LegionProf (Heptane 483)

4 AMD Interlagos

Integer cores for Legion Runtime

8 AMD Interlagos FP

cores for application

NVIDIA Kepler K20

Dynamic Analysis for (rhsf+2) Clean-up/meta tasks

December 4, 2014 32 http://legion.stanford.edu

Questions?

Recommended