Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
December 4, 2014 1 http://legion.stanford.edu
Alex Aiken
Legion Overview
December 4, 2014 2 http://legion.stanford.edu
Logistics
Wireless Choose “Stanford Visitor” network, follow directions Bootcamp slides @ legion.stanford.edu
Thursday
Extending the schedule by 15 minutes Parking Lunch Dinner
Friday Different building: Gates 505
December 4, 2014 3 http://legion.stanford.edu
Team
Alex Aiken Mike Bauer (Nvidia) Zhihao Jia Wonchan Lee Elliott Slaughter Sean Treichler
Charles Ferenbaugh Sam Gutierrez Pat McCormick
December 4, 2014 4 http://legion.stanford.edu
Legion
A programming model for heterogeneous, distributed machines
Heterogeneous Mixed CPUs and GPUs
Distributed Large spread, and variability, of communication latencies Caches, RAM, NUMA, network, …
December 4, 2014 5 http://legion.stanford.edu
One Slide History
Started in 2011
First version in 2012
S3D implementation in 2013 Collaboration with Jackie Chen’s group at Sandia Part of the ExaCT Center Drove many feature changes/additions And many optimizations/improvements
Emphasis on scaling up in 2014 S3D on 8,000 Titan nodes
December 4, 2014 6 http://legion.stanford.edu
Legion S3D Heptane Performance
1.73X - 2.85X faster between 1024 and 8192 nodes
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192Nodes
0
50000
100000
150000
200000
Thro
ughp
utPe
rNod
e(P
oint
s/s)
Titan Legion 483
Titan Legion 643
Titan Legion 963
Keeneland Legion 483
Keeneland Legion 643
Keeneland Legion 963
Titan OpenACC 483
Titan OpenACC 643
December 4, 2014 7 http://legion.stanford.edu
Bootcamp Focus
Writing Legion programs Different from the academic papers Cover many pragmatic, usability aspects
This morning: The programming model
Tasks, regions, mapping
This afternoon: Everything else Structuring applications Debugging & profiling
Tomorrow: Working with application groups
December 4, 2014 8 http://legion.stanford.edu
Philosophy
Designed to be a real programming system
Good abstractions, clear semantics
But can also “open the hood” Ways to drop down to lower-levels of abstraction Within the programming model
December 4, 2014 9 http://legion.stanford.edu
Example Code
December 4, 2014 10 http://legion.stanford.edu
First Point
Legion has a sequential semantics Easy to reason about But see discussion of advanced features this afternoon
Not like MPI OpenACC CUDA
December 4, 2014 11 http://legion.stanford.edu
Second Point
A programming model embedded in C++ but see discussion of future Legion compiler later today
December 4, 2014 12 http://legion.stanford.edu
Third Point
A runtime system All decisions are made dynamically Again, see discussion of Legion compiler …
December 4, 2014 13 http://legion.stanford.edu
Tasks
A task is The unit of parallel computation in Legion Takes regions (typed collections) as arguments Can launch subtasks
December 4, 2014 14 http://legion.stanford.edu
Task Tree
Legion programs can launch arbitrary trees of tasks
By default, execute in the order launched
Runtime system automatically identifies parallel tasks
T
T T T
T T T T
December 4, 2014 15 http://legion.stanford.edu
Regions: Index & Field Spaces
11 2 3 . . . N
Field x =
December 4, 2014 16 http://legion.stanford.edu
Regions
Two Dimensions
Unbounded set of rows
Bounded set of columns Fields
Tasks declare Which fields they use And how they use them
Regions can be partitioned
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Voltage Capac. Induct. Charge
December 4, 2014 17 http://legion.stanford.edu
Partitioning
SP
N
December 4, 2014 18 http://legion.stanford.edu
Partitioning
N
SP
p1 pn … s1 sn … g1 gn …
December 4, 2014 19 http://legion.stanford.edu
Organize Into Pieces
N
SP
p1 pn … s1 sn … g1 gn …
December 4, 2014 20 http://legion.stanford.edu
Embedded in C++
Can write any C++ code within a task Local state, pointers, etc. Must follow discipline when using Legion API
Regions are first class Can be passed as arguments, stored in data structures
December 4, 2014 21 http://legion.stanford.edu
Populating Regions
Can’t read/update a region without an instance Instances hold a valid current copy of the data
December 4, 2014 22 http://legion.stanford.edu
Populating Regions
To read/update a region, need an accessor A handle to reference, or iterate through, elements
December 4, 2014 23 http://legion.stanford.edu
Regions: Privileges & Coherence
December 4, 2014 24 http://legion.stanford.edu
Back to the Simulation
One task per circuit piece
Read/Write on wires pieces Read Only on everything else
December 4, 2014 25 http://legion.stanford.edu
The Crux
Crucial design decisions in a Legion program are:
What are the regions? How are the regions partitioned into subregions?
What are the tasks? How are the tasks decomposed into subtasks?
Often tension between the two These decisions drive the program’s design
December 4, 2014 26 http://legion.stanford.edu
Summary
The programmer Describes the structure of the program’s data
Regions The tasks that operate on that data
The Legion implementation
Guarantees tasks appear to execute in sequential order Ensures tasks have valid versions of their regions
December 4, 2014 27 http://legion.stanford.edu
Mapping Interface Programmer selects:
Where tasks run Where regions are placed
Mapping computed dynamically
Decouple correctness from performance
27
t1
t2
t3
t4t5
rc
rw
rw1 rw2
rn
rn1 rn2
$
$
$
$
N U M A
N U M A
FB
D R A M
x86
CUDA
x86
x86
x86
December 4, 2014 28 http://legion.stanford.edu
Mapping
Mapper interface = callback interface Legion runtime calls user-supplied methods Can do arbitrary computation to make decisions
But often very simple
December 4, 2014 29 http://legion.stanford.edu
The Crux, Revisited
Crucial design decisions in a Legion program are What are the regions? What are the tasks?
In particular, mapping decisions depend on design of the regions and tasks
Not the other way around
December 4, 2014 30 http://legion.stanford.edu
Debugging
December 4, 2014 31 http://legion.stanford.edu
LegionProf (Heptane 483)
4 AMD Interlagos
Integer cores for Legion Runtime
8 AMD Interlagos FP
cores for application
NVIDIA Kepler K20
Dynamic Analysis for (rhsf+2) Clean-up/meta tasks
December 4, 2014 32 http://legion.stanford.edu
Questions?