36
December 8, 2015 1 http://legion.stanford.edu Elliott Slaughter Regent

Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 1 http://legion.stanford.edu

Elliott Slaughter

Regent

Page 2: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 2 http://legion.stanford.edu

Regent

 A language for the Legion programming model   Implicit parallelism, sequential semantics  Tasks + automatic discovery of dependences  Automatic data movement

A(r) for i = 0, 3 do B(p[i]) end C(r)

A

B

C

B B

Page 3: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 3 http://legion.stanford.edu

Regent vs Legion API

A(r) for i = 0, 3 do B(p[i]) end C(r)  Regent simplifies Legion prog. model  Regent achieves performance identical to hand-tuned Legion

runtime->unmap_region(ctx, physical_r); TaskLauncher launcher_A(TASK_A, TaskArgument()); launcher_A.add_region_requirement( RegionRequirement(r, READ_WRITE, EXCLUSIVE, r)); launcher_A.add_field(0, FIELD_X); launcher_A.add_field(0, FIELD_Y); runtime->execute_task(ctx, launcher_A); Domain domain = Domain::from_rect<1>( Rect<1>(Point<1>(0), Point<1>(2))); IndexLauncher launcher_B(TASK_B, domain, TaskArgument(), ArgumentMap()); launcher_B.add_region_requirement( RegionRequirement(p, 0 /* projection */, READ_WRITE, EXCLUSIVE, r)); launcher_B.add_field(0, FIELD_X); runtime->execute_index_space(ctx, launcher_B); TaskLauncher launcher_C(TASK_A, TaskArgument()); launcher_C.add_region_requirement( RegionRequirement(r, READ_ONLY, EXCLUSIVE, r)); launcher_C.add_field(0, FIELD_X); launcher_C.add_field(0, FIELD_Y); runtime->execute_task(ctx, launcher_C); runtime->map_region(ctx, physical_r);

Page 4: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 4 http://legion.stanford.edu

Pushing the Performance Envelope with Compilation

Task Granularity

Fine-Grained Coarse-Grained

Scal

e

Smal

l La

rge

Dynamic Analysis

Static Analysis

Page 5: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 5 http://legion.stanford.edu

task A(r : region(…)) where writes(r.{x, y}) do … end task B(r : region(…)) where reads writes(r.x) do … end task C(r : region(…)) where reads(r.{x, y}) do … end

Data Model

task main() var r = region(…) var p = partition(equal, r, …) A(r) for i = 0, 3 do B(p[i]) end C(r) end

fields

keys

Page 6: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 6 http://legion.stanford.edu

Execution Model

A

B

C

B

var r = region(…) var p = partition(disjoint, r, …)

time

runtime thread

app thread

app thread

app thread

app thread

A(r) for i = 0, 3 do B(p[i]) end C(r)

A(r)

B(p[i])

C(r)

B

Page 7: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 7 http://legion.stanford.edu

Regions

fspace point { x : int, y : int, z : int } fspace node(list : region(node)) { idx : int2d, next : ptr(node(list), list), } task main() var bag = ispace(ptr, 28) var grid = ispace(int2d, {x = 4, y = 7}) var points = region(grid, point) var list = region(bag, node(list)) …

Page 8: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 8 http://legion.stanford.edu

Fills and Copies

task main() var grid, points, list = … fill(points.{x, y, z}, 0) copy(points.{x, y}, list.idx.{x, y}) …

Page 9: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 9 http://legion.stanford.edu

Tasks

task init_pointers(grid : ispace(int2d), points : region(grid, point), list : region(node(list))) where reads(points), reads writes(list.{idx, next}) do … end task main() var grid, points, list = … init_pointers(grid, points, list) …

Page 10: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 10 http://legion.stanford.edu

Control

task main() var grid, points, list = … if c1 then … elseif c2 then … else … end while c do … end for idx = 0, n do … end for idx in grid do … end for elt in list do … end …

Page 11: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 11 http://legion.stanford.edu

Pointers

task main() var grid, points, list = … var last = null(ptr(node(list), list)) for idx in grid do var elt = new(ptr(node(list), list)) elt.next = last last = elt elt.point = idx points[idx].{x, y, z} += 1 end …

Page 12: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 12 http://legion.stanford.edu

Vectorization

task inc(grid : ispace(int2d), points : region(grid, point), list : region(node(list))) where reads(list), reduces+(points) do __demand(__vectorize) for elt in list do points[elt.idx].{x, y, z} += 1 end end

Page 13: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 13 http://legion.stanford.edu

CUDA

__demand(__cuda) task inc(grid : ispace(int2d), points : region(grid, point), list : region(node(list))) where reads(list), reduces+(points) do for elt in list do points[elt.idx].{x, y, z} += 1 end end

Page 14: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 14 http://legion.stanford.edu

C Functions

local cstdio = terralib.includec(“stdio.h”) local cmath = terralib.includec(“math.h”) task main() cstdio.printf(“Hello, %f\n”, cmath.sin(1.0)) …

Page 15: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 15 http://legion.stanford.edu

Legion Interop

terralib.linklibrary(“my.so”) local my = terralib.includec(“my.h”) task main() my.legion_task(__runtime(), __context()) …

Page 16: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 16 http://legion.stanford.edu

Metaprogramming

function make_inc(t, v) local task inc(r : region(t)) where reads writes(r) do for x in r do x += v end end return inc end local inc1 = make_inc(int, 1) task main() var r = … inc1(r) …

Page 17: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 17 http://legion.stanford.edu

Optimization: Index Launches (Before)

var r = region(…) var p = partition(disjoint, r, …)

time

runtime thread

app thread

app thread

app thread

app thread

A(r) for i = 0, 3 do B(p[i]) end C(r)

Page 18: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 18 http://legion.stanford.edu

Optimization: Index Launches (After)

var r = region(…) var p = partition(disjoint, r, …)

time

runtime thread

app thread

app thread

app thread

app thread

A(r) for i = 0, 3: B(p[i]) C(r)

Page 19: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 19 http://legion.stanford.edu

Optimization: Leaf Tasks (Before)

var r = region(…) var p = partition(disjoint, r, …)

time

runtime thread

app thread

app thread

app thread

app thread

A(r) for i = 0, 3 do B(p[i]) end C(r)

A(r)

B(p[i])

how many subtasks?

don’t know until here

Page 20: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 20 http://legion.stanford.edu

Optimization: Leaf Tasks (After)

var r = region(…) var p = partition(disjoint, r, …)

time

runtime thread

leaf

app thread

leaf

leaf

leaf

leaf app thread

app thread

app thread

A(r) for i = 0, 3 do B(p[i]) end C(r)

B(p[i])

C(r)

Page 21: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 21 http://legion.stanford.edu

Optimization: Mapping (Before)

var r = region(…) var p = partition(disjoint, r, …)

time

runtime thread

app thread

app thread

app thread

app thread

A(r) for i = 0, 3 do B(p[i]) end C(r)

A(r)

reads r

writes r data race!

concurrent

Page 22: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 22 http://legion.stanford.edu

Optimization: Mapping (Runtime)

unmap(r)

time

runtime thread

app thread

app thread

app thread

app thread

A(r)

unmap(r) B(p[i]) map(r) -- blocks

A(r)

B(p[i])

map(r) -- blocks for i = 0, 3 do

end

unmap(r)

map(r) -- blocks

unmap(r)

map(r) -- blocks

unmap map (blocks)

Page 23: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 23 http://legion.stanford.edu

Optimization: Mapping (Compiler)

unmap(r) A(r)

B(p[i]) for i = 0, 3 do

end

unmap(r)

map(r) -- blocks

runtime thread

app thread

app thread

app thread

app thread

C(r)

Page 24: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 24 http://legion.stanford.edu

Other Optimizations

 Futures  Pointer Check Elision  Dynamic Branch Elision Vectorization  CUDA Kernel Generation

Page 25: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 25 http://legion.stanford.edu

Work In Progress: Static Dependences

A

B

C

B

var r = region(…) var p = partition(disjoint, r, …)

time

runtime thread

app thread

app thread

app thread

app thread

A(r) for i = 0, 3 do B(p[i]) end C(r)

B

Page 26: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 26 http://legion.stanford.edu

Work In Progress: Static Dependences

A

B

C

B

var r = region(…) var p = partition(disjoint, r, …)

time

runtime thread

app thread

app thread

app thread

app thread

A(r) for i = 0, 3 do B(p[i]) end C(r)

B

Page 27: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 27 http://legion.stanford.edu

Work In Progress: SPMD time

runtime thread

app thread

app thread

app thread

runtime thread

app thread

app thread

runtime thread

app thread

launch delay analysis cost

node 0

node 1

node 2

node 3 app thread

runtime thread

app thread

Page 28: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 28 http://legion.stanford.edu

Work In Progress: SPMD time

runtime thread

app thread

app thread

app thread

runtime thread

app thread

app thread

runtime thread

app thread

node 0

node 1

node 2

node 3 app thread

runtime thread

app thread

phase barriers

Page 29: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 29 http://legion.stanford.edu

Pushing the Performance Envelope with Compilation

Task Granularity

Fine-Grained Coarse-Grained

Scal

e

Smal

l La

rge

Dynamic Analysis

Static Analysis

Page 30: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 30 http://legion.stanford.edu

Questions?

Page 31: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 31 http://legion.stanford.edu

Lines of Code

Circuit PENNANT MiniAero

0

1,000

2,000

3,000

4,000

825

1,789

2,836

1,701

2,416

3,993Non-comment,non-blanklinesofcode

Regent

Reference

Page 32: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 32 http://legion.stanford.edu

Circuit: Absolute Performance

1 10 20 30 40 50 60 70 80

0

200

400

600

800

Total CPUs

GFLOPS

RegentLegion

Page 33: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 33 http://legion.stanford.edu

PENNANT: Absolute Performance

1 4 8 12 16 20 24

1

2

3

4

Total CPUs

Zon

esper

second(inmillion

s)

RegentOpenMP

Page 34: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 34 http://legion.stanford.edu

MiniAero: Absolute Performance

Single Node Multiple Nodes

1 2 4 8

0.0

0.2

0.4

0.6

Total CPUs

Cellspersecond(inmillions)

Regent

MPI+Kokkos

1 2 4

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Total Nodes (8 CPUs per Node)

Cellspersecond(inmillions)

Regent

MPI+Kokkos

Page 35: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 35 http://legion.stanford.edu

Impact of Optimizations non

e

all-idx-m

ap

all-idx-leaf

all-idx

all-fut

all-map

all-leaf

all-vec

all

0

20

40

60

80

100

GFLOPS

Circuit MiniAero

none

all-idx-map

all-idx-leaf

all-idx

all-fut

all-map

all-leaf

all-vec

all

0

0.2

0.4

0.6

Cellspersecond(inmillions)

individual optimizations disabled pairs of optimizations disabled

best single-threaded

performance

Page 36: Regent - Legion Programming System · Regent A language for the Legion programming model Implicit parallelism, sequential semantics Tasks + automatic discovery of dependences Automatic

December 8, 2015 36 http://legion.stanford.edu

Impact of Optimizations PENNANT

none

all-idx-map

all-idx-leaf

all-idx

all-fut

all-map

all-leaf

all-vec

all-dbr

all

0

1

2

3

4

Zonespersecond(inmillions)