41
Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya Mangpo Phothilimthana, Tikhon Jelvis, Rohin Shah, Nishant Totla, Sarah Chasins, and Ras Bodik

Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Synthesis-Aided Compiler for

Low-Power Spatial Architectures

Phitchaya Mangpo Phothilimthana,

Tikhon Jelvis, Rohin Shah, Nishant Totla, Sarah Chasins, and Ras Bodik

Page 2: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

2

Page 3: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

no cache

limited interconnect

spatial & temporal partitioning

unusual ISA small memory

narrow bitwidth

3

Page 4: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

no cache

limited interconnect

spatial & temporal partitioning

unusual ISA

small memory

narrow bitwidth

4

Page 5: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

5

no cache

limited interconnect

spatial & temporal partitioning

unusual ISA

small memory

narrow bitwidth

Need a new way of building a compiler!

Page 6: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

6

Synthesis-Aided Compiler

Page 7: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Classical Synthesis-Aided Approach Apply heuristic

transformations Find best program in defined search space

Required Components

• Transformations • Legality analysis • Heuristics

• Defined search space • Equivalence checker • Abstract cost function

Output’s Performance

Depends on heuristic quality Optimal in defined search space

Building Effort High Low

7

Classical vs. Synthesis Compiler

Page 8: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Case study: GreenArrays Spatial Processor

Specs

• Stack-based 18-bit architecture

• 144 tiled cores

• Limited communication (neighbors only)

• No cache, no shared memory

• < 300 bytes of memory per core

• 32 instructions

Example challenges of programming spatial architectures like GA144: • Bitwidth slicing: Represent 32-bit

numbers by two 18-bit words

• Function partitioning: Break functions into a pipeline with just a few operations per core.

GA144 is 11x faster and simultaneously 9x more energy efficient than TI MSP 430.

On FIR benchmark, [Avizienis, Ljung]

8

Page 9: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Spatial programming model – Flexible control over partitioning

Low-effort approach to compiler construction – Solved a compilation problem as a synthesis problem – To scale synthesis, decomposed it into subproblems

Empirical evaluation – Easy-to-build compiler architecture – Performance within 2x of expert-written code

Our Contributions

9

Page 10: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Compiler Workflow

Partitioner

Code Generator

Spatial Program

Program + partition annotation (logical cores)

Per-core optimized machine code

Layout

Code Separator

Program + location annotation & routing info (physical cores)

Per-core code with communication code

Quadratic Assignment Problem

a traditional transformation

constraint solving minimizing # of msgs

superoptimization

10

Page 11: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Partitioner

Code Generator

Spatial Program

Layout

Code Separator

Spatial programming model

11

Program + partition annotation (logical cores)

Per-core optimized machine code

Program + location annotation & routing info (physical cores)

Per-core code with communication code

Page 12: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

int a, b;

int ans = a * b;

Spatial programming model

12

1. Write algorithm in high-level language without dealing with low-level details

2. Have control over partitioning of data and computation if desired

How does one want to program a spatial architecture?

Page 13: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

int@1 a, b;

int@3 ans = a *@2 b;

Spatial programming model

Partition Type pins data and operators to specific partitions (logical cores)

1 2 3

a b

ans

13

*

Similar to [Chandra et al. PPoPP’08]

Page 14: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

int@1 a, b;

int@3 ans = a *@2 b;

Spatial programming model

1 2 3

* a b

ans

14

Do not need to handle data routing and communication code

Page 15: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Distributed Partition Type

int@6 k[10];

int@{[0:5]=6, [5:10]=7} k[10];

int::2@{[0:10]=(6,7)} k[10];

6

6 7

6 7

15

Page 16: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Unspecified Partitions

How to compile a partially annotated program?

int a, b;

int@3 ans = a * b;

16

Page 17: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Unspecified Partitions

How to compile a partially annotated program?

int@?? a, b;

int@3 ans = a *@?? b;

17

Page 18: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Partitioning Synthesizer

Partitioner

Code Generator

Layout

Code Separator

18

Program

Program + partition annotation (logical cores)

Per-core optimized machine code

Program + location annotation & routing info (physical cores)

Per-core code with communication code

constraint solving to minimize # of msgs

Page 19: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Idea: infer partition types subject to

• Communication count constraint # of messages is minimized

• Space constraint code and data fit in each core

How: use Rosette (by Emina Torlak, Session 9A)

• Implement a type checker

• Get type inference for free

How Does Partitioning Synthesizer Work?

19

Page 20: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Layout Synthesizer

Partitioner

Code Generator

Layout

Code Separator

Quadratic Assignment Problem • Simulated annealing

20

Program

Program + partition annotation (logical cores)

Per-core optimized machine code

Program + location annotation & routing info (physical cores)

Per-core code with communication code

Page 21: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Code Separator

Partitioner

Code Generator

Layout

Code Separator a traditional transformation

21

Program

Program + partition annotation (logical cores)

Per-core optimized machine code

Program + location annotation & routing info (physical cores)

Per-core code with communication code

Page 22: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Code Generator

Partitioner

Code Generator

Layout

Code Separator

superoptimization

22

Program

Program + partition annotation (logical cores)

Per-core optimized machine code

Program + location annotation & routing info (physical cores)

Per-core code with communication code

Page 23: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Code Generator

Naïve Code Gen

A

B

C D

E

Super- optimizer

23

Classical compiler backend

Our compiler backend

IR1 IR2 IR3 Optimized machine

code

Optimal program = minimum cost

Optimizing Code Gen

Super- optimized

machine code

Machine code

[Massalin et al. ASPLOS’87, Bansal et al. ASPLOS’06, Gulwani et al. PLDI’11, …]

Page 24: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Superoptimizer

Minimize Cost

CEGIS

Superoptimizer

• Counter-Example-Guided Inductive Synthesis (CEGIS) • encode program as SMT formula • solve using Z3

• Minimizing one of: • Execution time • Energy consumption • Program length 24

Program P Program P’

Page 25: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Problem with Superoptimizers

• Synthesizing the entire program is not scalable.

– Start-of-the-art synthesizers can generate up to 25

instructions [Schkufza et al. ASPLOS13, Gulwani et al. PLDI’11].

• Must decompose the superoptimization.

25

Page 26: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Modular Superoptimizer

Naïve Code Gen

A

an instruction

B

C D

E

loop

if

i

ii

iii

iv

v

vi

vii

viii

ix

Basic block A

Page 27: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Modular Superoptimizer

Naïve Code Gen

A segment <= 16 instructions

Basic block A

B

C D

E

loop

if

i

ii

iii

iv

v

vi

vii

viii

ix

Page 28: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Naïve Way to Decompose

block A

i

ii

iii

iv

v

vi

vii

viii

ix

Minimize Running Time

CEGIS

Superoptimizer

Fixed Windows

i’

segment i

segment j

segment k

j’

k’

Timeout i

28

block A’

Page 29: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

A Better Way

block A

i

ii

iii

iv

v

vi

vii

viii

ix

slid

ing

win

do

w

Timeout

Minimize Running Time

CEGIS

Superoptimizer

Sliding Window

29

block A’

Page 30: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

block A

i

ii

iii

iv

v

vi

vii

viii

ix

slid

ing

win

do

w

Minimize Running Time

CEGIS

Superoptimizer

No segment

with lower cost

i

block A’

A Better Way

Sliding Window

30

Page 31: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

block A

i

ii

iii

iv

v

vi

vii

viii

ix

slid

ing

win

do

w

Minimize Running Time

CEGIS

Superoptimizer

Find segment

with lower cost

i

block A’

j’ segment j

A Better Way

Sliding Window

31

Page 32: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

block A

i

ii

iii

iv

v

vi

vii

viii

ix

slid

ing

win

do

w

Minimize Running Time

CEGIS

Superoptimizer

i

block A’

j’ segment k

k’

A Better Way

Sliding Window

32

Page 33: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Address Space Compression

Trick to speed up synthesis time

Program P with memory x y

array z (10 entries)

Program Pc with memory x y

array z (2 entries)

Superoptimizer

compress

Program P’c with memory x y

array z (2 entries)

Program P’ with memory x y

array z (10 entries) decompress

Then verify if P ≡ P’

33

Page 34: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Empirical Evaluation

Hypothesis 1 Synthesis generates faster code than a heuristic compiler.

Synthesizing partitioner vs. Heuristic partitioner a greedy algorithm

34

Page 35: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Empirical Evaluation

Hypothesis 1 Synthesis generates faster code than a heuristic compiler.

Synthesizing partitioner vs. Heuristic partitioner Precise layout vs. Less precise layout

assumes each message is sent once

35

Page 36: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Empirical Evaluation

Hypothesis 1 Synthesis generates faster code than a heuristic compiler.

Synthesizing partitioner vs. Heuristic partitioner Precise layout vs. Less precise layout Superoptimizer vs. No superoptimizer Sliding window vs. Fixed window

36

Page 37: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Empirical Evaluation

Hypothesis 1 Synthesis generates faster code than a heuristic compiler.

+ precise layout up to 1.8x speedup + synthesizing partitioner 5% speedup + superoptimizer 15% speedup (4% from sliding window) 37

+ sliding window + superoptimizer + synthesizing partitioner + precise layout all heuristic (or less optimal)

better

Page 38: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Empirical Evaluation

Hypothesis 2 Our compiler produces code comparable to the expert’s code.

On MD5 benchmark, the expert uses many advanced tricks: • 10 cores • Self-modifying code • Circular array data structure • Different modes of operations for different cores

• Instruction fetch from local memory • Instruction fetch from neighbors

We define success to be within 2x of the expert’s code.

38

Page 39: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Empirical Evaluation

On 4 smaller benchmarks, Chlorophyll was on average • 46% slower • 44% less energy-efficient On a larger benchmark (MD5), Chlorophyll was • 65% slower • 69% less energy-efficient

39

Hypothesis 2 Our compiler produces code comparable to the expert’s code.

Page 40: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Empirical Evaluation

Hypothesis 3 Chlorophyll increases programmer productivity and offers the ability to explore different implementations quickly.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

intern FIR-1 FIR-2 FIR-4

Execution Time (Normalized)

40 1 summer 2 hours

Page 41: Synthesis-Aided Compiler for Low-Power Spatial Architecturespl.eecs.berkeley.edu/projects/chlorophyll/talks/... · Synthesis-Aided Compiler for Low-Power Spatial Architectures Phitchaya

Using program synthesis as a core compiler building block enables us to build a new compiler with low effort that still produces efficient code.

41 Thank you