42
Spiral: an empirical search system for program generation and optimization David Padua Department of Computer Science University of Illinois at Urbana- Champaign

Spiral: an empirical search system for program generation and optimization

  • Upload
    sorcha

  • View
    30

  • Download
    3

Embed Size (px)

DESCRIPTION

Spiral: an empirical search system for program generation and optimization. David Padua Department of Computer Science University of Illinois at Urbana-Champaign. Program optimization today. The optimization phase of a compiler applies a series of transformations to achieve its objectives. - PowerPoint PPT Presentation

Citation preview

Page 1: Spiral: an empirical search system  for program generation and optimization

Spiral: an empirical search system for

program generation and optimization

David PaduaDepartment of Computer

ScienceUniversity of Illinois at Urbana-

Champaign 

Page 2: Spiral: an empirical search system  for program generation and optimization

2

Program optimization today

• The optimization phase of a compiler applies a series of transformations to achieve its objectives.

• The compiler uses the outcome of program analysis to determine which transformations are correctness-preserving.

• Compiler transformation and analysis techniques are reasonably well-understood.

• Since many of the compiler optimization problems have “exponential complexity”, heuristics are needed to drive the application of transformations.

Page 3: Spiral: an empirical search system  for program generation and optimization

3

Optimization drivers

• Developing driving heuristics is laborious.

• One reason for this is the lack of methodologies and tools to build optimization drivers.

• As a result, although there is much in common among compilers, their optimization phases are usually re-implemented from scratch.

Page 4: Spiral: an empirical search system  for program generation and optimization

4

Optimization drivers (Cont.)

• A consequence: Machines and languages not widely popular usually lack good compilers. (some popular systems too)– DSP, network processor, and embedded system

programming is often done in assembly language.

– Evaluation of new architectural features requiring compiler involvement is not always meaningful.

– Languages such as APL, MATLAB, LISP, … suffer from chronic low performance.

– New languages difficult to introduce (although compilers are only a part of the problem).

Page 5: Spiral: an empirical search system  for program generation and optimization

5

A methodology based on the notion of search space

• Program transformations often have several possible target versions.– Loop unrolling: How many times– Loop tiling: size of the tile.– Loop interchanging: order of loop headers– Register allocation: which registers are stored

in memory to give room for new values.

• The process of optimization can be seen as a search in the space of possible program versions.

Page 6: Spiral: an empirical search system  for program generation and optimization

6

Empirical searchIterative compilation

• Perhaps the simplest application of the search space model is empirical search where several versions are generated and executed on the target machine. The fastest version is selected.

T. Kisuki, P.M.W. Knijnenburg, M.F.P. O'Boyle, and H.A.G. Wijshoff . Iterative compilation in program optimization. In Proc. CPC2000, pages 35-44, 2000

Page 7: Spiral: an empirical search system  for program generation and optimization

7

Empirical search and traditional compilers

• Searching is not a new approach and compilers have applied it in the past, but using architectural prediction models instead of actual runs:– KAP searched for best loop header

order– SGI’s MIPS-pro and IBM PowerPC

compilers select the best degree of unrolling.

Page 8: Spiral: an empirical search system  for program generation and optimization

8

Limitations of empirical search

• Empirical search is conceptually simple and portable.

• However, – the search space tends to be too large specially

when several transformations are combined.– It is not clear how to apply this method when

program behavior is a function of the input data set.

• Need heuristics/search strategies.• Availability of performance “formulas” could

help evaluate transformations across input data sets and facilitate search.

Page 9: Spiral: an empirical search system  for program generation and optimization

9

Compilers and Library Generators

Source Program

Internal representation

Algorithm

Program Transformation

Program Generation

Page 10: Spiral: an empirical search system  for program generation and optimization

10

Empirical search in program/library

generators

• Examples:– FFTW [M. Frigo, S. Johnson]– Spiral (FFT/signal processing) [J. Moura (CMU),

M. Veloso (CMU), J. Johnson (Drexel), …]– ATLAS (linear algebra)(R. Whaley, A. Petitet, J.

Dongarra)– PHiPAC[J. Demmel et al]

Page 11: Spiral: an empirical search system  for program generation and optimization

11

Page 12: Spiral: an empirical search system  for program generation and optimization

12

SPIRAL

• The approach:– Mathematical formulation of signal

processing algorithms– Automatically generate algorithm versions– A generalization of the well-known FFTW– Use compiler technique to translate

formulas into implementations– Adapt to the target platform by searching

for the optimal version

Page 13: Spiral: an empirical search system  for program generation and optimization

13

Page 14: Spiral: an empirical search system  for program generation and optimization

14

Fast DSP Algorithms As Matrix Factorizations

• Computing y = F4 x is carried out as:

t1 = A4 x ( permutation )

t2 = A3 t1 ( two F2’s )

t3 = A2 t2 ( diagonal scaling )

y = A1 t3 ( two F2’s )• The cost is reduced because A1, A2, A3

and A4 are structured sparse matrices.

Page 15: Spiral: an empirical search system  for program generation and optimization

15

Tensor Product Formulation of Cooley-

TuckeyTheorem

Example

rsrsr

rsssrrs LFITIFF )()(

is a diagonal matrixis a stride permutation

rssT

rsrL

1000

0010

0100

0001

1100

1100

0011

0011

1000

0100

0010

0001

1010

0101

1010

0101

)()( 4222

44224 LFITIFF

Page 16: Spiral: an empirical search system  for program generation and optimization

16

Formulas for Matrix Factorizations

4222

42224 )LF(I)TI(FF

rsrsr

rsssrrs )LF(I)TI(FF

R1

1

ki

nnnn

k

1i

nnnnnnnn )L(I)T)(IIF(IF ii

ii

ii

iiiii

where n = n1…nk, ni- = n1…ni-1, ni+= ni+1…nk

R2

Page 17: Spiral: an empirical search system  for program generation and optimization

17

Factorization Trees

F2

F2 F2

F8 : R1

F4 : R1F2

F2 F2

F8 : R1

F4 : R1

F2 F2 F2

F8 : R2

Different computation orderDifferent data access

patternDifferent performance

Page 18: Spiral: an empirical search system  for program generation and optimization

18

Walsh-Hadamard Transform

Page 19: Spiral: an empirical search system  for program generation and optimization

19

Optimal Factorization Trees

• Depend on the platform• Difficult to deduct• Can be found by empirical search

– The search space is very large– Different search algorithms

• Random, DP, GA, hill-climbing, exhaustive

Page 20: Spiral: an empirical search system  for program generation and optimization

20

Page 21: Spiral: an empirical search system  for program generation and optimization

21

Page 22: Spiral: an empirical search system  for program generation and optimization

22

Size of Search Space

N # of formulas N # of formulas

21 1 29 20793

22 1 210 103049

23 3 211 518859

24 11 212 2646723

25 45 213 13649969

26 197 214 71039373

27 903 215 372693519

28 4279 216 1968801519

Page 23: Spiral: an empirical search system  for program generation and optimization

23

Page 24: Spiral: an empirical search system  for program generation and optimization

24

Page 25: Spiral: an empirical search system  for program generation and optimization

25

More Search Choices

• Programming:– Loop unrolling– Memory allocation– In-lining

• Platform choices:– Compiler optimization options

Page 26: Spiral: an empirical search system  for program generation and optimization

26

The SPIRAL System

Formula Generator

SPL Compiler

Performance Evaluation

Search Engine

DSP Transform

Target machine DSP Library

SPL Program

C/FORTRAN Programs

Page 27: Spiral: an empirical search system  for program generation and optimization

27

Spiral

• Spiral does the factorization at installation time and generates one library routine for each size.

• FFTW only generates codelets (input size 64) and at run time performs the factorization.

Page 28: Spiral: an empirical search system  for program generation and optimization

28

A Simple SPL Program

Definition DirectiveFormula Comment

; This is a simple SPL program(define A (matrix(1 2)(2 1)))(define B (diagonal(3 3))#subname simple(tensor (I 2)(compose A B));; This is an invisible comment

Page 29: Spiral: an empirical search system  for program generation and optimization

29

Templates

(template (F n)[ n >= 1 ] ( do i=0,n-1 y(i)=0 do j=0,n-1 y(i)=y(i)+W(n,i*j)*x(j) end end ))

Pattern

I-code

Condition

Page 30: Spiral: an empirical search system  for program generation and optimization

30

SPL Compiler

Parsing

Intermediate Code Generation

Intermediate Code Restructuring

Target Code Generation

Abstract Syntax Tree

I-Code

I-Code

FORTRAN, C

Template Table

SPL Formula Template Definition

OptimizationI-Code

Page 31: Spiral: an empirical search system  for program generation and optimization

31

Intermediate Code Restructuring

• Loop unrolling– Degree of unrolling can be controlled globally

or case by case

• Scalar function evaluation– Replace scalar functions with constant value

or array access

• Type conversion– Type of input data: real or complex– Type of arithmetic: real or complex– Same SPL formula, different C/Fortran

programs

Page 32: Spiral: an empirical search system  for program generation and optimization

32

Page 33: Spiral: an empirical search system  for program generation and optimization

33

Optimizations

SPL Compiler

C/Fortran Compiler

Formula Generator* High-level scheduling* Loop transformation

* High-level optimizations- Constant folding- Copy propagation- CSE- Dead code elimination

* Low-level optimizations- Instruction scheduling- Register allocation

Page 34: Spiral: an empirical search system  for program generation and optimization

34

Basic Optimizations (FFT, N=25, SPARC, f77 –fast –O5)

Page 35: Spiral: an empirical search system  for program generation and optimization

35

Basic Optimizations(FFT, N=25, MIPS, f77 –O3)

Page 36: Spiral: an empirical search system  for program generation and optimization

36

Basic Optimizations(FFT, N=25, PII, g77 –O6 –malign-double)

Page 37: Spiral: an empirical search system  for program generation and optimization

37

Performance Evaluation

• Evaluation the performance of the code generated by the SPL compiler

• Platforms: SPARC, MIPS, PII• Search strategy: dynamic

programming

Page 38: Spiral: an empirical search system  for program generation and optimization

38

Pseudo MFlops

• Estimation of the # of FP operations:– FFT (radix-2): 5nlog2n – 10 + 16

s)( timeExecution

algorithm in the operations FP of #MFlops Pseudo

Page 39: Spiral: an empirical search system  for program generation and optimization

39

FFT Performance (N=21 to 26)

SPARC MIPS

PII

Page 40: Spiral: an empirical search system  for program generation and optimization

40

FFT Performance (N=27 to 220)

SPARC MIPS

PII

Page 41: Spiral: an empirical search system  for program generation and optimization

41

Important Questions

• What lessons can be learned from this work?

• Can this approach be used in other domains ?

Page 42: Spiral: an empirical search system  for program generation and optimization

42