46
NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also thanks to J. Birch, Y. Shou, and K. Ga

NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

Embed Size (px)

Citation preview

Page 1: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 1

Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization

Robert van Engelen

Florida State UniversityAlso thanks to J. Birch, Y. Shou, and K. Gallivan

Page 2: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 2

Outline

Motivation Restructuring compilers Chains of recurrences algebra and associated

algorithms for the GCC and Polaris compilers Nonlinear array dependence testing for loop

restructuring and vectorization Experimental results Conclusions

Page 3: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 3

Motivation

Intel CTO: “the increased power requirements of newer chips will lead to CPUs that are hotter than the surface of the sun by 2010”

Enter multi-core CPUs Increase the overall system speed by adding CPU cores Speed up multi-threaded applications Can effectively lower the power consumption

Enter (more?) multi-media extensions Vector-like instruction sets: MMX, SSE, AltiVec Speed up multi-media codes, such as JPEG, MPEG

Page 4: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 4

Code Optimization by Hand or Automatic? Rewriting applications by hand to exploit parallelism is

doable, if: Tasks can be identified that run independently, such as a Web

browser’s rendering and communications tasks Course-grain parallelism: tasks must have sufficient work

Rewriting applications by hand to exploit lots of fine-grain parallelism is not doable Thousands of read-after-write (RAW), write-after-read (WAR),

and write-after-write (WAW), data dependences must be analyzed

Page 5: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 5

Restructuring Compilers

A restructuring compiler typically applies source-code transformations automatically to meet various performance enhancement criteria: Exploit parallelism in loops by reordering the loop structure to

run loop iterations in parallel Find small loops to replace with vector instructions Optimize data locality by reordering code to change memory

access order and cache

All code changes are safe as long as RAW, WAR, and WAW data dependences are preserved!

Page 6: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 6

Example: Loop Fission

Loop fission splits a single loop into multiple loops Allows vectorization and

parallelization of the new loops when original loop was sequential

Loop fission must preserve all dependence relations of the original loop

S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDOS6 ENDDO

S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)Sx ENDDOSy DO J = 1, 10S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDOS6 ENDDO

S1 PARALLEL DO I = 1, 10S3 A(I,1:10)=B(I,1:10)+C(I,1:10)S4 D(I,1:10)=A(I,0:9) * 2.0S6 ENDDO

S3 (=,<) S4

S3 (=,<) S4

S3 (=,<) S4

Page 7: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 7

Loop Fission: Algorithm

Compute the acyclic condensation of the dependence graph to find a legal order of the loops

S1 DO I = 1, 10S2 A(I) = A(I) + B(I-1)S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)S5 D(I) = sqrt(C(I))S6 ENDDO

S2

S3

S4

S5

0

01

1

Dependence graph

S2 S5

S3 S4

Acyclic condensation

S1 DO I = 1, 10S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)Sx ENDDOS2 A(1:10) = A(1:10) + B(0:9)S5 D(1:10) = sqrt(C(1:10))

S3 (<) S2

S4 (<) S3

S3 (=) S4

S4 (=) S5

Page 8: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 8

Example: Loop Interchange

Changes the loop nesting order Allows vectorization of an

outer loop and more effective parallelization of an inner loop

Can be used to improve spatial locality

Loop interchange must preserve all dependence relations of the original loop

S1 DO I = 1, NS2 DO J = 1, MS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO

S2 DO J = 1, MS1 DO I = 1, NS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO

S2 DO J = 1, MS3 A(1:N,J)=A(1:N,J-1)+B(1:N,J)S5 ENDDO

S3 (=,<) S3

S3 (<,=) S3

S3 (<,=) S3

Page 9: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 9

Loop Interchange: Algorithm

Compute the direction matrix and find which columns (and therefore which loops) can be permuted without violating dependence relations in the original loop nest

S1 DO I = 1, NS2 DO J = 1, MS3 DO K = 1, LS4 A(I+1,J+1,K) = A(I,J,K) + A(I,J+1,K+1)S5 ENDDOS6 ENDDOS7 ENDDO

S4 (<,<,=) S4

S4 (<,=,>) S4

< < =< = >

Direction matrix

< = <= > <

< < =< = >

Invalid

< < == < >

< < =< = >

Valid

Page 10: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 10

Complications

Loop restructuring is complicated by: The presence of several induction variables Nonlinear and symbolic array index expressions The use of pointer arithmetic instead of arrays in C Non-unit loop strides and unstructured loops Control flow

Need loop normalization and preprocessing Apply induction variable substitution Convert pointer dereferences to array accesses Normalize the loop iteration space

Page 11: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 11

Induction Variable Substitution

Example loop After IV substitution (IVS) (note the affine indexes)

After parallelization

I = 0 J = 1 while (I<N) I = I+1 … = A[J] J = J+2 K = 2*I A[K] = … endwhile

for i=0 to N-1 S1: … = A[2*i+1] S2: A[2*i+2] = … endfor

forall (i=0,N-1) … = A[2*i+1] A[2*i+2] = … endforall

GCD test to solve dependence equation 2id - 2iu = -1Since 2 does not divide 1 there is no data dependence.

W R W R W R

A[2*i+1]

A[2*i+2]

A[]

Dep testIVS

Page 12: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 12

IV Recognitionon SSA Forms

I1 = 3M1 = 0do I2 = (I1,I3) J1 = (?,J3) K1 = (?,K2) L1 = (?,L2) M2 = (M1,M3) J2 = 3 I3 = I2+1 L2 = M2+1 M3 = L2+2 J3 = I3+J2

K2 = 2*J3

while (…)

I2(i) = 3+i J1(i) = 7+iL2(i) = 1+3i K1(i) = 14+2iM2(i) = 3i

Spanningtree

[Cytron91, Wolfe92]

Page 13: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 13

Symbolic Differencingdo x = x+z y = z+1 z = y+1while (…)

Iteration x y z

1 x+z diff z+1 diff z diff

2 x+2z+2 z+2 diff z+3 2 z+2 2

3 x+3z+6 z+4 2 z+5 2 z+4 2

Use abstract interpretation to evaluate loop iterations and construct symbolic difference table of the IV values

x(i) = x0 + z0i + (i2-i) y(i) = z0 + 2i + 1 z(i) = z0 + 2i

[Haghighat95]

Page 14: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 14

Pointer-to-Array Conversion

f += 2;lsp += 2;for (i = 2; i <= 5; i++){ *f = f[-2]; for (j = 1; j < i; j++, f--) *f += f[-2]-2*(*lsp)*f[-1]; *f -= 2*(*lsp); f += i; lsp += 2;}

Lsp_az speech codec segmentfrom ETSI with pointer updates.

for (i = 0; i <= 3; i++){ f[i+2] = f[i]; for (j = 0; j <= i; j++) f[i-j+2] += f[i-j]- 2*lsp[2*i+2]*f[i-j+1]; f[1] -= 2*lsp[2*i+2];}

Lsp_az speech codec segmentafter pointer-to-array conversion.

Note that all array indexexpressions are affine.

[vanEngelen01, Franke01]

Page 15: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 15

Control-Flow Issues

Conditional array accesses and conditionally updated induction variables present problems:

do { K = 3; K = K+J; if (…) J = K; else J = J+3; A[J] = …} while (J<N)

DO I=1,10 IF … J = J+2 ELSE J = I ENDIF A(J) = …ENDDO

for (…) { if (…) A[I] = … else … = A[J]

}

Assume RAW andWAR dependences

Extensive analysisreveals that J:=J+3

Problem: J has nosingle recurrence form

Page 16: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 16

Chains of Recurrences for Compiler Optimization

Chains of recurrence forms and algebra can be used to: Detect (non)linear coupled IVs Analyze pointer arithmetic Effectively handle control flow Implement array dependence testing

Page 17: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 17

Chains of Recurrences

A chain of recurrences (CR) represents a polynomial or exponential function or mix evaluated over a unit-distance grid [Zima92]

Basic form: {init, , stride}

Iteration {init, , stride} f(i) = 2i+1 = {1,+,2} f(i) = 2i = {1,*,2}

i = 0 init 1 1

i = 1 init stride 3 2

i = 2 init stride stride 5 4

i = 3 init stride stride stride 7 8

Page 18: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 18

Chains of Recurrences:General Formulation The key idea is to represent a non-constant CR stride in

CR form itself, thereby forming a chain of recurrences

Example: f(i) = i2 = {0, +, s(i-1)} = {0, +, 1, +, 2} where s(i-1) = {1, +, 2}

Iteration {init, , s(i-1)} s(i) = {1, +, 2} f(i) = {0, +, s(i-1)}

i = 0 init 1 0

i = 1 init s(0) 3 1

i = 2 init s(0) s(1) 5 4

i = 3 init s(0) s(1) s(2) 7 9

Page 19: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 19

CRs for Expediting Function Evaluations on Grids Suppose f(i) = a + b·i + c·i2 = {a, +, {b+c, +, 2c}} We have two IVs x and y:

f(i) = x = {x0, +, y} with x0 = as(i) = y = {y0, +, 2c} with y0 = b+c

Implement loop to update x and y for efficient evaluation of f(i) over a unit-distance grid i = 0, …, n :

x = ay = b+cfor i=0 to n f[i] = x x = x+y y = y+2*cendfor

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10

Iteration

s(i)

Page 20: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 20

Multi-Dimensional Example

Let f(i,j) = i2 + i·j + 1

1. Create IV k for f(i,j) in j-loop:f(i,j) = kj = {pi, +, ri}j with pi = i2 + 1 and ri = i

2. Create IVs for pi and ri in i-loop:pi = {p0, +, qi}i with p0 = 1qi = {q0, +, 2}i with q0 = 1ri = {r0, +, 1}i with r0 = 0

3. Implement k, p, q, and r ini-j-loop nest

p = 1q = 1r = 0for i = 0 to n k = p for j = 0 to m f[i,j] = k k = k+r endfor p = p+q q = q+2 r = r+1endfor

Page 21: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 21

CR Construction with the CR Algebra To construct the CR form of a symbolic function f(i):

1. Replace i with CR {0,+,1}2. Apply CR algebra rewrite rules (selected rules shown):

Example:f(i) = c·(i+a) = c·({0, +, 1}+a) = c{a, +, 1} = {c·a, +, c}

{x, +, y} + c {x+c, +, y}

c{x, +, y} {c·x, +, c·y}

{x, +, y} + {u, +, v} {x+u, +, y+v}

{x, +, y} * {u, +, v} {x·u, +, y{u, +, v}+v{x, +, y}+y·v}

Page 22: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 22

Loop Analysis with CR Forms

The basic idea: Scan the loop to detect IV updates Construct the CR form for each IV using the CR algebra

do J = J+I I = I+3 P = 2*P while (…)

J = {J0, +, I} J = {J0, +, {I0, +, 3}} I = {I0, +, 3} P = {P0, *, 2}

[vanEngelen01]

Page 23: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 23

Algorithm 1: Find Recurrences

Input: Loop L with live variable informationOutput: Set S of recurrence relations of IVs

1. Start with set S = { v, v | v is live at loop header }2. Search L from bottom to top:

for each assignment v = x of expression x to scalar variable v update tuples u, y in S by replacing v in y with x

Loop L Step Changes to S = {H, H, I, I, J, J, K, K}

do M = 2 L = J-H J = L+M K = K+M*I I = I+1 while (…)

54321

S5 = {H, H, I, I+1, J, J-H+2, K, K+2*I}S4 = {H, H, I, I+1, J, J-H+M, K, K+M*I}S3 = {H, H, I, I+1, J, L+M, K, K+M*I}S2 = {H, H, I, I+1, J, J, K, K+M*I}S1 = {H, H, I, I+1, J, J, K, K}

Page 24: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 24

Algorithm 2: Compute CR Forms

Input: Set S with recurrence relationsOutput: CR forms for IVs in S

1. For each relation v, x in S do:if x is of the form v then v = v0 (v is loop invariant) if x is of the form v + y then v = {v0, +, y}if x is of the form v * y then v = {v0, *, y}if x does not contain v then v = {v0, #, y} (v is wrap around)

2. Simplify the CR forms with the CR algebra rewrite rules

Recurrence relation in S CR form Simplified CR form

H, HI, I+1J, J-H+2K, K+2*I

H = H0

I = {I0, +, 1}

J = {J0, +, 2-H}

K = {K0, +, 2*I}

H = H0

I = {I0, +, 1}

J = {J0, +, 2-H0}

K = {K0, +, 2I0, +, 2}

Page 25: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 25

Algorithm 3: Solve

Input: CR forms for IVsOutput: Closed-form solutions for IVs (when possible)

1. For each CR form of v apply the CR inverse algebra, assuming loop is normalized for i = 0, …, n

2. Certain “exotic” mixed non-polynomial and non-exponential CR forms may not have closed forms

Loop L Simplified CR form Closed form

do M = 2 L = J-H J = L+M K = K+M*I I = I+1 while (…)

J = {J0, +, 2-H0} K = {K0, +, 2I0, +, 2} I = {I0, +, 1}

J(i) = J0 + (2-H0)*i K(i) = K0 + i2 + (2I0-1)*i I(i) = I0 + i

Page 26: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 26

Example 1

Loop L Step S = {x, x, z, z} CR form Closed form

x = 2 z = 0 do A(x) = A(z) x = x+z y = z+1 z = y+1 while (z<N)

321

S3 = {x, x+z, z, z+2}S2 = {x, x, z, z+2}S1 = {x, x, z, y+1}

x = {x0, +, z} z = {z0, +, 2}

x(i) = x0 + z0i + i2-i z(i) = z0+2i

do i=0,2*N-2 A(i*i-i+2) = A(2*i)end do

Page 27: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 27

Example 2

DO I=1,M DO J=1,I ij = ij+1 ijkl = ijkl+I-J+1 DO K=I+1,M DO L=1,K ijkl = ijkl+1 xijkl[ijkl]=xkl[L] ENDDO ENDDO ijkl = ijkl+ij+left ENDDOENDDO

TRFD code segmentfrom Perfect Benchmark

with IV updates

DO I=0,M-1 DO J=0,I DO K=0,M-I-2 DO L=0,I+K+1 tmp = ijkl+L+I*(K+(M+M*M+2*left+6)/4)+J*(left+(M+M*M)/2)+((I*I*M*M)+2*(K*K+3*K+I*I*(left+1))+M*I*I)/4+2 xijkl[tmp] = xkl[L+1] ENDDO ENDDO ENDDOENDDO

TRFD after aggressiveinduction variable substitution

IVS

Page 28: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 28

Example 3 (SSA)

a = 1; a0 = 1while (a<10) { if (a0>=10) goto L2 x = a+2; L1: a = a+1; a1 = (a0, a2) } x0 = a1 + 2 a2 = a1+1 if (a2<10) goto L1 L2:

1

a1

a0

+a2

1

x0

+

2

a1 = {1,+,1}

GCC 4.x uses our approachapplied to SSA form.

Note: GCC developers referto CRs as “scalar evolutions”

Page 29: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 29

Example 4 (SSA)

x = 0; x0 = 0 i = 1; i0 = 1while (i<10) { if (i0>=10) goto L2 x = x+i; L1: x1 = (x0, x2) i = i+1; i1 = (i0, i2) } x2 = x1+i1 i2 = i1+1 if (i2<10) goto L1 L2:

1

i1

i0

+i2

1

x1

x0

0

+x2

i1 = {1,+,1}x1 = {0,+,i1} = {0,+,1,+,1}

Page 30: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 30

Example 5 (SSA)

j = 0;i = 1;while (i<10) { if (p) j = j+2; else j = j+3; i = i+1;}

j0 = 0 i0 = 1 if (i0>=10) goto L2

L1: i1 = (i0, i2)

j1 = (j0, j4)

if (!p) goto L3

j2 = j1+2 goto L4

L3: j3 = j1+3

L4: j4 = (j2, j3)

i2 = i1+1 if (i2<10) goto L1

L2:

0

j1

j0

+

j4

2

j2

j3

+

3

{0,+,2} < j1 < {0,+,3}

Page 31: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 31

Recognizing Mixed Functional Forms and Reductions

Loop L Simplified CR form Factorial

I = 1 do F = F*I I = I+1 while (…)

F = {F0, *, 1, +, 1} I = {1, +, 1}

F = F0 * i!

Loop L Simplified CR form Reduction

I = 0; S = 0 do S = S+A[I] I = I+2 while (…)

S = {0, +, A[{0, +, 2}]} I = {0, +, 2}

S = ∑ A[2i]

Page 32: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 32

Pointer Access Descriptions of Pointer and Array References

A pointer access description (PAD) [vanEngelen01] is a CR form of a pointer or array reference in a loop nest

PADs are computed with the CR-based IV algorithms

Loop Code PAD Sequence

a[i] {a, +, 1} a[0],a[1],a[2],a[3]

a[2*i+1] {a+1, +, 2} a[1],a[3],a[5],a[7]

a[(i*i-i)/2] {a, +, 0, +, 1} a[0],a[0],a[1],a[3]

a[1<<i] {a+1, +, 1, *, 2} a[1],a[2],a[4],a[8]

p++ {a, +, 1} a[0],a[1],a[2],a[3]

p+=i {a, +, 0, +, 1} a[0],a[0],a[1],a[3]

short a[…], *p;int i;p = a;for(i=0;…;i++){

}

Page 33: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 33

CR-Enhanced Array Dependence Testing

Basic idea: construct dependence equations in CR form for both pointer and array accesses Determine the solution intervals by computing the value

ranges of the equations in CR form If the solution space is empty, there is no dependence

Page 34: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 34

Example

float a[…], *p, *q; p = a; q = a+2*n; for (i=0; i<n; i++) { t = *p; S: *p++ = *q; *q-- = t; }

Dependence equation:{a, +, 1}id = {a+2n, + ,-1}iu

Constraints:0 < id < n-10 < iu < n-1

Rewrite dependence equation:{a, +, 1}id = {a+2n, +, -1}iu

{a, +, 1}id - {a+2n, +, -1}iu = 0 {{-2n, +, 1}iu, +, 1}id = 0

Compute solution interval:Low[{{-2n, +, 1}iu, +, 1}id]= Low[{-2n, +, 1}iu]= -2nUp[{{-2n, +, 1}iu, +, 1}id]= Up[{-2n, +, 1}iu + n-1]= Up[-2n + 2n - 2]= -2

No dependence

S *

p={a, +, 1}q={a+2n, +, -1}

Page 35: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 35

Determining the Value Range of a CR Form

Suppose x(i) = {x0, +, s(i-1)} for i = 0, …, n If s(i-1) > 0 then x(i) is monotonically increasing If s(i-1) < 0 then x(i) is monotonically decreasing

If a function is monotonic on its domain, then it is trivial to find its exact value range

Page 36: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 36

Example: Nonlinear and Symbolic Dependence Testing

float a[…], *p, *q;p = q = a;for (i=0; i<n; i++){ for (j=0; j<=i; j++) *q += *++p; q++;}

CR dep. test disprovesflow dependence (<, <)

p = {{a+1, +, 1, +, 1}i, +, 1}j = a[(i2+i)/2+j+1]q = {a, +, 1}i = a[i]

DO i = 1, M+1 S1: A[I*N+10] = ... S2: ... = A[2*I+K] K = 2*K+N ENDDO

S1: A[{N+10, +, N}i]S2: A[{K0+2N, +, K0+ N+2, *, 2}i]

CR range test disprovesdependence when

K+N > 10 and K > 2

Page 37: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 37

Results

Implemented a CR-enhanced trapezoidal Banerjee test Relatively simple test Enhanced with support for nonlinear forms Enhanced with support for conditional flow Construct dependence equations in CR form

Implementation based on the Polaris compiler Pros: can compare to powerful dependence tests such as

Omega and Range test Cons: Fortran only

Page 38: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 38

Additional Independences Filtered over Omega Test

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

DYFESM

MDGOCEAN

QCD TRFD GEP NEP SEP

CR-EVT

Omega

LAPACKPerf. Benchmark

Page 39: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 39

Additional Independences Filtered over Range Test

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

DYFESM

MDGOCEAN

QCD TRFD GEP NEP SEP

CR-EVT

Range

Page 40: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 40

Additional Independences Filtered over Omega+Range

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

DYFESM

MDGOCEAN

QCD TRFD GEP NEP SEP

CR-EVT

Omega+Range

Page 41: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 41

Percentage of Conditional IVs w/o Closed Forms in LAPACK

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

GEP NEP SEP

Conditional IVs

Other IVs

Page 42: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 42

Timing Comparison: Perf Bench.

0

1

2

3

4

5

6

7

8

9

10

DYFESM MDG OCEAN QCD TRFD

Time (s)

Range

Omega

CR-EVT

CR-EVT (opt)

Page 43: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 43

Timing Comparison: LAPACK

0

10

20

30

40

50

60

70

GEP NEP SEP

Time (s)

Range

Omega

CR-EVT

CR-EVT (opt)

Page 44: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 44

Conclusions

A CR-based compiler framework has advantages: Applicable to CFG, AST, and SSA forms Handles conditional flow Handles nonlinear and symbolic induction variable expressions Allows array and pointer-based dependence testing to be

applied directly to the CR forms without induction variable substitution

Future work: Improve GCC implementation Enhance other dependence tests with CR forms

Page 45: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 45

Further Reading Robert van Engelen, Johnnie Birch, Yixin Shou, Burt Walsh, and Kyle Gallivan, “A

Unified Framework for Nonlinear Dependence Testing and Symbolic Analysis”, in the proceedings of the ACM International Conference on Supercomputing (ICS), 2004, pages 106-115.

Robert van Engelen, Johnnie Birch, and Kyle Gallivan, “Array Dependence Testing with the Chains of Recurrences Algebra”, in the proceedings of the IEEE International Workshop on Innovative Architectures for Future Generation High-Performance Processors and Systems (IWIA), January 2004, pages 70-81.

Robert van Engelen and Kyle Gallivan, “An Efficient Algorithm for Pointer-to-Array Access Conversion for Compiling and Optimizing DSP Applications”, in proceedings of the 2001 International Workshop on Innovative Architectures for Future Generation High-Performance Processors and Systems (IWIA), January 2001, pages 80-89.

Robert van Engelen, “Efficient Symbolic Analysis for Optimizing Compilers”, in proceedings of the International Conference on Compiler Construction, ETAPS 2001, LNCS 2027, pages 118-132.

Page 46: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also

NCSU 2/24/06 46

The End