CMPUT680 - Winter 2001

Preview:

DESCRIPTION

CMPUT680 - Winter 2001. Register Minimization X Register Saturation José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. - PowerPoint PPT Presentation

Citation preview

CMPUT 680 - Compiler Design and Optimization

1

CMPUT680 - Winter 2001

Register Minimization X Register Saturation

José Nelson Amaralhttp://www.cs.ualberta.ca/~amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

2

Reading List

Touati, Sid Ahmed Ali, “Register Saturation in Superscalar and VLIW Codes,” 10th International Conference on Compiler Construction, Genova, Italy, April 2001, pp. 213-228.

Touati, S.-A.-A., Thomasset, F., “Register Saturation in Data Dependence Graphs,” Research Report RR-3978, INRIA, July 2000.

Touati, S.-A.-A., “Optimal Register Saturation in Acyclic Superscalar and VLIW Codes,” Researchh Report, INRIA, Nov. 2000.

CMPUT 680 - Compiler Design and Optimization

3

Minimum Register Instruction Sequence (MRIS)

Problem

Given the Data Dependence Graph G for abasic block, derive an instruction sequence S for G that is optimal in the sensethat its register requirement is minimum.

CMPUT 680 - Compiler Design and Optimization

4

Intuition for Our Solution

a

b c d e

f g

h

i

Our intuition is to find sub-sets ofnodes that can definitely

share a register to inform theinstruction sequencing algorithm.

Data Dependence Graph

CMPUT 680 - Compiler Design and Optimization

5

Instruction Lineages

a

b c d e

f g

h

i

An instruction lineage is a sequenceof instructions in which a singleregister is passed from instructionto instruction (except for the last).

How can we ensure thatinstructions a, b, f, and h will be able to share the same register?

L1 = [a, b, f, h, i)

a

b

f

h

Data Dependence Graph

CMPUT 680 - Compiler Design and Optimization

6

Sequencing Edges

a

b c d e

f g

h

i

The lineage formation imposed ascheduling restriction in the DDG:the selected heir of a node must be the last node listed among itssiblings.

L1 = [a, b, f, h, i)

Thus the lineage formation insertssequencing edges in the DDG.

Augmented Data Dependence Graph

CMPUT 680 - Compiler Design and Optimization

7

Node Height

a

b c d e

f g

h

i

L1 = [a, b, f, h, i)

If the introduction of sequencing edges was to produce a cycle in the DDG,it would be impossible to find a legalinstruction sequence.

Thus we use the height of the nodes,recomputed after each lineage formation, to select the heir. Tiesare broken arbitrarily.

Augmented Data Dependence Graph

CMPUT 680 - Compiler Design and Optimization

8

Lineage Formation

a

b c d e

f g

h

i

L1 = [a, b, f, h, i)

For the next lineage, the heighestnodes not in a lineage are c, d, e,all with a height of 5.

L2 = [c, f)

c

L3 = [e, g, h)

e

g

L4 = [d, g)

d

Augmented Data Dependence Graph

CMPUT 680 - Compiler Design and Optimization

9

Lineage Interference

L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)

Two lineages Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vm) definitely overlap if:

(i) u1 reaches vn, and (ii) v1 reaches um.

a

b c d e

f g

h

iAugmented Data

Dependence Graph

CMPUT 680 - Compiler Design and Optimization

10

Lineage Interference Graph

L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)

a

b c d e

f g

h

i

L1

L3L2

L4

Lineage Interference Graph

Augmented Data Dependence Graph

Which lineages does lineage L1definely overlap with?

How about lineages L2 and L4?

CMPUT 680 - Compiler Design and Optimization

11

Lineage Fusion Condition

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L2

L4

Two lineagesLu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) can be fusedinto a single lineage if:

(i) u1 reaches vn, and (ii) v1 does not reach um.

L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)

Lineages

CMPUT 680 - Compiler Design and Optimization

12

Lineage Fusion Condition

L1 = [a, b, f, h, I)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L2

L4

Lineages

Which lineages can be fused in the example?

d reaches f, and c does not reach g

Thus L4 can be fused with L2 to formL5 = [d, g) [c, f)

CMPUT 680 - Compiler Design and Optimization

13

Lineage Fusion

L1 = {a, b, f, h, i}L2 = {c, f}L3 = {e, g, h}L4 = {d, g}a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L2

L4

Lineages

When Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) are fused:

(1) a scheduling edge from um to v1

is introduced in the augmented DDG(2) Lu and Lv are removed from the LIG(3) a new lineage Lw = Lu Lv is inserted in LIG

CMPUT 680 - Compiler Design and Optimization

14

Lineage Fusion Condition

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L5Lineages

How many colors we needto color the LIG?

Thus the fusion of L4 with L2 formL5 = [d, g) [c, f)

CMPUT 680 - Compiler Design and Optimization

15

Lineage Fusion Condition

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L5Lineages

We need three colors.

Can we find an instruction sequence?

CMPUT 680 - Compiler Design and Optimization

16

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Sequence

CMPUT 680 - Compiler Design and Optimization

17

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a

Sequence

CMPUT 680 - Compiler Design and Optimization

18

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d

Sequence

CMPUT 680 - Compiler Design and Optimization

19

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e

Sequence

CMPUT 680 - Compiler Design and Optimization

20

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g

Sequence

CMPUT 680 - Compiler Design and Optimization

21

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c

Sequence

CMPUT 680 - Compiler Design and Optimization

22

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b

Sequence

CMPUT 680 - Compiler Design and Optimization

23

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f

Sequence

CMPUT 680 - Compiler Design and Optimization

24

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f h

Sequence

CMPUT 680 - Compiler Design and Optimization

25

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f h i

Sequence

CMPUT 680 - Compiler Design and Optimization

26

Summary of Our Solution Method

A “good” construction algorithm for LIG (dynamic)

An effective heuristic method to calculate the HRB

An efficient scheduling method (do not backtrack)

Form Lineage Interference Graph (LIG)

Derive HRB

Extended list-scheduling guided by HRB

DDG

A good instructionsequence

CMPUT 680 - Compiler Design and Optimization

27

Register Saturation (Touati)

Given a data depende graph G, the register saturation (RS) of G is the maximal register need for any scheduleof G.

Touati’s strategy is to compute the RS of the G and,if RS exceeds the number of available registers, to reducethe RS by introducing new arcs in G.

The intuition is that by using either (1) all available registersor (2) the maximal registers that G can use, instruction levelparallelism is maximized.

CMPUT 680 - Compiler Design and Optimization

28

The HRB and the RS

Govind, Gao, Yang, Amaral, and Zhang had earlierproposed an alternative method: to find an heuristicregister bound (HRB) to be used as a guidance ina modified list scheduling. Their goal is to find aschedule that uses a minimum number of registers.

To compare both methods we will apply Touati’smethod to Govind et al.’s example, and Govind’smethod to Touati’s example.

CMPUT 680 - Compiler Design and Optimization

29

Potencial Killers

To find the RS(G), we need to know which operationmust kill each value generated. Touati’s define the set of operations that are potential killers of the valuegenerated by an operation u G.

pkillG(u) = { v Cons(u) / v Cons(u) = {v} }

v is the set of all descendents of v, including v.w Cons(u) iff (w,u) G

Thus a node v is a potential killer of the value generated by a node u if and only if v consumes u and no descendent of v consumes u.

CMPUT 680 - Compiler Design and Optimization

30

Potencial Killing Graph

The edges of the Potential Killing Graph of a DDG G, PK(G)=(V, EPK), are defined as follows:

EPK = {(u,v) / u VR v pkillG(u)}

VR is the set of operations that define a value,i.e., operations that need a register.

CMPUT 680 - Compiler Design and Optimization

31

Govind’s Example: Data Dependency Graph

B3a

b c d e

f g

h

i

(a) t1 := ld(x);(b) t2 := t1 + 4;(c) t3 := t1 * 8;(d) t4 := t1 - 4;(e) t5 := t1 / 2;(f) t6 := t2 * t3;(g) t7 := t4 - t5;(h) t8 := t6 * t7;(i) st(y,t8);

DDG G

CMPUT 680 - Compiler Design and Optimization

32

Govind’s Example: Potential Kill Graph

a

b c d e

f g

h

i

DDG G

pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}

CMPUT 680 - Compiler Design and Optimization

33

Govind’s Example: Potential Kill Graph

a

b c d e

f g

h

i

DDG G

a

b c d e

f g

h

i

PK(G)* In this example the DDG G and the potential kill graph PK(G) are identical. In general that is not the case.

CMPUT 680 - Compiler Design and Optimization

34

Choosing the Killer

If a node u has more than one potential killer, Touatidefines a killing function, k(u), that specifies which oneamong the potential killers of u will actually kill u.

A killing function imposes a scheduling order in the DDG:all other consumers of u , Cons(u), must be scheduled before k(u) is scheduled.

To represent these scheduling constraints, Touati defines an extended DAG, Gk, induced by the killingfunction k.

CMPUT 680 - Compiler Design and Optimization

35

Govind’s Example: Killing Function

a

b c d e

f g

h

i

PK(G)

pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}

In this example, node a is theonly node with multiple potentialkillers.

CMPUT 680 - Compiler Design and Optimization

36

Govind’s Example: Killing Function

Gk

pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}

If we choose k(a) = b, we obtainthe Gk on the left.

a

b c d e

f g

h

i

CMPUT 680 - Compiler Design and Optimization

37

Selecting a Good Set of Killers...

If the killing function for multiple nodes with multiple potential killers is choosen arbitrarily,it might induce cycles in Gk.

A valid killing function is one that does notinduce cycles in Gk.

CMPUT 680 - Compiler Design and Optimization

38

Avoiding Vengeance...

The descendents of k(u) cannot be simultaneouslyalive with u. Touati defines the Disjoint Value Graph,DVk(G) = (VR, EDV), by:

EDV = {(u,v) / u, v VR v Rk(u)}

An edge (u,v) in DVk(G) means that the live intervalof u is always before the live interval of v in any schedule of Gk.

A killer must kill before it has children, thus...

CMPUT 680 - Compiler Design and Optimization

39

Govind’s Example: Disjoint Value Graph

Gk

k(a) = {b}k(b) = {f}k(c) = {f}k(d) = {g}

a

b c d e

f g

h

i

k(e) = {g}k(f) = {h}k(g) = {h}k(h) = {i}

a

b c d e

f g

h

i

DVk(G) * simplified by transitive reduction

CMPUT 680 - Compiler Design and Optimization

40

Register Need and Maximal Antichains

The register need of any schedule of Gk is alwaysless than or equal to a maximal antichain in DVk(G).

An antichain in a graph G(E,V) is a set of nodes A suchthat there are no paths between the nodes in A:

A = {u, v V / (u,v) Ec (v,u) Ec}

Where Ec is the transitive closure of G: (u,v) Ec:(u,v) Ec iff a path p = (u, …, v) in G.

CMPUT 680 - Compiler Design and Optimization

41

Govind’s Example: Maximal Antichain

a

b c d e

f g

h

i

DVk(G)

The maximal antichain in thisexample is:

AMk = {a, c, d, e}

Thus this graph, with thiskilling function can useat most 4 registers.

CMPUT 680 - Compiler Design and Optimization

42

Register Saturating Scheduling

Touati proves that:

For every valid killing k(V) function, there is always a schedule that makes all the values in the maximal antichain of the disjoint value DAG DVk(G) simultaneously alive.

CMPUT 680 - Compiler Design and Optimization

43

Saturating Killing Function

To find the register saturation of a DDG, we need tofind a killing function that maximizes the maximalantichain in DVk(G).

In other words, we need to find a killing functionthat maximizes the number of nodes that are not connected by a path in DVk(G).

Touati calls this the maximizing maximal antichain (MMA) problem. A solution to the MMA problem isa saturating killing function. MMA is NP-complete.

CMPUT 680 - Compiler Design and Optimization

44

Heuristic to Compute Register Saturation

To compute the register saturation, Touati startsby decomposing the potential kill graph PK(G)into connected bipartite components.

A bipartite component, cb = (Scb, Tcb, Ecb), isa graph with a set of source nodes Scb, a setof target nodes Tcb, and a set of edges Ecb. cbmust obey the following conditions.

If e EPK e’ Ecb e, e’ share an endpoint, then e Ecb

e, e’ Ecb / target(e) = source (e’) /

CMPUT 680 - Compiler Design and Optimization

45

Bipartite Decomposition of PK(G)

A bipartite decomposition of the potential killing graphPK(G) is a set of bipartite components such that forevery edge e PK(G), there is a bipartite componentcb in the decomposition such that e Ecb.

Touati proves that given a DDG G, there is only onebipartite decomposition of G.

CMPUT 680 - Compiler Design and Optimization

46

Govind’s Example: Bipartite Decomposition

a

b c d e

f g

h

i

PK(G)

a

b c d e

b c d e

f g

f g

h

h

i

Bipartite Decomposition

CMPUT 680 - Compiler Design and Optimization

47

Saturating Killing Set

Touati defines the Saturating Killing Set of a connectedbipartite component cb, SKS(cb), as a subset of thetarget nodes, Tcb’ Tcb such that:

(1) All the source nodes, Scb, are contained in the union of all predecessors of the nodes in Tcb’.

(2) Tcb’ contains a minimum number of nodes.

Computing the SKS is an NP-complete problem.

CMPUT 680 - Compiler Design and Optimization

48

Govind’s Example: Saturating Killing Set

a

b c d e

b c d e

f g

f g

h

h

i

Bipartite Decomposition

In this example the computationof SKS is trivial. The only component with a non-unitarytarget set is the top one.

The selection of any single nodein the set Tcb = {b, c, d, e} covers the set Scb = {a}. Thus the selection can be arbitrary.

CMPUT 680 - Compiler Design and Optimization

49

Govind’s Example

As we seen earlier with k(a) = b, the registersaturation in Govind’s example is 4. And a schedulethat has four values alive at the same time can befound.

Using the lineage method, Govind et al. found aschedule for their example that uses three registers.What does Touati’s method does if only three registersare available?

CMPUT 680 - Compiler Design and Optimization

50

Reducing RS

Touati proposes an algorithm to reduce the registersaturation while trying not to increase the lengthof the critical path.

The algorithm starts by computing the maximal antichain AMk.Then it starts an interative process in which thefirst step is to construct the set Uk of alladmissible serializations between the saturatingvalues in AMk with their costs.

CMPUT 680 - Compiler Design and Optimization

51

Admissible Serializations

A serialization u v means that the kill of umust always be carried out before the definitionof v.

If v is one of the potential killers of u, then toproduce the serialization u v we must add arcs fromall other potential killers of u to v. This way we ensure that the live ranges of u and v will not overlap.

If v is not a potential killer of u, then to produce the serialization u v we must add arcs fromall nodes u’ pkillG(u) to v, as long as there is no path from v to u’.

CMPUT 680 - Compiler Design and Optimization

52

Cost of Serializations

The cost function of a serialization is defined as

(u v) = (1, 2)

1 predicts the reduction in the saturation valueproduced by the serialization, it is computed by:

1 = 1 - 2

1 is the number of saturating values serialized after u if this serialization is carried out.

2 is the number of descendents of u that can become simultaneously alive with u.

1 is the increase in the critical path.

CMPUT 680 - Compiler Design and Optimization

53

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

For a serialization u v to beadmissible, the following conditionmust be true:

v’ pkill(u) (v < v’ )i.e., there are no paths from v toany potential killer of u.

CMPUT 680 - Compiler Design and Optimization

54

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

Thus, there is no admissibleserialization from a to any ofthe other saturating values,because b pkillG(a) and there are paths fromc, d, and e to b in Gk

CMPUT 680 - Compiler Design and Optimization

55

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

c d and c e are notadmissible serializations

either because f pkillG(c) and d < f, e < f

CMPUT 680 - Compiler Design and Optimization

56

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

d e is not admissible because g pkillG(d) and e < g,

e d is not admissible because g pkillG(e) and d < g

CMPUT 680 - Compiler Design and Optimization

57

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

iThus the admissible serializations

in this example are:d c, e c

CMPUT 680 - Compiler Design and Optimization

58

Govind’s Example: Reducing RS

Gk

a

b c d e

f g

h

i

In this example bothserializations will cause

the scheduling edge (g,c) to be added to the graph.

Thus their cost is equivalent.

Note that, for this example,reducing RS is equivalent tothe lineage fusion technique

in Govind et al. approach.

CMPUT 680 - Compiler Design and Optimization

59

Govind’s Algorithm in Touati’s Example

Now we will apply the lineage based methodproposed by Govind et al. to the DDG presentedby Touati.

In the next slide we transcribe the code and theDDG as presented by Touati.

CMPUT 680 - Compiler Design and Optimization

60

A Trivial Example

x

y

k

t

z

pkillG(x) = {k}

pkillG(y) = {z}

pkillG(z) = {k}

pkillG(k) = {z}

DDG

x

y

k

t

z

PKG

CMPUT 680 - Compiler Design and Optimization

61

A Trivial Example (cont.)

x

y

k

t

z

pkillG(x) = {k}

pkillG(y) = {z}

pkillG(z) = {k}

pkillG(k) = {z}

DDG

x

y

k

t

z

PKG

There are no choicesto be made as eachnode has only one

potential killer.

CMPUT 680 - Compiler Design and Optimization

62

A Trivial Example (cont.)

x

y

k

t

z

DDG

x

y

k

t

z

DV

The DV graph is identicalto the PKG in this case,

and the solution is trivial,the maximal antichain inthe DV graph is {x,y,z}

CMPUT 680 - Compiler Design and Optimization

63

A Non-Trivial Example

a

f

d e

DDG

b c

pkillG(a) = {f}

pkillG(b) = {d,e}

pkillG(c) = {d,e}

pkillG(d) = {g}pkillG(e) = {f}

g

pkillG(f) = {g}

CMPUT 680 - Compiler Design and Optimization

64

A Non-Trivial Example

a

f

d e

DDG

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

DVk1k1={(b,d),(c,d)}

DVk2k2={(b,d),(c,e)}

DVk3k3={(b,e),(c,d)}

DVk4k4={(b,e),(c,e)}

CMPUT 680 - Compiler Design and Optimization

65

A Non-Trivial Example

a

f

d e

DDG

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

DVk1k1={(b,d),(c,d)}

DVk2k2={(b,d),(c,e)}

DVk3k3={(b,e),(c,d)}

DVk4k4={(b,e),(c,e)}

CMPUT 680 - Compiler Design and Optimization

66

There are eight killing functions (DV Graphs)

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

k={(a,b),(b,d),(c,d)} k={(a,b),(b,d),(c,e)} k={(a,b),(b,e),(c,d)} k={(a,b),(b,e),(c,e)}

k={(a,c),(b,d),(c,d)} k={(a,c),(b,d),(c,e)} k={(a,c),(b,e),(c,d)} k={(a,c),(b,e),(c,e)}

CMPUT 680 - Compiler Design and Optimization

67

Maximal antichainsa

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

k={(a,b),(b,d),(c,d)} k={(a,b),(b,d),(c,e)} k={(a,b),(b,e),(c,d)} k={(a,b),(b,e),(c,e)}

k={(a,c),(b,d),(c,d)} k={(a,c),(b,d),(c,e)} k={(a,c),(b,e),(c,d)} k={(a,c),(b,e),(c,e)}

CMPUT 680 - Compiler Design and Optimization

68

A More Non-Trivial Example

a

DDG

d e j k

b c g

f m

n

pkillG(a) = {b,c,g}pkillG(b) = {d,e}pkillG(c) = {e,j,k}pkillG(d) = {f}pkillG(e) = {m}pkillG(f) = {n}pkillG(g) = {d,j,k}pkillG(j) = {f}pkillG(k) = {m}

There are 3*2*3*3=18 killing functions

CMPUT 680 - Compiler Design and Optimization

69

Govind’s Algorithm in Touati’s Example

(a) fload [i1], fRa

(b) fload [i2], fRb

(c) fload [i3], fRc

(d) fmult fRa, fRb, fRd

(e) imultadd fRa, fRb, fRc, iRe

(g) ftoint fRc, iRg

(i) iadd iRg, 4, iRi

(f) fmultadd_setz fRb, iRi, fRc, fRf, gf

(h) fdiv fRd, iRe, fRh

(j) gf ? fadd_setbnz fRj, 1 , fRj, gj

(k) gf | gj ? fsub fRk, 1 , fRk

a b c

d e f

h k

g

i

j

fRc

iRg

iRi

gf

gj

iRe

Touati concentrates on theblue edges that represent flow

of floating point values.

fRd

gf

CMPUT 680 - Compiler Design and Optimization

70

Govind’s Algorithm in Touati’s Example

We will also concentrate onthe floating point value flow.Thus the simplified DDG isshown on the left.

Although the modified list schedulingrequires a souce and a sink node, the lineage formation processdoes not consider the source andthe sink node.

a b c

d e f

h k

g

i

j

CMPUT 680 - Compiler Design and Optimization

71

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1

Step 1: Compute the heights

CMPUT 680 - Compiler Design and Optimization

72

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)

CMPUT 680 - Compiler Design and Optimization

73

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)Step 3: Second lineage formation

L2 = [b, f)

CMPUT 680 - Compiler Design and Optimization

74

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)Step 3: Second lineage formation

L2 = [b, f)Recompute heights

CMPUT 680 - Compiler Design and Optimization

75

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)Step 3: Second lineage formation

L2 = [b, f)Recompute heights

Step 4: Third lineage formationL3 = [c, f)

CMPUT 680 - Compiler Design and Optimization

76

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

Step 1: Compute the heights

1 10

23 1

4 4 3

0

2

Step 2: First lineage formationL1 = [a, e)

Step 3: Second lineage formationL2 = [b, f)

Recompute heightsStep 4: Third lineage formation

L3 = [c, f)Recompute heights

CMPUT 680 - Compiler Design and Optimization

77

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

Step 1: Compute the heights

1 10

23 1

4 4 3

0

2

Step 2: First lineage formationL1 = [a, e)

Step 3: Second lineage formationL2 = [b, f)

Recompute heightsStep 4: Third lineage formation

L3 = [c, f)Recompute heights

Step 5: Fourth lineage formationL4 = [d, h)

CMPUT 680 - Compiler Design and Optimization

78

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

Step 1: Compute the heights

1 10

23 1

4 4 3

0

2

Step 2: First lineage formationL1 = [a, e)

Step 3: Second lineage formationL2 = [b, f)

Recompute heightsStep 4: Third lineage formation

L3 = [c, f)Recompute heights

Step 5: Fourth lineage formationL4 = [d, h)

CMPUT 680 - Compiler Design and Optimization

79

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L3 = [c, f)L4 = [d, h)

Lineage Source Nodes: S = {a, b, c, d}

Lineage End Nodes: S = {e, f, h}

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

CMPUT 680 - Compiler Design and Optimization

80

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L3 = [c, f)L4 = [d, h)

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Because d can reach f, butc cannot reach h, we can fuselineages L4 and L3 to createa new lineage L5 = [d, h)[c,f).This fusion requires a sequencingedge from h to c.

CMPUT 680 - Compiler Design and Optimization

81

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f).

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Because there are no more 0’sin the Reach relation matrix,there is no more lineage fusion possible.

CMPUT 680 - Compiler Design and Optimization

82

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f).

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Lineage Interference Graph:

L1

L2 L5

We need three colors:L1 = RAL2 = RBL3 = RC

CMPUT 680 - Compiler Design and Optimization

83

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

Sequence

CMPUT 680 - Compiler Design and Optimization

84

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

CMPUT 680 - Compiler Design and Optimization

85

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b

CMPUT 680 - Compiler Design and Optimization

86

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d

CMPUT 680 - Compiler Design and Optimization

87

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h

CMPUT 680 - Compiler Design and Optimization

88

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c

CMPUT 680 - Compiler Design and Optimization

89

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e

CMPUT 680 - Compiler Design and Optimization

90

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e g

CMPUT 680 - Compiler Design and Optimization

91

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e g f

CMPUT 680 - Compiler Design and Optimization

92

Comparing the Methods

Touati’s method allows the creation of schedulesthat uses from 7 to 3 registers (in his CC2001 paperhe reduced from 7 to 4) according to the numberof registers available for the basic block.

Govind et al. method will always create a schedule usingthree registers for this basic block, regardless of thenumber of registers available for the basic block.

CMPUT 680 - Compiler Design and Optimization

93

Conjecture

If the scheduler in an out of order instruction issue processor is optimal and the register renaminghas an infinite number of hidden registers, bothmethods should be equivalent, and the lineage basedone is simpler.

With limited number of hidden registers for renaming,and a sub-optimal runtime scheduler, Touati’s methodis likely to produce better results because it makes better use of the available registers.

CMPUT 680 - Compiler Design and Optimization

94

Research Questions

How well do the two methods compare in anactual superscalar processor such as the MIPS R12K?

Touati’s claim that his method will work well in VLIWmachines too. How would it compare with the lineagemethod in the IA-64?

The allocation of registers to basic block by the globalregister scheduler might affect Touati’s method significantly. How can his LRA be integrated with a GRA?

CMPUT 680 - Compiler Design and Optimization

95

Summary of Our Solution Method

A “good” construction algorithm for LIG (dynamic)

An effective heuristic method to calculate the HRB

An efficient scheduling method (do not backtrack)

Form Lineage Interference Graph (LIG)

Derive HRB

Extended list-scheduling guided by HRB

DDG

A good instructionsequence