68
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Email: [email protected] Tel: 310-267-2098 WWW: http://www.ee.ucla.edu/~mbs Copyright 2003 Mani Srivastava High-level Synthesis Scheduling, Allocation, Assignment, Note: Several slides in this Lecture are from Prof. Miodrag Potkonjak, UCLA CS

High-level Synthesis Scheduling, Allocation, Assignment,

  • Upload
    shania

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

High-level Synthesis Scheduling, Allocation, Assignment,. Note: Several slides in this Lecture are from Prof. Miodrag Potkonjak, UCLA CS. Overview. High Level Synthesis Scheduling, Allocation and Assignment Estimations Transformations. Allocation, Assignment, and Scheduling. - PowerPoint PPT Presentation

Citation preview

Page 1: High-level Synthesis Scheduling, Allocation, Assignment,

Mani SrivastavaUCLA - EE DepartmentRoom: 6731-H Boelter HallEmail: [email protected]: 310-267-2098WWW: http://www.ee.ucla.edu/~mbs

Copyright 2003 Mani Srivastava

High-level Synthesis Scheduling, Allocation, Assignment,

Note: Several slides in this Lecture are from

Prof. Miodrag Potkonjak, UCLA CS

Page 2: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava2

Overview

High Level Synthesis

Scheduling, Allocation and Assignment

Estimations

Transformations

Page 3: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava3

Allocation, Assignment, and Scheduling

D

+

-

>>

>>

+

-

>>

+ >>

+

>>

+

Allocation: How Much?2 adders

Assignment: Where?

Schedule: When?

Shifter 1

Time Slot 4

1 shifter24 registers

D

Techniques Well Understood and Mature

Page 4: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava4

Scheduling and Assignment

+

*3*2

3

+

*1

2

+1 1

2

3

3

4 4

+

*3*2

3

+2

+1 2

3

4

1

2 3

4 control steps

+ * * + *

*1

Schedule 1 Schedule 2

1 +1

2 +2

3 +3 *1

4 *2 *3

Control Step

1 +3

2 +1 *2

3 +2 *3

4 *1

Control Step

Page 5: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava5

ASAP Scheduling Algorithm

Page 6: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava6

ASAP Scheduling Example

Page 7: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava7

ASAP: Another Example

Sequence Graph ASAP Schedule

Page 8: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava8

ALAP Scheduling Algorithm

Page 9: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava9

ALAP Scheduling Example

Page 10: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava10

ALAP: Another Example

Sequence Graph ALAP Schedule(latency constraint = 4)

Page 11: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava11

Observation about ALAP & ASAP

No priority is given to nodes on critical path As a result, less critical nodes may be scheduled

ahead of critical nodes No problem if unlimited hardware However of the resources are limited, the less

critical nodes may block the critical nodes and thus produce inferior schedules

List scheduling techniques overcome this problem by utilizing a more global node selection criterion

Page 12: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava12

List Scheduling and Assignment

List_Scheduling() {

Create_Candidate_List();

while (Candidate_List != NULL) {

Select_Candidate();

Schedule Candidate();

}

}

+

*3*2

3

+

*1

2

+14 control steps

Schedule 1

+1 +3

+3 *1

*2 *3

*2

+3 +2

1:

2:

3:

4:

Page 13: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava13

List Scheduling Algorithm using Decreasing Criticalness Criterion

Page 14: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava14

Scheduling

NP-complete Problem Optimal Heuristics - Iterative Improvements Heuristics – Constructive Various versions of problem

Unconstrained minimum latency Resource-constrained minimum latency Timing constrained

If all resources identical, reduced to multiprocessor scheduling

Minimum latency multiprocessor problem is intractable

Page 15: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava15

Scheduling - Optimal Techniques

Integer Linear Programming

Branch and Bound

Page 16: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava16

Integer Linear Programming

Given: integer-valued matrix Amxn,

vectors B = ( b1, b2, … , bm ), C = ( c1, c2, … , cn )

Minimize: CTX

Subject to:

AX B

X = ( x1, x2, … , xn ) is an integer-valued vector

Page 17: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava17

Integer Linear Programming Problem: For a set of (dependent) computations {t1,t2,...,tn},

find the minimum number of units needed to complete the execution by k control steps.

Integer linear programming:Let y0 be an integer variable. For each control step i ( 1 i k ): define variable xij asxij = 1, if computation tj is executed in the ith control step. xij = 0, otherwise. define variable yi = xi1 + xI2 + ... + xin .

Page 18: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava18

Integer Linear Programming

Integer linear programming:For each computation dependency: ti has to be done before tj, introduce a constraint: k x1i+ (k-1) x2i+ ... + xki k x1j+ (k-1) x2j+ ... + xkj+ 1(*)

Minimize: y0

Subject to: x1i+ x2i+ ... + xki = 1 for all 1 i n

yj y0 for all 1 i k

all computation dependency of type (*)

Page 19: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava19

An Example

c1 c2 c3

c4

c6

c5

6 computations3 control steps

Page 20: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava20

An Example

Introduce variables: xij for 1 i 3, 1 j 6

yi = xi1+xi2+xi3+xi4+xi5+xi6 for 1 i 3

y0

Dependency constraints: e.g. execute c1 before c4

3x11+2x21+x31 3x14 +2x24+x34+1

Execution constraints:

x1i+x2i+x3i = 1 for 1 i 6

Page 21: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava21

An Example Minimize: y0

Subject to: yi y0 for all 1 i 3

dependency constraints

execution constraints One solution: y0 = 2

x11 = 1, x12 = 1,

x23 = 1, x24 = 1,

x35 = 1, x36 = 1.

All other xij = 0

Page 22: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava22

ILP Model of Scheduling

Binary decision variables xil

i = 0, 1, …, n l = 1, 2, … +1

Start time is unique

Page 23: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava23

ILP Model of Scheduling (contd.)

Sequencing relationships must be satisfied

Resource bounds must be met let upper bound on # of resources of type k be ak

Page 24: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava24

Minimum-latency Scheduling Under Resource-constraints

Let t be the vector whose entries are start times Formal ILP model

Page 25: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava25

Example

Two types of resources Multiplier ALU

Adder Subtraction Comparison

Both take 1 cycle execution time

Page 26: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava26

Example (contd.)

Heuristic (list scheduling) gives latency = 4 steps Use ALAP and ASAP (with no resource

constraints) to get bounds on start times ASAP matches latency of heuristic

so heuristic is optimum, but let us ignore it! Constraints?

Page 27: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava27

Example (contd.)

Start time is unique

Page 28: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava28

Example (contd.)

Sequencing constraints note: only non-trivial ones listed

those with more than one possible start time for at least one operation

Page 29: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava29

Example (contd.)

Resource constraints

Page 30: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava30

Example (contd.)

Consider c = [0, 0, …, 1]T

Minimum latency schedule since sink has no mobility (xn,5 = 1), any feasible

schedule is optimum Consider c = [1, 1, …, 1] T

finds earliest start times for all operations equivalently,

Page 31: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava31

Example Solution: Optimum Schedule Under Resource

Constraint

Page 32: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava32

Example (contd.)

Assume multiplier costs 5 units of area, and ALU costs 1 unit of area

Same uniqueness and sequencing constraints as before

Resource constraints are in terms of unknown variables a1 and a2

a1 = # of multipliers

a2 = # of ALUs

Page 33: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava33

Example (contd.) Resource constraints

Page 34: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava34

Example Solution

MinimizecTa = 5.a1 + 1.a2

Solution with cost 12

Page 35: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava35

Precedence-constrained Multiprocessor Scheduling

All operations done by the same type of resource intractable problem intractable even if all operations have unit delay

Page 36: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava36

Scheduling - Iterative Improvement

Kernighan - Lin (deterministic) Simulated Annealing Lottery Iterative Improvement Neural Networks Genetic Algorithms Taboo Search

Page 37: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava37

Scheduling - Constructive Techniques

Most Constrained

Least Constraining

Page 38: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava38

Force Directed Scheduling

Goal is to reduce hardware by balancing concurrency

Iterative algorithm, one operation scheduled per iteration

Information (i.e. speed & area) fed back into scheduler

Page 39: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava39

The Force Directed Scheduling Algorithm

Page 40: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava40

Step 1

Determine ASAP and ALAP schedules

*

-+

**

*+ <

**-

*

-

+* * *+ <**

-

ASAP ALAP

Page 41: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava41

Step 2

Determine Time Frame of each op Length of box ~ Possible execution cycles Width of box ~ Probability of assignment Uniform distribution, Area assigned = 1

C-step 1

C-step 2

C-step 3

C-step 4

Time Frames

*

-

*

*

-

*

**

+ <

+

1/2

1/3

Page 42: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava42

Step 3

Create Distribution Graphs Sum of probabilities of each Op type Indicates concurrency of similar Ops

DG(i) = Prob(Op, i)

DG for Multiply DG for Add, Sub, Comp

Page 43: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava43

Diff Eq Example: Precedence Graph Recalled

Page 44: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava44

Diff Eq Example: Time Frame & Probability Calculation

Page 45: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava45

Diff Eq Example: DG Calculation

Page 46: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava46

Conditional Statements

Operations in different branches are mutually exclusive Operations of same type can be overlapped onto DG Probability of most likely operation is added to DG

DG for Add

-+

-+

+Fork

Join

+-+

-+

Page 47: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava47

Self Forces Scheduling an operation will effect overall concurrency Every operation has 'self force' for every C-step of its time frame Analogous to the effect of a spring: f = Kx

Desirable scheduling will have negative self force Will achieve better concurrency (lower potential energy)

Force(i) = DG(i) * x(i)

DG(i) ~ Current Distribution Graph value

x(i) ~ Change in operation’s probability

Self Force(j) = [Force(i)]

b

ti

Page 48: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava48

Example Attempt to schedule multiply in C-step 1

Self Force(1) = Force(1) + Force(2)

= ( DG(1) * X(1) ) + ( DG(2) * X(2) )

= [2.833*(0.5) + 2.333 * (-0.5)] = +0.25

This is positive, scheduling the multiply

in the first C-step would be bad

DG for Multiply

*

-

*

*

-

*

**

+ <

+

C-step 1

C-step 2

C-step 3

C-step 41/2

1/3

Page 49: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava49

Diff Eq Example: Self Force for Node 4

Page 50: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava50

Predecessor & Successor Forces

Scheduling an operation may affect the time frames of other linked operations

This may negate the benefits of the desired assignment Predecessor/Successor Forces = Sum of Self Forces of

any implicitly scheduled operations

*

-+

**

*+ <

**-

Page 51: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava51

Diff Eq Example: Successor Force on Node 4

If node 4 scheduled in step 1 no effect on time frame for successor node 8

Total force = Froce4(1) = +0.25 If node 4 scheduled in step 2

causes node 8 to be scheduled into step 3 must calculate successor force

Page 52: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava52

Diff Eq Example: Final Time Frame and Schedule

Page 53: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava53

Diff Eq Example: Final DG

Page 54: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava54

Lookahead Temporarily modify the constant DG(i) to include the effect

of the iteration being considered

Force (i) = temp_DG(i) * x(i)temp_DG(i) = DG(i) + x(i)/3

Consider previous example:

Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2) = .5(2.833 + .5/3) -.5(2.333 - .5/3) = +.41667

This is even worse than before

Page 55: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava55

Minimization of Bus Costs

Basic algorithm suitable for narrow class of problems Algorithm can be refined to consider “cost” factors Number of buses ~ number of concurrent data transfers Number of buses = maximum transfers in any C-step Create modified DG to include transfers: Transfer DG

Trans DG(i) = [Prob (op,i) * Opn_No_InOuts]

Opn_No_InOuts ~ combined distinct in/outputs for Op

Calculate Force with this DG and add to Self Force

Page 56: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava56

Minimization of Register Costs Minimum registers required is given by the largest

number of data arcs crossing a C-step boundary Create Storage Operations, at output of any operation

that transfers a value to a destination in a later C-step Generate Storage DG for these “operations” Length of storage operation depends on final schedule

s

ss

d

d d

Storage distribution for S

ASAP Lifetime MAX Lifetime ALAP Lifetime

Page 57: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava57

Minimization of Register Costs( contd.) avg life] =

storage DG(i) = (no overlap between ASAP & ALAP)

storage DG(i) = (if overlap)

Calculate and add “Storage” Force to Self Force

3

life] [MAX life] [ALAP life] [ASAP

life][max

life] [avg

[overlap]life][max

[overlap] - life] [avg

7 registers minimum

ASAP Force Directed

5 registers minimum

Page 58: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava58

Pipelining* * *

***

+

+<

--

* * ****

+

+<

--

DG for Multiply

123, 1’4, 2’ 3’ 4’

Instance

Instance’

Functional Pipelining

1

2

34

*

*

Structural Pipelining

Functional Pipelining Pipelining across multiple

operations Must balance distribution

across groups of concurrent C-steps

Cut DG horizontally and superimpose

Finally perform regular Force Directed Scheduling

Structural Pipelining Pipelining within an operation For non data-dependant

operations, only the first C-step need be considered

Page 59: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava59

Other Optimizations Local timing constraints

Insert dummy timing operations -> Restricted time frames

Multiclass FU’s Create multiclass DG by summing probabilities of

relevant ops Multistep/Chained operations.

Carry propagation delay information with operation Extend time frames into other C-steps as required

Hardware constraints Use Force as priority function in list scheduling

algorithms

Page 60: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava60

Scheduling using Simulated Annealing

Reference:

Devadas, S.; Newton, A.R.

Algorithms for hardware allocation in data path synthesis.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7):768-81.

Page 61: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava61

Simulated Annealing

Local Search

Solution space

Cos

t fu

nctio

n

?

Page 62: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava62

Statistical Mechanics

Combinatorial Optimization

State {r:} (configuration -- a set of atomic position )

weight e-E({r:])/K BT -- Boltzmann distribution

E({r:]): energy of configuration

KB: Boltzmann constant

T: temperature

Low temperature limit ??

Page 63: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava63

Analogy

Physical System

State (configuration)

Energy

Ground State

Rapid Quenching

Careful Annealing

Optimization Problem

Solution

Cost Function

Optimal Solution

Iteration Improvement

Simulated Annealing

Page 64: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava64

Generic Simulated Annealing Algorithm

1. Get an initial solution S2. Get an initial temperature T > 03. While not yet 'frozen' do the following: 3.1 For 1 i L, do the following:

3.1.1 Pick a random neighbor S'of S 3.1.2 Let =cost(S') - cost(S) 3.1.3 If 0 (downhill move) set S = S' 3.1.4 If >0 (uphill move)

set S=S' with probability e-/T

3.2 Set T = rT (reduce temperature)4. Return S

Page 65: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava65

Basic Ingredients for S.A.

Solution Space

Neighborhood Structure

Cost Function

Annealing Schedule

Page 66: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava66

Observation

All scheduling algorithms we have discussed so far are critical path schedulers

They can only generate schedules for iteration period larger than or equal to the critical path

They only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraints

Page 67: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava67

Example

Can one do better than iteration period of 4? Pipelining + retiming can reduce critical path to 3, and also

the # of functional units Approaches

Transformations followed by scheduling Transformations integrated with scheduling

Page 68: High-level Synthesis Scheduling, Allocation, Assignment,

Copyright 2003 Mani Srivastava74

Conclusions

High Level Synthesis Connects Behavioral Description and Structural

Description Scheduling, Estimations, Transformations High Level of Abstraction, High Impact on the

Final Design