VLSI Digital Signal Processing Chapter 6 Foldingviplab.cs.nctu.edu.tw/course/VLSI_DSP2010_Fall/VLSIDSP_CHAP6.pdf · VLSI Digital Signal Processing Systems Folding Lan-Da Van (范倫達),

VLSI Digital Signal Processing Systems

Folding

Lan-Da Van (范倫達), Ph. D.

Department of Computer Science

National Chiao Tung University

Taiwan, R.O.C.

Fall, 2010

[email protected]

http://www.cs.nctu.tw/~ldvan/


Lan-Da Van VLSI-DSP-6-2

Outline

Introduction

Folding Transformation

Register Minimization Techniques

Register Minimization in Folded Architecture

Conclusions



Introduction (1/2)

Systematically determine the control circuits in DSP

architectures by folding transformation, where

multiple algorithm operations are time-multiplexed to

a single functional unit.

Use for synthesis of DSP architectures that can be

operated at single or multiple clocks.

Use to reduce the number of hardware functional

units (FUs) by a factor of N at the expense of

increasing computation time by a factor of N.

Lead to an architecture that uses a large number of

registers and thus present the register minimization

technique.



Introduction (2/2)



Outline

Introduction




Conclusions



Folding Transformation (1/3)

A systematic techniques for designing control circuits for hardware where several algorithm operations are time-multiplexed on a single functional unit.

Notations U, V: nodes (operations) of the original DFG

HU, HV: nodes (functional units) of the folded DFG

W(x): x-th iteration of node W

U → V: an edge e from node U to noe V

w(e): # of delays of the edge e

Folding factor N

# of operations that share one FU

Folding set An ordered set of operations that executed by the same FU

the position of an operation U in folding set is actually the folding order of U

The folding set are typically obtained from a scheduling and allocation algorithm (ref. Appendix B)

The folding set represents underlying folding transformation

e




PU: # of the pipeline stages of HU. PU = 0 indicates

that HU is not pipelined.

DF(U → V): (folding equation) # of cycles that the

result of HU must be stored

e

Negative value of folding equation DF is possible

before retiming the folding equations.

e

uvPeNw

uPNlvewlNVUD

U

UF

)(

][]))](([)(




U(l) w(e)V(l+w(e))

HU(Nl+u)

PU+DFHV

(N(l+w(e))+v)

N folded N folded



Folding Retimed Biquad Filter (1/2)

Folding factor N = 4

Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1

denote all add operation and S2 denote all multiply operation.

Assume that addition and multiplication require 1 and 2 u.t. respectively.

1-stage adders and 2-stage pipelined multipliers are available.



Folding Retimed Biquad Filter (2/2)

folding equations



Retiming (1/3)

What situations will be suffered if the folding equation

DF is negative?

Retiming (moving delay elements) the original DFG

prior to folding

Constraint:

D’F(U→V)= Nwr(e)–PU +v–u>=0 -----(1)

Substitute wr(e)=w(e)+r(V)–r(U) into (1)

r(U)–r(V)<= DF(U→V)/N

Since the retiming values of the nodes are restricted to be

integers, the above equations can be rewritten as

r(U)–r(V)<=└DF(U→V)/N┘e

e

e



Retiming (2/3)

Example:DF(12)=Nw(e)-PU+v-

u=0-1+1-3=-3

r(1)-r(2)<= floor{DF(12)/N}

=floor{-3/4}=-1



Retiming (3/3)

r(1)=-1, r(2)=0, r(3)=-1, r(4)=0

r(5)=-1, r(6)=-1, r(7)=-2, r(8)=-1



Outline

Introduction




Conclusions



Lifetime Analysis

Lifetime analysis is a procedure used to compute the

minimum number of registers required to implement a

DSP algorithm in hardware.

Linear lifetimes analysis

Circular lifetime analysis

In lifetime analysis, the number of live variables at

each time unit is computed, and the maximum

number of live variables at any time unit is

determined.

Forward-backward register allocation technique



Linear Lifetime Analysis

Variables {a , b , c}

max {0,1,2,2,2,2,2,2}=2

Three iterations with N=6

Periodicity Implicit



Matrix Transpose Example (1/3)

a d g

b e h

c f i

a b c

d e f

g h i

i h g f e d c b a Matrix

Transposei f c h e b g d a

Transpose




Tzlout = zero-lantacy output time

Tdiff = Tzlout – Tinput

Toutput = Tzlout + max{-Tdiff}




The minimum register number is 4.

Linear Lifetime Chart Circular Lifetime Chart



Procedures of Forward-Backward Register Allocation

Steps:

Step 1: Determinate the minimum number of registers using lifetime analysis.

Step 2: Input each variable at time step according to the beginning of its lifetime.

Step 3: Each variable is allocated in a forward manner until it is dead or it reaches the last register.

Step 4: Since the allocation is periodic, the allocation of the current iteration also repeats itself in subsequent iterations. Thus, we hash the position for registers at period of N.

Step 5: If a variable that reaches the last register and is still alive, then these variables are allocated to a register in a backwardly manner.

Step 6: Repeat Steps 4 and 5 as required until the allocation is completed.



Register Allocation for Matrix Transpose Example



Outline

Introduction




Conclusions



Procedures of Register Minimization in Folded Architectures

Steps:

Step 1: Perform retiming for folding

Step 2: Write the folding equations

Step 3: Use the folding equations to construct a lifetime table

Step 4: Draw the lifetime chart and determine the required number of registers

Step 5: Perform forward-backward register allocation

Step 6: Draw the folded architecture that uses the minimum number of registers



Folding Architecture Example



Folded Architecture for Matrix Transpose Example



Biquad Filter Example (1/4)

Retiming

Invalid folding:

DF(1→2) = -3

DF(6→4) = -4

DF(8→4) = -3

DF(7→3) = -3

Step 1: Retiming




Step 2: Folding Equations

DF(U→V) = Nw(e) – Pu + v - u

DF(1→2) = 4(1) – 1 + 1 – 3 = 1

DF(1→5) = 4(1) – 1 + 0 – 3 = 0

DF(1→6) = 4(1) – 1 + 2 – 3 = 2

DF(1→7) = 4(1) – 1 + 3 – 3 = 3

DF(1→8) = 4(2) – 1 + 1 – 3 = 5

DF(3→1) = 4(0) – 1 + 3 – 2 = 0

DF(4→2) = 4(0) – 1 + 1 – 0 = 0

DF(5→3) = 4(0) – 2 + 2 – 0 = 0

DF(6→4) = 4(1) – 2 + 0 – 2 = 4

DF(7→3) = 4(1) – 2 + 2 – 3 = 1

DF(8→4) = 4(1) – 2 + 0 – 1 = 1

Step 3: Construct the lifetime table

Tinput = u + Pu

Toutput = u + Pu + maxv{DF(U→V) }




Step 4: Draw the Lifetime Chart

The minimum number

of registers is 2.

Step 5: Register Allocation

Folding Factor = 4




Step 6: Folded Architecture



IIR Filter Example (1/4)

Step 1: Retiming

Retiming

Invalid folding:

DF(31) = -3

DF(41) = -2




Step 2: Folding Equations

DF(U→V) = Nw(e) – Pu + v - u

DF(1→2) = 4(1) – 1 + 1 – 3 = 0

DF(2→3) = 4(1) – 1 + 0 – 3 = 5

DF(2→4) = 4(1) – 1 + 2 – 3 = 2

DF(3→1) = 4(1) – 1 + 3 – 3 = 1

DF(4→1) = 4(2) – 1 + 1 – 3 = 0

Step 3: Construct the lifetime table

Tinput = u + Pu

Toutput = u + Pu + maxv{DF(U→V) }




Step 4: Draw the Lifetime Chart Step 5: Register Allocation

The minimum number

of registers is 3.

Folding Factor = 2




Step 6: Folded Architecture



Conclusions

Present a systematic transformation of time-

multiplexed architectures

Explore folding techniques to reduce # of functional

units

Explore register minimization technique to reduce #

of registers



References

K. K. Parhi, VLSI Digital Signal Processing Systems:

Design and Implementation, Wiley, 1999.

S. Y. Huang, Handout of text book, 2004.

Documents

VLSI Digital Signal Processing Chapter 6 Foldingviplab.cs.nctu.edu.tw/course/VLSI_DSP2010_Fall/VLSIDSP_CHAP6.pdf · VLSI Digital Signal Processing Systems Folding Lan-Da Van (范倫達),