Upload
ngophuc
View
227
Download
1
Embed Size (px)
Citation preview
VLSI Digital Signal Processing Systems
Folding
Lan-Da Van (范倫達), Ph. D.
Department of Computer Science
National Chiao Tung University
Taiwan, R.O.C.
Fall, 2010
http://www.cs.nctu.tw/~ldvan/
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-2
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-3
Introduction (1/2)
Systematically determine the control circuits in DSP
architectures by folding transformation, where
multiple algorithm operations are time-multiplexed to
a single functional unit.
Use for synthesis of DSP architectures that can be
operated at single or multiple clocks.
Use to reduce the number of hardware functional
units (FUs) by a factor of N at the expense of
increasing computation time by a factor of N.
Lead to an architecture that uses a large number of
registers and thus present the register minimization
technique.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-4
Introduction (2/2)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-5
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-6
Folding Transformation (1/3)
A systematic techniques for designing control circuits for hardware where several algorithm operations are time-multiplexed on a single functional unit.
Notations U, V: nodes (operations) of the original DFG
HU, HV: nodes (functional units) of the folded DFG
W(x): x-th iteration of node W
U → V: an edge e from node U to noe V
w(e): # of delays of the edge e
Folding factor N
# of operations that share one FU
Folding set An ordered set of operations that executed by the same FU
the position of an operation U in folding set is actually the folding order of U
The folding set are typically obtained from a scheduling and allocation algorithm (ref. Appendix B)
The folding set represents underlying folding transformation
e
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-7
Folding Transformation (2/3)
PU: # of the pipeline stages of HU. PU = 0 indicates
that HU is not pipelined.
DF(U → V): (folding equation) # of cycles that the
result of HU must be stored
e
Negative value of folding equation DF is possible
before retiming the folding equations.
e
uvPeNw
uPNlvewlNVUD
U
UF
)(
][]))](([)(
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-8
Folding Transformation (3/3)
U(l) w(e)V(l+w(e))
HU(Nl+u)
PU+DFHV
(N(l+w(e))+v)
N folded N folded
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-9
Folding Retimed Biquad Filter (1/2)
Folding factor N = 4
Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1
denote all add operation and S2 denote all multiply operation.
Assume that addition and multiplication require 1 and 2 u.t. respectively.
1-stage adders and 2-stage pipelined multipliers are available.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-10
Folding Retimed Biquad Filter (2/2)
folding equations
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-11
Retiming (1/3)
What situations will be suffered if the folding equation
DF is negative?
Retiming (moving delay elements) the original DFG
prior to folding
Constraint:
D’F(U→V)= Nwr(e)–PU +v–u>=0 -----(1)
Substitute wr(e)=w(e)+r(V)–r(U) into (1)
r(U)–r(V)<= DF(U→V)/N
Since the retiming values of the nodes are restricted to be
integers, the above equations can be rewritten as
r(U)–r(V)<=└DF(U→V)/N┘e
e
e
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-12
Retiming (2/3)
Example:DF(12)=Nw(e)-PU+v-
u=0-1+1-3=-3
r(1)-r(2)<= floor{DF(12)/N}
=floor{-3/4}=-1
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-13
Retiming (3/3)
r(1)=-1, r(2)=0, r(3)=-1, r(4)=0
r(5)=-1, r(6)=-1, r(7)=-2, r(8)=-1
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-14
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-15
Lifetime Analysis
Lifetime analysis is a procedure used to compute the
minimum number of registers required to implement a
DSP algorithm in hardware.
Linear lifetimes analysis
Circular lifetime analysis
In lifetime analysis, the number of live variables at
each time unit is computed, and the maximum
number of live variables at any time unit is
determined.
Forward-backward register allocation technique
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-16
Linear Lifetime Analysis
Variables {a , b , c}
max {0,1,2,2,2,2,2,2}=2
Three iterations with N=6
Periodicity Implicit
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-17
Matrix Transpose Example (1/3)
a d g
b e h
c f i
a b c
d e f
g h i
i h g f e d c b a Matrix
Transposei f c h e b g d a
Transpose
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-18
Matrix Transpose Example (2/3)
Tzlout = zero-lantacy output time
Tdiff = Tzlout – Tinput
Toutput = Tzlout + max{-Tdiff}
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-19
Matrix Transpose Example (3/3)
The minimum register number is 4.
Linear Lifetime Chart Circular Lifetime Chart
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-20
Procedures of Forward-Backward Register Allocation
Steps:
Step 1: Determinate the minimum number of registers using lifetime analysis.
Step 2: Input each variable at time step according to the beginning of its lifetime.
Step 3: Each variable is allocated in a forward manner until it is dead or it reaches the last register.
Step 4: Since the allocation is periodic, the allocation of the current iteration also repeats itself in subsequent iterations. Thus, we hash the position for registers at period of N.
Step 5: If a variable that reaches the last register and is still alive, then these variables are allocated to a register in a backwardly manner.
Step 6: Repeat Steps 4 and 5 as required until the allocation is completed.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-21
Register Allocation for Matrix Transpose Example
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-22
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-23
Procedures of Register Minimization in Folded Architectures
Steps:
Step 1: Perform retiming for folding
Step 2: Write the folding equations
Step 3: Use the folding equations to construct a lifetime table
Step 4: Draw the lifetime chart and determine the required number of registers
Step 5: Perform forward-backward register allocation
Step 6: Draw the folded architecture that uses the minimum number of registers
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-24
Folding Architecture Example
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-25
Folded Architecture for Matrix Transpose Example
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-26
Biquad Filter Example (1/4)
Retiming
Invalid folding:
DF(1→2) = -3
DF(6→4) = -4
DF(8→4) = -3
DF(7→3) = -3
Step 1: Retiming
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-27
Biquad Filter Example (2/4)
Step 2: Folding Equations
DF(U→V) = Nw(e) – Pu + v - u
DF(1→2) = 4(1) – 1 + 1 – 3 = 1
DF(1→5) = 4(1) – 1 + 0 – 3 = 0
DF(1→6) = 4(1) – 1 + 2 – 3 = 2
DF(1→7) = 4(1) – 1 + 3 – 3 = 3
DF(1→8) = 4(2) – 1 + 1 – 3 = 5
DF(3→1) = 4(0) – 1 + 3 – 2 = 0
DF(4→2) = 4(0) – 1 + 1 – 0 = 0
DF(5→3) = 4(0) – 2 + 2 – 0 = 0
DF(6→4) = 4(1) – 2 + 0 – 2 = 4
DF(7→3) = 4(1) – 2 + 2 – 3 = 1
DF(8→4) = 4(1) – 2 + 0 – 1 = 1
Step 3: Construct the lifetime table
Tinput = u + Pu
Toutput = u + Pu + maxv{DF(U→V) }
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-28
Biquad Filter Example (3/4)
Step 4: Draw the Lifetime Chart
The minimum number
of registers is 2.
Step 5: Register Allocation
Folding Factor = 4
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-29
Biquad Filter Example (4/4)
Step 6: Folded Architecture
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-30
IIR Filter Example (1/4)
Step 1: Retiming
Retiming
Invalid folding:
DF(31) = -3
DF(41) = -2
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-31
IIR Filter Example (2/4)
Step 2: Folding Equations
DF(U→V) = Nw(e) – Pu + v - u
DF(1→2) = 4(1) – 1 + 1 – 3 = 0
DF(2→3) = 4(1) – 1 + 0 – 3 = 5
DF(2→4) = 4(1) – 1 + 2 – 3 = 2
DF(3→1) = 4(1) – 1 + 3 – 3 = 1
DF(4→1) = 4(2) – 1 + 1 – 3 = 0
Step 3: Construct the lifetime table
Tinput = u + Pu
Toutput = u + Pu + maxv{DF(U→V) }
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-32
IIR Filter Example (3/4)
Step 4: Draw the Lifetime Chart Step 5: Register Allocation
The minimum number
of registers is 3.
Folding Factor = 2
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-33
IIR Filter Example (4/4)
Step 6: Folded Architecture
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-34
Conclusions
Present a systematic transformation of time-
multiplexed architectures
Explore folding techniques to reduce # of functional
units
Explore register minimization technique to reduce #
of registers
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-35
References
K. K. Parhi, VLSI Digital Signal Processing Systems:
Design and Implementation, Wiley, 1999.
S. Y. Huang, Handout of text book, 2004.