35
VLSI Digital Signal Processing Systems Folding Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2015 [email protected] http://www.cs.nctu.tw/~ldvan/

VLSI Digital Signal Processing Chapter 6 Folding

Embed Size (px)

Citation preview

Page 1: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Folding

Lan-Da Van (范倫達), Ph. D.

Department of Computer Science

National Chiao Tung University

Taiwan, R.O.C.

Fall, 2015

[email protected]

http://www.cs.nctu.tw/~ldvan/

Page 2: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-2

Outline

Introduction

Folding Transformation

Register Minimization Techniques

Register Minimization in Folded Architecture

Conclusions

Page 3: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-3

Introduction (1/2)

Systematically determine the control circuits in DSP

architectures by folding transformation, where

multiple algorithm operations are time-multiplexed to

a single functional unit.

Use for synthesis of DSP architectures that can be

operated at single or multiple clocks.

Use to reduce the number of hardware functional

units (FUs) by a factor of N at the expense of

increasing computation time by a factor of N.

Lead to an architecture that uses a large number of

registers and thus present the register minimization

technique.

Page 4: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-4

Introduction (2/2)

Page 5: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-5

Outline

Introduction

Folding Transformation

Register Minimization Techniques

Register Minimization in Folded Architecture

Conclusions

Page 6: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-6

Folding Transformation (1/3)

A systematic techniques for designing control circuits for hardware where several algorithm operations are time-multiplexed on a single functional unit.

Notations U, V: nodes (operations) of the original DFG

HU, HV: nodes (functional units) of the folded DFG

W(x): x-th iteration of node W

U → V: an edge e from node U to noe V

w(e): # of delays of the edge e

Folding factor N

# of operations that share one FU

Folding set An ordered set of operations that executed by the same FU

the position of an operation U in folding set is actually the folding order of U

The folding set are typically obtained from a scheduling and allocation algorithm (ref. Appendix B)

The folding set represents underlying folding transformation

e

Page 7: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-7

Folding Transformation (2/3)

PU: # of the pipeline stages of HU. PU = 0 indicates

that HU is not pipelined.

DF(U → V): (folding equation) # of cycles that the

result of HU must be stored

e

Negative value of folding equation DF is possible

before retiming the folding equations.

e

uvPeNw

uPNlvewlNVUD

U

UF

)(

][]))](([)(

Page 8: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-8

Folding Transformation (3/3)

U(l) w(e)

V(l+w(e))

HU(Nl+u)

PU+DF HV

(N(l+w(e))+v)

N folded N folded

Page 9: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-9

Folding Retimed Biquad Filter (1/2)

Folding factor N = 4

Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1 denote all add operation and S2 denote all multiply operation.

Assume that addition and multiplication require 1 and 2 u.t. respectively.

1-stage adders and 2-stage pipelined multipliers are available.

Page 10: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-10

Folding Retimed Biquad Filter (2/2)

folding equations

Page 11: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-11

Retiming (1/3)

What situations will be suffered if the folding equation

DF is negative?

Retiming (moving delay elements) the original DFG

prior to folding

Constraint:

D’F(U→V)= Nwr(e)–PU +v–u>=0 -----(1)

Substitute wr(e)=w(e)+r(V)–r(U) into (1)

r(U)–r(V)<= DF(U→V)/N

Since the retiming values of the nodes are restricted to be

integers, the above equations can be rewritten as

r(U)–r(V)<=└DF(U→V)/N┘ e

e

e

Page 12: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-12

Retiming (2/3)

Example: DF(12)=Nw(e)-PU+v-

u=0-1+1-3=-3

r(1)-r(2)<= floor{DF(12)/N}

=floor{-3/4}=-1

Page 13: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-13

Retiming (3/3)

r(1)=-1, r(2)=0, r(3)=-1, r(4)=0

r(5)=-1, r(6)=-1, r(7)=-2, r(8)=-1

Page 14: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-14

Outline

Introduction

Folding Transformation

Register Minimization Techniques

Register Minimization in Folded Architecture

Conclusions

Page 15: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-15

Lifetime Analysis

Lifetime analysis is a procedure used to compute the

minimum number of registers required to implement a

DSP algorithm in hardware.

Linear lifetimes analysis

Circular lifetime analysis

In lifetime analysis, the number of live variables at

each time unit is computed, and the maximum

number of live variables at any time unit is

determined.

Forward-backward register allocation technique

Page 16: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-16

Linear Lifetime Analysis

Variables {a , b , c}

max {0,1,2,2,2,2,2,2}=2

Three iterations with N=6

Periodicity Implicit

Page 17: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-17

Matrix Transpose Example (1/3)

a d g

b e h

c f i

a b c

d e f

g h i

i h g f e d c b a Matrix

Transpose i f c h e b g d a

Transpose

Page 18: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-18

Matrix Transpose Example (2/3)

Tzlout = zero-lantacy output time

Tdiff = Tzlout – Tinput

Toutput = Tzlout + max{-Tdiff}

Page 19: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-19

Matrix Transpose Example (3/3)

The minimum register number is 4.

Linear Lifetime Chart Circular Lifetime Chart

Page 20: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-20

Procedures of Forward-Backward Register Allocation

Steps:

Step 1: Determinate the minimum number of registers using lifetime analysis.

Step 2: Input each variable at time step according to the beginning of its lifetime.

Step 3: Each variable is allocated in a forward manner until it is dead or it reaches the last register.

Step 4: Since the allocation is periodic, the allocation of the current iteration also repeats itself in subsequent iterations. Thus, we hash the position for registers at period of N.

Step 5: If a variable that reaches the last register and is still alive, then these variables are allocated to a register in a backwardly manner.

Step 6: Repeat Steps 4 and 5 as required until the allocation is completed.

Page 21: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-21

Register Allocation for Matrix Transpose Example

Page 22: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-22

Outline

Introduction

Folding Transformation

Register Minimization Techniques

Register Minimization in Folded Architecture

Conclusions

Page 23: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-23

Procedures of Register Minimization in Folded Architectures

Steps:

Step 1: Perform retiming for folding

Step 2: Write the folding equations

Step 3: Use the folding equations to construct a lifetime table

Step 4: Draw the lifetime chart and determine the required number of registers

Step 5: Perform forward-backward register allocation

Step 6: Draw the folded architecture that uses the minimum number of registers

Page 24: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-24

Folding Architecture Example

Page 25: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-25

Folded Architecture for Matrix Transpose Example

Page 26: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-26

Biquad Filter Example (1/4)

Retiming

Invalid folding:

DF(1→2) = -3

DF(6→4) = -4

DF(8→4) = -3

DF(7→3) = -3

Step 1: Retiming

Page 27: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-27

Biquad Filter Example (2/4)

Step 2: Folding Equations

DF(U→V) = Nw(e) – Pu + v - u

DF(1→2) = 4(1) – 1 + 1 – 3 = 1

DF(1→5) = 4(1) – 1 + 0 – 3 = 0

DF(1→6) = 4(1) – 1 + 2 – 3 = 2

DF(1→7) = 4(1) – 1 + 3 – 3 = 3

DF(1→8) = 4(2) – 1 + 1 – 3 = 5

DF(3→1) = 4(0) – 1 + 3 – 2 = 0

DF(4→2) = 4(0) – 1 + 1 – 0 = 0

DF(5→3) = 4(0) – 2 + 2 – 0 = 0

DF(6→4) = 4(1) – 2 + 0 – 2 = 4

DF(7→3) = 4(1) – 2 + 2 – 3 = 1

DF(8→4) = 4(1) – 2 + 0 – 1 = 1

Step 3: Construct the lifetime table

Tinput = u + Pu

Toutput = u + Pu + maxv{DF(U→V) }

Page 28: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-28

Biquad Filter Example (3/4)

Step 4: Draw the Lifetime Chart

The minimum number

of registers is 2.

Step 5: Register Allocation

Folding Factor = 4

Page 29: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-29

Biquad Filter Example (4/4)

Step 6: Folded Architecture

Page 30: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-30

IIR Filter Example (1/4)

Step 1: Retiming

Retiming

Invalid folding:

DF(31) = -3

DF(41) = -2

Page 31: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-31

IIR Filter Example (2/4)

Step 2: Folding Equations

DF(U→V) = Nw(e) – Pu + v - u

DF(1→2) = 4(1) – 1 + 1 – 3 = 0

DF(2→3) = 4(1) – 1 + 0 – 3 = 5

DF(2→4) = 4(1) – 1 + 2 – 3 = 2

DF(3→1) = 4(1) – 1 + 3 – 3 = 1

DF(4→1) = 4(2) – 1 + 1 – 3 = 0

Step 3: Construct the lifetime table

Tinput = u + Pu

Toutput = u + Pu + maxv{DF(U→V) }

Page 32: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-32

IIR Filter Example (3/4)

Step 4: Draw the Lifetime Chart Step 5: Register Allocation

The minimum number

of registers is 3.

Folding Factor = 2

Page 33: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-33

IIR Filter Example (4/4)

Step 6: Folded Architecture

Page 34: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-34

Conclusions

Present a systematic transformation of time-

multiplexed architectures

Explore folding techniques to reduce # of functional

units

Explore register minimization technique to reduce #

of registers

Page 35: VLSI Digital Signal Processing Chapter 6 Folding

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-6-35

References

K. K. Parhi, VLSI Digital Signal Processing Systems:

Design and Implementation, Wiley, 1999.

S. Y. Huang, Handout of text book, 2004.