85
1

16 Tap FIR Filter

  • Upload
    mingan

  • View
    125

  • Download
    6

Embed Size (px)

DESCRIPTION

16 Tap FIR Filter. Omar F. Mousa/Chintan Daisa Professor: Scott Wakefield. Design Objectives. To have a register based storage of 16 latest input values and the 16 impulse response coefficients on-chip. To utilize a clocked architecture to synchronize input and output values. - PowerPoint PPT Presentation

Citation preview

Page 1: 16 Tap FIR Filter

11

Page 2: 16 Tap FIR Filter

22

Design ObjectivesDesign ObjectivesDesign ObjectivesDesign Objectives To have a register based storage of To have a register based storage of

16 latest input values and the 16 16 latest input values and the 16 impulse response coefficients on-impulse response coefficients on-chip.chip.

To utilize a clocked architecture to To utilize a clocked architecture to synchronize input and output values.synchronize input and output values.

Reduce the Number of Multiplier and Reduce the Number of Multiplier and Adder needed that is Optimize area Adder needed that is Optimize area and Power and cost. and Power and cost.

By Achieving the above the speed will By Achieving the above the speed will not be compromisednot be compromised

Page 3: 16 Tap FIR Filter

33

Design ObjectivesDesign ObjectivesDesign ObjectivesDesign Objectives Future scalability for input data as well Future scalability for input data as well

as coefficient bits. as coefficient bits.

Signed or unsigned input data as well Signed or unsigned input data as well as coefficients. as coefficients.

Fast MAC operation on signed or Fast MAC operation on signed or unsigned data with future scalability. unsigned data with future scalability.

Synchronization of Input/Output data Synchronization of Input/Output data

Configurable Output Precision Configurable Output Precision

Page 4: 16 Tap FIR Filter

44

Design ObjectivesDesign ObjectivesDesign ObjectivesDesign Objectives 16 taps of delay line. 16 taps of delay line.

8 bits of Input/Output bit resolution 8 bits of Input/Output bit resolution

Burst mode of data transfer at Input supporting 32 Burst mode of data transfer at Input supporting 32 elements of the desired resolution in one burst elements of the desired resolution in one burst

Main Issue of concern when designing FIR FilterMain Issue of concern when designing FIR Filter

Sharp ResponseSharp Response

Number of TapsNumber of Taps

Numerical PrecisionNumerical Precision

Fully ParallelFully Parallel

Page 5: 16 Tap FIR Filter

55

Advantages and DisadvantagesAdvantages and DisadvantagesAdvantages and DisadvantagesAdvantages and Disadvantages• Advantages:

– Always stable (assume non-recursive

implementation).

– Quantization noise is not much of a problem.

– Transients have a finite duration.

• Disadvantages:– A high-order filter is generally needed to satisfy

the stated specification – so more coefficients

are needed with more storage and computation.

Page 6: 16 Tap FIR Filter

66

Review of discrete-time Review of discrete-time systemssystems

Review of discrete-time Review of discrete-time systemssystemsLinear time-invariant (LTI) systemsLinear time-invariant (LTI) systems

Causal systems: Causal systems:

for all input x[k]=0, k<0 -> output y[k]=0, k<0for all input x[k]=0, k<0 -> output y[k]=0, k<0

Impulse response : Impulse response :

input 1,0,0,0,... -> output h[0],h[1],h[2],h[3],...input 1,0,0,0,... -> output h[0],h[1],h[2],h[3],...

input x[0],x[1],x[2],x[3] -> output y[0],y[1],y[2],y[3],...input x[0],x[1],x[2],x[3] -> output y[0],y[1],y[2],y[3],...

x[k] y[k]

][*][][].[][ khkuikhiukyi

Page 7: 16 Tap FIR Filter

77

OverviewOverviewOverviewOverviewFIR filter equationFIR filter equation

y[n] = x[n] * h [n]y[n] = x[n] * h [n]

where n is the number of where n is the number of “taps” or coefficients in the “taps” or coefficients in the FIR filter.FIR filter.

For a 16-tap FIR filterFor a 16-tap FIR filter

y[n] = ay[n] = a00x[n] + ax[n] + a11x[n-1] + ax[n-1] + a22x[n-2] x[n-2] + a+ a33x[n-3]+…+ ax[n-3]+…+ a1515x[n-15] x[n-15]

Page 8: 16 Tap FIR Filter

88

Different Filter Different Filter RepresentationsRepresentationsDifferent Filter Different Filter

RepresentationsRepresentations Difference equationDifference equation

Recursive Recursive computation needs computation needs yy[-1] and [-1] and yy[-2][-2]For the filter to be LTI, For the filter to be LTI, yy[-1] = 0 and [-1] = 0 and yy[-2] = 0[-2] = 0

Transfer functionTransfer functionAssumes LTI systemAssumes LTI system

Block Diagram Block Diagram RepresentationRepresentation][]2[

8

1]1[

2

1][ kxkykyky

x[k] y[k]

UnitDelay

UnitDelay

1/2

1/8

y[k-1]

y[k-2]

21

21

81

21

1

1

)(

)()(

)()(8

1)(

2

1)(

zzzX

zYzH

zXzYzzYzzY

Page 9: 16 Tap FIR Filter

99

Discrete-Time SystemsDiscrete-Time SystemsDiscrete-Time SystemsDiscrete-Time SystemsZ-Transform: Z-Transform:

i

izihzH ].[)(

]3[

]2[

]1[

]0[

.

]2[000

]1[]2[00

]0[]1[]2[0

0]0[]1[]2[

00]0[]1[

000]0[

....1

]5[

]4[

]3[

]2[

]1[

]0[

....1

3211).()(

521521

u

u

u

u

h

hh

hhh

hhh

hh

h

zzz

y

y

y

y

y

y

zzz

zzzzHzY

i

iziyzY ].[)( i

iziuzU ].[)(

)().()( zUzHzY

Page 10: 16 Tap FIR Filter

1010

Discrete-Time SystemsDiscrete-Time SystemsDiscrete-Time SystemsDiscrete-Time Systems`Popular’ frequency responses for filter design :`Popular’ frequency responses for filter design :

low-pass (LP) high-pass (HP) band-pass (BP)low-pass (LP) high-pass (HP) band-pass (BP)

band-stop multi-bandband-stop multi-band … …

Page 11: 16 Tap FIR Filter

1111

Digital Filter SpecificationsDigital Filter SpecificationsDigital Filter SpecificationsDigital Filter Specifications For example the magnitude response For example the magnitude response

of a digital lowpass filter may be given as of a digital lowpass filter may be given as indicated belowindicated below )( jeG

Page 12: 16 Tap FIR Filter

1212

Hierarchical Structures:Hierarchical Structures:

–PipelinePipeline

–SplitJoinSplitJoin

–Feedback LoopFeedback Loop

Structured StreamsStructured StreamsStructured StreamsStructured Streams

Page 13: 16 Tap FIR Filter

1313

Different StrategiesDifferent StrategiesDifferent StrategiesDifferent Strategies Map filter per tile and run Map filter per tile and run

foreverforever

Pros:Pros:– No filter swapping overheadNo filter swapping overhead– Reduced memory trafficReduced memory traffic– Localized communicationLocalized communication– Tighter latenciesTighter latencies– Smaller live data setSmaller live data set

Cons:Cons:– Load balancing is criticalLoad balancing is critical– Not good for dynamic behaviorNot good for dynamic behavior– Requires # filters Requires # filters ≤≤ # processing # processing

elements elements

Page 14: 16 Tap FIR Filter

1414

Discrete-Time SystemsDiscrete-Time SystemsDiscrete-Time SystemsDiscrete-Time Systems`FIR filters’ (finite impulse response):`FIR filters’ (finite impulse response):

Moving average filters (MA)Moving average filters (MA)

N poles at the origin z=0 (hence guaranteed stability) N poles at the origin z=0 (hence guaranteed stability)

N zeros (zeros of B(z)), `all zero’ filtersN zeros (zeros of B(z)), `all zero’ filters

corresponds to difference equationcorresponds to difference equation

Impulse responseImpulse response

NNN

zbzbbz

zBzH ...

)()( 1

10

][....]1[.][.][ 10 Nkubkubkubky N

,...0]1[,][,...,]1[,]0[ 10 NhbNhbhbh N

Page 15: 16 Tap FIR Filter

1515

Speeding Up FIR FilterSpeeding Up FIR FilterSpeeding Up FIR FilterSpeeding Up FIR Filter FIR speed-upFIR speed-up

y(0) = c(0)x(0) + c(1)x(-1) + c(2)x(-2) + . . . + c(N-1)x(1-N);y(0) = c(0)x(0) + c(1)x(-1) + c(2)x(-2) + . . . + c(N-1)x(1-N);

y(1) = c(0)x(1) + c(1)x(0) + c(2)x(-1) + . . . + c(N-1)x(2-N);y(1) = c(0)x(1) + c(1)x(0) + c(2)x(-1) + . . . + c(N-1)x(2-N);

y(2) = c(0)x(2) + c(1)x(1) + c(2)x(0) + . . . + c(N-1)x(3-N);y(2) = c(0)x(2) + c(1)x(1) + c(2)x(0) + . . . + c(N-1)x(3-N);

. . .. . .

y(n) = c(0)x(n) + c(1)x(n-1) + c(2)x(n-2)+ . . + c(N-1)x(n-(N-1));y(n) = c(0)x(n) + c(1)x(n-1) + c(2)x(n-2)+ . . + c(N-1)x(n-(N-1));

Run MAC at double frequency, read two 32-bit numbersRun MAC at double frequency, read two 32-bit numbers

FIR filtering: two outputs in parallelFIR filtering: two outputs in parallel

Two outputs = 4N reads, 2N MAC’s, 2 writesTwo outputs = 4N reads, 2N MAC’s, 2 writes

Page 16: 16 Tap FIR Filter

1616

Direct Form Direct Form RealizationRealization

Direct Form Direct Form RealizationRealization

u[k]

u[k-4]u[k-3]u[k-2]u[k-1]

x

bo

+

x

b4

x

b3

+

x

b2

+

x

b1

+

y[k]

0 1[ ] . [ ] . [ 1] ... . [ ]

( 1)

, number of Taps

N

Critical M A

Clock Critical

y k b u k b u k b u k N

T T T N

T T N

Page 17: 16 Tap FIR Filter

1717

Retiming FIR Filter Retiming FIR Filter RealizationsRealizations

Retiming FIR Filter Retiming FIR Filter RealizationsRealizations Select subgraph (shaded) Select subgraph (shaded)

Remove delay element on all inbound arrowsRemove delay element on all inbound arrowsAdd delay element on all outbound arrowsAdd delay element on all outbound arrows

u[k]

u[k-4]u[k-3]u[k-2]u[k-1]

xbo

+

xb4

xb3

+

xb2

+

xb1

+y[k]

Page 18: 16 Tap FIR Filter

1818

RetimingRetimingRetimingRetimingu[k]

u[k-1]

x

bo

+

x

b1

+

y[k]

u[k-3]u[k-2]

x

b4

x

b3

+

x

b2

+

Page 19: 16 Tap FIR Filter

1919

Four Tap Direct Form RealizationFour Tap Direct Form RealizationFour Tap Direct Form RealizationFour Tap Direct Form Realization

u[k]

u[k-3]u[k-2]u[k-1]

xbo

+

xb3

xb2

+

xb1

y[k] +

0 1 2 3[ ] . [ ] . [ 1] . [ 2] . [ 3]

log( )

, number of TapsCritical M A

Clock Critical

y k b u k b u k b u k b u k

T T T N

T T N

Page 20: 16 Tap FIR Filter

2020

Transposed Direct-Form Transposed Direct-Form RealizationRealization

Transposed Direct-Form Transposed Direct-Form RealizationRealization

u[k]

xbo

+y[k]

xb1

+

xb2

+

xb3

+

xb4

0 1[ ] . [ ] . [ 1] ... . [ ]

, number of Taps

N

Critical M A

Clock Critical

y k b u k b u k b u k N

T T T

T T N

Page 21: 16 Tap FIR Filter

2121

Lattice Form Lattice Form RealizationsRealizationsLattice Form Lattice Form RealizationsRealizationsu[k] u[k-1]

u[k-2]

xb1

+

xb2

+

x

+

x

+

b3

u[k-3]

xb3

+

b2x

+

xbo

+

y[k]

b4x

+

u[k-4]

xb4

b1x

bo

y[k]~

Page 22: 16 Tap FIR Filter

2222

FIR Filter Realizations FIR Filter Realizations FIR Filter Realizations FIR Filter Realizations Lattice FormLattice Form

u[k]

y[k]

+

+

x

xko

+

+

x

xk1

+

+

x

xk2

+

+

x

xk3

xbo

y[k]~

][....]1[.][.][ 10 Nkubkubkubky N

i.e. different software/hardware, same i/o-behavior

Page 23: 16 Tap FIR Filter

2323

Efficient Direct Form Efficient Direct Form RealizationRealization

Efficient Direct Form Efficient Direct Form RealizationRealizationEfficient Direct-Form realization. Efficient Direct-Form realization.

bo

y[k]

u[k]

+

+ ++ +

++

x xb4

xb3

xb2

xb1

++

Page 24: 16 Tap FIR Filter

2424

Pin DiagramPin DiagramPin DiagramPin Diagram

Drivey[0]

y[2]y[3]y[4]y[5]y[6]….y[31]

y[1]

x[0]x[1]……....x[15]

Reset

Coeffin Din Clk

Vdd Gnd

16-bit16-tapFIR

Filter

a[0]a[1]……....

a[15]

Synthesis using Synopsys Design CompilerSynthesis using Synopsys Design CompilerInitial Target Frequency: 100 MHz (typical)Initial Target Frequency: 100 MHz (typical)

Page 25: 16 Tap FIR Filter

2525

SpecificationsSpecificationsSpecificationsSpecificationsInput SpecificationsInput Specifications

16-bit unsigned integers for 16-bit unsigned integers for data inputs.data inputs.

16-bit unsigned integers for 16-bit unsigned integers for coefficients.coefficients.

Output SpecificationsOutput Specifications

32-bit unsigned integer 32-bit unsigned integer output.output.

Page 26: 16 Tap FIR Filter

2626

System ComponentsSystem ComponentsSystem ComponentsSystem Components MemoryMemory - Input and Coefficient - Input and Coefficient

ControlControl - Mod-4 and Mod-8 counters - Mod-4 and Mod-8 counters

- 3-8 Decoder- 3-8 Decoder

- Combinational logic- Combinational logic

MultiplierMultiplier - Radius-8 Booth multiplier- Radius-8 Booth multiplier

- Multiplier register- Multiplier register

AdderAdder - 9-bit Carry Save adder- 9-bit Carry Save adder

- Adder register- Adder register

Output RegisterOutput Register

Page 27: 16 Tap FIR Filter

2727

SpecificationsSpecificationsSpecificationsSpecificationsDrive Signal(Output Signal)Drive Signal(Output Signal)

A new output is available.A new output is available.

Inputs or coefficients to be applied Inputs or coefficients to be applied only when Drive is asserted.only when Drive is asserted.

CoefficientsCoefficients

Any coefficient changed implies a Any coefficient changed implies a new filter definition.new filter definition.

Input Memory cleared – new data to Input Memory cleared – new data to be entered.be entered.

Page 28: 16 Tap FIR Filter

2828

SpecificationsSpecificationsSpecificationsSpecificationsSystem ClockSystem Clock

One clock-cycle for the filter = 32 One clock-cycle for the filter = 32 input clock pulses.input clock pulses.

One Tap-cycle = 8 input clock pulses One Tap-cycle = 8 input clock pulses described as 8 phases.described as 8 phases.

4 such Taps for each output.4 such Taps for each output.

System ResetSystem Reset

Active HighActive High

Page 29: 16 Tap FIR Filter

2929

System TimingSystem TimingSystem TimingSystem Timing mod8 counter statesmod8 counter states

Input or Coefficient memory enableInput or Coefficient memory enable

Multiplier propagation delayMultiplier propagation delay

Multiplier propagation delayMultiplier propagation delay

Multiplier Register enableMultiplier Register enable

Add Register EnableAdd Register Enable

Output Register EnableOutput Register Enable

Page 30: 16 Tap FIR Filter

3030

System Timing System Timing StrategyStrategy

System Timing System Timing StrategyStrategy Two phase clockingTwo phase clocking

Generation of internal lower Generation of internal lower frequency clocks using mod-4 and frequency clocks using mod-4 and mod-8 countersmod-8 counters

Each state of mod-4 counter used for Each state of mod-4 counter used for computation of one filter tapcomputation of one filter tap

Output available at the end of one Output available at the end of one cycle of mod-4 countercycle of mod-4 counter

Page 31: 16 Tap FIR Filter

3131

2-Parallel FIR Filtering 2-Parallel FIR Filtering StructureStructure

2-Parallel FIR Filtering 2-Parallel FIR Filtering StructureStructure

H0

H1

H0

H1

+

D

+

y(2k)

y(2k+1)

x(2k)

x(2k+1)

z-2

Page 32: 16 Tap FIR Filter

3232

Hardware-Efficient 2-Parallel FIR Hardware-Efficient 2-Parallel FIR FilterFilter

Hardware-Efficient 2-Parallel FIR Hardware-Efficient 2-Parallel FIR FilterFilter

YY00 = X = X00 H H00 + z + z-2-2XX11HH11

YY11 = X = X00 H H11 + X + X11 H H00

= (H= (H00 + H + H11) (X) (X00 + X + X11) – H) – H00XX00 – H – H11XX11

z-2

H0

H0+H1

H1

+

D

+

y(2k)

y(2k+1)

x(2k)

x(2k+1)

+ +

Page 33: 16 Tap FIR Filter

3333

Savings in the New Savings in the New StructureStructure

Savings in the New Savings in the New StructureStructureOriginally,Originally,

–2N multiplications + 2(N-1) 2N multiplications + 2(N-1) additions for two inputsadditions for two inputs

In the new structureIn the new structure–3*(N/2) = 1.5N multiplication3*(N/2) = 1.5N multiplication

–3(N/2 –1) + 4 = 1.5N + 1 additions3(N/2 –1) + 4 = 1.5N + 1 additions

Page 34: 16 Tap FIR Filter

3434

Design Flow FIR 16 Tap DelayDesign Flow FIR 16 Tap DelayDesign Flow FIR 16 Tap DelayDesign Flow FIR 16 Tap Delay

VHDL Deign Entry

Synthesis

Floor planning

Place & Route

FunctionalVerification

Timing Verification

PhysicalVerification

EDIF

PDEFSDF

PDEFParasitic

Page 35: 16 Tap FIR Filter

3535

The FIR FilterThe FIR FilterThe FIR FilterThe FIR FilterImplementation of 16 Tap Implementation of 16 Tap FIR Filter, the coefficients FIR Filter, the coefficients are represented as fixed are represented as fixed point 16-bits 2’s point 16-bits 2’s complement numbers. It complement numbers. It is assumed that either or is assumed that either or both of the coefficients both of the coefficients and data are fractional and data are fractional numbers. numbers.

Page 36: 16 Tap FIR Filter

3636

FIR Filter(Critical Path)FIR Filter(Critical Path)FIR Filter(Critical Path)FIR Filter(Critical Path) In order to save area and improve the In order to save area and improve the

critical path performance, we decided to add critical path performance, we decided to add the 12-bit sum and carry results of the the 12-bit sum and carry results of the multiplier during the accumulation multiplier during the accumulation operation. Therefore, the adder has to add operation. Therefore, the adder has to add three 12-bit numbers. To do that, the first three 12-bit numbers. To do that, the first stage of the adder is a 3-to-2 combiner, stage of the adder is a 3-to-2 combiner, which is just a CSA. The next stage is a CPA which is just a CSA. The next stage is a CPA (Carry Propagate Adder) arranged in a static (Carry Propagate Adder) arranged in a static Manchester carry chain form. The chain is Manchester carry chain form. The chain is divided into four sections, each one has divided into four sections, each one has three carry stages. Buffers are used three carry stages. Buffers are used between sections to reduce the overall between sections to reduce the overall delay. delay.

Page 37: 16 Tap FIR Filter

3737

Survey of MultiplierSurvey of MultiplierSurvey of MultiplierSurvey of MultiplierCombinational Multiplier: uses n Combinational Multiplier: uses n

adders, eliminates registers:adders, eliminates registers:

Page 38: 16 Tap FIR Filter

3838

44 multiplication

X3 X2 X1 X0 multiplicand

Y3 Y2 Y1 Y0 multiplier

X3Y0 X2Y0 X1Y0 X0Y0

X3Y1 X2Y1 X1Y1 X0Y1

X3Y2 X2Y2 X1Y2 X0Y2

X3Y3 X2Y3 X1Y3 X0Y3

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0 Result

P.P.

Multiplier DesignMultiplier DesignMultiplier DesignMultiplier Design

Page 39: 16 Tap FIR Filter

3939

Radix-2 Unsigned Radix-2 Unsigned MultiplicationMultiplication

Radix-2 Unsigned Radix-2 Unsigned MultiplicationMultiplicationUse a single n-bit adder, three registers (P, A, B), Use a single n-bit adder, three registers (P, A, B),

and a testing circuit for Aand a testing circuit for A00

Initialization: Place the unsigned numbers in Initialization: Place the unsigned numbers in registers A and B. Set P to zero.registers A and B. Set P to zero.

1: If A1: If A00 is 1, is 1,

then register B, containing bthen register B, containing bn-1n-1bbn-2n-2...b...b00 is added to is added to

P; P; otherwise 00...00 (nothing) is added to P. The sum otherwise 00...00 (nothing) is added to P. The sum is placed back into P.is placed back into P.

2. Shift register pair (P, A) one bit right.2. Shift register pair (P, A) one bit right.The last bit of A is shifted out (not used).The last bit of A is shifted out (not used).

Page 40: 16 Tap FIR Filter

4040

Array MultiplierArray MultiplierArray MultiplierArray MultiplierArray multiplier is an efficient Array multiplier is an efficient

layout of a combinational layout of a combinational multiplier.multiplier.

Array multipliers may be Array multipliers may be pipelined to decrease clock pipelined to decrease clock period at the expense of period at the expense of latency.latency.

Page 41: 16 Tap FIR Filter

4141

Array Multiplier Array Multiplier OrganizationOrganization

Array Multiplier Array Multiplier OrganizationOrganization0 1 1 00 1 1 0

x 1 0 0 1x 1 0 0 1

0 1 1 00 1 1 0

+ + 0 0 0 00 0 0 0

0 0 1 1 00 0 1 1 0

+ + 0 0 0 00 0 0 0

0 0 0 1 1 00 0 0 1 1 0

+ + 0 1 1 00 1 1 0

0 1 1 0 1 1 00 1 1 0 1 1 0

Product

skew arrayfor rectangularlayout

Multiplicand

Multiplier

Page 42: 16 Tap FIR Filter

4242

Unsigned Array Unsigned Array MultiplierMultiplier

Unsigned Array Unsigned Array MultiplierMultiplier

+

x0y0x1y0x2y0

xny0

0

x0y1+ x1y1

0

+ x0y2+ x1y2

+ 0+

P(2n-1) P(2n-2) P0

Page 43: 16 Tap FIR Filter

4343

tmult(M-1) tcarry +(N-1) tsum + tand

For small tmult, tcarry

tsum

Beneficial to make tcarry = tsum

Differential Logic (DCVS)

Array Multiplier cell

Xi

Yi

Pin

Cout

Pout

FA

Pout

Cout

Pin

Cin

Cin

Xi Yi

Critical Path

N-1 P.P

M-1

Array Multiplier OrganizationArray Multiplier OrganizationArray Multiplier OrganizationArray Multiplier Organization

Page 44: 16 Tap FIR Filter

4444

HA

HA×

×

×

×

HA

HA

X3 X2 X1 X0

Y0

Y1

Y2

Y3 Z7 Z6 Z5 Z4 Z3

Z0

Z1

Z2

Architecture of Array MultiplierArchitecture of Array MultiplierArchitecture of Array MultiplierArchitecture of Array Multiplier

Page 45: 16 Tap FIR Filter

Array multipliersArray multipliers

– Partial product generation and Partial product generation and accumulation are mergedaccumulation are merged

– Identical cellsIdentical cells

– High-rate pipeliningHigh-rate pipelining

a4x2

a3x3

a2x4

p6

a4x1

a3x2

a2x3

a1x4

p5

a4

x4

a4x0

a3x1

a2x2

a1x3

a0x4

p4

a3

x3

a3x0

a2x1

a1x2

a0x3

p3

a2

x2

a2x0

a1x1

a0x2

p2

a1

x1

a1x0

a0x1

p1

a0

x0

a0x0

p0

a4x3

a3x4

p7

a4x4

p8p9

Advantages of Array MultiplierAdvantages of Array MultiplierAdvantages of Array MultiplierAdvantages of Array Multiplier

Page 46: 16 Tap FIR Filter

– Array multiplier for Array multiplier for

Unsigned numbersUnsigned numbers

a3x1

a4x00

a2x1

a3x00

a1x1

a2x00

a0x1

a1x00

a3x2

a4x1

a2x2 a1x2 a0x2

a3x3

a4x2

a2x3 a1x3 a0x3

a3x4

a4x3

a2x4 a1x4 a0x4a4x4

0

a0x0

p9 p8 p7 p6 p5 p4 p3 p2 p1 p0

Array MultiplierArray MultiplierArray MultiplierArray Multiplier

Page 47: 16 Tap FIR Filter

• type I cell type I cell

–ordinary full adderordinary full adder

• type II cell type II cell –x + y - z = 2c - sx + y - z = 2c - s

s = (x + y - z) mod 2s = (x + y - z) mod 2

c = [(x + y - z) + s] / 2c = [(x + y - z) + s] / 2

–type I cell withtype I cell with

inverted z and sinverted z and s

z=1-z’, s=1-s’z=1-z’, s=1-s’

weight = -1z

II x

y

c s

x + y - z 2c - s

0 0 0 0 0 0 0 1 0 10 1 0 1 10 1 1 0 01 0 0 1 11 0 1 0 01 1 0 1 01 1 1 1 1

Array Multiplier for Two’s ComplementArray Multiplier for Two’s ComplementArray Multiplier for Two’s ComplementArray Multiplier for Two’s Complement

Page 48: 16 Tap FIR Filter

• type II’ cell :type II’ cell :

–- x - y + z = - 2c + s - x - y + z = - 2c + s

x + y - z = 2c - sx + y - z = 2c - s

identical to the type II identical to the type II cellcell z

y

II’ x

c s

weight = -2

weight = -1

Array Multiplier for Two’s ComplementArray Multiplier for Two’s ComplementArray Multiplier for Two’s ComplementArray Multiplier for Two’s Complement

Page 49: 16 Tap FIR Filter

4949

Carry-Save Multiplier

carry propagation : diagonally downwards instead of to left Requires additional adder (vector-merging adder) You can make this final adder very fast using CLA or CSA scheme

44 multiplier

ripple-carry based multiplier

Architecture of Carry-Save MultiplierArchitecture of Carry-Save MultiplierArchitecture of Carry-Save MultiplierArchitecture of Carry-Save Multiplier

Page 50: 16 Tap FIR Filter

5050

Critical path

Vector-merging addercarry-save multiplier

tmult=(N-1) tcarry + tand + tvma

Carry-Save Multiplier (44)

Architecture of Carry-Save MultiplierArchitecture of Carry-Save MultiplierArchitecture of Carry-Save MultiplierArchitecture of Carry-Save Multiplier

Page 51: 16 Tap FIR Filter

5151

Baugh-Wooley MultiplierBaugh-Wooley MultiplierBaugh-Wooley MultiplierBaugh-Wooley MultiplierAlgorithm for two’s-complement Algorithm for two’s-complement

multiplication.multiplication.

Adjusts partial products to maximize Adjusts partial products to maximize regularity of multiplication array.regularity of multiplication array.

Moves partial products with negative Moves partial products with negative signs to the last steps; also adds signs to the last steps; also adds negation of partial products rather than negation of partial products rather than subtracts.subtracts.

Page 52: 16 Tap FIR Filter

5252

Serial-Parallel Serial-Parallel MultiplierMultiplier

Serial-Parallel Serial-Parallel MultiplierMultiplierUsed in serial-arithmetic Used in serial-arithmetic

operations.operations.

Multiplicand can be held in Multiplicand can be held in place by register.place by register.

Multiplier is shifted into Multiplier is shifted into array.array.

Page 53: 16 Tap FIR Filter

5353

reset

Serial to parallelregister

G1

G2

Full adder

CoCi

Delay element ; F/F

S

N-1 stages

X

Y

M+N bits M*N cycles

Serial MultiplierSerial Multiplier

Serial-Parallel Serial-Parallel MultiplierMultiplier

Serial-Parallel Serial-Parallel MultiplierMultiplier

Page 54: 16 Tap FIR Filter

5454

Y0 Y1 Y2 Yn-1

X

Serial-Parallel Serial-Parallel MultiplierMultiplier

Serial-Parallel Serial-Parallel MultiplierMultiplier

Page 55: 16 Tap FIR Filter

5555

X3Y0 X2Y0 X1Y0 X0Y0

X0Y1X1Y1X2Y1X3Y1

X0Y2X1Y2X2Y2X3Y2

X0Y3X1Y3X2Y3X3Y3

P7 P6 P5 P4 P3 P2 P1 P0

Y0

Y1

Y2

Y3

X3 X2 X1 X0

Serial-Parallel Serial-Parallel MultiplierMultiplier

Serial-Parallel Serial-Parallel MultiplierMultiplier

Page 56: 16 Tap FIR Filter

5656

1

0

1

0

2

2

n

j

jj

m

i

ii

YY

XX

1

0

1

0

1

0

1

0

1

0

2

2)(

22

nm

k

kk

m

i

n

j

jiji

m

i

n

j

jj

iir

P

YX

YXYXP

+

Pi+1

Yi

Xi

CiCi+1

Serial-Parallel Serial-Parallel MultiplierMultiplier

Serial-Parallel Serial-Parallel MultiplierMultiplier

Page 57: 16 Tap FIR Filter

5757

The Architecture of the Booth The Architecture of the Booth AlgorithmAlgorithm

The Architecture of the Booth The Architecture of the Booth AlgorithmAlgorithm

The Booth MultiplierThe Booth Multiplier–High performance, low High performance, low power multiplier units are power multiplier units are necessary in many necessary in many situations, such as DSP situations, such as DSP systems.systems.

Page 58: 16 Tap FIR Filter

5858

FAFA

FA

FAFAFA

CLA adder

……..……..……..

X7 X6 X5 X4 X3 X2 X1 X0

Y0

Y1

Y2

Y7

. . . . . . . . .

Carry Save AdditionCarry Save AdditionCarry Save AdditionCarry Save Addition

Page 59: 16 Tap FIR Filter

5959

Booth’s AlgorithmBooth’s AlgorithmBooth’s AlgorithmBooth’s Algorithm

Page 60: 16 Tap FIR Filter

6060

)0(

2)248(

2)24(

2)2(

2)(

0

44142434

14/

044

313/

03132333

12/

0

221222

1

01

y

xyyyyyXY

xyyyyXY

xyyyXY

xyyXY

iiiii

n

ii

in

iiiii

n

i

iiii

n

i

iii1st order(radix-2)

2nd order(radix-4)

3rd order(radix-8)

4th order(radix-16)

Booth AlgorithmBooth AlgorithmBooth AlgorithmBooth Algorithm

Page 61: 16 Tap FIR Filter

6161

Booth EncodingBooth EncodingBooth EncodingBooth Encoding Encode a number by taking groups of 3 bitsEncode a number by taking groups of 3 bits

where each 3-bit group overlaps by 1 bitwhere each 3-bit group overlaps by 1 bit

Consider multiplier B with (n + 1) bitConsider multiplier B with (n + 1) bit– Pad B with 0 to match the first term Pad B with 0 to match the first term – if B has an odd number of bits, if B has an odd number of bits,

then extend the sign Bthen extend the sign BnnBBnnBBn-1n-1...B...B0000

i1i2i1j

2i1iij

BBB2E

BBB2E

Page 62: 16 Tap FIR Filter

6262

Booth MultiplierBooth MultiplierBooth MultiplierBooth MultiplierEncoding scheme to reduce number of Encoding scheme to reduce number of

stages in multiplication.stages in multiplication.

Performs two bits of multiplication at Performs two bits of multiplication at once—requires half the stages.once—requires half the stages.

Each stage is slightly more complex Each stage is slightly more complex than simple multiplier, but than simple multiplier, but adder/subtracter is almost as small/fast adder/subtracter is almost as small/fast as adder.as adder.

Page 63: 16 Tap FIR Filter

6363

Booth EncodingBooth EncodingBooth EncodingBooth Encoding

Two’s-complement form of multiplier:Two’s-complement form of multiplier:– y = -2y = -2nnyynn + 2 + 2n-1n-1yyn-2n-2 + 2 + 2n-2n-2yyn-2n-2 + ... + ...

Rewrite using 2Rewrite using 2aa = 2 = 2a+1a+1 - 2 - 2aa::– y = -2y = -2nn(y(yn-1n-1-y-ynn) + 2) + 2n-1n-1(y(yn-2n-2 -y -yn-1n-1) + 2) + 2n-2n-2(y(yn-3n-3 -y -yn-2n-2) )

+ ...+ ...

Consider first two terms: by looking at Consider first two terms: by looking at three bits of y, we can determine three bits of y, we can determine whether to add whether to add xx, , 2x2x to partial product. to partial product.

Page 64: 16 Tap FIR Filter

6464

Booth ActionsBooth ActionsBooth ActionsBooth Actionsyyii y yi-1i-1 y yi-2i-2 incrementincrement

0 0 00 0 0 00

0 0 10 0 1 xx

0 1 00 1 0 xx

0 1 10 1 1 2x2x

1 0 01 0 0 -2x-2x

1 0 11 0 1 -x-x

1 1 01 1 0 -x-x

1 1 11 1 1 00

Page 65: 16 Tap FIR Filter

6565

x8

Inverter/shift

Boothdecoder

Wallace Tree

CLA CLA CLA

x 2xx2x

selector

4

x0

y0

y1

y2

y3

y4

y5

y6

y7y8

………….

Booth MultiplierBooth MultiplierBooth MultiplierBooth Multiplier

Page 66: 16 Tap FIR Filter

Array Multiplier Cell for Booth’s Array Multiplier Cell for Booth’s AlgorithmAlgorithm

Array Multiplier Cell for Booth’s Array Multiplier Cell for Booth’s AlgorithmAlgorithm

0 (-2A)i (2A)i(A)i(-A)i

MUX

Full Adder

cout sout

select

cin

sin

Page 67: 16 Tap FIR Filter

6767

S0 S0 S0 S0 S0 S0 S0 S0 - - - - - - - -

S1 S1 S1 S1 S1 S1 - - - - - - - -

S2 S2 S2 S2 - - - - - - - -

S3 S3 - - - - - - - -

Signextension

)2(0)2(1)2(2)2(3

)222(0)222(1)222(2)222(3

)22222222(0

)222222(1)2222(2)22(3

0246

077277477677

01234567

234567456767

SSSS

SSSS

S

SSS

1 S3 1 S2 1 S1 1 S0+1

Sign Extension ReductionSign Extension ReductionSign Extension ReductionSign Extension Reduction

Page 68: 16 Tap FIR Filter

6868

Wallace TreeWallace TreeWallace TreeWallace Tree Reduces depth of adder chain.Reduces depth of adder chain.

Built from carry-save adders:Built from carry-save adders:– three inputs a, b, c three inputs a, b, c – produces two outputs y, z such that y + z = a + b produces two outputs y, z such that y + z = a + b

+ c+ c

Carry-save equations:Carry-save equations:– yyii = parity(a = parity(aii,b,bii,c,cii))

– zzii = majority(a = majority(aii,b,bii,c,cii))

Page 69: 16 Tap FIR Filter

6969

Wallace Tree StructureWallace Tree StructureWallace Tree StructureWallace Tree Structure

Page 70: 16 Tap FIR Filter

7070

7-bit Wallace Tree Addition7-bit Wallace Tree Addition7-bit Wallace Tree Addition7-bit Wallace Tree Addition

Page 71: 16 Tap FIR Filter

7171

Wallace Tree Wallace Tree OperationOperation

Wallace Tree Wallace Tree OperationOperation At each stage, i numbers are combined to At each stage, i numbers are combined to

form ceil(2i/3) sums.form ceil(2i/3) sums.

Final adder completes the summation.Final adder completes the summation.

Wiring is more complex.Wiring is more complex.

Can build a Booth-encoded Wallace tree Can build a Booth-encoded Wallace tree multiplier.multiplier.

Page 72: 16 Tap FIR Filter

7272

C S

FA

FA

FA

FA

1 2 3

4

5

6

FA FA

FA

FA

C S

CSA vs. Wallace TreeCSA vs. Wallace TreeCSA vs. Wallace TreeCSA vs. Wallace Tree

Page 73: 16 Tap FIR Filter

A 0 1 0 1 1 0 22A 0 1 0 1 1 0 22X X 0 0 1 0 1 1 11X X 0 0 1 0 1 1 11Y(recoded multiplier) 0 1 0 1 0 1Y(recoded multiplier) 0 1 0 1 0 1

1 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 0 1 0

Radix-4 Modified Booth’s AlgorithmRadix-4 Modified Booth’s AlgorithmRadix-4 Modified Booth’s AlgorithmRadix-4 Modified Booth’s Algorithm

Page 74: 16 Tap FIR Filter

7474

Wallace-TreeWallace-TreeWallace-TreeWallace-Tree

FA

FA

FA

FA

y0 y1 y2

y3

y4

y5

S

Ci-1

Ci-1

Ci-1

Ci

Ci

Ci

FA

y0 y1 y2

FA

y3 y4 y5

FA

FA

CC S

Ci-1

Ci-1

Ci-1

Ci

Ci

Ci

Collapse the chain of FAs yCollapse the chain of FAs y00-y-y55 (5 adders delays) to the Wallace tree consisting (5 adders delays) to the Wallace tree consisting

of (4 adders delays)of (4 adders delays)

Page 75: 16 Tap FIR Filter

7575

Floor Plan of MultiplierFloor Plan of MultiplierFloor Plan of MultiplierFloor Plan of Multiplier

Y

X

Z0

|Z3

Z7 — Z4Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

X3 X2 X1 X0

Y0

Y1

Y2

Y3

1) Square Floor Plan

Page 76: 16 Tap FIR Filter

7676

In The Actual DatapathIn The Actual DatapathIn The Actual DatapathIn The Actual Datapathx

Y

LSB

LSB

MSB

M1

M2

orM3

Floor Plan of MultiplierFloor Plan of MultiplierFloor Plan of MultiplierFloor Plan of Multiplier

Page 77: 16 Tap FIR Filter

7777

Floor PlanFloor PlanFloor PlanFloor Plan

AdderAdder

Add RegAdd Reg

Out RegOut Reg

MultiplierMultiplier

Multiplier RegMultiplier Reg

Control BlockControl Block

Coefficient Coefficient MemoryMemory

InputInputMemoryMemory

RoutingRouting

Page 78: 16 Tap FIR Filter

7878

Floor PlanningFloor PlanningFloor PlanningFloor Planning

Page 79: 16 Tap FIR Filter

7979

ResultsResultsResultsResultsCellCell Number of Number of

PortsPortsNumber of PortsNumber of Ports 3434

Number of NetsNumber of Nets 157157

Number of CellsNumber of Cells 3232

Combinational AreaCombinational Area 24286.050781 24286.050781

Non-Combinational AreaNon-Combinational Area 14935.535156 14935.535156

Total AreaTotal Area 39221.58593839221.585938

Page 80: 16 Tap FIR Filter

8080

Power Consumption Power Consumption & Area& Area

Power Consumption Power Consumption & Area& AreaCell Internal Power = 419.5078 uW (57%)Cell Internal Power = 419.5078 uW (57%)

Net Switching Power = 315.0848 uW (43%)Net Switching Power = 315.0848 uW (43%)

Total Dynamic Power = 734.5925 uW (100%)Total Dynamic Power = 734.5925 uW (100%)

Cell Leakage Power = 248.1773 nWCell Leakage Power = 248.1773 nW

Cell Internal Power = 419.5078 uW (57%)Cell Internal Power = 419.5078 uW (57%)

Net Switching Power = 315.0848 uW (43%)Net Switching Power = 315.0848 uW (43%)

Total Dynamic Power = 734.5925 uW (100%)Total Dynamic Power = 734.5925 uW (100%)

Cell Leakage Power = 248.1773 nWCell Leakage Power = 248.1773 nW

Page 81: 16 Tap FIR Filter

8181

Main ModuleMain ModuleMain ModuleMain Module

Page 82: 16 Tap FIR Filter

8282

Booth MultiplierBooth MultiplierBooth MultiplierBooth Multiplier

Page 83: 16 Tap FIR Filter

8383

Core ModuleCore ModuleCore ModuleCore Module

Page 84: 16 Tap FIR Filter

8484

Controller ModuleController ModuleController ModuleController Module

Page 85: 16 Tap FIR Filter

8585

ConclusionConclusionConclusionConclusion Good Design Experience.Good Design Experience.

Using Parallel FIR Filter Realization Using Parallel FIR Filter Realization Reduced the number of Multiplier and Reduced the number of Multiplier and Adder needed therefore Area was shrunk Adder needed therefore Area was shrunk and power consumption was loweredand power consumption was lowered

Timing Strategies Using non-blocking in Timing Strategies Using non-blocking in Verilog reduced number of states needed Verilog reduced number of states needed for implementation.for implementation.

Partitioning the design into submodules Partitioning the design into submodules made design more manageable and made design more manageable and optimized.optimized.

Performance Optimization was reached Performance Optimization was reached with slack time equal to +9.54.with slack time equal to +9.54.