Transcript
Page 1: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Fast Fourier Transform:VLSI Architectures

Lecture 10

6.973 Communication System Design – Spring 2006

Massachusetts Institute of Technology

Vladimir Stojanović

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Page 2: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Pipelined FFT architectures

� Examples

� Radix-2 � multi-path delay commutator � single-path delay feedback

� Radix-4 � single-path delay feedback � multi-path delay commutator � single-path delay commutator

6.973 Communication System Design 2

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Figure by MIT OpenCourseWare.

(1) . R2MDC(N-16)

(2) . R25DF(N-16)

(3) . R4SDF(N-256)

(4) . R4MDC(N-256)

(5) . R4SDC(N-256)

1

BF4

2B4E8F

3X64 3X16

BF4 BF4

3X4

BF4

3X1

C2 C2 C2 C2BF2 BF2 BF2 BF2j

8 4

4

2

2 1

4848

1921921281286464

16163232

BF4BF4C4C4

1212

4848 32 321616

4488

BF4BF4C4C4

33

1212 8 844

1122

BF4BF4C4C4

332211C4C4 BF4BF4

BF4BF4BF4BF4DC6X64DC6X64 BF4BF4 BF4BF4DC6X16DC6X16 DC6X4DC6X4 DC6X1DC6X1

jBF2

8

BF2

4

BF2

1

BF2

2

Page 3: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix-2 Multi-path Delay Commutator

� The most classical approach for pipelineimplementation of radix-2 FFT

� Input sequence broken into two parallel datastreams flowing forward with correct “distance”between data elements entering the butterflyscheduled by proper delays

� Both butterflies and multipliers are in 50%utilization

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 3

C2 C2 C2 C2BF2 BF2 BF2 BF2j

8 4

4

2

2

1

1

Figure by MIT OpenCourseWare.

Page 4: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix-2 Single-path Delay Feedback

� Uses registers more efficiently � Both as input and the output of the butterfly

� A single data stream goes through the multiplier at every stage

� Multiplier utilization is also 50%

6.973 Communication System Design 4

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

BF2

8

BF2

4

BF2

1

BF2

2

j

Figure by MIT OpenCourseWare.

[Wold & Despain ‘84]

Page 5: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix-4 Single-path Delay Feedback[Despain74]

� Utilization of multipliers 75% � By storing 3 BF4 outputs

� Radix-4 butterfly utilization only 25% � Butterfly fairly complicated

� At least 8 complex adders

6.973 Communication System Design 5

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

3X4 3X1

BF4BF4

N4

x(n+ ) N4

y(n+ )

N2

y(n+ )

3N4

y(n+ )

x(n) y(n)

N2

x(n+ )

3N4

x(n+ )

WN0

WNn

WN2n

WN3n

-j-1

-1

-1

-1j

-j

DFT4

DFT4

DFT4

DFT4

DFT4

X0x0

X12

x12

X1

x8X13

X14

x4

X2

X15x15

X3

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Page 6: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix-4 Multi-path Delay Commutator[Swartzlander84]

� What is the utilization of � Butterflies? � Multipliers?

6.973 Communication System Design 6

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

+++

1284

BF4 BF41

2

3

C4123

C4

DFT4

DFT4

DFT4

DFT4

DFT4

DFT4

X0x0

X12

x12

X1

x8X13

X14

x4

X2

X15x15

X3

N4

x(n+ ) N4

y(n+ )

N2

y(n+ )

3N4

y(n+ )

x(n) y(n)

N2

x(n+ )

3N4

x(n+ )

WN0

WNn

WN2n

WN3n

-j-1

-1

-1

-1j

-j

Figure by MIT OpenCourseWare.Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Page 7: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix-4 Single-path Delay Commutator[Bi & Jones ‘89]

� Modified radix-4 algorithm � Programmable ¼ radix-4 BF � 75% utilization � Used to build one of the largest single-chip

FFTs (8192pts) [Bidet’95]

6.973 Communication System Design 7

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

DFT4

DFT4

DFT4

DFT4

DFT4

DFT4

X0x0

X12

x12

X1

x8X13

X14

x4

X2

X15x15

X3

N4

x(n+ ) N4

y(n+ )

N2

y(n+ )

3N4

y(n+ )

x(n) y(n)

N2

x(n+ )

3N4

x(n+ )

WN0

WNn

WN2n

WN3n

-j-1

-1

-1

-1j

-j

commutator butterflyelement

commutator butterflyelement

coefficient

c1 c2 c3 c4 c5 c6

input

stage 1 stage 2

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Page 8: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

R4SDC commutator and butterfly details

6.973 Communication System Design 8

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

inputNt Nt Nt

Nt

Nt

Nt

mt c1 c2 c3

1

11 1 1

11

123

0

0000

00 0

1

1

1

0

0

0

1

1

1

0

0

0

1

1

1

0

0

0

1

1

1

0

0

0

2

Time

Time

1 0 15

t'+16T t'

t'+28T

m1= 3 m1= 2 m1= 1 m1= 0

t'+12T

14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 x(n)

T

input

2:1 multiplexers

Outputs fromcommutator at

stage 1

2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13

14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9

10 9 8 7 6 5 4

6 5 4 3 2 1 0

3 2 1 0

4 3 2 1

15 14 13 12 11 10 9 8 7 6 5

15 14 13 12 11 10 9 8 7 6 5

00

0

0

0

0

0

0

1

1

2

4

2

3

3

2

3

4

5

6

6

6

9

7

8

9

10

11

12

13

14

15

0

4

8

12

1

5

9

13

2

6

10

14

3

7

11

15

0

0

0

0

1

1

1

1

2

2

2

2

3

3

3

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

stage 1 stage 2

mt

re (0)Re

Im

add/sub

add/sub

D

add/sub

(0 = addition, 1 = subtraction)

add/sub

add/sub

add/sub

im (0)

im (1)re (1)

im (2)re (2)

im (3)re (3)

c4 c5 c6

10 0 0

11

023

0101

11 0

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.Figure by MIT OpenCourseWare.

Page 9: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Some conclusions

� Delay feedback approaches are always more efficient than corresponding delay-commutator approaches � In terms of memory utilization

� Since butterfly outputs share same storage with its inputs

� Pipeline architectures require FFT algorithms to be formulated in a “hardware-oriented” form � Where spatial regularity is preserved in a signal-flow

graph (SFG) � So that arithmetic operations can be tightly

scheduled for efficient hardware utilization

6.973 Communication System Design 9

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Page 10: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Decomposition – a review

� Twiddle factor is Nth primitive root of unity � With exponent evaluated modulo N

� Most fast algorithms share same general strategy � Map one-dimensional transform int a two or multi­

dimensional representation � Exploit congruence property of coefficients to simplify

computation

� Unlike traditional step-by-step decompositionof twiddle factors � Cascading the twiddle factor decomposition leads

to new forms of FFT with high-spatial regularity

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 10

Page 11: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix 22 approach

� Start by classical divide-and-conquer radix-2 DIF indexing � But, consider the first two steps of decomposition together

Compute directly in standard radix-2 approach

New idea is to proceed to shorter DFTs cascading the twiddle factor WN

(N/4n2+n3)k1

[Shouseng and Torkelson 1996]

6.973 Communication System Design 11

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Page 12: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

A 16pt example� Get radix-4-like mulitplier complexity with

radix-2 butterfly structures (radix-22)

6.973 Communication System Design 12

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

x(0)x(1)x(2)x(3)x(4)x(5)x(6)x(7)x(8)x(9)

x(10)x(11)x(12)x(13)x(14)x(15)

x(0)x(1)x(2)x(3)x(4)x(5)x(6)x(7)x(8)x(9)

x(10)x(11)x(12)x(13)x(14)

x(15)

-j-j-j-j

-j

-j

-j

-j

-j-j

-j-j

W2

W4

W6

W0

W1

W2

W3

W0

W3

W6

W9

W2

W4

W6

W1

W2

W3

W3

W6

W9

N/4 DFT

(k1=0, k2=0)

N/4 DFT

(k1=0, k2=1)

N/4 DFT

(k1=1, k2=0)

N/4 DFT

(k1=1, k2=1)

X(0)X(8)X(4)X(12)X(2)X(10)X(6)X(14)X(1)X(9)X(5)X(13)X(3)X(11)X(7)X(15)

X(8)X(0)

X(4)X(12)X(2)X(10)

X(6)X(14)X(1)X(9)X(5)X(13)X(3)X(11)X(7)X(15)

BF I BF II BF II BF III BF IVBF I

W0

Figure by MIT OpenCourseWare.

Page 13: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

A 64pt radix-22 example

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 13

Image removed due to copyright restrictions.

Page 14: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix-22 (R22SDF) architectureN=256

� Similar to R2SDF � Reduced number of multipliers

� Need two types of butterflies � One identical to that in R2SDF � The other contains the logic for trivial twiddle factor multiplication (with j)

� Synchronization control very simple due to spatial regularity � Just a log2N binary counter

6.973 Communication System Design 14

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

X(k)X

x(n)

clkW2(n) W3(n)W1(n)

8163264128

BF2I BF2II BF2I BF2II BF2I BF2II BF2I BF2II

124

01234567

X XX X X X XXt t t t

+ + +

0101

0

01

1

+

+

+

+

x

xr(n)

xi(n)

xr(n+N/2)

xi(n+N/2)

(i). BF2I

--

0

101

0

01

1

+

++

+

+

+

xr(n)

zr(n)

xi(n)

zi(n)

xr(n+N/2)

zr(n+N/2)

xi(n+N/2)

zi(n+N/2)

(ii). BF2IIt x

--

-

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Page 15: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix-22 architecture – Sync control� log2N-bit binary counter

� Synchronization controller � Address counter for twiddle factor reading in each stage

� On first N/2 cycles, 2-to-1 mux in BF1 switch to 0 � Butterfly is idle (input data directed to shift registers)

� On next N/2 cycles, muxes in BF1 switch to 1 � Butterfly computes a 2pt DFT with incoming data and data stored in the shift registers � Output Z1(n) sent to twiddle multiplier � Output Z1(n+N/2) sent back to the shift register to be “multiplied” in next N/2 cycles,

when the first half of the next frame is loaded in

� Operation of BF2 is similar, except the “distance” of butterfly input sequence is just N/4 and the trivial multiply logic

� Utilization of the multiplier is 75% � Next frame can be computed w/o pausing due to the pipelined processing in

each stage

6.973 Communication System Design

� Pipeline register can be inserted between each multiplier and BF stage to improve the performance

15

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

X(k)X

x(n)

clkW2(n) W3(n)W1(n)

8163264128

BF2I BF2II BF2I BF2II BF2I BF2II BF2I BF2II

124

01234567

X XX X X X XXt t t t

+ + +

Figure by MIT OpenCourseWare.

Page 16: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Arithmetic complexity

� R22SDF has reached minimum requirement for both multiplier and storage

� Only R4SDC better in terms of adder usage � R22SDF well suited for VLSI implementations

of pipeline FFT processors

6.973 Communication System Design 16

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

R2MDCR2SDFR4SDFR4MDCR4SDCR22SDF

multiplier # adder # memory size control2(log4 N - 1) 4 log4 N

4 log4 N 8 log4 N 8 log4 N 3 log4 N 4 log4 N

2(log4 N - 1)log4 N - 1

3(log4 N - 1)log4 N - 1log4 N - 1

3N/2 - 2N - 1N - 1

5N/2 - 42N - 2N - 1

simplesimple

simple

simple

medium

complex

Figure by MIT OpenCourseWare.

Page 17: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Memory issues� The area/power consumption in the pipeline architectures

dominated by the � FIFO register files at each stage � Complex multipliers at each (or every other stage)

� To diminish the unnecessary data moving in the FIFO needto reconstruct the storage � A known approach is to use FIFO with 2-port RAM

� With read and write addresses displaced by a constant � 2-port RAM cells 33% more area of the 1-port RAM cell

� Use two N/2 1-port RAMs � Read and write interleaved � Each active every other cycle

6.973 Communication System Design 17

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

D(n) lxN 2-port RAM D(n-N)

W-addr. R-addr.

W R

D(n) D(n-N)

R/W

a

bd

c

Addr.

E

E

E

N/2-1 RAM

N/2-1 RAM

E

Figure by MIT OpenCourseWare.Figure by MIT OpenCourseWare.

Page 18: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].6.973 Communication System Design 18

x[0]

x[1]

x[2]

x[3]

x[4]

x[5]

x[6]

x[7]

X[0]

X[4]

X[2]

X[6]

X[1]

X[5]

X[3]

X[7]

-1

-1

-1

-1 -1

-1

-1

-1 -1

-1

-1

-1

W 0

W 1

W 2

W 3 W 2

W 2

W 0

W 0 W 0

W 0

W 0

W 0

TFFT =Nr logrN Tr,PE. .

Where,N/r = No. of butterfly per stagelogrN = No. of stageTr,PE = Time to calculate one butterfly

Con

trol

Circ

uits

Coe

ffR

OM

N/r

But

terf

lies

S/P

& B

it re

vers

e

P/S

Cou

nter

Figure by MIT OpenCourseWare.

Figure by MIT OpenCourseWare.

Single stage hardware example

Fold stages onto each other

Need constant geometry signal flow graph Big price in area for parallelism (within each stage)

[Sadat2001]

Page 19: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Radix-8 Pipelined/Parallel implementation

� A 64pt FFT example for 802.11a [Excerpted from Maharatna et al 2004]

� Two dimensional structure of 8pt FFTs � The number of nontrivial complex multiplications is 49 (7x7)

� Since the first twiddle is always 1 � The number of nontrivial complex multiplications for radix-2

FFT is 66 � Radix-4 (or 22) FFTs need only 52 multiplies

� Important to note that for 8pt FFT (DIT) no need formultiplies

6.973 Communication System Design 19

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Page 20: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

8pt DIT FFT

� The only nontrivial multiply is with 1/sqrt(2)� Easily realize using hardwired shift-and-add

6.973 Communication System Design 20

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Figure from Maharatna, K., E. Grass, and U. Jagdhold. "A 64-point Fourier Transform Chip for High-speed Wireless LAN Application Using OFDM." Solid-State Circuits 39 (2004): 484-493. Copyright 2004 IEEE. Used with permission.

Page 21: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Block diagram of the FFT unit

� Two-stages are pipelined � Fully parallel in each stage (radix-2 8pt FFT, single clk cycle)

� Two performance bottlenecks � Large number of global wires resulting from the multiplexing of

complex data to the 8-point FFTs � Construction of the multiplier unit to attain the required speed with

minimal silicon are is not trivial

6.973 Communication System Design 21

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Figure from Maharatna, K., E. Grass, and U. Jagdhold. "A 64-point Fourier Transform Chip for High-speed Wireless LAN Application Using OFDM." Solid-State Circuits 39 (2004): 484-493. Copyright 2004 IEEE. Used with permission.

Page 22: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Input unit� Hard wired outputs and data shifting

� To the 8pt FFT � Reduce de-muxing � Reduce global wires

� Cannot shift every clk � Multiplier cannot finish � Extend latency

� Temporary registers 1,2,3

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 22

Figure from Maharatna, K., E. Grass, and U. Jagdhold. "A 64-point Fourier Transform Chip for High-speed Wireless LAN Application Using OFDM." Solid-State Circuits 39 (2004): 484-493. Copyright 2004 IEEE. Used with permission.

Page 23: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Multiplier unit� 49 multiplies

� Only nine sets unique (cos,sin)

hard-wired constant

6.973 Communication System Design 23

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

� Significantly less storage space � for coefficients

� Turn multiplies into shift&add

Figure from Maharatna, K., E. Grass, and U. Jagdhold. "A 64-point Fourier Transform Chip for High-speed Wireless LAN Application Using OFDM." Solid-State Circuits 39 (2004): 484-493. Copyright 2004 IEEE. Used with permission.

Page 24: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Multiplier unit and scheduling

� Some of the coefficients requested concurrently by different FFT outputs � Solve by adding temp registers in the input unit

� ~50% less power and area than 8 standard complex multipliers � Buffer unit similar to input unit, just w/o temporary registers

� Outputs also hardwired with distance of 8

6.973 Communication System Design 24

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Figures from Maharatna, K., E. Grass, and U. Jagdhold. "A 64-point Fourier Transform Chip for High-speed Wireless LAN Application Using OFDM." Solid-State Circuits 39 (2004): 484-493. Copyright 2004 IEEE. Used with permission.

Page 25: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Output unit

� A mirror of input unit � Just w/o temporary reg

� Control/sync is simple� 5-bit counter

� Starts counting when in� Local counters control

� Input � Intermediate � Output units

6.973 C

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

rs

ull

iste

put f

ommunication System Design 25

Figure from Maharatna, K., E. Grass, and U. Jagdhold. "A 64-point Fourier Transform Chip for High-speed Wireless LAN Application Using OFDM." Solid-State Circuits 39 (2004): 484-493. Copyright 2004 IEEE. Used with permission.

Page 26: Fast Fourier Transform: VLSI Architectures · PDF fileFast Fourier Transform: VLSI Architectures ... Where spatial regularity is preserved in a signal-flow graph ... Synchronization

Readings� [1] H.e. Shousheng and M. Torkelson "A new approach to pipeline FFT processor," Parallel

Processing Symposium, 1996., Proceedings of IPPS '96, The 10th International no. SN -, pp. 766-770, 1996. � [3] H.e. Shousheng and M. Torkelson "Designing pipeline FFT processor for OFDM (de)modulation,"

Signals, Systems, and Electronics, 1998. ISSSE 98. 1998 URSI International Symposium on no. SN -, pp. 257-262, 1998.

� [2] E. Wold and Alvin M. Despain "Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementations," IEEE Trans. Computers vol. 33, no. 5, pp. 414-426, 1984.

� [3] G. Bi and E.V. Jones "A pipelined FFT processor for word-sequential data," Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on vol. 37, no. 12 SN - 0096-3518, pp. 1982-1985, 1989.

� [4] K. Maharatna, E. Grass and U. Jagdhold "A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM," Solid-State Circuits, IEEE Journal of vol. 39, no. 3 SN - 0018-9200, pp. 484-493, 2004.

� Interesting DIT&F algorithm � [4] C. Chiu, W. Hui, T.J. Ding and J.V. McCanny "A 64-point Fourier transform chip for video motion

compensation using phase correlation," Solid-State Circuits, IEEE Journal of vol. 31, no. 11 SN ­0018-9200, pp. 1751-1761, 1996.

� Power-performance estimation � [2] S. Hong, S. Kim, M.C. Papaefthymiou and W.E. Stark "Power-complexity analysis of pipelined

VLSI FFT architectures for low energy wireless communication applications," Circuits and Systems, 1999. 42nd Midwest Symposium on vol. 1, no. SN -, pp. 313-316 vol. 1, 1999.

� [3] K. Pagiamtzis and P.G. Gulak "Empirical performance prediction for IFFT/FFT cores for OFDM systems-on-a-chip," Circuits and Systems, 2002. MWSCAS-2002. The 2002 45th Midwest Symposium on vol. 1, no. SN -, pp. I-583-6 vol.1, 2002.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 26


Recommended