4
FPGA Implementation of Low Power 64-Point Radix-4 FFT Processor for OFDM System Ishak Suleiman TM Research & Development Sdn.Bhd. Idea Tower, UPM-MTDC, Technology Incubation Centre One Lebuh Silikon, 43400 Serdang, Selangor, Malaysia E-mail: [email protected] Abstract-FFT processor is a crucial block in multi-carrier systems like OFDM (Orthogonal Frequency Division Multiplexing) based Wireless LAN (IEEE 802.11). The portable usage applications of these systems require for low power FFT processor. This paper proposes a radix-4 butterfly architecture using recursive technique for reducing hardware complexity and power consumption using multipliers. A full pipelined architecture design is proposed for constant data throughput for every clock cycle. The FFT processor has been implemented on Xilinxs' FPGA devices (XCV1000E-8HQ240, X2V3000-6FFl152, X2V6000-6FFl152 and XC2VP30-7FFl152) with device utilization around 35% of the chip, running at an estimated frequency clock 20MHz and with estimated power of 400 mW. based implementation. ': Keywords - FFT; OFDM; ADSL; WLAN; 4G; radix-4; low power I. INTRODUCTION The Fast Fourier Transform (FFT) and its inverse (IFFT) are essential in the field of digital signal processing (DSP) to give parallelism of data symbol representation in time domain to frequency domain in modem design (broadband data transmission) [1-5]. Signal transformation from time domain to frequency domain using FFT and vice versa are shown in Fig. 1. The popularity of the orthogonal frequency division multiplexer (OFDM) system has increased the demand for high-speed and low-power FFT for various broadband applications such as Asymmetric Digital Subscriber Line (ADSL), Wireless Local Area Network (IEEE 802.11a/b and 802.16), HIPERLAN/2 and fourth generation (4G) systems [1- 5]. Among various FFT algorithms, the Cooley-Turkey algorithm [6] is the most popular because it reduces computational complexity and regularity of the algorithm that makes it suitable for hardware implementation. To further reduce the computational complexity, radix-4 is proposed [6]. FFT enables broadband data transmission but it also requires higher power processing for high data rates application [1-5]. The key factor of the proposed architecture design is to enable low power implementation without losing performance. A novel architecture of the FFT processor for commutator is proposed using three stage 64 point radix-4 FFT. FIFO based commutator can be implemented in two ways using SR (Shift Register) or DM (Dual port RAM). This paper proposes an SR This research work was supported by Telekom Malaysia Bhd known as TM. Project No. R03-0568 "OFDM Based Wireless LAN Processor". 1-4244-0011-2/05/$20.00 ©2005 IEEE. 278 -[-[-l [f [ .f .: -/ - 0 - 0 J -: Figure I. Signal transformation using FFT/IFFT. Figure 2. Transmitter and receiver block diagram for the OFDM PHY (the figure adopted from IEEE 802.lla Standard [5], pg. 24). II. ALGORITHM The 64-point radix-4 FFT of a finite duration sequence is given in [6] as:- 63 X(k) = Lx(n)W;: n=O FPGA Implementation of Low Power 64-Point Radix-4 FFT Processor for OFDM System Ishak Suleiman TM Research & Development Sdn.Bhd. Idea Tower, UPM-MTDC, Technology Incubation Centre One Lebuh Silikon, 43400 Serdang, Selangor, Malaysia E-mail: [email protected] Abstract-FFT processor is a crucial block in multi-carrier systems like OFDM (Orthogonal Frequency Division Multiplexing) based Wireless LAN (IEEE 802.11). The portable usage applications of these systems require for low power FFT processor. This paper proposes a radix-4 butterfly architecture using recursive technique for reducing hardware complexity and power consumption using multipliers. A full pipelined architecture design is proposed for constant data throughput for every clock cycle. The FFT processor has been implemented on Xilinxs' FPGA devices (XCV1000E-8HQ240, X2V3000-6FFl152, X2V6000-6FFl152 and XC2VP30-7FFl152) with device utilization around 35% of the chip, running at an estimated frequency clock 20MHz and with estimated power of 400 mW. based implementation. ': Keywords - FFT; OFDM; ADSL; WLAN; 4G; radix-4; low power I. INTRODUCTION The Fast Fourier Transform (FFT) and its inverse (IFFT) are essential in the field of digital signal processing (DSP) to give parallelism of data symbol representation in time domain to frequency domain in modem design (broadband data transmission) [1-5]. Signal transformation from time domain to frequency domain using FFT and vice versa are shown in Fig. 1. The popularity of the orthogonal frequency division multiplexer (OFDM) system has increased the demand for high-speed and low-power FFT for various broadband applications such as Asymmetric Digital Subscriber Line (ADSL), Wireless Local Area Network (IEEE 802.11a/b and 802.16), HIPERLAN/2 and fourth generation (4G) systems [1- 5]. Among various FFT algorithms, the Cooley-Turkey algorithm [6] is the most popular because it reduces computational complexity and regularity of the algorithm that makes it suitable for hardware implementation. To further reduce the computational complexity, radix-4 is proposed [6]. FFT enables broadband data transmission but it also requires higher power processing for high data rates application [1-5]. The key factor of the proposed architecture design is to enable low power implementation without losing performance. A novel architecture of the FFT processor for commutator is proposed using three stage 64 point radix-4 FFT. FIFO based commutator can be implemented in two ways using SR (Shift Register) or DM (Dual port RAM). This paper proposes an SR This research work was supported by Telekom Malaysia Bhd known as TM. Project No. R03-0568 "OFDM Based Wireless LAN Processor". 1-4244-0011-2/05/$20.00 ©2005 IEEE. 278 -[-[-l [f [ .f .: -/ - 0 - 0 J -: Figure I. Signal transformation using FFT/IFFT. Figure 2. Transmitter and receiver block diagram for the OFDM PHY (the figure adopted from IEEE 802.lla Standard [5], pg. 24). II. ALGORITHM The 64-point radix-4 FFT of a finite duration sequence is given in [6] as:- 63 X(k) = Lx(n)W;: n=O

[IEEE Signal Processing with Special Track on Biomedical Engineering (CCSP) - Kuala Lumpur, Malaysia (2005.11.14-2005.11.16)] 2005 1st International Conference on Computers, Communications,

  • Upload
    ishak

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE Signal Processing with Special Track on Biomedical Engineering (CCSP) - Kuala Lumpur, Malaysia (2005.11.14-2005.11.16)] 2005 1st International Conference on Computers, Communications,

FPGA Implementation of Low Power 64-PointRadix-4 FFT Processor for OFDM System

Ishak SuleimanTM Research & Development Sdn.Bhd.

Idea Tower, UPM-MTDC, Technology Incubation Centre OneLebuh Silikon, 43400 Serdang, Selangor, Malaysia

E-mail: [email protected]

Abstract-FFT processor is a crucial block in multi-carriersystems like OFDM (Orthogonal Frequency DivisionMultiplexing) based Wireless LAN (IEEE 802.11). The portableusage applications of these systems require for low power FFTprocessor. This paper proposes a radix-4 butterfly architectureusing recursive technique for reducing hardware complexity andpower consumption using multipliers. A full pipelinedarchitecture design is proposed for constant data throughput forevery clock cycle. The FFT processor has been implemented onXilinxs' FPGA devices (XCV1000E-8HQ240, X2V3000-6FFl152,X2V6000-6FFl152 and XC2VP30-7FFl152) with deviceutilization around 35% of the chip, running at an estimatedfrequency clock 20MHz and with estimated power of 400 mW.

based implementation.

':

Keywords - FFT; OFDM; ADSL; WLAN; 4G; radix-4; lowpower

I. INTRODUCTION

The Fast Fourier Transform (FFT) and its inverse (IFFT)are essential in the field of digital signal processing (DSP) togive parallelism of data symbol representation in time domainto frequency domain in modem design (broadband datatransmission) [1-5]. Signal transformation from time domain tofrequency domain using FFT and vice versa are shown in Fig.1. The popularity of the orthogonal frequency divisionmultiplexer (OFDM) system has increased the demand forhigh-speed and low-power FFT for various broadbandapplications such as Asymmetric Digital Subscriber Line(ADSL), Wireless Local Area Network (IEEE 802.11a/b and802.16), HIPERLAN/2 and fourth generation (4G) systems [1­5]. Among various FFT algorithms, the Cooley-Turkeyalgorithm [6] is the most popular because it reducescomputational complexity and regularity of the algorithm thatmakes it suitable for hardware implementation. To furtherreduce the computational complexity, radix-4 is proposed [6].

FFT enables broadband data transmission but it alsorequires higher power processing for high data rates application[1-5]. The key factor of the proposed architecture design is toenable low power implementation without losing performance.A novel architecture of the FFT processor for commutator isproposed using three stage 64 point radix-4 FFT. FIFO basedcommutator can be implemented in two ways using SR (ShiftRegister) or DM (Dual port RAM). This paper proposes an SR

This research work was supported by Telekom Malaysia Bhd known asTM. Project No. R03-0568 "OFDM Based Wireless LAN Processor".

1-4244-0011-2/05/$20.00 ©2005 IEEE.278

lilJ1l1IFFT~-[-[-l [f [ .f.: -/ - 0 - 0 • J -:

Figure I. Signal transformation using FFT/IFFT.

Figure 2. Transmitter and receiver block diagram for the OFDM PHY(the figure adopted from IEEE 802.lla Standard [5], pg. 24).

II. ALGORITHM

The 64-point radix-4 FFT of a finite duration sequence isgiven in [6] as:-

63

X(k) = Lx(n)W;:n=O

FPGA Implementation of Low Power 64-PointRadix-4 FFT Processor for OFDM System

Ishak SuleimanTM Research & Development Sdn.Bhd.

Idea Tower, UPM-MTDC, Technology Incubation Centre OneLebuh Silikon, 43400 Serdang, Selangor, Malaysia

E-mail: [email protected]

Abstract-FFT processor is a crucial block in multi-carriersystems like OFDM (Orthogonal Frequency DivisionMultiplexing) based Wireless LAN (IEEE 802.11). The portableusage applications of these systems require for low power FFTprocessor. This paper proposes a radix-4 butterfly architectureusing recursive technique for reducing hardware complexity andpower consumption using multipliers. A full pipelinedarchitecture design is proposed for constant data throughput forevery clock cycle. The FFT processor has been implemented onXilinxs' FPGA devices (XCV1000E-8HQ240, X2V3000-6FFl152,X2V6000-6FFl152 and XC2VP30-7FFl152) with deviceutilization around 35% of the chip, running at an estimatedfrequency clock 20MHz and with estimated power of 400 mW.

based implementation.

':

Keywords - FFT; OFDM; ADSL; WLAN; 4G; radix-4; lowpower

I. INTRODUCTION

The Fast Fourier Transform (FFT) and its inverse (IFFT)are essential in the field of digital signal processing (DSP) togive parallelism of data symbol representation in time domainto frequency domain in modem design (broadband datatransmission) [1-5]. Signal transformation from time domain tofrequency domain using FFT and vice versa are shown in Fig.1. The popularity of the orthogonal frequency divisionmultiplexer (OFDM) system has increased the demand forhigh-speed and low-power FFT for various broadbandapplications such as Asymmetric Digital Subscriber Line(ADSL), Wireless Local Area Network (IEEE 802.11a/b and802.16), HIPERLAN/2 and fourth generation (4G) systems [1­5]. Among various FFT algorithms, the Cooley-Turkeyalgorithm [6] is the most popular because it reducescomputational complexity and regularity of the algorithm thatmakes it suitable for hardware implementation. To furtherreduce the computational complexity, radix-4 is proposed [6].

FFT enables broadband data transmission but it alsorequires higher power processing for high data rates application[1-5]. The key factor of the proposed architecture design is toenable low power implementation without losing performance.A novel architecture of the FFT processor for commutator isproposed using three stage 64 point radix-4 FFT. FIFO basedcommutator can be implemented in two ways using SR (ShiftRegister) or DM (Dual port RAM). This paper proposes an SR

This research work was supported by Telekom Malaysia Bhd known asTM. Project No. R03-0568 "OFDM Based Wireless LAN Processor".

1-4244-0011-2/05/$20.00 ©2005 IEEE.278

lilJ1l1IFFT~-[-[-l [f [ .f.: -/ - 0 - 0 • J -:

Figure I. Signal transformation using FFT/IFFT.

Figure 2. Transmitter and receiver block diagram for the OFDM PHY(the figure adopted from IEEE 802.lla Standard [5], pg. 24).

II. ALGORITHM

The 64-point radix-4 FFT of a finite duration sequence isgiven in [6] as:-

63

X(k) = Lx(n)W;:n=O

Page 2: [IEEE Signal Processing with Special Track on Biomedical Engineering (CCSP) - Kuala Lumpur, Malaysia (2005.11.14-2005.11.16)] 2005 1st International Conference on Computers, Communications,

15 31 47 63

=Lx(n)W:' + Lx(n)w~n + Lx(n)w~n + Lx(n)w~nn=O n=16 n=32 n=4815 15

=Lx(n)W:: + Lx(n + l6)W~(n+16) +n=O n=O

15 15

L x(n + 32)W~(n+32) + L x(n + 48)W~(n+48)

n=O n=O

= f[x(n) + x(n + l6)W~6k + x(n + 32)W;"2k]W kn

n=O + x(n + 48)W~8k 64

= f[x(n) + (- j)k x(n + 16) + (_l)k x(n + 32)]W:: (1)n=O + (j)k x(n + 48)

each stage. The decomposition corresponds to the decimationin frequency computation.

III. ARCHITECTURE

A pipelined 64-point radix-4 processor based on the abovealgorithm is shown in Fig. 4. Each stage produces four outputsof butterfly element on each cycle. Each stage contains acommutator and a butterfly element.

where, W: =e-j2TIkn/ 64 =cos(211kn/ 64) - j sin(211kn/ 64) denotes the

twiddle factor representing k and n indexes; n is the time index;the k is the frequency index and j = H .

In this algorithm there are three (lo~64) stages involving64 point uniform radix-4 algorithmic processes with 16elements of radix-4 butterfly for each stage. The signal flowgraph (1) is shown in Fig. 3.

Conunutator1

Conunutator3

Figure 3. Signal flow graph of 64-point radix-4 FFT.

In Fig. 3, the first stage computes 64-input samples; thesecond stage computes 64-input data correspond to the firststage output; similar process is applied to the third stage (laststage); and finally, results from the last stage indicates theoutput samples. The dotted lines represent the boundaries of

Figure 4. 64-point radix-4 FFT processor.

IV. RESULT

The 64-point radix-4 FFT processor is implemented onVerilog-HDL (for RTL level model) and synthesized on XilinxFPGA devices (XCVI000E-8HQ240, X2V3000-6FFI152,X2V6000-6FF1152 and XC2VP30-7FFI152). The processor isverified for 16-bits data in-out and is compared against resultsobtained from MATLAB-simulation of the OFDM system.

The results obtained from the synthesis and simulations aresummarized in Table 1. Different targeted devices gavedifferent results. For instance, the XCVI000E-8HQ240,X2V3000-6FF1152 X2V6000-6FF1152 and XC2VP30­7FF1152 require 4,352 out of 12,288, 3,304 out of 14,336,3,304 out of 33,792 and 3,328 out of 13,696 SLICEs,respectively. The area utilization of each device is shown inFig. 5, Fig. 6, Fig. 7 and Fig. 8, respectively. The XC2VP30­7FF1152 gives the best processing speed of 32.74 millionsamples data throughputs per second and the worst speed is20.12 million samples per second given by the XCVI000E­8HQ240. In addition, estimated power consumption of eachdevice ranges from 359 mW to 432 mW. The nominaldifferences of the power consumptions are due to differentarchitecture of the Xilinx device family (RAM and routingutilization).

As shown in Fig. 4, the first and second stage processes ofthe 64-point radix 4 FFT algorithm (as shown in Fig. 3) aredesigned using the butterfly element and commutator2accordingly. The last stage process of the FFT algorithm isdesigned using the butterfly element alone. The function of thecommutator1 and commutator2 are organized in serial toparallel order and parallel to serial order respectively.

:JI'istagj.,o

.I~~.~..... Ui;:12.:16.:I

~.--..;:~ 20~.~~ ....::.....-..~~..-;:IIIS

526

~..... -""",-_2.:1~,.~~~....... .:IO

5lIS12

_-- .... 26.....---.---::I~--.:I.:I

lIS01

.1 ._..... 17

;:1;:1.:IS'5

~.--..~__21;:175;:1S'

___............... 25~.~....--:::~........:Il

571;:1

~ ~.--~ 2S'.....--~~--.:I5

IISl2

.1 ._..... 16

;:1.:150lIS

~.--..~__ 22;:165.:110

...---.............._211S~.~....--:::~....... .:I2

561.:1

~ ~.--~ ;:IO.....--~~--.:IIIS

lIS 2;:I

",,-~~---lSi

;:15517_____..-..._ 2;:1

;:IS'5511

...---.............._27~.~....--:::~.....:I;:I

5S'15

~~.--.....~ ;:Il

.:17IIS;:I

1rtstag-:.o12;:I.:I5lIS76S'1011121;:11.:1151151716lSi2021222;:12.:1252lIS27262S';:10

~~ !X%~~;:1;:1;:1.:1;:15;:IllS;:17;:16;:IS'.:10.:11.:12.:1;:1.:1.:1.:15.:IllS.:17.:16.:IS'5051525;:15.:1555 lIS57565S'lIS0lIS1lIS 2IIS;:I

279

15 31 47 63

=Lx(n)W:' + Lx(n)w~n + Lx(n)w~n + Lx(n)w~nn=O n=16 n=32 n=4815 15

=Lx(n)W:: + Lx(n + l6)W~(n+16) +n=O n=O

15 15

L x(n + 32)W~(n+32) + L x(n + 48)W~(n+48)

n=O n=O

= f[x(n) + x(n + l6)W~6k + x(n + 32)W;"2k]W kn

n=O + x(n + 48)W~8k 64

= f[x(n) + (- j)k x(n + 16) + (_l)k x(n + 32)]W:: (1)n=O + (j)k x(n + 48)

each stage. The decomposition corresponds to the decimationin frequency computation.

III. ARCHITECTURE

A pipelined 64-point radix-4 processor based on the abovealgorithm is shown in Fig. 4. Each stage produces four outputsof butterfly element on each cycle. Each stage contains acommutator and a butterfly element.

where, W: =e-j2TIkn/ 64 =cos(211kn/ 64) - j sin(211kn/ 64) denotes the

twiddle factor representing k and n indexes; n is the time index;the k is the frequency index and j = H .

In this algorithm there are three (lo~64) stages involving64 point uniform radix-4 algorithmic processes with 16elements of radix-4 butterfly for each stage. The signal flowgraph (1) is shown in Fig. 3.

Conunutator1

Conunutator3

Figure 3. Signal flow graph of 64-point radix-4 FFT.

In Fig. 3, the first stage computes 64-input samples; thesecond stage computes 64-input data correspond to the firststage output; similar process is applied to the third stage (laststage); and finally, results from the last stage indicates theoutput samples. The dotted lines represent the boundaries of

Figure 4. 64-point radix-4 FFT processor.

IV. RESULT

The 64-point radix-4 FFT processor is implemented onVerilog-HDL (for RTL level model) and synthesized on XilinxFPGA devices (XCVI000E-8HQ240, X2V3000-6FFI152,X2V6000-6FF1152 and XC2VP30-7FFI152). The processor isverified for 16-bits data in-out and is compared against resultsobtained from MATLAB-simulation of the OFDM system.

The results obtained from the synthesis and simulations aresummarized in Table 1. Different targeted devices gavedifferent results. For instance, the XCVI000E-8HQ240,X2V3000-6FF1152 X2V6000-6FF1152 and XC2VP30­7FF1152 require 4,352 out of 12,288, 3,304 out of 14,336,3,304 out of 33,792 and 3,328 out of 13,696 SLICEs,respectively. The area utilization of each device is shown inFig. 5, Fig. 6, Fig. 7 and Fig. 8, respectively. The XC2VP30­7FF1152 gives the best processing speed of 32.74 millionsamples data throughputs per second and the worst speed is20.12 million samples per second given by the XCVI000E­8HQ240. In addition, estimated power consumption of eachdevice ranges from 359 mW to 432 mW. The nominaldifferences of the power consumptions are due to differentarchitecture of the Xilinx device family (RAM and routingutilization).

As shown in Fig. 4, the first and second stage processes ofthe 64-point radix 4 FFT algorithm (as shown in Fig. 3) aredesigned using the butterfly element and commutator2accordingly. The last stage process of the FFT algorithm isdesigned using the butterfly element alone. The function of thecommutator1 and commutator2 are organized in serial toparallel order and parallel to serial order respectively.

:JI'istagj.,o

.I~~.~..... Ui;:12.:16.:I

~.--..;:~ 20~.~~ ....::.....-..~~..-;:IIIS

526

~..... -""",-_2.:1~,.~~~....... .:IO

5lIS12

_-- .... 26.....---.---::I~--.:I.:I

lIS01

.1 ._..... 17

;:1;:1.:IS'5

~.--..~__21;:175;:1S'

___............... 25~.~....--:::~........:Il

571;:1

~ ~.--~ 2S'.....--~~--.:I5

IISl2

.1 ._..... 16

;:1.:150lIS

~.--..~__ 22;:165.:110

...---.............._211S~.~....--:::~....... .:I2

561.:1

~ ~.--~ ;:IO.....--~~--.:IIIS

lIS 2;:I

",,-~~---lSi

;:15517_____..-..._ 2;:1

;:IS'5511

...---.............._27~.~....--:::~.....:I;:I

5S'15

~~.--.....~ ;:Il

.:17IIS;:I

1rtstag-:.o12;:I.:I5lIS76S'1011121;:11.:1151151716lSi2021222;:12.:1252lIS27262S';:10

~~ !X%~~;:1;:1;:1.:1;:15;:IllS;:17;:16;:IS'.:10.:11.:12.:1;:1.:1.:1.:15.:IllS.:17.:16.:IS'5051525;:15.:1555 lIS57565S'lIS0lIS1lIS 2IIS;:I

279

Page 3: [IEEE Signal Processing with Special Track on Biomedical Engineering (CCSP) - Kuala Lumpur, Malaysia (2005.11.14-2005.11.16)] 2005 1st International Conference on Computers, Communications,

TABLE 1. SUMMARIZED RESULTS

Xilinxs' FPGA

Items \ Targeted Devices VirtexE VirtexII VirtexII VirtexII-Pro(XCVI000E- (X2V3000- (X2V6000- (XC2VP30-

88Q240)1 6FFl152)1 6FFl152)1 7FFl152)164-point radix-4 FFT processor (RTL level);

General Specifications Constant data throughputs for every clock cycle;Data latency 96 cycles; 16-bit complex word length

Voltage 1.8 volt / 3.3 volt / 1.5 volt / 3.3 volt / 1.5 volt / 3.3 volt / 1.5 volt / 2.5 volt /(Vccint/Vcco/Vccaux) n.a. 3.3 volt 3.3 volt 2.5 volt

Number of SLICEs4352 out of 12288 3304 out of 14336 3304 out of 33792 3328 out of 13696

(35%) (23%) (9%) (24%)

Number ofMULTl8Xl8s12 out of96 12 out of 144 12 out of 136

n.a.(12%) (8%) (8%)

Maximum system clock (fmax)20.12 MHz 24.57 MHZ 27.787 MHZ 32.74 MHz(49.700ns) (40.694ns) (35.988ns) (30.545ns)

Data throughputs20.12 million 24.57 million 27.787 million 32.74 million

samples/s samples/s samples/s samples/sEstimated power

414mW 359mW 360mW 432mWconsumption run at fmax

1 Available in the lab

v. CONCLUSION

This paper has presented a novel architecture to implementa pipelined 64-point radix-4 FFT processor suitable for OFDMsystem. The realization of radix-4 butterfly element with re­used technique significantly reduces hardware complexity.Table 1 shows the estimated power consumption of 400 mWwhich is suitable for low power broadband systemrequirements. In the future, the results and performances of theprocessor can be further increased by targeting to ASICs.

ACKNOWLEDGMENT

The author would like to thank Dr. Zulkalnain Mohd.Yusof for discussion and support; and Mazlaini Yahya forreviewing this paper.

REFERENCES

[1] R. Van Nee and R. Prasad, "OFDM for Wireless MultimediaCommunications", Norwell, MA: Archtech House, 2000.

[2] W. C. Yeh and C. W. Jen, "High-Speed and Low-Power Split-RadixFFT", IEEE Trans. Signal Processing, Vol.51, No.3, Mar. 2003.

[3] P.S. Chow, J.C. Tu and J.M. Cioffi, "Performance Evaluation of aMultichannel Transceiver System for ADSL and VHDSL Services", IEEEJ. Selected Area, Vol. SAC-9, No.6, pp. 909-919, Aug. 1991.

[4] M. Yoshida, E. Ishizu, N. Yamashita and Y. Amezawa, "OFDMTransmission For lSI Channels Using Variable-Length Pilot Symbols AndPre-FFT Equalizer With Enhanced MRC Diversity Reception",GLOBECOM '03. IEEE, Volume: 4,1-5 Dec. 2003, pp:2290 - 2294 volA.

[5] IEEE 802.1 la, "High Speed Physical Layer in the 5GHz Band", 1999.

[6] J. W. Cooley and J. W. Tukey, "An Algorithm for the MachineCalculation of Complex Fourier Series", Math. Comput., Vol. 10, pp.297-301, April 1965.

[7] M. Hassan, T. Arslan and J. S. Thompson, "A Novel Coefficient Orderingbased Low Power Pipelined Radix-4 FFT Processor for Wireless LANApplications", IEEE Transactions on Consumer Electronics, vol.49, no.!,February 2003.

280

Figure 5. FFT processor physical placementfor the XCVI 000E-8HQ240 device.

Figure 6. FFT processor physical placementfor the X2V3000-6FF1152 device.

TABLE 1. SUMMARIZED RESULTS

Xilinxs' FPGA

Items \ Targeted Devices VirtexE VirtexII VirtexII VirtexII-Pro(XCVI000E- (X2V3000- (X2V6000- (XC2VP30-

88Q240)1 6FFl152)1 6FFl152)1 7FFl152)164-point radix-4 FFT processor (RTL level);

General Specifications Constant data throughputs for every clock cycle;Data latency 96 cycles; 16-bit complex word length

Voltage 1.8 volt / 3.3 volt / 1.5 volt / 3.3 volt / 1.5 volt / 3.3 volt / 1.5 volt / 2.5 volt /(Vccint/Vcco/Vccaux) n.a. 3.3 volt 3.3 volt 2.5 volt

Number of SLICEs4352 out of 12288 3304 out of 14336 3304 out of 33792 3328 out of 13696

(35%) (23%) (9%) (24%)

Number ofMULTl8Xl8s12 out of96 12 out of 144 12 out of 136

n.a.(12%) (8%) (8%)

Maximum system clock (fmax)20.12 MHz 24.57 MHZ 27.787 MHZ 32.74 MHz(49.700ns) (40.694ns) (35.988ns) (30.545ns)

Data throughputs20.12 million 24.57 million 27.787 million 32.74 million

samples/s samples/s samples/s samples/sEstimated power

414mW 359mW 360mW 432mWconsumption run at fmax

1 Available in the lab

v. CONCLUSION

This paper has presented a novel architecture to implementa pipelined 64-point radix-4 FFT processor suitable for OFDMsystem. The realization of radix-4 butterfly element with re­used technique significantly reduces hardware complexity.Table 1 shows the estimated power consumption of 400 mWwhich is suitable for low power broadband systemrequirements. In the future, the results and performances of theprocessor can be further increased by targeting to ASICs.

ACKNOWLEDGMENT

The author would like to thank Dr. Zulkalnain Mohd.Yusof for discussion and support; and Mazlaini Yahya forreviewing this paper.

REFERENCES

[1] R. Van Nee and R. Prasad, "OFDM for Wireless MultimediaCommunications", Norwell, MA: Archtech House, 2000.

[2] W. C. Yeh and C. W. Jen, "High-Speed and Low-Power Split-RadixFFT", IEEE Trans. Signal Processing, Vol.51, No.3, Mar. 2003.

[3] P.S. Chow, J.C. Tu and J.M. Cioffi, "Performance Evaluation of aMultichannel Transceiver System for ADSL and VHDSL Services", IEEEJ. Selected Area, Vol. SAC-9, No.6, pp. 909-919, Aug. 1991.

[4] M. Yoshida, E. Ishizu, N. Yamashita and Y. Amezawa, "OFDMTransmission For lSI Channels Using Variable-Length Pilot Symbols AndPre-FFT Equalizer With Enhanced MRC Diversity Reception",GLOBECOM '03. IEEE, Volume: 4,1-5 Dec. 2003, pp:2290 - 2294 volA.

[5] IEEE 802.1 la, "High Speed Physical Layer in the 5GHz Band", 1999.

[6] J. W. Cooley and J. W. Tukey, "An Algorithm for the MachineCalculation of Complex Fourier Series", Math. Comput., Vol. 10, pp.297-301, April 1965.

[7] M. Hassan, T. Arslan and J. S. Thompson, "A Novel Coefficient Orderingbased Low Power Pipelined Radix-4 FFT Processor for Wireless LANApplications", IEEE Transactions on Consumer Electronics, vol.49, no.!,February 2003.

280

Figure 5. FFT processor physical placementfor the XCVI 000E-8HQ240 device.

Figure 6. FFT processor physical placementfor the X2V3000-6FF1152 device.

Page 4: [IEEE Signal Processing with Special Track on Biomedical Engineering (CCSP) - Kuala Lumpur, Malaysia (2005.11.14-2005.11.16)] 2005 1st International Conference on Computers, Communications,

Figure 7. FFT processor physical placementfor the X2V6000-6FF1152 device.

Figure 8. FFT processor physical placementfor the XC2VP30-7FF 1152 device.

281

Figure 7. FFT processor physical placementfor the X2V6000-6FF1152 device.

Figure 8. FFT processor physical placementfor the XC2VP30-7FF 1152 device.

281