46
DS260 April 2, 2007 www.xilinx.com 1 Product Specification © 2003-2007 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. Xilinx is providing this design, code, or information "as is." By providing the design, code, or information as one possible implementation of this feature, application, or standard, Xilinx makes no representation that this implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of the implementation, including but not limited to any warranties or representations that this implementation is free from claims of infringement and any implied warranties of merchantability or fitness for a particular purpose. Introduction The Fast Fourier Transform (FFT) is a computationally efficient algorithm for computing the Discrete Fourier Transform (DFT). The FFT core uses the Cooley-Tukey algorithm for computing the FFT. Features Drop-in module for Virtex™-II Pro, Virtex-4/XA, Virtex-5, Spartan™-3/XA, Spartan-3E/XA and Spartan-3A/3AN/3A DSP FPGAs Forward and inverse complex FFT, run-time configurable Transform sizes N = 2 m , m = 3 – 16 Data sample precision b x = 8 – 24 Phase factor precision b w = 8 – 24 Arithmetic types: - Unscaled (full-precision) fixed-point - Scaled fixed-point - Block floating-point Rounding or truncation after the butterfly On-chip memory Block RAM or Distributed RAM for data and phase- factor storage Optional run-time configurable transform point size Run-time configurable scaling schedule for scaled fixed point Bit/digit reversed output order or natural output order Four architectures offer an exchange between core size and transform time For use with Xilinx CORE Generator™ v9.1i and higher Overview The FFT core computes an N-point forward DFT or inverse DFT (IDFT) where N can be 2 m , m = 3–16. The input data is a vector of N complex values represented as dual b x -bit two’s-complement numbers, that is, b x bits for each of the real and imaginary components of the data sample, where b x is in the range 8 to 24 bits inclusive. Similarly, the phase factors b w can be 8 to 24 bits wide. All memory is on-chip using either block RAM or dis- tributed RAM. The N element output vector is repre- sented using b y bits for each of the real and imaginary components of the output data. Input data is presented in natural order, and the output data can be in either natural or bit/digit reversed order. The complex nature of data input and output is intrinsic to the FFT algo- rithm, not the implementation. Three arithmetic options are available for computing the FFT: Full-precision unscaled arithmetic Scaled fixed-point, where the user provides the scaling schedule Block-floating point (run-time adjusted scaling) The point size N, the choice of forward or inverse trans- form, and the scaling schedule. Both forward/inverse and scaling schedule can be changed frame by frame. Changing the point size resets the core. Four architecture options are available: Pipelined, Radix-4 Burst I/O, Radix-2 Burst I/O and Radix-2-Lite Burst I/O. For detailed information about each archi- tecture, see "Architecture Options" on page 3. 0 Fast Fourier Transform v4.1 DS260 April 2, 2007 0 0 Product Specification

DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

IntroductionThe Fast Fourier Transform (FFT) is a computationallyefficient algorithm for computing the Discrete FourierTransform (DFT). The FFT core uses the Cooley-Tukeyalgorithm for computing the FFT.

Features• Drop-in module for Virtex™-II Pro, Virtex-4/XA,

Virtex-5, Spartan™-3/XA, Spartan-3E/XA and Spartan-3A/3AN/3A DSP FPGAs

• Forward and inverse complex FFT, run-time configurable

• Transform sizes N = 2m, m = 3 – 16

• Data sample precision bx = 8 – 24

• Phase factor precision bw = 8 – 24

• Arithmetic types:

- Unscaled (full-precision) fixed-point

- Scaled fixed-point

- Block floating-point

• Rounding or truncation after the butterfly

• On-chip memory

• Block RAM or Distributed RAM for data and phase- factor storage

• Optional run-time configurable transform point size

• Run-time configurable scaling schedule for scaled fixed point

• Bit/digit reversed output order or natural output order

• Four architectures offer an exchange between core size and transform time

• For use with Xilinx CORE Generator™ v9.1i and higher

Overview The FFT core computes an N-point forward DFT orinverse DFT (IDFT) where N can be 2m, m = 3–16. Theinput data is a vector of N complex values representedas dual bx-bit two’s-complement numbers, that is,bx bits for each of the real and imaginary components ofthe data sample, where bx is in the range 8 to 24 bitsinclusive. Similarly, the phase factors bw can be 8 to 24bits wide.

All memory is on-chip using either block RAM or dis-tributed RAM. The N element output vector is repre-sented using by bits for each of the real and imaginarycomponents of the output data. Input data is presentedin natural order, and the output data can be in eithernatural or bit/digit reversed order. The complex natureof data input and output is intrinsic to the FFT algo-rithm, not the implementation.

Three arithmetic options are available for computingthe FFT:

• Full-precision unscaled arithmetic• Scaled fixed-point, where the user provides the

scaling schedule• Block-floating point (run-time adjusted scaling)

The point size N, the choice of forward or inverse trans-form, and the scaling schedule. Both forward/inverseand scaling schedule can be changed frame by frame.Changing the point size resets the core.

Four architecture options are available: Pipelined,Radix-4 Burst I/O, Radix-2 Burst I/O and Radix-2-LiteBurst I/O. For detailed information about each archi-tecture, see "Architecture Options" on page 3.

0

Fast Fourier Transform v4.1

DS260 April 2, 2007 0 0 Product Specification

DS260 April 2, 2007 www.xilinx.com 1Product Specification

© 2003-2007 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. Xilinx is providing this design, code, or information "as is." By providing the design, code, or information as one possible implementation of this feature, application, or standard, Xilinx makes no representation that this implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of the implementation, including but not limited to any warranties or representations that this implementation is free from claims of infringement and any implied warranties of merchantability or fitness for a particular purpose.

Page 2: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

2

Theory of OperationThe FFT is a computationally efficient algorithm for computing a Discrete Fourier Transform (DFT) ofsample sizes that are a positive integer power of 2. The DFT of asequence is defined as

Equation 1

where N is the transform size and . The inverse DFT (IDFT) is

Equation 2

Algorithm

The FFT core uses the Radix-4 and Radix-2 decomposition for computing the DFT. For burst I/O solu-tions, the decimation-in-time (DIT) method is used, while the decimation-in-frequency (DIF) method isused for the streaming solution. When using Radix-4, the N-point FFT consists of log4 (N) stages, witheach stage containing N/4 Radix-4 butterflies. Point sizes that are not a power of 4 need an extra Radix-2stage for combining data. An N-point FFT using Radix-2 has log2 (N) stages, with each stage containingN/2 Radix-2 butterflies.

The inverse FFT (IFFT) is computed by conjugating the phase factors of the corresponding forward FFT.

Finite Word Length Considerations

The burst I/O algorithms process an array of data by successive passes over the input data array. Oneach pass, the algorithm performs Radix-4 or Radix-2 butterflies, where each butterfly picks up four ortwo complex numbers, respectively, and returns four or two complex numbers to the same memory.The numbers returned to memory by the processor are potentially larger than the numbers picked upfrom memory. A strategy must be employed to accommodate this dynamic range expansion. Note thata full explanation of scaling strategies and their implications is beyond the scope of this document; formore information about this topic, see items 3 and 4 in "References" on page 45.

For a Radix-4 DIT FFT, the values computed in a butterfly stage (except the second) can experience agrowth to .

For Radix-2, the growth can be up to . This bit growth can be handled in three ways:

• Performing the calculations with no scaling and carrying all significant integer bits to the end of the computation

• Scaling at each stage using a fixed-scaling schedule

• Scaling automatically using block-floating point

All significant integer bits are retained when doing full-precision unscaled arithmetic. The width of thedata path increases to accommodate the bit growth through the butterfly. The growth of the fractionalbits created from the multiplication are truncated (or rounded) after the multiplication. The width ofthe output will be the (input width + log2(transform length) + 1). This will accommodate the worst casescenario for bit growth. For example, a 1024-pt transform with an input of 16 bits consisting of 1 integerbit and 15 fractional bits, will have an output of 27 bits with 12 integer bits and 15 fractional bits. The

( ), 0, , 1X k k N= −K

( ), 0, , 1x n n N= −K

12 /

0

( ) ( ) 0, , 1N

jnk N

n

X k x n e k Nπ−

=

= = −∑ K

1j = −

12 /

0

1( ) ( ) 0, , 1N

jnk N

k

x n X k e n NN

π−

=

= = −∑ K

1 3 2 5.242≈+

1 2 2.414+ ≈

www.xilinx.com DS260 April 2, 2007Product Specification

Page 3: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

core does not have a specific location for the binary point. The output will simply maintain the samebinary point location as the input. For the above example, a 16 bit input with 3 integer bits and 13 frac-tional bits would have an unscaled output of 27 bits with 14 integer bits and 13 fractional bits.

When using scaling, a scaling schedule is used to scale by a factor of 1, 2, 4, or 8 in each stage. If scalingis insufficient, a butterfly output may grow beyond the dynamic range and cause an overflow. As aresult of the scaling applied in the FFT implementation, the transform computed is a scaled transform.The scale factor s is defined as

Equation 3

where bi is the scaling (specified in bits) applied in stage i.

The scaling results in the final output sequence being modified by the factor 1/s. For the forward FFT,the output sequence X’ (k), k = 0,...,N - 1 computed by the core is defined in Equation 4.

Equation 4

For the inverse FFT, the output sequence is

Equation 5

If a Radix-4 algorithm scales by a factor of 4 in each stage, the factor of 1/s will be equal to the factor of1/N in the inverse FFT equation (Equation 2). For Radix-2, scaling by a factor of 2 in each stage providesthe factor of 1/N. Otherwise, additional scaling is necessary.

With block floating point, each data point in a frame is scaled by the same amount, and the scaling istracked by a block exponent. Scaling is performed only when necessary (to prevent data overflow),which is detected by the core.

As with unscaled arithmetic, for scaled and block floating point arithmetic, the core does not have aspecific location for the binary point. The location of the binary point in the output data is inheritedfrom the input data and then shifted by the scaling applied.

Architecture OptionsThe FFT core provides four architecture options to offer a trade-off between core size and transformtime.

• Pipelined, Streaming I/O. Allows continuous data processing.

• Radix-4, Burst I/O. Loads and processes data separately, using an iterative approach. It is smaller in size than the pipelined solution but has a longer transform time.

• Radix-2, Burst I/O. Uses the same iterative approach as Radix-4, but the butterfly is smaller. This means it is smaller in size than the Radix-4 solution, but the transform time is longer.

• Radix-2-Lite, Burst I/O. Based on the Radix-2 architecture, this variant uses a time-multiplexed approach to the butterfly for an even smaller butterfly, at the cost of longer transform time.

Figure 1 illustrates the trade-off of throughput versus resource use for the four architectures. As a ruleof thumb, each architecture offers a factor of 2 difference in resource from the next architecture. The

log 1

02

N

ii

b

s

=∑

=

1' 2 /

0

1 1( ) ( ) ( ) 0, , 1N

jnk N

n

X k X k x n e k Ns s

π−

=

= = = −∑ K

12 /

0

1( ) ( ) 0, , 1N

jnk N

k

x n X k e n Ns

π−

=

= = −∑ K

l 2, 2007 www.xilinx.com 3ecification

Page 4: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

4

example is for an even power of 2 point size. This does not require the Radix-4 architecture to have anadditional Radix-2 stage.

Bit and Digit Reversal

Each architecture offers the option of Natural or Reversed order of data output. Natural order is wherethe data points are output in the same order as the input data points, i.e., 0, 1, 2, 3, and so on. However,this imposes a cost on each architecture. For the block I/O architectures, this imposes a time penalty,because unloading the data cannot take place at the same time as loading input data for the next frame,so separate unload and load phases are required. In the pipelined architecture, it requires additionalRAM storage to perform the reordering.

In the Radix 2 and pipelined architectures, the Bit Reverse order is simple to calculate, by taking theindex of the data point, written in binary, and reversing the order of the digits. Hence, 0000, 0001, 0010,0011, 0100,...(0, 1, 2, 3, 4,...) becomes 0000, 1000, 0100, 1100, 0010,...(0, 8, 4, 12, 2,...).

In the case of Radix 4, the reversal applies to digits and, therefore, is called Digit Reversal. A digit inRadix 4 is two bits. Hence, 0000, 0001, 0010, 0011, 0100,...(0, 1, 2, 3, 4,...) becomes 0000, 0100, 1000, 1100,0001,...(0, 4, 8, 12, 1,...), as the pairs of digits are reversed. Where the transform size requires an oddnumber of index bits, the odd digit in the least significant place is moved to the most significant place,so 00000, 00001, 00010, 00011, 00100,... (0, 1, 2, 3, 4,...) becomes 00000, 10000, 00100, 10100, 01000,...(0, 16,4, 20, 8,...)

Note: The core outputs a data point index along with the data, so this section is for information only.

Pipelined, Streaming I/O

The Pipelined, Streaming I/O solution pipelines several Radix-2 butterfly processing engines to offercontinuous data processing. Each processing engine has its own memory banks to store the input andintermediate data (Figure 2). The core has the ability to simultaneously perform transform calculations

Figure Top x-ref 1

Figure 1: Resource versus Throughput for Architecture Options

www.xilinx.com DS260 April 2, 2007Product Specification

Page 5: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

on the current frame of data, load input data for the next frame of data, and unload the results of theprevious frame of data. The user can continuously stream in input data and, after the calculationlatency, can continuously unload the results. If preferred, this design can also calculate one frame byitself or frames with gaps in between.

This architecture supports unscaled full-precision and scaled fixed point arithmetic methods. In thescaled fixed point mode, the data is scaled after every pair of Radix-2 stages.

The unloaded output data can either be in bit reversed order or in natural order. By choosing the outputdata in natural order, additional memory resource will be utilized.

This architecture covers point sizes from 8 to 65536. The user has flexibility to select the number ofstages to use block RAM for data and phase factor storage. The remaining stages will use distributedmemory.

Radix-4, Burst I/O

With the Radix-4, Burst I/O solution, the FFT core uses one Radix-4 butterfly processing engine(Figure 3). It loads and/or unloads data separately from calculating the transform. Data I/O and pro-cessing are not simultaneous. When the FFT is started, the data is loaded. After a full frame has beenloaded, the core computes the FFT. When the computation has finished, the data can be unloaded, butcannot be loaded or unloaded during the calculation process. The data loading and unloading pro-cesses can be overlapped if the data is unloaded in digit reversed order.

Figure Top x-ref 2

Figure 2: Pipelined, Streaming I/O

Memory

Memory Memory

Memory Memory Memory

Radix-2Butterfly

Radix-2Butterfly

Radix-2Butterfly

Radix-2Butterfly

Radix-2Butterfly

Radix-2Butterfly

Group 0 Group 1

Stage 0 Stage 1 Stage 2 Stage 3

OutputShuffling

Output Data

Input Data

l 2, 2007 www.xilinx.com 5ecification

Page 6: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

6

This architecture has lower resource usage than the Pipelined Streaming I/O architecture but a longertransform time, and covers point sizes from 64 to 65536. All three arithmetic types are supported:unscaled, scaled, and block floating point. Data and phase factors can be stored in Block RAM or in Dis-tributed RAM (for point sizes less than or equal to 1024).

Radix-2, Burst I/O

The Radix-2 Burst I/O architecture uses one Radix-2 butterfly processing engine (Figure 4) and hasburst I/O (like Radix-4 Burst I/O). After a frame of data is loaded, the input data stream must halt untilthe transform calculation is completed. Then, the data can be unloaded. As with the Radix-4, Burst I/Oarchitecture, data can be simultaneously loaded and unloaded if the results are presented inbit-reversed order. This solution supports point sizes N = 8 – 65536 and uses a minimum of block mem-ories. All three arithmetic types are supported (unscaled, scaled, and block floating point). Both thedata memories and phase factor memories can be in either block memory or distributed memory (forpoint sizes less than or equal to 1024).

Figure Top x-ref 3

Figure 3: Radix-4, Burst I/O

-

-

-

- -j

ROM forTwiddles

RADIX-4DRAGONFLY

DataRAM 0

DataRAM 1

DataRAM 2

DataRAM 3

switc

h

switc

h

Input Data

Output Data

www.xilinx.com DS260 April 2, 2007Product Specification

Page 7: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Radix-2-Lite, Burst I/O

This architecture differs from the Radix-2 Burst I/O in that the butterfly processing engine uses oneshared adder/subtractor, hence, reducing resources at the expense of an additional delay per butterflycalculation. Again, as with the Radix-4 and Radix-2 Burst I/O architectures, data can be simultaneouslyloaded and unloaded if the results are presented in bit-reversed order. This solution supports pointsizes N = 8 – 65536 and uses a minimum of block memories. See Figure 5.

Figure Top x-ref 4

Figure 4: Radix-2, Burst I/O

Figure Top x-ref 5

Figure 5: Radix-2-Lite, Burst I/O

-

ROM forTwiddles

DataRAM 0

DataRAM 1

switc

h

switc

h

Input Data

Output Data

RADIX-2BUTTERFLY

Generate oneoutput each cycle

Sine one cycle,cosine the next

Multiply real one cycle,imaginary the next

Store data insingle RAM

ds260_05_102306

Input Data

Output Data

ROM forTwiddles

DataDPM 0

DataDPM 1

RADIX-2BUTTERFLY

-

l 2, 2007 www.xilinx.com 7ecification

Page 8: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

8

Core Symbol and Port DefinitionsFigure 6 shows the Core Schematic Symbol and Table 1 lists the core pinout for single channel configu-ration.

Figure Top x-ref 6

Figure 6: Core Schematic Symbol (Single Channel)

Table 1: Core Pinout (Single Channel)

Port Name Port Width Direction Description

XN_RE bxn InputInput data bus: Real component (bxn = 8 - 24) in two’s complement format

XN_IM bxn InputInput data bus. Imaginary component (bxn = 8 - 24) in two’s complement format

START 1 Input

FFT start signal (Active High): START is asserted to begin the data loading and transform calculation (for the burst I/O architectures). For streaming I/O, START will begin data loading, which proceeds directly to transform calculation and then data unloading.

UNLOAD 1 Input

Result unloading (Active High): For the burst I/O architectures, UNLOAD will start the unloading of the results in normal order. The UNLOAD port is not necessary for the Pipelined, Streaming I/O architecture or for bit/digit reversed unloading.

NFFT 5 Input

Point size of the transform: NFFT can be the size of the transform or any smaller point size. For example, a 1024-point FFT can compute point sizes 1024, 512, 256, and so on. The value of NFFT is log2 (point size). This port is only used with run-time configurable transform length.

XK_RE

XK_IM

XN_INDEXXK_INDEX

RFDBUSY

DV

EDONE

DONE

BLK_EXP

OVFLO

XN_RE

XN_IM

STARTUNLOAD

NFFT

NFFT_WE

FWD_INV

FWD_INV_WE

SCALE_SCH

SCALE_SCH_WE

SCLR

CE

CLK

www.xilinx.com DS260 April 2, 2007Product Specification

Page 9: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

NFFT_WE 1 Input

Write enable for NFFT (Active High): Asserting NFFT_WE will automatically cause the FFT core to stop all processes and to initialize the state of the core to the new point size on the NFFT port. This port is only used with run-time configurable transform length.

FWD_INV 1 Input

Control signal that indicates if a forward FFT or an inverse FFT is performed. When FWD_INV=1, a forward transform is computed. If FWD_INV=0, an inverse transform is performed.

FWD_INV_WE 1 Input Write enable for FWD_INV (Active High).

SCALE_SCH

for PIpelined Streaming I/O and Radix-4 Burst I/O architectures

or 2 x NFFT

for Radix-2 Minimum Resources

where NFFT is log2 (point size) or the number of stages

Input

Scaling schedule: For Burst I/O architectures, the scaling schedule is specified with two bits for each stage, starting at the two LSBs. The scaling can be specified as 3, 2, 1, or 0, which represents the number of bits to be shifted. An example scaling schedule for N =1024, Radix-4 Burst I/O is [1 0 2 3 2]. For N=128, Radix-2 or Radix-2-Lite, one possible scaling schedule is [1 1 1 1 0 1 2].

For Pipelined Streaming I/O architecture, the scaling schedule is specified with two bits for every pair of Radix-2 stages, starting at the two LSBs. For example, a scaling schedule for N=256 could be [2 2 2 3]. When N is not a power of 4, the maximum bit growth for the last stage is one bit. For instance, [0 2 2 2 2] or [1 2 2 2 2] are valid scaling schedules for N=512, but [2 2 2 2 2] is invalid. The two MSBs of SCALE_SCH can only be 00 or 01.

This port is only available with scaled arithmetic (not unscaled or block-floating point).

SCALE_SCH_WE 1 InputWrite enable for SCALE_SCH (Active High): This port is available only with scaled arithmetic.

SCLR 1 InputMaster synchronous reset (Active High): Optional port.

CE 1 Input Clock enable (Active High): Optional port.

CLK 1 Input Clock

XK_RE bxk Output

Output data bus: Real component in two’s complement format. (For scaled arithmetic and block floating point arithmetic, bxk=bxn. For unscaled arithmetic, bxk=bxn+NFFT+1)

XK_IM bxk Output

Output data bus: Imaginary component in two’s complement format. (For scaled arithmetic and block floating point arithmetic, bxk=bxn. For unscaled arithmetic, bxk=bxn+NFFT+1)

XN_INDEX log2 (point size) Output Index of input data.

XK_INDEX log2 (point size) Output Index of output data.

Table 1: Core Pinout (Single Channel) (Continued)

Port Name Port Width Direction Description

22

NFFTceil ⎛ ⎞× ⎜ ⎟⎝ ⎠

l 2, 2007 www.xilinx.com 9ecification

Page 10: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

10

Multichannel Pinout

Up to 12 channels are supported by this core. Table 2 shows how the pinout above must be adapted formultichannel operation

RFD 1 OutputReady for data (Active High): RFD is High during the load operation.

BUSY 1 OutputCore activity indicator (Active High): This signal will go High while the core is computing the transform.

DV 1 OutputData valid (Active High): This signal is High when valid data is presented at the output.

EDONE 1 OutputEarly done strobe (Active High): EDONE goes High one clock cycle immediately prior to DONE going active.

DONE 1 OutputFFT complete strobe (Active High): DONE will transition High for one clock cycle when the transform calculation has completed.

BLK_EXP 5 OutputBlock exponent: The number of bits scaled for every point in the data frame. Available only when block-floating point is used.

OVFLO 1 Output

Arithmetic overflow indicator (Active High): OVFLO will be High during result unloading if any value in the data frame overflowed. The OVFLO signal is reset at the beginning of a new frame of data. This port is optional and only available with scaled arithmetic.

Table 2: Single to Multichannel Pinout Conversion

Single Channel Multichannel

CLK CLK

CE CE

SCLR SCLR

NFFT NFFT

NFFT_WE NFFT_WE

FWD_INV FWD_INV

FWD_INV_WE FWD_INV_WE

START START

UNLOAD UNLOAD

XN_RE XN0_RE,..,XN11_RE

XN_IM XN0_IM,..,XN11_IM

SCALE_SCH SCALE_SCH0,..,SCALE_SCH11

SCALE_SCH_WE SCALE_SCH0_WE,..,SCALE_SCH11_WE

RFD RFD

XN_INDEX XN_INDEX

BUSY BUSY

Table 1: Core Pinout (Single Channel) (Continued)

Port Name Port Width Direction Description

www.xilinx.com DS260 April 2, 2007Product Specification

Page 11: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Graphical User Interface The FFT core graphical user interface (GUI) provides several screens with fields to set the parametervalues for the particular instantiation required. Here follows a description of each GUI field.• Component Name: The name of the core component to be instantiated. The name must begin with

a letter and be composed of the following characters: a to z, 0 to 9, and “_”.

• Number of channels: Select the number of channels from 1 to 12. This option is only available for the Radix-2-Lite Burst I/O architecture.

• Transform Length: Select the desired point size. All powers of two from 8 to 65536 are available.

• Implementation Options: Select an implementation option, as described in "Architecture Options" on page 3.

- Pipelined, Streaming I/O, and Radix-2 support point sizes 8 to 65536.

- Radix-4 Burst I/O architecture supports point sizes 64 to 65536.

• Transform Length Option: Select the transform length to be run-time configurable or not. The core uses fewer logic resources and has a faster maximum clock speed when the transform length is not run-time configurable.

• Precision Options: Input data width and phase factor data width can be 8 - 24 bits.

• Optional Pins: Clock Enable (CE), Synchronous Clear (SCLR), and Overflow (OVFLO) are optional pins. If no option is selected, some logic resources are saved.

• Scaling Options:

- Unscaled- Scaled- Block Floating Point. Note that Block Floating Point is unavailable with the Pipelined Streaming

I/O architecture.

EDONE EDONE

DONE DONE

DV DV

XK_INDEX XK_INDEX

XK_RE XK0_RE,..,XK11_RE

XK_IM XK0_IM,..,XK11_IM

BLK_EXP BLK_EXP0,..,BLK_EXP11

OVFLO OVFLO0,..,OVFLO11

Table 2: Single to Multichannel Pinout Conversion (Continued)

Single Channel Multichannel

l 2, 2007 www.xilinx.com 11ecification

Page 12: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

12

• Rounding Modes: At the output of the butterfly, the LSBs in the datapath need to be trimmed. These bits can be truncated or rounded using convergent rounding, an unbiased rounding scheme. When the fractional part of a number is equal to exactly one-half, convergent rounding rounds down if the number is odd, and rounds up if the number is even. Convergent rounding can be used to avoid the DC bias that would be introduced by truncation.

• Output Ordering: Output data selections are either Bit/Digit Reversed Order or Natural Order. The Radix-2 based architectures (Pipelined Streaming I/O, Radix-2 Burst I/O, and Radix-2-Lite Burst I/O) offer bit-reversed ordering, and the Radix-4 based architecture (Radix-4 Burst I/O) offers digit-reversed ordering. For Pipelined Streaming I/O, selecting Natural Order causes an increase in memory used by the core. For Burst I/O architectures, selecting natural order output increases the overall transform time because a separate unloading phase is required.

• Memory Options:

- For Pipelined Streaming I/O solution, the data can be partially stored in Block RAM and partially in Distributed RAM. The user can select the number of pipelined stages, counting from the input side, that use Block RAM for data and phase factor storage. The default displayed on the GUI will offer a good balance between both.

- For Burst I/O architectures, either Block RAM or Distributed RAM can be used for data and phase factor storage. Data and phase factor storage can be in distributed RAM for all point sizes 1024 and under.

• Optimize Options:

- In Virtex-4, Virtex-5 and Spartan-3A DSP FPGAs, the complex multiplications and the butterfly additions/subtractions can be computed in XtremeDSP™ slices. Selecting Optimize For Speed Using XtremeDSP Slices allows a faster maximum clock speed at the cost of using more XtremeDSP slices. This option is only available when the CORE Generator target architecture is Virtex-4 , Virtex-5, or Spartan-3A DSP.

- If Complex Multiplication is selected, the complex multipliers are built out of four real multipliers instead of three, allowing the entire complex multiplication to be calculated within the XtremeDSP slices, resulting in faster clock speeds. Select this option for the largest increase in clock speed with a minimal increase in the number of extra XtremeDSP slices used. This option is only available for Virtex-4 and Spartan-3A DSP. In Virtex-5 it is always selected.

- If Butterfly Arithmetic is selected, the additions and subtractions of the butterflies are computed using XtremeDSP slices. This option is only available in Virtex-4 and Spartan-3A DSP if the output width is less than or equal to 30. In Virtex-5, this feature is available for all output widths.

• Information:

- Implementation: This area displays the currently selected architecture. This is useful to see the result of automatic architecture selection.

- Transform Size: When the transform length is run-time configurable, the core has the ability to reprogram the point size while the core is running; that is, the core can support the selected point size and any smaller point size. This area displays the supported point sizes based on the Transform Length, Transform Length Option, and the Implementation Option selected.

- Output Data Width. The output data width equals the input data width for scaled arithmetic and block floating point arithmetic. With unscaled arithmetic, the output data width equals (input data width+ log2(point size) + 1).

- Resource Estimates: Based on the options selected, this area displays the XtremeDSP slice count and block RAM numbers. The resource numbers are just an estimate. For exact resource usage, a MAP report should be consulted.

www.xilinx.com DS260 April 2, 2007Product Specification

Page 13: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

XCO Parameters

Table 3 defines valid entries for the xco parameters. Note that parameters are not case sensitive. Defaultvalues are displayed in bold.

Table 3: XCO Parameters

XCO Parameter Valid Values

component_nameName must begin with a letter and be composed of the following characters: a to z, 0 to 9, and “_”.

channels 1 - 12 (default value is 1)

transform_length8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536

implementation_options

automatically_selectpipelined_streaming_ioradix4_burst_ioradix2_burst_ioradix2_lite_burst_io

target_clock_frequency 0 - 550 (default is 250)

target_data_throughput 0 - 550 (default is 50)

run_time_configurable_transform_lengthfalsetrue

input_width 8 - 24 (default value is 16)

phase_factor_width 8 - 24 (default value is 16)

scaling_optionsscaledunscaledblock_floating_point

rounding_modestruncationconvergent_rounding

cefalsetrue

sclrfalsetrue

ovflofalsetrue

output_orderingbit_reversed_ordernatural_order

memory_options_datablock_ramdistributed_ram

memory_options_phase_factorsblock_ramdistributed_ram

number_of_stages_using_block_ram_for_data_and_phase_factors

0 - 12 (default value depends on transform length)

l 2, 2007 www.xilinx.com 13ecification

Page 14: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

14

Simulation Models

When the core is generated using the CORE Generator tool, a UNISIM-based model is created. The FFTcore does not have a VHDL or Verilog functional behavioral model. For this reason, the core overridesthe CORE Generator Project Options and always delivers a Structural model type.

Control Signals and Timing

Synchronous Clear

Asserting the Synchronous Clear (SCLR) pin results in resetting all output pins, internal counters, andstate variables to their initial values. All pending load processes, transform calculations, and unloadprocesses stop and are reinitialized. However, internal frame buffers retain their contents. NFFT will beset to the largest FFT point size permitted (the Transform Length value set in the GUI). The scalingschedule will be set to 1/N. For the Radix-4 Burst I/O and Pipelined Streaming I/O architectures witha non-power-of-four point size, the last stage will have a scaling of 1, and the rest will have a scaling of2. See Table 4.

Transform Size

The transform point size can be set through the NFFT port if the run-time configurable transformlength option is selected. Valid settings and the corresponding transform sizes are provided in Table 5.If the NFFT value entered is too large, the core sets itself to the largest available point size (selected inthe GUI). If the value is too small, the core sets itself to the smallest available point size: 64 for theRadix-4 Burst I/O architecture and 8 for the other architectures.

NFFT values are read in on the rising clock edge when NFFT_WE is High. A new transform sizere-times all current processes within the core, so every time a transform size is latched in, regardless ofwhether or not the new point size differs from the current point size, the core is internally reset. (Note

optimize_for_speed_using_xtreme_dsp_slicesfalsetrue

fast_complex_multfalsetrue (for Virtex-5 the default is true)

fast_butterflyfalsetrue

Table 4: Synchronous Clear Reset Values

Signal Initial / Reset Value

NFFT maximum point size = N

FWD_INV Forward = 1

SCALE_SCH

1/N[10 10... 10] for Radix-4 or Pipelined architecture when N is a power of 4.[01 10... 10] for Radix-4 or Pipelined architecture when N is not a power of 4.[01 01... 01] for Radix-2 or Radix-2-Lite

Table 3: XCO Parameters (Continued)

XCO Parameter Valid Values

www.xilinx.com DS260 April 2, 2007Product Specification

Page 15: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

that FWD_INV and SCALE_SCH are not reset.) Holding NFFT_WE High continues to reset the core onevery clock cycle.

Transform Time

The transform time (in cycles) varies as a function of many parameters and is likely to change as thecore is revised. Handshaking signals are provided to facilitate timely transfer of data to and from thecore. A transform time (in cycles) calculator is provided with this core. For details see Calculator forTransform Cycles.

Forward/Inverse and Scaling Schedule

The transform type (forward or inverse) and the scaling schedule can be set frame-by-frame withoutinterrupting frame processing. The transform type can be set using the FWD_INV pin. SettingFWD_INV to 0 produces an inverse FFT, and setting FWD_INV to 1 creates the forward transform.

The scaling performed during successive stages can be set via the SCALE_SCH pin. For the Radix-4Burst I/O and Radix-2 architectures, the value of the SCALE_SCH bus is used as pairs of bits [... N4,N3, N2, N1, N0]: each pair representing the scaling value for the corresponding stage. There arelog4(point size) stages for Radix-4, and log2(point size) stages for Radix-2. In each stage, the data can beshifted by 0, 1, 2, or 3 bits, which corresponds to SCALE_SCH values of 00, 01, 10, and 11. Stages arecomputed starting with stage 0 as the two LSBs. For example, for Radix-4, when N = 1024, [01 10 00 1110] translates to a right shift by 2 for stage 0, shift by 3 for stage 1, no shift for stage 3, a shift of 2 in stage3, and a shift of 1 for stage 4 (there are log4(1024) = 5 Radix-4 stages). This scaling schedule will scale bya total of 8 bits which gives a scaling factor of 1/256. The conservative schedule SCALE_SCH = [10 1010 10 11] will completely avoid overflows in the Radix-4 architecture. For the Radix-2 and Radix-2-Litearchitectures, the conservative scaling schedule of [01 01 01 01 01 01 01 01 01 10] will prevent overflowfor N = 1024 (there are log2(1024) = 10 Radix-2 stages).

Table 5: Valid NFFT Settings

NFFT[4:0] Transform size (N)

00011 8

00100 16

00101 32

00110 64

00111 128

01000 256

01001 512

01010 1024

01011 2048

01100 4096

01101 8192

01110 16384

01111 32768

10000 65536

l 2, 2007 www.xilinx.com 15ecification

Page 16: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

16

For the pipelined streaming architecture, consider every pair of adjacent Radix-2 stages as a group.That is, group 0 contains stage 0 and 1, group 1 contains stage 2 and 3, and so forth. The value of theSCALE_SCH bus is also used as pairs of bits [... N4, N3, N2, N1, N0]. Each pair represents the scalingvalue for the corresponding group of two stages. In each group, the data can be shifted by 0, 1, 2, or 3bits which corresponds to SCALE_SCH values of 00, 01, 10, and 11. Groups are computed starting withgroup 0 as the two LSBs. For example, when N = 1024, [10 10 00 01 11] translates to a right shift by 3 forgroup 0 (stages 0 and 1), shift by 1 for group 1 (stages 2 and 3), no shift for group 3 (stages 4 and 5), ashift of 2 in group 3 (stages 6 and 7), and a shift of 2 for group 4 (stages 8 and 9). The conservative sched-ule SCALE_SCH = [10 10 10 10 11] will completely avoid overflows in the Pipelined Streaming I/Oarchitecture. Note that when the point size is not a power of 4, the last group only contains one stage,and the maximum bit growth for the last group is one bit. Therefore, the two MSBs of the scaling sched-ule can only be 00 or 01. A conservative scaling schedule for N=512 is SCALE_SCH=[01 10 10 10 11].

The user is allowed great flexibility to set the transform type (Forward/Inverse) and the scaling sched-ule. The FWD_INV and SCALE_SCH values are latched into temporary registers whenever the corre-sponding WE pins are High. FWD_INV_WE and SCALE_SCH_WE can be asserted at any time beforethe frame of data is loaded in. The core will read these temporary registers at XN_RE/XN_IM(0). Theseare the values that will be used for that frame of data. There is no way to alter those values once thetransform calculation phase has started. Any WE assertions after XN_RE/XN_IM(0) affect the framethat follows.

Both the scaling schedule and the transform type are registered internally, so there is no need to holdthese values on the pins. Also, if the scaling and transform type are constant through multiple frames,(that is, no new values are latched in) registered values will apply for successive frames. The scalingschedule and transform type are not reset when NFFT_WE is asserted.

The initial value and reset value of FWD_INV is forward = 1. The scaling schedule is set to 1/N. Thattranslates to [10 10 10 10... 10] for the Radix-4 and Pipelined Streaming architectures, and [01 01...01] forthe Radix-2 architecture. The core will read in (2*number of stages) bits for the scaling schedule. So,when the point size decreases, the leftover MSBs will be ignored. However, all bits will be latched intothe core on SCALE_SCH_WE and will be used in later transforms if the point size increases.

Overflow

The Overflow (OVFLO) signal (used only with fixed-point scaling) will be High during unloading ifany point in the data frame overflowed. For the Burst I/O architectures, the OVFLO signal will go Highas soon as an overflow occurs during the computation and remain High during the entire time theframe is unloading. For the Pipelined Streaming I/O architecture, the OVFLO signal will go High dur-ing unloading as soon as an overflow is detected in that frame.

Block Exponent

The Block Exponent (BLK_EXP) signal (used only with the block floating point option) contains theblock exponent. This signal will be valid during the unloading of the data frame. The value present onthe port represents the total number of bits the data was scaled during the transform. For example, ifBLK_EXP has a value of 00101 = 5, this means the output data (XK_RE, XK_IM) was scaled by 5 bits(shifted right by 5 bits), or in other words, was divided by 32, to fully utilize the available dynamicrange of the output data path without overflowing.

www.xilinx.com DS260 April 2, 2007Product Specification

Page 17: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Calculator for Transform Cycles

When the FFT LogiCORE is generated, the CORE Generator creates a file in the project directory calledxfft_v4_1_timing_calculator_<instance_name>.vhd, where <instance_name> is thename entered in the Component Name field in the FFT LogiCORE GUI. When this file is compiled andsimulated in a simulator, such as ModelSim®, it reports the number of cycles for a transform of the gen-erated core, at every allowed transform length, i.e., based on the values of the parameters in the GUI.The transform time is simply this figure divided by the system clock. For example, if the transformcycles figure is 256 and the core is to be run at 100 MHz, the transform time will be 256/100M = 2.56 μs.The number of cycles reported is the minimum number of cycles between START pulses. For Burst I/Oarchitectures, this transform time is equal to the latency, but not for the pipelined architecture. This cal-culator is for information only. It is recommended that the handshake signals be used to control transferdata to and from the core.

The transform cycle calculator depends upon functions in the library XilinxCoreLib. This library mustbe mapped in the simulator for the calculator to compile. Here is an example of the commands requiredto compile and simulate the file for ModelSim, for an instance of the core called ’r2_fft’:

vlib work

vcom -work work xfft_v4_1_timing_calculator_r2_fft.vhd

vsim work.xfft_v4_1_timing_calculator_r2_fft

run -all

Timing for Pipelined Streaming I/O

Asserting START starts the data loading phase, which will immediately flow into the transform calcu-lation phase and then the data unloading phase. Pulsing START once will allow the transform calcula-tion for a single frame. Pulsing START every N clock cycles will allow continuous data processing.Alternatively, holding START High will also allow continuous data processing (Figure 7). START isignored except when the core can begin loading a new frame, i.e., when no data is being loaded, or thelast value in the data frame is being loaded. If no NFFT_WE, FWD_INV_WE, or SCALE_SCH_WEwere asserted before the initial START, then the defaults will be used. This architecture can also supportnon-continuous data streams (Figure 8). Simply assert START at any time to begin data loading. Afterthe data frame is loaded, the core will proceed to calculate the transform and then output the results.Note that Figure 8 is intended to show the timing of entire frames. It does not show the small skewsbetween signals which occur at the start and end of frames.

Input data (XN_RE, XN_IM) corresponding to a certain XN_INDEX should arrive three clock cycleslater than the XN_INDEX it matches (Figure 9). In this way, XN_INDEX can be used to address externalmemory or a frame buffer storing the input data. RFD will remain High with XN_INDEX during theloading phase when it is valid to input data.

BUSY will go High while the core is calculating the transform. DONE will go High when calculation iscomplete. EDONE will go High one cycle before that, i.e., during the last cycle of the calculation phase.The cycle in which DONE goes High, the core begins unloading. During the unloading phase, whilevalid output results are present on XK_RE/XK_IM, DV (Data Valid) will be High. During unloading,XK_INDEX will correspond to the XK_RE/XK_IM being presented.

l 2, 2007 www.xilinx.com 17ecification

Page 18: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

18

Figure Top x-ref 7

Figure 7: Timing for Continuous Streaming Data

ce dvclk

sclr

nfft

nfft_

we

fwd_

inv_

we

fwd_

inv

scal

e_sc

h

scal

e_sc

h_w

e

star

t

xn_r

e

xn_i

m rfd

busy

edon

e

done

xk_r

e

xk_i

m

xk_i

ndex

ovflo

xn_i

ndex

xip2

22

xk(0

)xk

(N-1

)xk

(0)

xk(N

-1)

xk(0

)xk

(N-1

)xk

(0)

xk(N

-1)

xk(0

)

xk(0

)

N-3

N-2

N-1

0000

N-1

00

xn(0

)

xn(0

)

0001

0203

N-1

00

N c

ycle

s N c

ycle

s

www.xilinx.com DS260 April 2, 2007Product Specification

Page 19: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Figure Top x-ref 8

Figure 8: Timing for Non-Continuous Data Stream

Figure Top x-ref 9

Figure 9: Beginning of Data Frame

start

xn_im

xn_index

busy processing Frame A

load data Frame A load data Frame B

load data Frame A load data Frame B

processing Frame B

unload Frame A unload Frame B

unload Frame A unload Frame B

0 . . . N-10 . . . N-1

0 . . . N-10 . . . N-1

dv

xn_re

xn_im

xn_index

xn_re

rfd

xip223

Note: All transitions are synchronous with the rising edge of the clock.

ce

dv

clk

sclr

nfft_we

fwd_inv_we

fwd_inv

scale_sch_we

start

xn_re

xn_im

rfd

busy

edone

done

xn_index 01 02 03 04 05 06 0700

xip224

xn_re(0) xn_re(1)

xn_im(2)

xn_re(2) xn_re(3) xn_re(4)

xn_im(3) xn_im(4)xn_im(0) xn_im(1)

nfft

0 or 1

scale_sch scaling

pt size

l 2, 2007 www.xilinx.com 19ecification

Page 20: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

20

Timing for Radix-4 Burst I/O, Radix-2 Burst I/O, and Radix-2-Lite Burst I/O

The START signal begins the data loading phase, which leads directly to the calculation phase. Start isignored except when the core can begin loading a new frame, i.e., when the core is idle or in its last cycleof calculation (bit-reversed output) or unloading (natural order output).

Input data (XN_RE, XN_IM) corresponding to a certain XN_INDEX should arrive three clock cycleslater than the XN_INDEX it matches (Figure 10). In this way, XN_INDEX can be used to address exter-nal memory or a frame buffer storing the input data. RFD will remain High with XN_INDEX duringthe loading phase when it is valid to input data.

BUSY will go High while the core is calculating the transform. DONE will go High when calculation iscomplete. EDONE will go High one cycle before that, i.e., during the last cycle of the calculation phase.

After START is asserted and the data is loaded and processed, two options are available to unload data:

• If Natural Output Ordering was selected: To output the data in natural order, UNLOAD should be asserted (Figure 11). Note that Figure 11 is intended to show the timing of entire frames. It does not show the small skews between signals which occur at the start and end of frames and does not show the length of each phase of the transform to scale. The processing time may be much longer than the time required to input or output a frame. UNLOAD can be asserted any time from when EDONE goes High. UNLOAD is ignored except when the core can begin unloading. In addition to using pulses, START and UNLOAD can be tied High (Figure 12). In this case, the core will continuously load, process, and unload data.

• If Bit/Digit Reversed Output Ordering was selected: To output data in bit/digit reversed order, the user should assert START again (Figure 13). While the next frame of data is loaded, the results will be presented in bit/digit reversed order at the same time (Figure 12). START can be asserted any time from when EDONE goes High. If START is tied High, the core will continuously load/unload then process, load/unload then process, and so on.

DV remains High during data unloading in both cases.

There is a latency of k CLK cycles after triggering an unload with UNLOAD or START before the outputdata XK_RE/XK_IM is presented. This latency varies as a function of several core parameters, but theoutput data is qualified by DV(Data valid) and XK_INDEX, so should be considered as a handshake.

www.xilinx.com DS260 April 2, 2007Product Specification

Page 21: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Figure Top x-ref 10

Figure 10: Unload Output Results in Natural Order

xip226

ce

dv

rfd

01 0200

xk_im(0) xk_im(1) xk_im(2)

xk_re(0) xk_re(1) xk_re(2)

clk

sclr

nfft

nfft_we

fwd_inv_we

fwd_inv

scale_sch

scale_sch_we

start

xn_re

xn_im

unload

busy

edone

done

xk_re

xk_im

xk_index

blk_exp

xn_index

00

00

blk exp

l 2, 2007 www.xilinx.com 21ecification

Page 22: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

22

Figure Top x-ref 11

Figure 11: Timing for Burst I/O Solutions with Natural Order Output

ce

star

t

xn_i

m

unlo

ad

xn_i

ndex

busy

proc

essi

ng F

ram

e A

unlo

ad F

ram

e A

unlo

ad F

ram

e B

unlo

ad F

ram

e A

unlo

ad F

ram

e B

proc

essi

ng F

ram

e B

load

Fra

me

A

load

Fra

me

A

load

Fra

me

B

load

Fra

me

B

dv

xk_r

e

xk_i

m

xn_i

ndex

xn_r

e

rfd

Not

e: A

ll tr

ansi

tions

are

syn

chro

nous

with

the

risi

ng e

dge

of th

e cl

ock.

xip2

25

0

.

. .

N-1

0

.

. .

N-1

0

.

. .

N-1

0

.

. .

N-1

www.xilinx.com DS260 April 2, 2007Product Specification

Page 23: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Figure Top x-ref 12

Figure 12: Unloading Results in Bit/Digit Reversed Order

xip2

28

ce dvclk

sclr

nfft_

we

fwd_

inv_

we

fwd_

inv

scal

e_sc

h_w

e

star

t

xn_r

e

xn_i

m rfd

busy

edon

e

done

xn_i

ndex

unlo

adnfft

scal

e_sc

h

xk_r

e

xk_i

m

xk_i

ndex

xk_r

exk

_re

xk_r

e

00

xn_r

e(0)

xn_r

e(1)

xn_r

e(2)

xn_r

e(3)

xn_r

e(4)

0 or

1

scal

ing

08xn_r

e(5)

xn_r

e(6)

xn_i

m(3

)xn

_im

(4)

xn_i

m(5

)xn

_im

(6)

09

xk_i

mxk

_im

xk_i

m

1020

0100

0203

0405

0607

digi

t-re

vers

ed o

rder

xn_i

m(0

)xn

_im

(1)

xn_i

m(2

)

l 2, 2007 www.xilinx.com 23ecification

Page 24: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

24

Figure Top x-ref 13

Figure 13: Unload Results in Bit/Digit Reversed Order

xk(0

)xk

xk(0

)

00

busyclk

star

t

rfd dv

edon

e

done

xk_r

exk

(0)

xk

xkxk

xkxk

xkxk

xkxkxk

(0)

xk_i

m

xk_i

ndex

xn_i

ndex

1000

0000

0102

0307

N-1

0102

0304

0506

0708

00

xn(N

-4)

xn(N

-3)

xn(N

-2)

xn(N

-1)

xn_r

exn

(0)

xn(0

)

xn(N

-4)

xn(0

)xn

_im

xn(0

)x(

N-3

)xn

(N-2

)xn

(N-1

)

Inpu

t of d

ata

fram

e B

Dig

it-re

vers

ed o

utpu

t of p

revi

ousl

y en

tere

d fra

me

AD

igit-

reve

rsed

out

put

of d

ata

fram

e B

Inpu

t of d

ata

fram

e C

xip2

27

www.xilinx.com DS260 April 2, 2007Product Specification

Page 25: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Performance and Resource UsageThe following tables list the resource usage and transform time for a selected set of parameters. Thiscore does not use placement constraints, hence, allowing Place and Route (PAR) full flexibility. The slicecount, block RAM count, and XtremeDSP slice/embedded 18-bit x18-bit multiplier count is listed. Theslice count can vary depending on the options used when running MAP. The maximum clock fre-quency is listed next to the transform time. For Pipelined Streaming I/O, the transform time is the num-ber of clock cycles or the number of microseconds necessary to process one frame of data after theinitial startup latency. For Radix-4 and Radix-2 Burst I/O architectures, a data load + transform time isquoted; this is the time necessary to load the input data and then calculate the FFT, and does notinclude time to unload the results. For each FFT architecture and chip family, a second table is includedwith resource usage numbers for some commonly used parameters. The following architectures arerepresented:

• Virtex-5 Family

• Virtex-4 Family

• Spartan-3E Family

• Virtex-II Pro Family

The maximum clock frequency for each test was determined iteratively. For the determination of max-imum frequency, the core was generated with double registers on each input and output. The registersdirectly connected to the core run on the core clock, whereas the outer registers run off a separate clock.This ensures that all paths in the core are included in the timing constraint without artificially distort-ing the design to fit the chip. The slowest speed grade is used for each family. The parameters used formap and par are as follows:

map -pr b -ol high

par -pl high -rl high

The slice count can typically be reduced from the figures shown by the use of the -c argument to MAP(packing factor); however, this will typically reduce the maximum clock frequency achievable too.

All Virtex and Spartan cases were run using the lowest speed grade.

Virtex-5 Family

Table 6 through Table 13 include performance and resource usage numbers for Virtex-5 FPGAs. All theFFTs use scaled fixed-point arithmetic with truncation after the butterfly. The point size is not run-timeconfigurable, and none of the optional pins (CE, SCLR, OVFLO) are used. The input data and phasefactor widths are 16 bits unless otherwise specified. (The input data width and phase factor width areset to the same value, but that is not a restriction of the FFT core.) The maximum amount of block RAMstorage is used, but some resource numbers are listed using the minimum amount of block RAM so thatthe full range is shown. The output ordering is assumed to be bit/digit reversed except where naturalorder is explicitly stated. Some numbers are shown with both Optimize for Speed options selected:Complex Multiplication and Butterfly Arithmetic.

l 2, 2007 www.xilinx.com 25ecification

Page 26: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

26

Table 6: Virtex-5 Family Pipelined Streaming I/O: Performance and Resource Utilization

Transform Length

Optimize for Speed

LUT6-FF pairs

Block RAMs

XtremeDSP Slices

Max Clock Frequency (MHz)

Transform Time

DeviceClock Cycles

Time (µs)

256 yes 2027 3 30 432 256 0.59 5vsx35t

256 no 2676 3 12 418 256 0.61 5vsx35t

1024 yes 2609 4 40 411 1024 2.49 5vsx35t

1024 no 3446 4 16 403 1024 2.54 5vsx35t

8192 yes 3684 13 58 347 8192 23.6 5vsx35t

8192 no 4829 13 24 361 8192 22.7 5vsx35t

Table 7: Virtex-5 Family Pipelined Streaming I/O: Resource Utilization

Transform Length

Input Data andPhase Factor

Width

Number of Stages using Block Ram

Output Ordering

LUT6-FF Pairs Block RAMsXtremeDSP

Slices

256 24 3 bit reversed 3760 4 24

256 16 0 bit reversed 2826 0 12

256 16 3 natural 2680 3 12

1024 24 5 bit reversed 4810 7 32

1024 16 1 bit reversed 4118 1 16

1024 16 5 natural 3399 6 16

8192 24 8 bit reversed 6682 20 48

8192 16 4 bit reversed 5343 10 24

8192 16 8 natural 4735 27 24

Table 8: Virtex-5 Family Radix-4 Burst I/O: Performance and Resource Utilization

Transform Length

Optimize for Speed

LUT6-FF Pairs

Block RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)

Data Load + Transform Time

DeviceClock Cycles

Time (µs)

256 yes 1616 4 20 390 592 1.52 5vsx35t

256 no 1636 4 12 383 576 1.50 5vsx35t

1024 yes 1718 4 20 383 2404 6.28 5vsx35t

1024 no 1719 4 12 383 2384 6.22 5vsx35t

8192 yes 1992 11 20 383 22668 59.20 5vsx35t

8192 no 2010 11 12 375 22640 60.37 5vsx35t

www.xilinx.com DS260 April 2, 2007Product Specification

Page 27: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Table 9: Virtex-5 Family Radix-4 Burst I/O: Resource Utilization

Transform Length

Input Data andPhase Factor

Width

Data and Phase Factor

Memory

Output Ordering

LUT6-FF Pairs Block RAMsXtremeDSP

Slices

256 24 block RAM digit reversed 2419 7 24

256 16distributed

RAMdigit reversed 1857 0 12

256 16 block RAM natural 1669 4 12

1024 24 block RAM digit reversed 2543 7 24

1024 16distributed

RAMdigit reversed 3337 0 12

1024 16 block RAM natural 1765 4 12

8192 24 block RAM digit reversed 2771 17 24

8192 16 block RAM natural 2067 11 12

Table 10: Virtex-5 Family Radix-2 Burst I/O: Performance and Resource Utilization

Transform Length

Optimize for Speed

LUT6-FF Pairs

Block RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)

Data Load + Transform Time

DeviceClock Cycles

Time (µs)

256 yes 806 2 6 418 1408 3.36 5vsx35t

256 no 809 2 4 446 1392 3.12 5vsx35t

1024 yes 865 2 6 439 6304 14.36 5vsx35t

1024 no 871 2 4 432 6284 14.55 5vsx35t

8192 yes 980 9 6 375 61648 164.39 5vsx35t

8192 no 988 9 4 361 61622 170.70 5vsx35t

Table 11: Virtex-5 Family Radix-2 Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor

Memory

Output Ordering

LUT6-FF Pairs Block RAMsXtremeDSP

Slices

256 24 block RAM bit reversed 1149 3 8

256 16distributed

RAMbit reversed 1035 0 4

256 16 block RAM natural 834 2 4

1024 24 block RAM bit reversed 1221 3 8

1024 16distributed

RAMbit reversed 2123 0 4

1024 16 block RAM natural 900 2 4

8192 24 block RAM bit reversed 1308 13 8

8192 16 block RAM natural 1003 9 4

l 2, 2007 www.xilinx.com 27ecification

Page 28: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

28

Table 12: Virtex-5 Family Radix-2-Lite Burst I/O: Performance and Resource Utilization

Transform Length

Optimize for Speed

LUT6-FF Pairs

Block RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)

Data Load + Transform Time

DeviceClock Cycles

Time (µs)

256 yes 692 2 3 446 2330 5.22 5vsx35t

256 no 699 2 2 432 2328 5.39 5vsx35t

1024 yes 739 2 3 446 11290 25.31 5vsx35t

1024 no 751 2 2 432 11288 26.13 5vsx35t

8192 yes 844 9 3 418 114714 274.43 5vsx35t

8192 no 852 9 2 418 114712 274.43 5vsx35t

Table 13: Virtex-5 Family Radix-2-Lite Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor

Memory

Output Ordering

Slices Block RAMsXtremeDSP

Slices

256 24 block RAM bit reversed 944 2 4

256 16distributed

RAMbit reversed 1048 0 2

256 16 block RAM natural 690 2 2

1024 24 block RAM bit reversed 1023 3 4

1024 16distributed

RAMbit reversed 2033 0 2

1024 16 block RAM natural 759 2 2

8192 24 block RAM bit reversed 1090 13 4

8192 16 block RAM natural 863 9 2

Table 14: Virtex-5 Family Radix-2-Lite Burst I/O: Multichannel Performance and Resource Utilization

Transform Length

ChannelsOptimize for

SpeedLUT6-FF

PairsBlock RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)Device

256 1 yes 696 2 3 418 5vsx95t

256 2 yes 1067 2 6 391 5vsx95t

256 1 no 705 2 2 412 5vsx95t

256 2 no 1082 2 4 426 5vsx95t

1024 4 yes 745 2 3 384 5vsx95t

1024 12 yes 1116 3 6 390 5vsx95t

1024 4 no 753 2 2 411 5vsx95t

1024 12 no 1134 3 4 390 5vsx95t

8192 1 yes 848 9 3 397 5vsx95t

www.xilinx.com DS260 April 2, 2007Product Specification

Page 29: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Virtex-4 Family

Table 15 through Table 22 include performance and resource usage numbers for Virtex-4 FPGAs. Allthe FFTs use scaled fixed-point arithmetic with truncation after the butterfly. The point size is notrun-time configurable, and none of the optional pins (CE, SCLR, OVFLO) are used. The input data andphase factor widths are 16 bits unless otherwise specified. (The input data width and phase factorwidth are set to be the same value, but that is not a restriction of the FFT core.) The maximum amountof block RAM storage is used, but some resource numbers are listed using the minimum amount ofblock RAM so that the full range is shown. The output ordering is assumed to be bit/digit reversedexcept where natural order is explicitly stated. Some numbers are shown with both Optimize for Speedoptions selected: Complex Multiplication and Butterfly Arithmetic.

8192 2 yes 1217 17 6 375 5vsx95t

8192 1 no 856 9 2 383 5vsx95t

8192 2 no 1232 17 4 375 5vsx95t

Table 15: Virtex-4 Family Pipelined Streaming I/O: Performance and Resource Utilization

Transform Length

Optimize for Speed

SlicesBlock RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)

Transform Time

DeviceClock Cycles

Time (µs)

256 yes 1131 4 34 354 256 0.70 4vsx25

256 no 1535 4 12 347 256 0.88 4vsx25

1024 yes 1489 7 46 354 1024 3.16 4vsx25

1024 no 2014 7 16 347 1024 3.42 4vsx25

8192 yes 2244 24 66 341 8192 24.4 4vsx25

8192 no 2927 24 24 329 8192 26.95 4vsx25

Table 16: Virtex-4 Family Pipelined Streaming I/O: Resource Utilization

Transform Length

Input Data andPhase Factor

Width

Number of Stages using Block Ram

Output Ordering

Slices Block RAMsXtremeDSP

Slices

256 24 3 bit reversed 3065 7 36

256 16 0 bit reversed 2249 0 12

256 16 3 natural 1595 5 12

1024 24 5 bit reversed 4025 12 48

1024 16 1 bit reversed 3760 1 16

1024 16 5 natural 2081 11 16

Table 14: Virtex-5 Family Radix-2-Lite Burst I/O: Multichannel Performance and Resource Utilization

Transform Length

ChannelsOptimize for

SpeedLUT6-FF

PairsBlock RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)Device

l 2, 2007 www.xilinx.com 29ecification

Page 30: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

30

8192 24 8 bit reversed 5708 37 72

8192 16 4 bit reversed 4468 19 24

8192 16 8 natural 3130 53 24

Table 17: Virtex-4 Family Radix-4 Burst I/O: Performance and Resource Utilization

Transform Length

Optimize for Speed

SlicesBlock RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)

Data Load + Transform Time

DeviceClock Cycles

Time (µs)

256 yes 1110 7 28 366 592 1.62 4vsx25

256 no 1423 7 9 285 592 2.08 4vsx25

1024 yes 1176 7 28 385 2404 6.24 4vsx25

1024 no 1505 7 9 285 2404 8.44 4vsx25

8192 yes 1396 22 28 372 22668 60.94 4vsx25

8192 no 1722 22 9 291 22668 77.90 4vsx25

Table 18: Virtex-4 Family Radix-4 Burst I/O: Resource Utilization

Transform Length

Input Data andPhase Factor

Width

Data and Phase Factor

Memory

Output Ordering

Slices Block RAMsXtremeDSP

Slices

256 24 block RAM digit reversed 2394 11 36

256 16distributed

RAMdigit reversed 2336 0 9

256 16 block RAM natural 1440 7 9

1024 24 block RAM digit reversed 2506 11 36

1024 16distributed

RAMdigit reversed 5215 0 9

1024 16 block RAM natural 1528 7 9

8192 24 block RAM digit reversed 2688 33 36

8192 16 block RAM natural 1742 22 9

Table 16: Virtex-4 Family Pipelined Streaming I/O: Resource Utilization (Continued)

Transform Length

Input Data andPhase Factor

Width

Number of Stages using Block Ram

Output Ordering

Slices Block RAMsXtremeDSP

Slices

www.xilinx.com DS260 April 2, 2007Product Specification

Page 31: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Table 19: Virtex-4 Family Radix-2 Burst I/O: Performance and Resource Utilization

Transform Length

Optimize for Speed

SlicesBlock RAMs

XtremeDSPSlices

Max Clock Frequency

(MHz)

Data Load + Transform Time

DeviceClock Cycles

Time (µs)

256 yes 492 3 8 366 1408 3.85 4vsx25

256 no 625 3 3 360 1424 3.96 4vsx25

1024 yes 549 3 8 372 6304 16.94 4vsx25

1024 no 684 3 3 372 6324 17.00 4vsx25

8192 yes 627 18 8 354 61648 174.15 4vsx25

8192 no 759 18 3 329 61674 187.46 4vsx25

Table 20: Virtex-4 Family Radix-2 Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor

Memory

Output Ordering

Slices Block RAMsXtremeDSP

Slices

256 24 block RAM bit reversed 996 5 12

256 16distributed

RAMbit reversed 1415 0 3

256 16 block RAM natural 637 3 3

1024 24 block RAM bit reversed 1065 5 12

1024 16distributed

RAMbit reversed 3903 0 3

1024 16 block RAM natural 699 3 3

8192 24 block RAM bit reversed 1135 25 12

8192 16 block RAM natural 777 18 3

Table 21: Virtex-4 Family Radix-2-Lite Burst I/O: Performance and Resource Utilization

Transform Length

Optimize for Speed

SlicesBlock RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)

Data Load + Transform Time

DeviceClock Cycles

Time (µs)

256 yes 402 2 5 397 2330 5.86 4vsx25

256 no 406 2 2 360 2328 6.47 4vsx25

1024 yes 424 3 5 397 11290 28.44 4vsx25

1024 no 428 3 2 366 11288 30.84 4vsx25

8192 yes 486 18 5 379 114714 304.67 4vsx25

8192 no 488 18 2 366 114712 313.42 4vsx25

l 2, 2007 www.xilinx.com 31ecification

Page 32: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

32

Spartan-3E Family

Table 24 through Table 29 include performance and resource usage numbers for Spartan-3E. All theFFTs use scaled fixed-point arithmetic with truncation after the butterfly. The point size is not run-timeconfigurable, and none of the optional pins (CE, SCLR, OVFLO) are used. The input data and phasefactor widths are 16 bits unless otherwise specified. (The input data width and phase factor width areset to be the same value, but that is not a restriction of the FFT core.) The maximum amount of blockRAM storage is used, but some resource numbers are listed using the minimum amount of block RAMso that the full range is shown. The output ordering is assumed to be bit/digit reversed except wherenatural order is explicitly stated.

Table 22: Virtex-4 Family Radix-2-Lite Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor

Memory

Output Ordering

Slices Block RAMsXtremeDSP

Slices

256 24 block RAM bit reversed 583 3 8

256 16distributed

RAMbit reversed 1207 0 2

256 16 block RAM natural 408 2 2

1024 24 block RAM bit reversed 623 5 8

1024 16distributed

RAMbit reversed 3530 0 2

1024 16 block RAM natural 430 3 2

8192 24 block RAM bit reversed 674 27 8

8192 16 block RAM natural 492 18 2

Table 23: Virtex-4 Family Radix-2-Lite Burst I/O: Multichannel Performance and Resource Utilization

Transform Length

ChannelsOptimize for

SpeedSlices

Block RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)Device

256 1 yes 402 2 4 385 4vsx55

256 2 yes 610 3 8 400 4vsx55

256 1 no 406 2 2 348 4vsx55

256 2 no 618 3 4 354 4vsx55

1024 1 yes 424 3 4 391 4vsx55

1024 2 yes 635 5 8 391 4vsx55

1024 1 no 428 3 2 372 4vsx55

1024 2 no 643 5 4 352 4vsx55

8192 1 yes 487 18 4 381 4vsx55

8192 2 yes 686 34 8 355 4vsx55

8192 1 no 488 18 2 372 4vsx55

8192 2 no 694 34 4 347 4vsx55

www.xilinx.com DS260 April 2, 2007Product Specification

Page 33: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Table 24: Spartan-3E Family Pipelined Streaming I/O: Performance and Resource Utilization

Transform Length

Optimize for speed

SlicesBlock RAMs

18x18 Mults

Max Clock Frequency

(MHz)

Transform Time

DeviceClock Cycles

Time (µs)

256 no 1988 4 12 189 256 1.35 xc3s1600e

1024 no 2630 7 16 181 1024 5.66 sx3s1600e

8192 no 3741 24 24 174 8192 47.08 xc3s1600e

Table 25: Spartan-3E Family Pipelined Streaming I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Number of Stages using Block Ram

Output Ordering

Slices Block RAMs 18x18 Mults

256 24 3 bit reversed 3574 7 36

256 16 0 bit reversed 2645 0 12

256 16 3 natural 2048 5 12

1024 16 1 bit reversed 4315 1 16

1024 16 5 natural 2697 11 16

8192 16 4 bit reversed 5226 18 24

Table 26: Spartan-3E Family Radix-4 Burst I/O: Performance and Resource Utilization

Transform Length

Optimize for speed

SlicesBlock RAMs

18x18 Mults

Max Clock Frequency

(MHz)

Data Load + Transform Time

DeviceClock Cycles

Time (µs)

256 no 1514 7 9 181 576 3.18 xc3s500e

1024 no 1577 7 9 181 2384 13.17 xc3s500e

8192 no 1787 22 9 166 22640 136.38 xc3s1600e

Table 27: Spartan-3E Family Radix-4 Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor

Memory

Output Ordering

Slices Block RAMs 18x18 Mults

256 24 block RAM digit reversed 2714 11 36

256 16distributed

RAMdigit reversed 2417 0 9

256 16 block RAM natural 1534 7 9

1024 24 block RAM digit reversed 2819 11 36

1024 16distributed

RAMdigit reversed 5308 0 9

l 2, 2007 www.xilinx.com 33ecification

Page 34: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

34

1024 16 block RAM natural 1598 7 9

8192 24 block RAM digit reversed 2994 33 36

8192 16 block RAM natural 1815 22 9

Table 28: Spartan-3E Family Radix-2 Burst I/O: Performance and Resource Utilization

Transform Length

Slices Block RAMs 18x18 MultsMax Clock Frequency

(MHz)

Data Load + Transform TimeDevice

Clock Cycles Time (µs)

256 642 3 3 197 1392 7.07 xc3s500e

1024 697 3 3 197 6284 31.90 xc3s500e

8192 774 18 3 189 61622 326.04 sx3s500e

Table 29: Spartan-3E Family Radix-2 Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor

Memory

Output Ordering

Slices Block RAMs 18x18 Mults

256 24 block RAM bit reversed 1057 5 12

256 16distributed

RAMbit reversed 1424 0 3

256 16 block RAM natural 650 3 3

1024 24 block RAM bit reversed 1122 5 12

1024 16distributed

RAMbit reversed 3912 0 3

1024 16 block RAM natural 706 3 3

8192 24 block RAM bit reversed 1270 25 12

8192 16 block RAM natural 784 18 3

Table 30: Spartan-3E Family Radix-2-Lite Burst I/O: Performance and Resource Utilization

Transform Length

Slices Block RAMs 18x18 MultsMax Clock Frequency

(MHz)

Data Load + Transform TimeDevice

Clock Cycles Time (µs)

256 492 2 2 197 2327 11.81 xc3s250e

1024 514 3 2 189 11287 59.72 xc3s250e

8192 577 18 2 189 114711 606.94 sx3s500e

Table 27: Spartan-3E Family Radix-4 Burst I/O: Resource Utilization (Continued)

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor

Memory

Output Ordering

Slices Block RAMs 18x18 Mults

www.xilinx.com DS260 April 2, 2007Product Specification

Page 35: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Virtex-II Pro Family

Table 32 through Table 40 include performance and resource usage numbers for Virtex-II Pro. All theFFTs use scaled fixed-point arithmetic with truncation after the butterfly. The point size is not run-timeconfigurable, and none of the optional pins (CE, SCLR, OVFLO) are used. The input data and phasefactor widths are 16 bits unless otherwise specified. (The input data width and phase factor width areset to be the same value, but that is not a restriction of the FFT core.) The maximum amount of blockRAM storage is used, but some resource numbers are listed using the minimum amount of block RAMso that the full range is shown. The output ordering is assumed to be bit/digit reversed except wherenatural order is explicitly stated.

Table 31: Spartan-3E Family Radix-2-Lite Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor

Memory

Output Ordering

Slices Block RAMs 18x18 Mults

256 24 block RAM bit reversed 778 3 8

256 16distributed

RAMbit reversed 1294 0 2

256 16 block RAM natural 494 2 2

1024 24 block RAM bit reversed 815 5 8

1024 16distributed

RAMbit reversed 3729 0 2

1024 16 block RAM natural 514 3 2

8192 24 block RAM bit reversed 865 27 8

8192 16 block RAM natural 579 18 2

Table 32: Virtex-II Pro Family Pipelined Streaming I/O: Performance and Resource Utilization

Transform Length

Optimize for

speedSlices

Block RAMs

18x18 Mults

Max Clock Frequency

(MHz)

Transform Time

DeviceClock Cycles

Time (µs)

256 no 1993 4 12 225 256 1.14 xc2vp20

1024 no 2620 7 16 235 1024 4.36 xc2vp20

8192 no 3724 24 24 207 8192 39.57 xc2vp20

Table 33: Virtex-II Pro Family Pipelined Streaming I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Number of Stages using Block Ram

Output Ordering

Slices Block RAMs 18x18 Mults

256 24 3 bit reversed 3567 7 36

256 16 0 bit reversed 2655 0 12

256 16 3 natural 2053 5 12

l 2, 2007 www.xilinx.com 35ecification

Page 36: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

36

1024 16 5 natural 2687 11 16

1024 16 1 bit reversed 4375 1 16

8192 16 4 bit reversed 5231 18 24

Table 34: Virtex-II Pro Family Radix-4 Burst I/O: Performance and Resource Utilization

Transform Length

Optimize for

speedSlices

Block RAMs

18x18 Mults

Max Clock Frequency

(MHz)

Data Load + Transform TimeDevice

Clock Cycles Time (µs)

256 no 1519 7 9 225 572 2.54 xc2vp4

1024 no 1583 7 9 216 2379 11.01 xc2vp4

8192 no 1799 22 9 225 22633 100.59 xc2vp4

Table 35: Virtex-II Pro Family Radix-4 Burst I/O: Resource Utilization:

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor Memory

Output Ordering Slices Block RAMs 18x18 Mults

256 24 block RAM digit reversed 2718 11 36

256 16 distributed RAM digit reversed 2379 0 9

256 16 block RAM natural 1534 7 9

1024 24 block RAM digit reversed 2820 11 36

1024 16 distributed RAM digit reversed 5291 0 9

1024 16 block RAM natural 1604 7 9

8192 24 block RAM digit reversed 2994 33 36

8192 16 block RAM natural 1824 22 9

Table 36: Virtex-II Pro Family Radix-2 Burst I/O: Performance and Resource Utilization

Transform Length

SlicesBlock RAMs

18x18 Mults

Max Clock Frequency

(MHz)

Data Load + Transform TimeDevice

Clock Cycles Time (µs)

256 643 3 3 253 1384 5.47 xc2vp4

1024 697 3 3 253 6274 24.80 xc2vp4

8192 774 18 3 244 61609 252.50 xc2vp4

Table 33: Virtex-II Pro Family Pipelined Streaming I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Number of Stages using Block Ram

Output Ordering

Slices Block RAMs 18x18 Mults

www.xilinx.com DS260 April 2, 2007Product Specification

Page 37: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Table 37: Virtex-II Pro Family Radix-2 Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor Memory

Output Ordering Slices Block RAMs 18x18 Mults

256 24 block RAM bit reversed 1058 5 12

256 16 distributed RAM bit reversed 1412 0 3

256 16 block RAM natural 649 3 3

1024 24 block RAM bit reversed 1122 5 12

1024 16 distributed RAM bit reversed 4161 0 3

1024 16 block RAM natural 705 3 3

8192 24 block RAM bit reversed 1271 25 12

8192 16 block RAM natural 783 18 3

Table 38: Virtex-II Pro Family Radix-2-Lite Burst I/O: Performance and Resource Utilization

Transform Length

SlicesBlock RAMs

18x18 Mults

Max Clock Frequency

(MHz)

Data Load + Transform TimeDevice

Clock Cycles Time (µs)

256 492 2 2 244 2326 9.53 xc2vp4

1024 515 3 2 253 11286 44.61 xc2vp4

8192 577 18 2 253 114710 453.40 xc2vp4

Table 39: Virtex-II Pro Family Radix-2-Lite Burst I/O: Resource Utilization

Transform Length

Input Data and Phase Factor

Width

Data and Phase Factor Memory

Output Ordering Slices Block RAMs 18x18 Mults

256 24 block RAM bit reversed 778 3 8

256 16 distributed RAM bit reversed 1285 0 2

256 16 block RAM natural 494 2 2

1024 24 block RAM bit reversed 817 5 8

1024 16 distributed RAM bit reversed 4117 0 2

1024 16 block RAM natural 516 3 2

8192 24 block RAM bit reversed 865 27 8

8192 16 block RAM natural 581 18 2

l 2, 2007 www.xilinx.com 37ecification

Page 38: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

38

Dynamic Range Characteristics

The dynamic range characteristics are shown by performing slot noise tests. First, a frame of complexGaussian noise data samples is created. An FFT is taken to acquire the spectrum of the data. To createthe slot, a range of frequencies in the spectra is set to zero. To create the input slot noise data frame, theinverse FFT is taken, then the data is quantized to use the full input dynamic range. Because of thequantization, if a perfect FFT is done on the frame, the noise floor on the bottom of the slot will be non-zero. The Input Data figures, which basically represent the dynamic range of the input format, displaythis.

This slot noise input data frame is fed to the FFT core to see how shallow the slot becomes due to thefinite precision arithmetic. The depth of the slot shows the dynamic range of the FFT.

Figures 14 through 23 show the effect of input data width on the dynamic range. All FFTs have thesame bit width for both data and phase factors. Block floating point arithmetic is used with roundingafter the butterfly. The figures show the input data slot and the output data slot for bit widths of 24, 20,16, 12, and 8.

Table 40: Virtex-II Pro Family Radix-2-Lite Burst I/O: Multichannel Performance and Resource Utilization

Transform Length

ChannelsOptimize for

SpeedSlices

Block RAMs

XtremeDSP Slices

Max Clock Frequency

(MHz)Device

256 1 no 492 2 2 253 xc2vp100

256 2 no 815 3 4 253 xc2vp100

1024 1 no 515 3 2 244 xc2vp100

1024 2 no 843 5 4 244 xc2vp100

8192 1 no 577 18 2 235 xc2vp100

8192 2 no 915 34 4 235 xc2vp100

Figure Top x-ref 14

Figure 14: Input Data: 24 Bits

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT BinNumber

dB

www.xilinx.com DS260 April 2, 2007Product Specification

Page 39: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

Figure Top x-ref 15

Figure 15: FFT Core Results: 24 Bits

Figure Top x-ref 16

Figure 16: Input Data: 20 Bits

Figure Top x-ref 17

Figure 17: FFT Core Results: 20 Bits

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT BinNumber

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

l 2, 2007 www.xilinx.com 39ecification

Page 40: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

40

Figure Top x-ref 18

Figure 18: Input Data: 16 Bits

Figure Top x-ref 19

Figure 19: FFT Core Results: 16 Bits

Figure Top x-ref 20

Figure 20: Input Data: 12 Bits

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT BinNumber

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT BinNumber

dB

www.xilinx.com DS260 April 2, 2007Product Specification

Page 41: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

There are several options available that also affect the dynamic range. Consider the arithmetic typeused.

Figure Top x-ref 21

Figure 21: FFT Core Results: 12 Bits

Figure Top x-ref 22

Figure 22: Input Data: 8 Bits

Figure Top x-ref 23

Figure 23: FFT Core Results: 8 Bits

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT BinNumber

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

l 2, 2007 www.xilinx.com 41ecification

Page 42: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

42

Figures 24, 25, and 26 display the results of using unscaled, scaled (scaling of 1/1024), and block float-ing point. All three FFTs are 1024 point, Radix-4 transforms with 16-bit input, 16-bit phase factors, andconvergent rounding.

Figure Top x-ref 24

Figure 24: Full-Precision Unscaled Arithmetic

Figure Top x-ref 25

Figure 25: Scaled (scaling of 1/N) Arithmetic

Figure Top x-ref 26

Figure 26: Block Floating Point Arithmetic

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

www.xilinx.com DS260 April 2, 2007Product Specification

Page 43: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

After the butterfly computation, the LSBs of the data path can be truncated or rounded. The effects ofthese options are shown below in Figures 27 and 28. Both transforms are 1024 points with 16-bit dataand phase factors using block floating point arithmetic.

For illustration purposes, the effect of point size on dynamic range is displayed Figures 29 through 31.The FFTs in these figures use 16-bit input and phase factors along with convergent rounding and blockfloating point arithmetic.

Figure Top x-ref 27

Figure 27: Convergent Rounding

Figure Top x-ref 28

Figure 28: Truncation

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

l 2, 2007 www.xilinx.com 43ecification

Page 44: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

44

Figure Top x-ref 29

Figure 29: 64-point Transform

Figure Top x-ref 30

Figure 30: 2048-point Transform

Figure Top x-ref 31

Figure 31: 8192-point Transform

10 20 30 40 50 60

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

200 400 600 800 1000 1200 1400 1600 1800 2000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

1000 2000 3000 4000 5000 6000 7000 8000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

www.xilinx.com DS260 April 2, 2007Product Specification

Page 45: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

DS260 ApriProduct Sp

All of the preceding dynamic range plots show the result of a Radix-4 architecture. Figures 32 andFigure 33 show two plots for the Radix-2 architecture. Both use 16-bit input and phase factors alongwith convergent rounding and block floating point.

References1. J. W. Cooley and J. W. Tukey, An Algorithm for the Machine Computation of Complex Fourier Series,

Mathematics of Computation, Vol. 19, pp. 297-301, April 1965.2. J. G. Proakis and D. G. Manolakis, Digital Signal Processing Principles, Algorithms and Applications

Second Edition, Maxwell Macmillan International, New York, 1992.3. W. R. Knight and R. Kaiser, A Simple Fixed-Point Error Bound for the Fast Fourier Transform, IEEE

Trans. Acoustics, Speech and Signal Proc., Vol. 27, No. 6, pp. 615-620, December 1979.4. L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice-Hall Inc.,

Englewood Cliffs, New Jersey, 1975.

Figure Top x-ref 32

Figure 32: 64-point Radix-2 Transform

Figure Top x-ref 33

Figure 33: 1024-point Radix-2 Transform

10 20 30 40 50 60

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

100 200 300 400 500 600 700 800 900 1000

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

FFT Bin Number

dB

l 2, 2007 www.xilinx.com 45ecification

Page 46: DS260, Fast Fourier Transform v4 - Raman Research Institutedsp_ral/spectrometer_10p4/func_des/xfft.pdf · Fast Fourier Transform v4.1 2 DS260 April 2, 2007 Product Specification Theory

Fast Fourier Transform v4.1

46

Ordering InformationThe FFT core may be downloaded from the Xilinx IP Center for use with the Xilinx CORE Generatorv9.1i and higher. The Xilinx CORE Generator is bundled with all ISE™ Foundation Software packagesat no additional charge. Information about additional Xilinx LogiCORE modules is available on the Xil-inx IP Center.

To order Xilinx software, contact your local Xilinx sales representative.

Revision History

Date Version Revision

03/28/03 1.0 Xilinx release in new template.

07/14/03 2.0 Modified Figures 8 through 14, inclusive.

12/11/03 2.1 Updated to v2.1 release.

05/21/04 3.0 Updated to v3.0 release.

11/11/04 3.1Updated document to support core v3.1 release - updated performance and resource utilization tables for Virtex-II and Virtex-II Pro. Also added performance and resource utilization tables for Virtex-4.

8/31/05 3.2Updated documentation for v3.2 core release; updated performance and resource utilization tables; updated for ISE v7.1i SP4.

1/11/06 3.3 Corrected table 9, XtremeDSP Slices, row 3.

11/30/06 3.4 Updated to v4.0 release.

02/15/07 4.1 Updated to v4.1 release.

04/02/07 Added support for Spartan-3A DSP devices.

www.xilinx.com DS260 April 2, 2007Product Specification