DS534, FIR Compiler v3 - Oregon State Universityweb.engr.oregonstate.edu/~tavakola/Data Sheets/fir_compiler_ds534.pdf · For MAC-based FIR filter imple- ... Figure 2 shows the schematic

Features• Highly parameterizable drop-in module for

Virtex™, Virtex-E, Virtex-II, Virtex-II Pro, Virtex-4,

Virtex-5, Spartan™-II, Spartan-IIE, Spartan-3, Spartan-3A/3AN/3A DSP, and Spartan-3E FPGAs

• High-performance finite impulse response (FIR), polyphase decimator, polyphase interpolator, half-band, half-band decimator, half-band interpolator, Hilbert transform and interpolated filter implementations

• Multiply-Accumulate (MAC) and Distributed Arithmetic (DA) architectures available

• Support for up to 256 sets of coefficients, with 2 to 1024 coefficients per set

• Signed or unsigned input data with 1- to 32-bit precision

• Signed or unsigned filter coefficients with 1- to 32-bit precision

• Up to 74-bit accumulator width (48-bit limit on DSP-enabled families)

• Support for up to 64 channels

• Interpolation and decimation factors of up to 64 generally and up to 1024 for single channel filters.

• Coefficient symmetry exploitation extended for MAC implementations on DSP capable families

• DA-based filters support both serial and parallel implementation

• MAC implementations use single or multiple MAC engines to achieve specified filter performance

• Data-flow-style core interface and control

• On-line coefficient reload capability

• User-selectable output rounding available in DSP-enabled families

• Incorporates Xilinx Smart-IP™ technology for maximum performance

• Use with Xilinx CORE Generator™ software v9.2i or later

General DescriptionThe Xilinx LogiCORE™ IP FIR Compiler core providesa common interface for users to generate highly param-eterizable, area-efficient high-performance FIR filtersutilizing either Multiply-Accumulate (MAC) or Distrib-uted Arithmetic (DA) architectures. A wide range of fil-ter types can be implemented in the Xilinx COREGenerator: single-rate, half-band, Hilbert transformand interpolated filters, in addition to multi-rate filterssuch as polyphase decimators and interpolators andhalf-band decimators and interpolators. Structure inthe coefficient set is exploited to produce area-efficientFPGA implementations. Sufficient arithmetic precisionis employed in the internal data-path to avoid the pos-sibility of overflow.

The conventional single-rate FIR version of the corecomputes the convolution sum defined in Equation 1,where N is the number of filter coefficients.

The conventional tapped delay line realization of thisinner-product calculation is shown in Figure 1.

Although the figure is a useful conceptualization of thecomputation performed by the core, the actual FPGArealization is quite different. Where a MAC realizationis selected, one or more time-shared multiply accumu-late (MAC) functional units to service the Nsum-of-product calculations in the filter. The core auto-matically determines the minimum number of MACengines required to meet user-specified throughput.Where a distributed arithmetic (DA) realization [1] [2]is selected, no explicit multipliers are employed in thedesign; only look-up tables (LUTs), shift registers, and ascaling accumulator are required.

0

FIR Compiler v3.2

DS534 October 10, 2007 0 0 Product Specification

y k( ) a n( )x k n–( )n 0=

N 1–

∑= k 0 1 …, ,= Equation 1

DS534 October 10, 2007 www.xilinx.com 1Product Specification

© 2006-2007 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, the Brand Window, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.

www.xilinx.com

FIR Compiler v3.2

2

.

Feature Support MatrixNote that there are distinct implementation structures utilized within the FIR Compiler, with the choicebeing determined largely by device family and desired architecture. Feature support is not uniformacross these structures, as indicated in Table 1 and Table 2. Distributed Arithmetic FIR filter implemen-tations are currently available in all families except the Virtex-5 family. For MAC-based FIR filter imple-mentations, two structures are available with the choice being dependent on the project device family.Older families that do not have DSP slices or Embedded Multipliers use an adder tree based structure,while those that have DSP slices (currently Virtex-4 and Virtex-5 families and the Spartan-3A DSPparts) and Embedded Multipliers available (Spartan-3 and Virtex-II families) implement the filter usinga cascaded adder chain structure. The cascaded adder chain structure is particularly suited to familieswith the DSP slice as this exploits the capabilities of these advanced FPGA families. While the interfaceand operation of these two structures are broadly similar, any differences are indicated in this docu-ment. Support for the various features of the FIR Compiler core across different filter architectures anddevice families is summarized in Table 1.

Note: Customers should note from Table 1 the improved feature support for MAC-based filters in families with Embedded Multipliers. This has been achieved by using a different architecture than in previous versions. Hence, the latency of the core will also be different and customers should verify that the new latency meets their requirements.

Figure Top x-ref 1

Figure 1: Conventional Tapped Delay Line FIR Filter Representation

Table 1: Feature Support Matrix

FeatureDistributed Arithmetic

Multiply-Accumulate(Virtex-5 FPGAs)

Multiply-Accumulate (Virtex-4, Spartan-3,

Virtex-II FPGAs)

Multiply-Accumulate(other families)

Number of coefficients 2–1024 2–1024 2–1024 2–1024

Coefficient width1 1–32 2–25 2–18 2–32

Data width1,2 1–32 2–25 2–18 2–32

Number of channels 1–8 1–64 1–64 1–64

Maximum Rate ChangeSingle ChannelMultiple Channels

81

102464

102464

6464

Fractional Rate Support

Coefficient ReloadOfflineOnline (glitch-free)

z-1z-1x(n)

y(n)

z-1 z-1 z-1

a(0) a(1) a(2) a(3) a(4) a(N-1)

www.xilinx.com DS534 October 10, 2007Product Specification

www.xilinx.com

FIR Compiler v3.2

DS534 OctoProduct Sp

Table 2 shows the classes of filters that are supported for the FIR Compiler core.

The supported filter configurations are described in separate sections within this document.

Notable Limitations

In conjunction with Table 1 and Table 2, it is important to note some further limitations inherent in thecore.

When implementing MAC-based filters in families without DSP slices or Embedded Multiplier capa-bility:

• Symmetry is not exploited in configurations requiring more than one multiply-accumulate engine.

• Symmetry is not exploited for interpolating filter implementations.

For more recent device families, the following significant limitations apply for MAC-based cores:

• Symmetry is not exploited in configurations requiring multiple columns of DSP slices.

• Fractional Rate filters do not currently exploit coefficient symmetry.

When selecting the Distributed Arithmetic-based core architecture, the limitations are as follows:

• Symmetry is not exploited for multi-rate filters.

• DA-based cores are not available for Virtex-5 devices.

Coefficient Sets 1 1–256 1–256 1–256

Max Accumulator Width 74 48 48 74

Notes: 1. Maximum Coefficient Width reduces by one in DSP Slice and Embedded Multiplier families when the

Coefficients are signed. Similarly for Maximum Data Width when the Data values are signed.2. The allowable range for the Data Width field in the GUI may reduce further in Virtex-5 devices to ensure that

the accumulator width does not exceed maximum.

Table 2: Filter Configuration Support Matrix

Filter Configuration Distributed Arithmetic

Multiply-Accumulate (families with DSP slices or Embedded Multipliers)

Multiply-Accumulate (other families)

Conventional single-rate FIR

Half-band FIR

Hilbert transform [5]

Interpolated FIR [4] [6]

Polyphase decimator

Polyphase interpolator

Half-band decimator

Half-band interpolator

Table 1: Feature Support Matrix (Continued)

FeatureDistributed Arithmetic

Multiply-Accumulate(Virtex-5 FPGAs)

Multiply-Accumulate (Virtex-4, Spartan-3,

Virtex-II FPGAs)

Multiply-Accumulate(other families)

ber 10, 2007 www.xilinx.com 3ecification

www.xilinx.com

FIR Compiler v3.2

4

Filter Interface PinsFigure 2 shows the schematic symbol for a the interface pins to the FIR Compiler module.

Filter input data is supplied on the DIN port (N bits wide) and filter output samples are presented onthe DOUT port (R bits wide). The output width R is the sum of the data bit width N, the coefficient bitwidth K, and the bit growth due to the number of coefficients. The CLK signal is the system clock for thecore, where the clock rate may be greater than or equal to the input signal sample frequency. The ND,RDY, and RFD signals are filter interface/control signals that permit a simple and efficient data-flowstyle interface for supplying input samples and reading output samples from the filter. These coreinterface signals are discussed in detail in "Interface, Control, and Timing" on page 47. For Hilberttransform filter implementations, a pair of In-Phase/Quadrature data outputs is provided. TheIn-Phase data output is N bits wide, as it is a delayed version of the input data, while the Quadraturedata output is R bits wide, calculated as described previously. For multiple channel implementations, apair of indicator signals is provided to specify the currently active input and output channels. Theseindicator signals are C bits wide, where C is the required bitwidth to represent the maximum channelvalue. Where multiple coefficient sets are specified in the COE file, a filter selection input is available toselect the active filter set, and this is F bits wide. F is the required bitwidth to represent the maximumfilter set value. Coefficient reloading, when supported, can be achieved by driving the coefficientreload interface, which consists of a load start indicator, a write enable, and a coefficient data bus (K bitswide for most filter types). Where reloading is required with multiple filter sets, the filter set to bereloaded can be specified using the COEF_FILT_SEL port, which is again F bits wide. Resetting of thecore is achieved by driving the SCLR pin, while a clock enable pin is available only for MAC-based FIRfilter implementations on the those device families that include DSP slices or Embedded Multipliers.

Figure Top x-ref 2

Figure 2: FIR Filter Core Pinout

FILT_SEL [F-1:0]

ND

RFDRDY

DIN [N-1:0] DOUT [R-1:0]

CHAN_IN [C-1:0]CHAN_OUT [C-1:0]

CLKCESCLR

COEF_LDCOEF_WECOEF_DIN [K-1:0]

DOUT_I [N-1:0]DOUT_Q [R-1:0]

COEF_FILT_SEL [F-1:0]


www.xilinx.com

FIR Compiler v3.2


Table 3 contains more information about the FIR filter port names and port functional definitions. Table 3: FIR Core Signal Pinout

Name Direction Description

SCLR Input

SYNCHRONOUS CLEARSynchronous reset (active High). Asserting SCLR synchronously with CLK resets the filter internal state machines. It does NOT reset the filter data memory contents (regressor vector). SCLR resets the counters that control the channel indicator output signals. SCLR is an optional pin.

CLK InputCLOCKCore clock (active rising edge). Always present.

CE Input

CLOCK ENABLECore clock enable (active High). Available for MAC-based FIR implementations in devices with DSP slices or Embedded Multipliers only.

DIN [N-1:0] Input

DATA INN-bit wide filter input sample. Always present. Note that for multi-channel implementations this input is time-shared across all channels. Separate channel inputs are not provided.

ND Input

NEW DATA (active High)When this signal is asserted, the data sample presented on the DIN port is accepted into the filter core. ND should not be asserted while RFD is Low; any samples presented when RFD is Low are ignored by the core.

FILT_SEL [F-1:0] InputFILTER SELECTFilter Selection input signal, F-bit wide where F = ceil(log2(filter sets)). Only present when using multiple filter sets

COEF_LD InputCOEFFICIENT LOADIndicates the beginning of a new coefficient reload cycle.

COEF_WE InputCOEFFICIENT RELOAD WRITE ENABLEWE for loading of coefficients into the filter to allow a host to halt loading until ready to transmit on the interface.

COEF_DIN [K-1:0] Input

COEFFICIENT RELOAD DATA INInput data bus for reloading coefficients. K is the core coefficient width for most filter types and coefficient width + 2 for interpolating filters where the symmetric coefficient structure is exploited.

COEF_FILT_SEL [F-1:0] Input

COEFFICIENT RELOAD FILTER SELECTFilter Selection input signal for reloading coefficients, F-bit wide where F = ceil(log2(filter sets)). Only present when using multiple filter sets and reloadable coefficients.

DOUT [R-1:0] Output

DATA OUTR-bit-wide output sample bus. R depends on the filter parameters (data precision, coefficient precision, number of taps, and coefficient optimization selection) and is always supplied as a full-precision output port to avoid any potential for overflow.

RDY OutputREADYFilter output ready flag (active High). indicates that a new filter output sample is available on the DOUT port.


www.xilinx.com

FIR Compiler v3.2

6

Single-Rate FIR Filter

The basic FIR Filter core is a single-rate (input sample rate = output sample rate) finite impulseresponse filter. This is the simplest of filter types and is the default at the start of parametrization in theCORE Generator tool.

Half-Band FIR Filter

The general frequency response for a half-band filter is shown in Figure 3.

RFD OutputREADY FOR DATAIndicator to signal that the core is ready to accept a new data sample. Active High.

CHAN_IN [C-1:0] OutputINPUT CHANNEL SELECTStandard binary count generated by the core that indicates the current filter input channel number.

CHAN_OUT [C-1:0] OutputOUTPUT CHANNEL SELECTStandard binary count generated by the core that indicates the current filter output channel number.

DOUT_I [N-1:0] Output

DATA OUT IN-PHASEHilbert transform only. In-phase (I) data output component. A Hilbert transform accepts real valued input data and produces a complex result. This port is the real or in-phase component of the result. Since this output port is an access point to the center of the filter memory buffer, it carries the same precision as the input sample data stream, that is, N bits.

DOUT_Q [R-1:0] Output

DATA OUT QUADRATUREHilbert transform only. Quadrature (Q) data output component. A Hilbert transform accepts real valued input data and produces a complex result. This port is the imaginary or quadrature component of the result.

Figure Top x-ref 3

Figure 3: Half-Band Filter—Magnitude Frequency Response

Table 3: FIR Core Signal Pinout (Continued)

Name Direction Description

1+δp

1−δp1

δs

−δsΩπ

|H(ejΩ)|

PASSBAND

STOPBAND

Ωp Ωs

0.5

π2


www.xilinx.com

FIR Compiler v3.2


The magnitude frequency response is symmetrical about quarter sample frequency π/2 radians. Thesample rate is normalized to 2π radians/sec. The passband and stopband frequencies are positionedsuch that

The passband and stopband ripple, and respectively, are equal . These properties arereflected in the filter impulse response. It can be shown [5] that approximately half of the filter coeffi-cients are zero for an odd number of taps. This is illustrated in Figure 4 for an 11-tap half-band filter.

The interleaved zero values in the coefficient data can be exploited to realize an efficient realization likethat shown in Figure 5.

This same structure can be utilized to generate an efficient FPGA implementation for either a MAC orDA architecture. The half-band filter selection in the compiler is intended for this purpose. This filter isavailable in the Coefficient Structure field of the user interface. The user must supply the complete list offilter coefficients, including the 0 value samples, when using the half-band filter. The filter coefficientfile format is discussed in greater detail in the Filter Coefficient Data section.

Hilbert Transform

Hilbert transformers [5] are used in a variety of ways in digital communication systems.

An ideal Hilbert transform provides a phase shift of 90 degrees for positive frequencies and –90 degreesfor negative frequencies. It can be shown [5] that the impulse response corresponding to this frequencydomain characteristic is odd-symmetric and has interleaved zeros as shown in Figure 6. Both the alter-

Figure Top x-ref 4

Figure 4: Half-Band Filter Impulse Response

Figure Top x-ref 5

Figure 5: Half-Band Filter Impulse Response

Ω Ωp s= −π

δ p δ s δ δp s=

0 2 4 6 8 10

-0.2

0

0.2

0.4

0.6

COEFFICIENT INDEX

x(n)

y(n)

z-1 z-1 z-1

a4

z-1 z-1

a5

z-1 z-1

a8a6

z-1z-1

a2a0

z-1

a10


www.xilinx.com

FIR Compiler v3.2

8

nating zero-valued coefficients and the negative symmetry can be utilized to produce an efficient hard-ware realization. A Hilbert transformer accepts a real-valued signal and produces a complex (I,Q)output signal. The quadrature (Q) component of the output signal is produced by a FIR filter with animpulse response like that shown in Figure 6. The in-phase (I) component is the input signal delayed byan appropriate amount to compensate for the phase delay of the FIR process employed for generatingthe Q output. This is easily and efficiently achieved by accessing the center tap of the sample historydelay of the Q channel FIR filter as shown in Figure 7. In this figure, x(n) is the real-valued input signaland yI(n) and yQ(n) are the in-phase and quadrature outputs, respectively.

Figure 8 shows the architecture for a Hilbert transformer that exploits both the zero-valued and thenegative symmetry characteristics of the impulse response.

Figure 6: Impulse Response of a Hilbert Transformer

Figure Top x-ref 6

Figure 7: FIR Filter Realization of a Hilbert Transformer

Figure Top x-ref 7

Figure 8: Hilbert Transformer Exploiting Zero-Valued Filter Coefficients and Negative Symmetry

4096

1365

0

-1365

0 0819

0

-819

-4096

0

x(n)

yQ(n)

z-1 z-1 z-1

a4

z-1 z-1 z-1 z-1

-a2-a4

z-1z-1

a2a0

z-1

-a0

yI(n)

z-1z-2z-2

z-2 z-1z-2

a4a2a0

x(n)

yQ(n)

+ + +

yI(n)


www.xilinx.com

FIR Compiler v3.2


The DA equivalent of this architecture can be used for realizing a Hilbert transformer in all supportedfamilies, while the MAC-based FIR filter architecture currently only supports Hilbert transform imple-mentations for families that include DSP slices.

Interpolated FIR Filter

An interpolated FIR (IFIR) Filter [4] has a similar architecture to a conventional FIR filter, but with theunit delay operator replaced by k-1 units of delay. k is referred to as the zero-packing factor. An N-tapIFIR filter is shown in Figure 9.

This architecture is functionally equivalent to inserting k-1 zeros between the coefficients of a prototypefilter coefficient set.

Interpolated filters are useful for realizing efficient implementations of both narrow-band andwide-band filters. A filter system based on an IFIR approach requires not only the IFIR but also animage rejection filter. References [4] and [6] provide the details of how these systems are realized, andhow to design the IFIR and the image rejection filters.

The IFIR filter implementation takes advantage of the k-1 zeros in the impulse response to realize anarea-efficient FPGA implementation. The FPGA area required by an IFIR filter is not a strong functionof the zero-packing factor.

The interpolated FIR should not be confused with an interpolation filter. Interpolated filters are single-rate systems employed to produce efficient realizations of narrow-band filters and, with some minor enhancements, wide-band filters can be accommodated. There is no inherent range change when using an interpolated filter, the input rate is the same as the output rate.

Interpolated filters are supported for the DA FIR filter architecture in all families up to Virtex-4 devices, while support is limited to device families which include DSP slices or Embedded Multipliers for the MAC-based FIR architecture.

Figure Top x-ref 8

Figure 9: Interpolated FIR (IFIR). The Zero-Packing Factor is k.

z-Dz-Dx(n)

y(n)

z-D z-D z-D

a(0) a(1) a(2) a(3) a(4) a(N-1)

D = k-1


www.xilinx.com

FIR Compiler v3.2

10

Polyphase Decimator

The polyphase decimation filter option implements the computationally efficient M-to-1 polyphase deci-mating filter shown in Figure 10.

A set of N prototype filter coefficients are mapped to the M polyphase sub-filtersaccording to Equation 2.

The polyphase segments are accessed by delivering the input samples x(n) to their inputs via an inputcommutator which starts at the segment index and decrements to index 0. After the commu-tator has executed one cycle and delivered M input samples to the filter, a single output is taken as thesummation of the outputs from the polyphase segments. The output sample rate is where

is sample rate of the input data stream .

We observe that each of the polyphase segments is operating at the low output sample rate (com-pared to the high input sample rate ) and a total of operations are performed per output point.

Polyphase Interpolator

The polyphase interpolation filter option implements the computationally efficient 1-to-P interpolationfilter shown in Figure 11.

Figure Top x-ref 9

Figure 10: M-to-1 Polyphase Decimator

Figure Top x-ref 10

Figure 11: 1-to-P Polyphase Interpolator

h0(n)

h1(n)

hM-3(n)

x(n)

hM-2(n)

hM-1(n)

y(n)

a0 a1 … aN 1–, , ,h0 n( ) h1 n( ) … hM 1– n( ), , ,

hi n( ) a i Mr+( )= i 0 1 … M 1–, , ,= r 0 1 … N M– i+, , ,= Equation 2

i M 1–=

′f s

fsM-----=′f s

f s x n( ) n, 0 1 2 …, , ,=

fs ′f sf s N

h0(n)

h1(n)

hP-3(n)x(n)

hP-2(n)

hP-1(n)

y(n)


www.xilinx.com

FIR Compiler v3.2


A set of N prototype filter coefficients are mapped to the polyphase subfiltersaccording to Equation 2, as in the decimation case.

Each new input sample engages all of the polyphase segments in parallel. For each input sampledelivered to the filter, output samples, one from each segment, are delivered to the filter output portas indicated by the commutator in Figure 11.

The output sample rate is where is sample rate of the input data stream. We observe each of the polyphase segments operating at the low input sample rate

(compared to the high output sample rate ) and a total of operations performed per outputpoint.

Half-Band Decimator

The half-band decimator is a polyphase filter with an embedded 2-to-1 downsampling of the input sig-nal. The structure is shown in Figure 12.

The filter is very similar to the polyphase decimator described in "Polyphase Decimator" on page 10with the decimation factor set to M=2. However, there is a subtle difference in the implementation thatmakes the half-band decimator a more area efficient 2-to-1 down-sampling filter when the frequencyresponse reflects a true half-band characteristic.

The frequency and time response of a half-band filter are shown in Figure 3 and Figure 4 respectively.Observe the alternating zero-valued coefficients in the impulse response. Figure 13 details a 7-taphalf-band polyphase filter when the coefficients are allocated to the two polyphase segments and

shown in Figure 12. Figure 13 (a) is the filter impulse response; note that . Figure 13(b) provides a detailed illustration of the polyphase subfilters and shows how the filter coefficients areallocated to the two polyphase arms. In the bottom arm, the only nonzero coefficient is the cen-ter value of the impulse response Figure 13 (c) shows the optimized architecture when the redun-dant multipliers and adders are removed. The final structure has a reduced computation workload incontrast to a more general 2:1 down-sampling filter. The number of multiply-accumulate (MAC) oper-ations required to compute an output sample has been lowered by a factor of approximately two. Inthis figure note that the high density of zero-valued filter coefficients is exploited in the FPGA realiza-tion to produce a minimal area implementation.

Figure Top x-ref 11

Figure 12: Half-Band Decimation Filter

a0 a1 … aN 1–, , , Ph0 n( ) h1 n( ) … hp 1– n( ), , ,

x n( )P

′f s = fs P′f s f s

x n( ) n, 0 1 2 …, , ,=fs ′fs N

h0(n)

h1(n)x(n) y(n)

h n0 ( )h n1( ) a a1 50= =

h n1( ),a3.


www.xilinx.com

FIR Compiler v3.2

12

Half-Band Interpolator

Just as the half-band decimator is an optimized version of the more general polyphase decimation filter,the half-band interpolator is a special case of a polyphase interpolator. The half-band interpolator isshown in Figure 14.

The coefficient set for a true half-band interpolator is identical to that of a half-band decimator with thesame specifications. The large number of zero entries in the impulse response is exploited in exactly thesame manner as with the half-band decimator to produce hardware-optimized half-band interpolators.The process is presented in Figure 15. Figure 15(a) is the impulse response, Figure 15(b) shows thepolyphase partition, and Figure 15(c) is the optimized architecture that has taken full advantage of the0 entries in the coefficient data. Note that the high density of zero-valued filter coefficients is exploitedin the FPGA realization to produce a minimal area implementation.

Figure Top x-ref 12

Figure 13: 7-Tap Half-Band Decimation Filter

Figure Top x-ref 13

Figure 14: Half-Band Interpolation Filter

a0

a2

a1=0

a3a4

a5=0 a6

(a) Impulse Response

x(n)y(n)

(b) Polyphase Partition

z-1z-1 z-1

a0 a2 a4 a6

z-1z-1

a1=0 a3 a5=0

x(n)y(n)

z-1z-1 z-1

a0 a2 a4 a6

z-1

a3

h0(n)

h1(n)x(n) y(n)


www.xilinx.com

FIR Compiler v3.2


Small Non-Zero Even Terms in a Half-Band Filter Impulse Response

Certain filter design software can result in small non-zero values for the odd terms in the half-band fil-ter impulse response. In this situation, it can be useful to force these values to 0 and re-evaluate the fre-quency response to assess if it is still acceptable for the intended application. If the odd terms are notidentically zero, the hardware optimizations described previously are not possible. If the small nonzerovalue terms cannot be ignored, the general polyphase decimator or interpolator described in"Polyphase Decimator" on page 10 and "Polyphase Interpolator" on page 10, using a rate change of two,are more appropriate.

Figure Top x-ref 14

Figure 15: 7-Tap Half-Band Interpolation Filter

a0

a2

a1=0

a3a4

a5=0 a6

(a) Impulse Response

z-1z-1

x(n)

y(n)

z-1

a0 a2 a4 a6

z-1z-1

a1=0 a3 a5=0

0

1The first output is taken from theport 0, then port 1.

z-1z-1

x(n)

y(n)

z-1

a0 a2 a4 a6

z-1

a3

0

1The first output is taken from theport 0, then port 1.

(b) Polyphase Partition

(c) Reduced Complexity (Hardware Optimized) Realization


www.xilinx.com

FIR Compiler v3.2

14

Filter Realization: Multiply-AccumulateA simplified view of a MAC-based FIR utilizing a single MAC engine is shown in Figure 16. The singleimplementation is extensible to multi-MAC implementations for use in achieving higher performancefilter specifications (larger numbers of coefficients, higher sample rates, more channels, etc.).

The number of multipliers required to implement a filter is determined by calculating the number ofmultiplies required to perform the computation (taking into account symmetrical and halfband coeffi-cient structures, and sample rate changes) and then dividing by number of clocks available to processeach input sample. The available clock cycles value is always rounded down and the number of multi-pliers rounded up to the nearest integer. If there is a non-zero remainder, some of the MAC engines cal-culate fewer coefficients than others, and the coefficients are padded with zeros to accommodate theexcess cycles. Note that the output samples reflect the padding of the coefficient vector; therefore, theresponse to an applied impulse contains a certain number of zero outputs before the first coefficient ofthe specified impulse response appears at the output. The core automatically generates an implemen-tation that meets the user defined performance requirements based on the system clock rate, the sam-ple rate, the number of taps and channels, and the rate change. The core inserts one or more multipliersto meet the overall throughput requirements. The single MAC implementation structure is similar forall device families, although hardware multipliers and DSP slices are used where available.

Figure 17 illustrates a multi-MAC-based FIR implementation for older device families that do notinclude DSP slices or Embedded Multipliers, which requires four multipliers. Filter implementations inthese device families use an adder tree based structure in what is known as direct form implementa-tion, where a series of delay elements forms a data regression vector which is then processed by one ormore multipliers and the results of these calculations are then summed in an accumulator. The multi-plication can either be fully serial across all coefficients (if sufficient cycles are available), semi-parallel(where one unit is not sufficient to calculate all tap multiplications in the available cycles) or fully par-allel (where only one cycle is available to process all multiplications). For more recent device families,an alternative structure is used which takes advantage of the advanced features of the DSP slice (orDSP48) to provide a cascaded addition, with a correspondingly cascaded data regression vector, com-monly referred to as direct form implementation with pipelining or, occasionally, a systolic implemen-tation. Pipeline registers are available in the DSP slice to efficiently implement this structure, and DSPslices are organized in columns with high speed dedicated routing provided to connect the cascadeddata regressor vector and the cascaded accumulation of sum-of-product outputs.

Figure Top x-ref 15

Figure 16: Single MAC Engine Block Diagram

Control

Register

DataStorage

CoefficientStorage

FD RDY

Q

RFDND

DIN

XIP162


www.xilinx.com

FIR Compiler v3.2


Figure 18 illustrates a FIR implementation for families that include DSP slices or Embedded Multiplierswhich requires four multipliers. Note that for families that include DSP slices this implementationstructure takes advantage of the capabilities of the Xilinx DSP slice, however this also places a restric-tion on the output width limiting it to 48 bits. Further information on implementing filters efficientlywith the DSP slice structures can be found in the XtremeDSP handbook [7].

Note: Embedded Multiplier block register implementation varies across families.

Figure Top x-ref 16

Figure 17: Multiple MAC Engine Implementation (Device Families Without DSP Slices or Embedded Multipliers)

Figure Top x-ref 17

Figure 18: Multiple MAC Engine Implementation (Device Families With DSP Slices or Embedded Multipliers)

DOUT

DINC0

C1

C2

C3

Accumulator

X

X

X

X

+

+

+

0

SRL16

DSP SliceDSP Slice DSP Slice

y(n)

x(n) SRL16 SRL16 SRL16

CoeffRAM

CoeffRAM

CoeffRAM

CoeffRAM

Multiplier Multiplier

ds534_18_091207


www.xilinx.com

FIR Compiler v3.2

16

Filter Realization: Distributed ArithmeticA simplified view of a DA FIR is shown in Figure 19.

In its most obvious and direct form, DA-based computations are bit-serial in nature—serial distributedarithmetic (SDA) FIR. Extensions to the basic algorithm remove this potential throughput limitation[2]. The advantage of a distributed arithmetic approach is its efficiency of mechanization. The basicoperations required are a sequence of table look-ups, additions, subtractions and shifts of the inputdata sequence. All of these functions efficiently map to FPGAs. Input samples are presented to theinput parallel-to-serial shift register (PSC) at the input signal sample rate. As the new sample is serial-ized, the bit-wide output is presented to a bit-serial shift register or time-skew buffer (TSB). The TSBstores the input sample history in a bit-serial format and is used in forming the required inner-productcomputation. The TSB is itself constructed using a cascade of shorter bit–serial shift registers. Thenodes in the cascade connection of TSBs are used as address inputs to a look-up table. This LUT storesall possible partial products [2] over the filter coefficient space.

Several observations provide valuable insight into the operation of a DA FIR filter. In a conventionalmultiply-accumulate (MAC)-based FIR realization, the sample throughput is coupled to the filterlength. With a DA architecture, the system sample rate is related to the bit precision of the input datasamples. Each bit of an input sample must be indexed and processed in turn before a new output sam-ple is available. For B-bit precision input samples, B clock cycles are required to form a new outputsample for a non-symmetrical filter, and B+1 clock cycles are needed for a symmetrical filter. The rate atwhich data bits are indexed occurs at the bit-clock rate. The bit-clock frequency is greater than the filtersample rate (fs) and is equal to Bfs for a non-symmetrical filter and (B+1)fs for a symmetrical filter. In aconventional instruction-set (processor) approach to the problem, the required number of multi-ply-accumulate operations are implemented using a time-shared or scheduled MAC unit. The filter sam-ple throughput is inversely proportional to the number of filter taps. As the filter length is increased,the system sample rate is proportionately decreased. This is not the case with DA-based architectures.The filter sample rate is decoupled from the filter length. The trade off introduced here is one of siliconarea (FPGA logic resources) for time. As the filter length is increased in a DA FIR filter, more logicresources are consumed, but throughput is maintained.

Figure 20 provides a comparison between a DA FIR architecture and a conventional scheduledMAC-based approach. The clock rate is assumed to be 120 MHz for both filter architectures. Severalvalues of input sample precision for the DA FIR are presented. The dependency of the DA filterthroughput on the sample precision is apparent from the plots. For 8-bit precision input samples, the

Figure Top x-ref 18

Figure 19: Serial Distributed Arithmetic FIR Filter

subtract on lastbit of DA procesing

sequence

ScalingAccumulator

y(n)

2-1

Add/Sub

2N

WordLUT

Bx(n)

Parallel-to-SerialConverter

B-bit Shift Registers

DA LUT AddressSequence

PartialProducts

N-1Shift RegistersPSC

Time Skew Buffer (TSB)


www.xilinx.com

FIR Compiler v3.2


DA FIR maintains a higher throughput for filter lengths greater than 8 taps. When the sample precisionis increased to 16 bits, the crossover point is 16 taps.

Figure 21 provides a similar comparison but for a dual-MAC architecture.

Increasing the Speed of Multiplication–Parallel Distributed Arithmetic

In its most obvious and direct form, DA-based computations are bit-serial in nature; each bit of thesamples must be indexed in turn before a new output sample becomes available (SDA FIR). When theinput samples are represented with B bits of precision, B clock cycles are required to complete aninner-product calculation (for a non-symmetrical impulse response). Additional speed can be obtainedin several ways. One approach is to partition the input words into M subwords and process these sub-words in parallel. This method requires M-times as many memory look-up tables and so comes at a costof increased storage requirements. Maximum speed is achieved by factoring the input variables into

Figure Top x-ref 19

Figure 20: Throughput (Sample Rate) Comparison of Single-MAC-Based FIR and DA FIR as a Function of Filter Length. B is the DA FIR Input Sample Precision. The Clock Rate is 120 MHz.

Figure Top x-ref 20

Figure 21: Throughput (Sample Rate) Comparison of Dual-MAC-Based FIR and DA FIR as a Function of Filter Length. B is the DA FIR Input Sample Precision. The Clock Rate is 120 MHz.

0 50 100 150 200 2500

10

20

30

40

50

60

FILTER LENGTH

SA

MP

LE

RA

TE

(M

HZ

)

S INGLE M ACB=8 B=12 B=16

0 50 100 150 200 2500

20

40

60

80

100

120

FILTER LENGTH

SA

MP

LE

RA

TE

(M

HZ

)

DUAL M ACB=8 B=12 B=16


www.xilinx.com

FIR Compiler v3.2

18

single-bit subwords. The resulting structure is a fully parallel DA (PDA) FIR filter. With this factoringa new output sample is computed on each clock cycle. PDA FIR filters provide exceptionally high per-formance. The Xilinx filter core provides support for parallel DA FIR implementations. Filters can bedesigned that process several bits in a clock period, through to a completely parallel architecture thatprocesses all the bits of the input data during a single clock period. For example, consider a non-sym-metrical filter with 12-bit precision input samples. Using a serial DA filter, new output samples areavailable every 12 clock periods. If the data samples are processed 2 bits at a time (2-BAAT), a new out-put sample is ready every 12/2 = 6 clock cycles. With 3-,4-, 6- and 12-BAAT implementations, a newresult is available every 4, 3, 2 and 1 clock cycles, respectively.

Another way to view the problem is in terms of the number of clock cycles L needed to produce a filteroutput sample. And indeed, this is how the degree of computation parallelism is presented to the useron the filter design GUI. So, for example, let’s consider a filter core with a master system clock (and thisis not necessarily the filter sample rate) equal to 150 MHz. Also assume that the input sample precisionis 12 bits and that the impulse response is not symmetrical. For this set of parameters, the valid valuesof L (and these are presented on the core GUI) are 12, 6, 4, 3, 2 and 1. The corresponding filter samplerate (or throughput) for each value of L is 150/12=12.5, 150/6=25, 150/4=37.5, 150/3=50, 150/2=75 and150/1=150 MHz, respectively. If the filter employs a symmetrical impulse response, the valid values ofL are different—and this is associated with the hardware architecture that is employed to exploit thecoefficient symmetry to produce the most compact (in terms of FPGA logic resources) realization. Sofor a filter with 12-bit precision input samples and a symmetrical impulse response, the valid values ofL are 13, 7, 5, 4, 3, 2, and 1. Again, using a filter core master clock frequency of 150 MHz, the sample ratefor each value of L is 11.539, 21.429, 30, 37.5, 50, 75, and 150 MHz respectively.

The higher the degree of filter parallelism (fewer number of clock cycles per output sample or smallerL), the greater the FPGA logic resources required to implement the design. Specifying the number ofclock cycles per output sample is an extremely powerful mechanism that allows the designer to tradeoff silicon area in return for filter throughput.

DA Filter Throughput

The signal sample rate for a DA type filter is a function of the core bit clock frequency, fclk Hz, the inputdata sample precision B, the number of channels, the number of clock cycles (L) per output sample, andthe coefficient symmetry. For a single-channel non-symmetrical FIR filter using L=B clock cycles peroutput sample, the filter sample frequency, or sample throughput, is fclk/B Hz. If the filter is symmet-rical, the sample rate is fclk/(B+1) Hz. If the number of clock cycles per output sample is changed toL=1, the sample throughput is fclk Hz. For L=2, the throughput is fclk/2 Hz.

As a specific example, consider a filter with a core clock frequency equal to 100 MHz, 10-bit input sam-ples, L=10 and a non-symmetrical coefficient set. The filter sample rate is 100/10 = 10 MHz. Observethat this figure is independent of the number of filter taps. If a symmetrical realization had been gener-ated, the sample throughput would be 100/11 = 9.0909 MHz. For L=1, the sample rate would be 100MHz (non-symmetrical FIR). If the input sample precision is changed to 8 bits, with L=8, the filter sam-ple rate for a non-symmetrical filter would be 100/8 = 12.5 MHz.


www.xilinx.com

FIR Compiler v3.2


Exploiting Filter SymmetryThe impulse response for many filters possesses significant symmetry. This symmetry can generally beexploited to minimize arithmetic requirements and produce area-efficient filter realizations.

Figure 22 shows the impulse response for a 9-tap symmetric FIR filter.

Instead of implementing this filter using the architecture shown in Figure 1, the more efficient signalflow-graph in Figure 23 can be used. In general, the former approach requires N multiplications and(N-1) additions. In contrast, the architecture in Figure 23 requires only [N/2] multiplications andapproximately N additions. This significant reduction in the computation workload can be exploited togenerate efficient filter hardware implementations.

Coefficient symmetry for an even number of terms can be exploited as shown in Figure 24 .

Figure Top x-ref 21

Figure 22: Symmetric FIR - Odd Number of Terms

Figure Top x-ref 22

Figure 23: Exploiting Coefficient Symmetry - Odd Number of Filter Taps

Figure Top x-ref 23

Figure 24: Exploiting Coefficient Symmetry - Even Number of Filter Taps

a3 a5(=a3)

a2

a1a0 a4 a6

(=a2)

a7(=a1)

a8(=a0)

z-1z-1z-1

z-1 z-1z-1

a3a2a1a0

x(n)

y(n)

a4

z-1

z-1

z-1z-1z-1

z-1 z-1z-1

a3a2a1a0

x(n)

y(n)

a4

z-1

z-1 z-1


www.xilinx.com

FIR Compiler v3.2

20

The impulse response for a negative, or odd, symmetric filter is shown in Figure 25.

This symmetry is easily exploited in a manner similar to that shown in Figure 23 and Figure 24. In thiscase, the middle layer of adders are replaced by subtracters as illustrated in Figure 26.

Again, as highlighted previously, the symmetry properties can be utilized to produce an efficient hard-ware realization. The example considered here illustrates a filter with an even number of terms; the fil-ter structure for an odd number of terms is a simple extension of the same principle.

The FIR Compiler interface allows the filter symmetry to be specified by the user. When the impulseresponse does exhibit symmetry, the filter logic requirements can be significantly reduced in compari-son to an implementation that does not exploit the impulse response structure. For example, a 100-tapNon-symmetric filter with 12-bit data samples and 12-bit coefficients consumes 519 Virtex logic slices[3] in a DA architecture implementation. In contrast, a 100-tap symmetric filter is realized with 354slices. This represents approximately a 30 percent savings in area. The advantage for MAC-based filtersis a reduction of around 50% in multiply-accumulate modules that are required to implement the filter,although fabric usage might increase due to the additional pre-adder stages required to add data sam-ples and there might be a small increase in control logic and delays.

Filter coefficient symmetry can be inferred by the core GUI from the coefficient definition file, which isthe default setting. Note that this inferred value can be overridden by the user (by a Non-Symmetricstructure). When the structure is inferred, the inferred setting is displayed in the Summary page and inthe ToolTip for the Coefficient Structure field. If the user sets the coefficient symmetry type to“Inferred” and then specifies a filter configuration that cannot support exploitation of symmetry, then

Figure Top x-ref 24

Figure 25: Negative Symmetric Impulse Response

Figure Top x-ref 25

Figure 26: FIR Architecture Exploiting Negative Symmetry

a3

a5=-a4

a2

a1

a0

a4

a6=-a3

a7=-a2

a8=-a1

a9=-a0

z-1z-1z-1

z-1 z-1z-1

a3a2a1a0

x(n)

y(n)

a4

z-1

z-1 z-1

+ + + + +


www.xilinx.com

FIR Compiler v3.2


the GUI automatically implements a Non-Symmetric structure for that configuration; if the user hasexplicitly specified “Symmetric” rather than “Inferred,” then the GUI disables any options whichwould not allow symmetry to be exploited. The GUI Tool Tips provide feedback to users on why a par-ticular feature is not available. Note that only the first 2048 entries in the coefficient definition file willbe checked by the inference algorithm.

Coefficient Padding

When implementing a filter with symmetric coefficients, users must be aware of the fact that the corereorganizes the filter coefficients if required to exploit symmetry, and this might alter the filterresponse. This is only necessary if the core is configured such that all processing cycles are not uti-lized. For example, when the core has 4 cycles to process each sample for a 30-tap symmetric responsefilter, the core pads the coefficient storage out as illustrated in Figure 27.

The appended zeroes after the non-zero coefficients do not affect the filter response, but the prependedzero coefficients do alter the phase response of the filter implementation when compared to the idealcoefficients. There are two ways to avoid this issue. Firstly and simply, the user can force the CoefficientStructure to be Non-Symmetric–this avoids the issue of prepending zero coefficients to the coefficientvector, and only appended zeroes are used to pad out the filter response to the required number ofcycles. Secondly and more efficiently, the user can increase the number of taps implemented by the fil-ter at little or no cost in resource usage. In the previous example, the filter could process 32 taps in thesame time, with the same hardware resources and with the same cycle latency as the 30-tap implemen-tation, and the phase response of the 32-tap filter would be unaltered. The core exploits symmetry ininterpolating filters by taking advantage of the “symmetric pairs” technique. This produces phases of

Figure Top x-ref 26

Figure 27: Filter Padding to Facilitate Symmetric Structure Exploitation

MAC3 0 a b c

MAC2 d e f g

MAC1 h i j k

MAC0 l m n p

Resultant Impulse Response

0 a b c d e f g h i j k l m n p p n m l k j i h g f e d c b a 0


www.xilinx.com

FIR Compiler v3.2

22

symmetric coefficient values by combining sums and differences of the coefficients from a pair ofmatched phases. This technique is illustrated in Figure 28.

This technique requires re-organization of the coefficients. Generally, when the filter phase arms arefully populated with coefficients, this is transparent to the user and the filter response is not changed.However, similarly to the general symmetric filter case, if the combination of rate and number of filtertaps results in a phase arm which is not fully populated with coefficients, the reorganization of the filtercoefficients result in a change in the phase response of the filter. The impulse response is shifted by anumber of output samples as a result. In the 14 tap, interpolate by 4 case, padding a zero coefficient tothe front of the coefficient response would be required to align the phases such that symmetry can beexploited, resulting in a smaller implementation, but this results in a different phase response for thefilter. The methods to avoid this change in response, if such a change cannot be accommodated in theuser’s application system, are also similar to the general symmetry case - the user can either forcenon-symmetric structure implementation or make use of the extra coefficients which can be supported

Figure Top x-ref 27

Figure 28: Symmetric Pair Technique

a c e g h f d b

b d f h g e c

Interpolate by 2

a

a+b c+d e+f g+h h+g f+e d+c b+a

Interpolate by 2 using symmetric pairs

b-a d-c f-e h-g g-h e-f c-d a-b

Even Sym

Even Sym(negative sym)


www.xilinx.com

FIR Compiler v3.2


in the structure. This situation is illustrated for several example cases in Figure 32 and is extensible tolarger filters.

Figure Top x-ref 28

Figure 29: Filter Padding to Facilitate Symmetric Pairing

17 taps, Interpolate by 3

d

Even Sym

14 taps, interpolate by 4

a g h e b 0

b0 e h g d a

c0 f i f c 0

Symmetric Pair Symmetric Pairs

21 taps, Interpolate by 3(no padding)

f

Even Sym

16 taps, interpolate by 4 (no padding)

c i j g d a

da g j i f c

eb h k h e b

Symmetric Pair

c g d 0

b f e a

a e f b

0 d g c

d h e a

c g f b

b f g c

a e h dSymmetric Pairs


www.xilinx.com

FIR Compiler v3.2

24

Bit Growth CalculationBit growth of the original sample width occurs as a result of the many multiplications and additionswhich form the filter’s basic function. Therefore, the accumulator result width is significantly largerthan the original input sample width. Limiting the accumulator width is desirable to save resources,both in the filter output path (such as output buffer memory, if present) and in any subsequent blocksin the signal processing chain. The worst case bit growth can be obtained by adding the coefficientwidth to the base 2 logarithm of the number of non-zero multiplications required (rounded up); how-ever, this does not take into account the actual coefficient values. Taking the base 2 logarithm of the sumof all filter coefficients reveals the true maximum bit growth for a fixed coefficient filter, and this can beused to limit the required accumulator width.

For MAC implementations on families equipped with DSP slices or Embedded Multipliers, FIR Com-piler automatically calculates the bit growth based on the actual coefficient values for filter implemen-tations that do not use the coefficient reload option. For reloadable filters, or MAC-based filters infamilies without DSP slices and Embedded Multipliers, or any DA-based filter, the worst case bitgrowth is used.

Although users might also wish to take into account the expected statistical magnitude profile of theinput data samples in calculating the maximum bit growth, that feature is not available in the currentversion of the core. Implementing such a feature produces a risk of accumulator overflow, which is notcurrently accommodated. Contact your local Xilinx representative if you have an urgent requirementfor such a feature.

Note that there is a 48-bit limitation on the accumulator width for DSP slice families, due to the widthlimits of the basic DSP slice primitive. For Virtex-4 and Spartan-3A DSP devices, the limitations on dataand coefficient bitwidths ensure that the accumulator width can never exceed this limit for any numberof taps. However, in Virtex-5 devices, the 25-bit option for data or coefficient bitwidth could produce asituation where the bitgrowth on large filters would cause the accumulator bitwidth to exceed the48-bit limit. To prevent such an occurrence, the core limits the data sample bitwidth such that the 48-bitlimit cannot be exceeded. For fixed coefficient filters, it is expected that this situation will not ariseoften, due to calculating the bit growth using actual coefficient values. However, for reloadable filtersin Virtex-5 devices, this scenario can occur more readily (for example, a 128 tap reloadable filter with25-bit coefficients could support only a 16-bit data sample width). As mentioned above, the option toallow accumulator overflow is not available in the current version of the core.

Output RoundingAs mentioned in the Bit Growth Calculation section, it is desirable to limit the output sample width ofthe filter to minimize resource utilization in downstream blocks in a signal processing chain. For MACimplementations on families equipped with DSP slices or Embedded Multipliers, FIR Compilerincludes features to limit the output sample width and round the result to the nearest integer. Severalrounding modes are provided to allow the user to select their preferred trade-off between resource uti-lization, rounding precision, and rounding bias:

• Full Precision

• Truncation (removal of LSBs)

• Non-symmetric rounding (towards positive or negative)

• Symmetric rounding (towards zero or infinity)

• Convergent rounding (towards odd or even)


www.xilinx.com

FIR Compiler v3.2


In the following descriptions, the variable x is the fractional number to be rounded, with n representingthe output width (i.e., the integer bits of the accumulator result) and m representing the truncated LSBs(i.e., the difference between the accumulator width and the output width). In Figure 30 throughFigure 32, the direction of inflexion on the red midpoint markers indicates the direction of rounding.

Full Precision

In Full Precision mode, no output sample bitwidth reduction is performed (n=accumulator width,m=0). This is the default option and is also the only option for DA-based filters and MAC-based filterson families without DSP slices.

Truncation

In Truncation mode, the m LSBs are removed from the accumulator result to reduce it to the specifiedoutput width; the effect is the same as the MATLAB function floor(x). This has the advantage that it canbe implemented simply with zero resource cost, but has the disadvantage of being biased towards thenegative by 0.5.

Non-Symmetric Rounding to Positive

In this rounding mode, a binary value corresponding to 0.5 is added to the accumulator result and them LSBs are removed; this is equivalent to the MATLAB function floor(x+0.5). The addition can usuallybe done in most filter configurations with little or no resource cost in hardware using the DSP slice fea-tures. It has the disadvantage of being biased towards the positive by 2-(m+1).

Non-Symmetric Rounding to Negative

In a modification of the above technique, a binary value corresponding to 0.499 is added to the accu-mulator result and the m LSBs are removed; this is equivalent to the MATLAB function ceil(x-0.5). Theresource usage advantage is the same, but the bias in this case is towards the negative by 2-(m+1).

Symmetric Rounding to Highest Magnitude

The bias incurred during non-symmetric rounding occurs because rounding decisions at the midpointsalways go in the same direction. In symmetric rounding, the decision on which direction to round isbased on the sign of the number. For rounding towards highest magnitude, a binary value correspond-ing to 0.499 is added to the accumulator result, and the inverse of the accumulator sign bit is added asa carry-in before removal of the m LSBs. As is generally the case, there are as many positive as negativenumbers, the result should not be biased in either direction. This rounding mode is commonly used ingeneral applications, mainly due to the fact that it is equivalent to the MATLAB function round(x).

Figure Top x-ref 29

Figure 30: Non-Symmetric Rounding (a) to positive (b) to negative

0 1 2-1-2 0 1 2-1-2 0 1 2-1-2 0 1 2-1-2

(a) (b)


www.xilinx.com

FIR Compiler v3.2

26

Symmetric Rounding to Zero

The implementation difference for this mode from round to highest magnitude is that the sign bit isused directly as the carry-in. There is no direct MATLAB equivalent of this operation. One minoradvantage of rounding toward zero is that it will not cause overflow situations.

Approximation of Symmetric Rounding

One important point to note about symmetric rounding mode is that to achieve the correct result, thesign of the accumulator must be known before the addition of the rounding constant to generate thecorrect carry-in. This requires an additional processing cycle to be available. When the additional cycleis not available and the user wishes to maintain full accuracy, a separate rounding unit must be used(FIR Compiler calculates whether or not this is required automatically).

An alternative technique is available to users who wish to employ symmetric rounding but do not havea spare cycle available, if they are willing to accept some inaccuracies. The rounding constant can beadded on the initial loading of the accumulator, and the sign bit can be checked on the penultimateaccumulation cycle and added on the final accumulation. This will normally achieve the same result,but there is a small risk that the accumulated result will change sign between the penultimate and finalaccumulation cycles, which will cause the midpoint decision to go in the wrong direction occasionally.

It is important to note that while some implementations of this approximation technique rearrange thecalculation order of coefficients and data such that the smallest coefficient is used last, the FIR Compilerdoes not perform any rearrangement of coefficients and data. This is significant for symmetric filters, asthe centre coefficient is the final coefficient calculated. For non-symmetric filters, the final coefficient isoften very small and would be unlikely to affect the sign of the final result. It is also important to notethat the risk of the sign changing between the penultimate and final accumulation cycles increases asthe level of parallelism employed in the core increases. This is due to the contribution added to theaccumulation on each cycle increases as the number of cycles per output decreases. Therefore, it isimportant that users consider carefully the coefficient structure and level of parallelism they intend touse before deciding on whether to employ approximation of symmetric rounding.

Convergent Rounding

Convergent rounding chooses the rounding direction for midpoints as either toward odd or even num-bers, rather than toward positive or negative. This can be advantageous as the balance of roundingdirection decisions for midpoints is based on the probability of occurrence of odd or even numbers,which will generally be equal in most scenarios, even when the mean of the input signal moves awayfrom zero. The function is achieved by adding a rounding constant, as in other modes, but then check-ing for a particular pattern on the LSBs to detect a midpoint and forcing the LSB to be either zero (forround to even) or one (for round to odd) when a midpoint occurs.

Figure Top x-ref 30

Figure 31: Symmetric Rounding (a) to highest magnitude (b) to zero

(a) (b)

0 1 2-1-2 0 1 2-1-2 0 1 2-1-2 0 1 2-1-2


www.xilinx.com

FIR Compiler v3.2


.

Resource Implications of Rounding

The implications with regard to resource utilization of selecting a particular rounding mode should beconsidered by users. Generally, the FIR Compiler IP core attempts to integrate rounding functions withexisting functions, which usually means the accumulator portion of the circuit. However, this is notalways possible. In certain combinations of rounding mode, filter type and device family, an additionalDSP slice must be used to implement the rounding function. The most important factor to consider isthe inherent hardware support for each mode in each of the device families, but filter type and config-uration also play a role. Convergent rounding requires pattern detection support and, therefore, thismode is only available in Virtex-5 devices; all other rounding modes are available in all DSP sliceenabled families.

Table 4 indicates the combinations of filter type and rounding type for which no extra DSP slice is likelyto be required. Where all three DSP slice enabled device families are likely to support that combinationof rounding mode and filter type without an additional DSP slice, a tick mark is entered; where none ofthe three is likely to support the combination without the additional DSP slice, a check mark is entered;where there is a list of families provided, the list refers to those families which support the combinationwithout an extra DSP slice. The device families are abbreviated to: “V4” for Virtex-4; “V5” for Virtex-5;and “S3D” for Spartan-3A DSP. Support for symmetric rounding assumes that either there is a sparecycle available, or approximation is allowed. If this is not the case, an additional DSP slice will alwaysbe required for symmetric rounding modes, regardless of filter type or family.

It is important to note that the table is indicative only, and certain combinations for which hardwaresupport is indicated will actually require the extra DSP, and vice versa. Notable exceptions to the tableinclude parallel multi-channel decimation with symmetric rounding (approximated), which requiresan additional DSP slice.

Figure Top x-ref 31

Figure 32: Convergent Rounding (a) to even (b) to odd

Table 4: Indicative Table of Hardware Support for Rounding Modes for Particular Filter Types

Filter Type

No

n-S

ymm

etri

c

Sym

met

ric

(In

fin

ity)

Sym

met

ric

(Zer

o)

Co

nver

gen

t

Single Rate, Interpolated, Hilbert V4,V5 V5 V5

Half-Band V4,V5 V5 V5

Interpolating without Symmetry V4,V5 V5 V5

Interpolate by 2, odd Symmetry V4,V5 V5 V5

Interpolating with Symmetry (others)

(a) (b)

0 1 2-1-2 0 1 2-1-2 0 1 2-1-2 0 1 2-1-2


www.xilinx.com

FIR Compiler v3.2

28

Multiple-Channel FiltersThe FIR Compiler core provides support for processing multiple input sample streams using the sameimplementation. Each input stream is filtered using the same filter configuration (rate change, samplerate, etc.) using the currently selected filter coefficient set.

In many applications the same filter must be applied to several data streams. A common example is thesimple digital down converter shown in Figure 33. Here a complex base-band signal

is applied to a matched filter M(z). The in-phase and quadrature components areprocessed by the same filter.

One candidate solution to this problem is to employ two separate filters. This, however, can be wastefulof logic resources. A more efficient design can be realized using a filter architecture that shares logicresources between multiple sample streams. Several filter classes supported by the filter core providein-built support for multi-channel processing and can accommodate up to eight independent datastreams. As more channels are processed by a filter core, the sample throughput is commensuratelyreduced. For example, if the sample rate (not the core bit clock CLK) for a single-channel filter is fs, atwo-channel version of the same filter processes two sample streams, each with a sample rate of fs/2. Athree-channel version of the filter processes three data streams and supports a sample rate of fs/3 foreach of the streams.

Interpolating Half-Band V4,V5 V5

Decimating, Single Channel V4,V5 V5 V5

Decimating, Multi-Channel V4,V5 V5 V5

Decimating Half-Band V4,V5 V5 V5

Fractional Interpolation V4,V5 V5 V5

Fractional Decimation, Single Channel V4,V5 V5 V5

Fractional Decimation, Multi-Channel V4,V5 V5 V5

Figure Top x-ref 32

Figure 33: Digital Down Converter

Table 4: Indicative Table of Hardware Support for Rounding Modes for Particular Filter Types (Continued)

Filter Type

No

n-S

ymm

etri

c

Sym

met

ric

(In

fin

ity)

Sym

met

ric

(Zer

o)

Co

nver

gen

t

x n( ) xI n( ) jxQ n( )+=

M(z)

M(z)

(DDS)

v(n)

xI(n)

xQ(n)

I

Q

DDS = Direct Digital Synthesizer


www.xilinx.com

FIR Compiler v3.2


A multi-channel filter implementation is very efficient in logic resources utilization. A filter with two ormore channels can be realized using a similar amount of logic resources as a single-channel version ofthe same filter, with proportionate increase in data memory requirements. The tradeoff that needs to beaddressed when using multi-channel filters is one of sample rate versus logic requirements. As thenumber of channels is increased, the logic area remains approximately constant, but the sample rate foran individual input stream decreases. The number of channels supported by a filter core is specified inthe filter customization GUI.

Note the following limitations on multi-channel support:

• MAC implementations support up to 64 channels.

• DA implementations of single rate filters support up to 8 channels only

• DA implementations of multi-rate filters (polyphase decimator, polyphase interpolator, half-band decimator, and half-band interpolator) provide support for single-channel operation only.

Fixed Fractional Rate Re-Sampling FiltersMAC-based FIR filters that implement re-sampling of a data stream at a fixed fractional rate P/Q,where P and Q are integers up to 64, are available for the device families that include DSP slices orEmbedded Multipliers. In Figure 34, the operation of an interpolation filter with interpolation rate P =5 is contrasted conceptually with the operation of a fixed fractional rate filter with rate P/Q = 5/3.

The normal (integer rate) interpolator passes the input sample to all P phases and then produces anoutput from each of the phase arms of the polyphase filter structure. In the fractional rate version, theoutput is taken from a phase arm which varies according to a stepping sequence with step size Q.

A similar method for implementing fractional rate decimators is conceptually illustrated in Figure 35.The integer decimation rate for the left-hand diagram is Q = 5, while the fractional-rate illustrated onthe right is P/Q = 3/5.

Figure Top x-ref 33

Figure 34: Interpolation Filters for Integer and Fractional Rates

a f k p

b g l q

c h m r

d i n s

e j o t

Normal Interpolator

a f k p

b g l q

c h m r

d i n s

e j o t

Fractional Interpolator


www.xilinx.com

FIR Compiler v3.2

30

The integer rate decimator passes the input samples in sequence to each of the Q phase arms in turn,with the data being shifted through the filter, and the output is generated from the summation of theoutputs from each phase arm of the polyphase filter. For the fractional rate implementation, the filterpasses the input samples to phases in a stepping sequence based on a step size of P with zero samplesbeing placed into the skipped phases. The summation across the various phase arms remains the samebut is based on fewer actual calculations.

The implementation details differ somewhat from these conceptual illustrations, but the resultingbehavior of the filter is the same.

Note: Symmetry is not currently exploited when using the fractional rate structures.

Coefficient ReloadAn interface for loading new coefficient data is available for DA FIR implementations in all familiesand for MAC-based FIR implementations on device families that include DSP slices or Embedded Mul-tipliers.

Coefficient Reload for DA FIR implementations

The DA FIR implementation provides a facility for loading new coefficient data, although it is limitedin that the filter operation must be halted (the filter ceases to process input samples) while the new coef-ficient values are loaded and some internal data structures are subsequently initialized. The coefficientreload time is a function of the filter length and type.

A high-level view of the reloadable DA FIR architecture is shown in Figure 37. Observe that the DALUT build engine, in addition to resources to store the new coefficient vector (coefficient buffer), is inte-grated with the FIR filter engine.

Figure Top x-ref 34

Figure 35: Decimation Filters for Integer and Fractional Rates

a f k p

b g l q

c h m r

d i n s

e j o t

Normal Decimator Fractional Decimator

+

a f k p

b g l q

c h m r

d i n s

e j o t

+


www.xilinx.com

FIR Compiler v3.2


The signals that support the reload operation are COEF_DIN, COEF_LD and COEF_WE. TheCOEF_DIN port is used to supply the new vector of coefficients to the core. COEF_LD is asserted to ini-tiate a load operation and COEF_WE is a write enable signal for the internal coefficient buffer.

When a coefficient load operation is initiated, the new vector of coefficients is first written to an internalbuffer—the coefficient buffer. After the load operation has completed, the DA LUT build-engine isautomatically started. The build-engine uses the values in the coefficient buffer to re-initialize the DALUT.

COEF_LD is asserted to start the procedure. The new vector of coefficients is then written to the internalmemory buffer synchronously with the core master clock CLK. COEF_WE can be used to control theflow of coefficient data from the external coefficient source—for example, a microprocessor—to thecore. COEF_WE performs a clock-enable function for the load process.

Asserting COEF_LD forces RFD to the inactive state (Low), indicating that the core cannot accept anynew input samples. Note that during the reload operation the filter inner-product engine is suspended.Once the new coefficients have been loaded and the DA LUT build engine has constructed the new par-tial-product lookup tables, RFD is asserted indicating the core is ready to accept new input samples andresume normal operation. The filter sample history buffer (regressor vector) is cleared when a newcoefficient vector is loaded.

Asserting COEF_LD also forces RDY to the inactive state (Low). COEF_LD can be reasserted again atany point during an update procedure (even once the DA LUT build-engine is running) to start a newcoefficient configuration.

The number of clock cycles required to load a coefficient vector is a function of several variables,including the filter length and filter type. Table 5 presents the reload time (in clock cycles) for each filterclass for the DA filter architecture.

Figure Top x-ref 35

Figure 36: High-Level View of DA FIR with Reloadable Coefficients

NDRFDRDY

DIN DOUT

COEF_LDCOEF_W ECOEF_DIN

DA FIRFilter

CoefficientBuffer Mem

DA LUTBuild Engine

BlockMemory


www.xilinx.com

FIR Compiler v3.2

32

Table 5: Coefficient Reload Times as a Function of Filter Type for DA architectures

Filter Type Latency L1

Single-Rate FIR 2,3

Halfband

Hilbert Transform

Interpolated

InterpolationDecimation4

Decimating HalfbandInterpolating Halfband

Notes: 1. Latency equations calculate number of cycles between the last coefficient written into block memory and RFD

being asserted.2. is the symbol for rounding down to the nearest integer (for example, )3. is the effective number of taps:

a. for Non-symmetric and Negative Symmetric filters, b. for Symmetric filters :

c. is the Sample Rate Change( and are temporary variables).

3 64 184

NL ⎛ + ⎞⎢ ⎥= × +⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠

1 42 64 18

4

N

L

⎛ ⎞⎢ + ⎥⎢ ⎥ +⎜ ⎟⎢ ⎥⎢ ⎥⎣ ⎦⎜ ⎟⎢ ⎥= × +⎜ ⎟⎢ ⎥⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠

1 32 64 18

4

N

L

⎛ ⎞⎢ + ⎥⎢ ⎥ +⎜ ⎟⎢ ⎥⎢ ⎥⎣ ⎦⎜ ⎟⎢ ⎥= × +⎜ ⎟⎢ ⎥⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠

3 64 184

NL ⎛ + ⎞⎢ ⎥= × +⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠

( )64 18L S= × + 44NY N RR

⎢ ⎥= − ×⎢ ⎥⎣ ⎦

if 0Y = , then 4NS =

if 0 Y R< < , then 4NS R YR

⎛ ⎞⎢ ⎥= × +⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠

if and Y R Y N≥ ≠ , then 14NS RR

⎛ ⎞⎢ ⎥= + ×⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠if Y N= , then S R=

1 32 64 82

4

N

L

⎛ ⎞⎢ + ⎥⎢ ⎥ +⎜ ⎟⎢ ⎥⎢ ⎥⎣ ⎦⎜ ⎟⎢ ⎥= × +⎜ ⎟⎢ ⎥⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠

x x 3.2 3=N

N Number of Taps=N Number of Taps 1+

2------------------------------------------------=

R S Y


www.xilinx.com

FIR Compiler v3.2


An example timing diagram for DA-based filter reload operation is shown in Figure 37.

Coefficient Reload for MAC-Based FIR Implementations

When a coefficient load operation is initiated for a MAC-based FIR implementation (available for fam-ilies with DSP slices and Embedded Multipliers), the new vector of coefficients is written directly intothe coefficient memory. The coefficient memory is split into two pages and the new vector is writteninto the inactive page. The active page is swapped after the last coefficient is written into the core.

The core operation is not disrupted during coefficient reload and the data buffer is not cleared follow-ing a reload. Sample processing proceeds without interruption. The timing for coefficient reload inter-face signals is illustrated in Figure 38.

Figure Top x-ref 36

Figure 37: Coefficient Reload Timing

Figure Top x-ref 37

Figure 38: Coefficient Reload Timing for Multiply-Accumulate Filters

COEF_WE

COEF_DIN

A0 A1 A2 A3 A4 B0 B1 B2 B0 B1 B2

A B B

Ao Bo Co Do

Ai Bi Ci Di


www.xilinx.com

FIR Compiler v3.2

34

The number of clock cycles required to reload a coefficient vector is simply equal to the length of thereloaded coefficient vector plus one cycle. The host driving the reload port can load the coefficientsover a period of as many samples as required by its application, subject to a minimum requirementequal to the length of the reloaded coefficient vector plus one cycle. The additional cycle is required forthe active page to be swapped. To minimize the reload time, it is only necessary to load the first half ofthe coefficient vector for symmetric coefficient sets, and only non-zero coefficients for halfband or Hil-bert coefficient sets.

The timing diagram indicates reloading of multiple filter sets. The COEF_FILTER_SEL port value issampled when the COEF_LD signal is pulsed to indicate the start of a reload operation and that is thefilter which is reloaded. The switch to the reload coefficients occurs for each filter set individually. InFigure 38, filter A is reloaded with five new coefficient values. The data samples continue to be pro-cessed with the current filter set until the reload is completed (samples Ai, Bi, and Ci leading to outputsAo, Bo, and Co), after which data samples are processed using the new coefficient set (presuming, ofcourse, that the selected filter set has not changed during that time). After filter set A has been reloaded,the user initiates a reload of filter set B. After loading three of the five coefficients, COEF_LD is pulsedonce more; this aborts the current reload procedure and signals the start of a new reload procedure,again to filter set B. Note that the level on COEF_WE is irrelevant during the COEF_LD pulse as it isignored along with any data on the COEF_DATA port for that clock cycle. The new reload procedurecan proceed to completion as indicated previously.

To minimize the resources required to implement the coefficient reload feature, it is necessary for usersto re-order the coefficients that are to be reloaded to correctly pass each coefficient to its correct storagelocation in the filter structure. This re-ordering is illustrated in Table 6 and Table 7 for some simplercases, and the patterns can be extended to larger filter lengths and rates. Users should particularly notethe special case of reloading coefficients for interpolating symmetric filter implementations, as the coef-ficients to be loaded must first be converted to the combined format used in the symmetric pair tech-nique, and then reordered as required. As the ordering (and in the latter case combination) of reloadcoefficients can be a complicated matter for even experienced users, the CORE Generator GUI has beenconfigured to output an informational text file, “<instance_name>_reload_order.txt”, which lists theindices of the coefficients in the order they should be reloaded into the filter via the reload port. In thecase of interpolating symmetric filters, the combination of coefficients is also defined as a sum or dif-ference of 2 indices. This text file is delivered to the project area selected by the user and can be anextremely useful reference to how the filter coefficients are arranged in the coefficient buffers for eachMAC element of the filter. It is strongly recommended that users refer to the reload order text file todetermine the required reload ordering for their filter.

Contact your Xilinx representative if you need any assistance or guidance in implementing the reloadcoefficient ordering for your specific filter implementation.


www.xilinx.com

FIR Compiler v3.2


Table 6: Filter Coefficient Reload Re-Ordering Examples (1)

Filter Configuration

Non-Symmetric Single Rate

16 CoefficientsClock freq. 4 MHz

Sample freq. 1 MHz

Non-Symmetric Single Rate


Sample freq. 1 MHz

Symmetric Single Rate


Sample freq. 1 MHz

Half BandSingle Rate


Sample freq. 1 MHz

Load Order Coefficient No. Coefficient No. Coefficient No. Coefficient No.

1 13 15 8 5

2 14 16 7 7

3 15 13 6 1

4 16 14 5 3

5 9 11 4 8

6 10 12 3

7 11 9 2

8 12 10 1

9 5 7

10 6 8

11 7 5

12 8 6

13 1 3

14 2 4

15 3 1

16 4 2

Table 7: Filter Coefficient Reload Re-Ordering Examples (2)


Non-symmetric Decimate by 216 Coefficients

Clock freq. 4 MHzSample freq. 1 MHz

Non-symmetric Interpolate by 216 Coefficients


Half BandDecimate by 215 Coefficients


Half BandInterpolate by 215 Coefficients



1 10 13 1 1

2 12 15 3 3

3 14 14 5 5

4 16 16 7 7

5 9 9 8 8

6 11 11

7 13 10

8 15 12


www.xilinx.com

FIR Compiler v3.2

36

CORE Generator GUI & ParametersA filter core is customized using a configuration wizard or graphical user interface (GUI). The informa-tional screens in the left-hand tabbed panel are shown in Figure 39 through Figure 41. The interactiveGUI screens are shown in Figure 42 through Figure 45. Note that the left-hand panel can be removed bydragging the centre bar fully to the left, or stretched to the full GUI window size by dragging fully tothe right. The entire GUI window can be enlarged to facilitate easy viewing of the presented informa-tion (this is of most benefit with the frequency response window).

Users should note the Tool Tips which appear when they hover the mouse over each parameter - thesebriefly describe each parameter as a minimum, but also provide feedback when their values or rangesare affected by other parameter selections the user has made (for example, the Coefficient StructureTool Tip displays the inferred structure when the user selects Inferred from the drop-down list.)

9 2 5

10 4 7

11 6 6

12 8 8

13 1 1

14 3 3

15 5 2

16 7 4

Table 7: Filter Coefficient Reload Re-Ordering Examples (2) (Continued)


Non-symmetric Decimate by 216 Coefficients


Non-symmetric Interpolate by 216 Coefficients


Half BandDecimate by 215 Coefficients


Half BandInterpolate by 215 Coefficients




www.xilinx.com

FIR Compiler v3.2


Tab 1: Core Symbol

The first tab in the left-hand panel displays the core symbol (see Figure 39).

Figure Top x-ref 38

Figure 39: Core Symbol Tab


www.xilinx.com

FIR Compiler v3.2

38

Tab 2: Filter Frequency Response Screen

The filter frequency response (magnitude only) is displayed in the second tab in the left-hand panel ofthe GUI (see Figure 40) and is the default tab on CORE Generator start-up. The left-hand panel as awhole can be adjusted to fit the whole GUI window if desired, as shown below, in which case the coreparameter window disappears, or can be adjusted to suit, subject to a minimum width for the parame-ter window

The frequency response of the currently selected coefficient set is plotted against normalized frequency.Where the COE file has been specified with integers (decimal, binary or hex), there is only a single plotbased on the provided values, which already has been quantized by the customer. Where the COE filehas been specified with real values (to a minimum of one decimal place), an ideal plot is displayedbased on the provided values alongside a Quantized plot based on a set of coefficient values quantizedaccording to the specified coefficient bitwidth. Where the Quantization option is set to “Normalize andQuantize,” the coefficients are first scaled to take full advantage of the available dynamic range, thenquantized according to the specified coefficient bitwidth. Then the quantized coefficients are summedto determine the resulting gain factor over the provided real coefficient set, and the resulting scale fac-tor is used to correct the filter response of the quantized coefficients such that the gain is factored out.The scale factor is reported in the legend text of the frequency response plot.

Figure Top x-ref 39

Figure 40: Frequency Response Tab


www.xilinx.com

FIR Compiler v3.2


Important Note: While an appreciable improvement in performance can be achieved by making use ofthe full dynamic range of the coefficient bitwidth, it is not always the case. The user must compensatefor any additional gain elsewhere in their application system. It is often desirable to amalgamate gainsinherent in a signal processing chain and compensate or adjust for these gains either at the front end(e.g., in an Automatic Gain Control circuit) or the back end (e.g., in a Constellation Decoder unit) of thechain. If the user has no facility to compensate for the additional gain, Quantize Only should be chosen.

Note the Passband and Stopband filter response analysis boxes beneath the plot. These boxes take theuser specified ranges for passband and stopband and provide useful feedback on the limits of the fre-quency response. The passband maximum, minimum and ripple values are provided (in dB), while themaximum value only is provided for the stopband. The user can specify any range for the passband,allowing closer analysis of any region of the response, e.g., examination of the transition region can bedone to more accurately examine the filter roll-off.


www.xilinx.com

FIR Compiler v3.2

40

Tab 3: Resource Estimation Screen

The third tab displays the Resource Estimation information (Figure 41), which is only available cur-rently for MAC-based FIR filters in device families that include DSP slices or Embedded Multipliers.

The Resource Estimation screen displays information about the usage of critical and limited FPGAresources. The number of DSP slices/Multipliers is displayed along with a count of the number ofblock RAM elements required to implement the design. Usage of general slice logic is not currently esti-mated.

It should be noted that the results presented in the Resource Estimation are estimates only using equa-tions which model the expected core implementation structure. The Resource Utilization option withinCORE Generator should be used after generating the core to get a more accurate report on all resourceusage. It is not guaranteed that the resource estimates given in the GUI will match the results of amapped core implementation.

Figure Top x-ref 40

Figure 41: Filter Configuration - Resource Estimation Tab


www.xilinx.com

FIR Compiler v3.2


Filter Specification Screen

The options available on the Filter Specification Screen (Figure 42) are used to define the basic configu-ration and performance of the filter. These are described below.

• Component Name: The user-defined filter component instance name.

• Coefficients File: Coefficient file name. This is the file of filter coefficients. The file has a COE extension and the file format is described in "Filter Coefficient Data" on page 60. The file can be selected through the dialog box activated by the “Browse”.

• Show Coefficients: Selecting this tab displays the filter coefficient data in a pop-up window.

• Number of Coefficient Sets: The number of sets of filter coefficients to be implemented. The value specified must divide without remainder into the number of coefficients derived from the COE file.

• Number of Coefficients (per set): The number of filter coefficients per filter set. This value is automatically derived from the COE file contents and the specified number of coefficient sets.

• Filter Type: Four filter types are supported: Single-rate FIR, Interpolated FIR, Interpolating FIR, and Decimating FIR.

• Rate Change Type: This field is applicable to Interpolation and Decimation filter types for Fractional Rate Change implementations. For the interpolation filter, it defines the up-sampling factor.

• Interpolation Rate Value: This field is applicable to all Interpolation filter types and Decimation

Figure Top x-ref 41

Figure 42: Filter Specification Screen


www.xilinx.com

FIR Compiler v3.2

42

filter types for Fractional Rate Change implementations. The value provided in this field defines the up-sampling factor, or P for Fixed Fractional Rate (P/Q) resampling filter implementations.

• Decimation Rate Value: This field is applicable to the all Decimation and Interpolation filter types for Fractional Rate Change implementations. The value provided in this field defines the down-sampling factor, or Q for Fixed Fractional Rate (P/Q) resampling filter implementations.

• Zero Packing Factor: This field is applicable to the interpolated filter only. The zero packing factor specifies the number of 0s inserted between the coefficient data supplied by the user in the COE (filter coefficient file). A zero packing factor of k inserts k-1 0s between the supplied coefficient values.

• Number of Channels: The number of channels processed by the filter.

• Input Sampling Frequency: This field can be an integer or real value. The upper limit is set based on the clock frequency and filter parameters such as Interpolation Rate and number of channels.

• Clock Frequency: This field can be an integer or real value. The limits are set based on the sample frequency, interpolation rate and number of channels, and the value provided is used along with these other parameters to determine the number of available clock cycles for data sample processing, which directly affect the level of parallelism in the core implementation. Note that this field influences architecture choices only, the specified clock rate may not be achievable by the final implementation.

Implementation Options Screen

The following describes the Implementation Options Screen (Figure 43).

Figure Top x-ref 42

Figure 43: Filter Configuration - Input Data, Coefficient Options, and COE File Screen


www.xilinx.com

FIR Compiler v3.2


• Filter Architecture: Two filter architectures are supported: Multiply-Accumulate and Distributed Arithmetic.

• Use Reloadable Coefficients: When the Reloadable option is selected, a coefficient reload interface is provided on the core.

• Coefficient Structure: Five coefficient structures are supported: Non-symmetric; Symmetric; Negative Symmetric; Half-band; Hilbert transform. The structure can also be inferred from the coefficient file directly (default setting), or specified directly. Note the inference algorithm only analyses the first 2048 coefficients. Only valid structure options, based on analysis of the provided coefficient file, are available for the user to specify directly.

• Coefficient Type: The coefficient data can be specified as either signed or unsigned. When the signed option is selected, conventional two’s complement representation is assumed.

• Coefficient Width: The bit precision of the filter coefficients. This field can be used with real value COE files (specified to a minimum of one decimal place) and the filter response graph to explore the possibilities for more efficient implementation by limiting coefficient bitwidth to the minimum required to meet the user’s target specification for the filter.

• Quantization: Specifies the quantization method to be used when real coefficient values (specified to a minimum of one decimal place) are defined in the COE file. Available options are “Quantize Only” or “Maximize Dynamic Range.” The “Quantize Only” option will simply round the provided real values to the nearest quantum using a simple rounding towards zero algorithm. The “Maximize Dynamic Range” option will scale all coefficients such that the maximum coefficient is equal to the maximum representable number in the specified bitwidth, thus maximizing the dynamic range of the filter (note that with the current implementation, overflow is not possible, as the accumulator width is automatically set to accommodate maximum bitgrowth within the filter.)

• Fractional Bits: This field reports back the fractional bitwidth used when quantizing the coefficient values provided. It’s value is equal to the Coefficient Width value minus the required integer bitwidth. The integer bitwidth value is static and is automatically determined by calculating the required integer bitwidth required to represent the maximum value contained in the provided coefficient sets. Note that fractional bitwidth may be a negative integer - this indicates that very large coefficient values have been provided but only the MSBs will be used in the filter. This value is also reported on the Summary Page.

• Input Data Type: The filter input data can be specified as either signed or unsigned. The signed option employs conventional two’s complement arithmetic.

• Input Data Width: The precision (in bits) of the filter input data samples.

• Output Rounding Mode: Specifies the type of rounding to be applied to the output of the filter

• Output Width: When using Full Precision, this field is disabled and indicates the output precision (in bits) of the filter output data samples, including bit growth; when using any other Rounding Mode, this field allows the user to specify the desired output sample width.

• Allow Rounding Approximation: When using either of the two Symmetric rounding modes, a spare cycle is normally required to allow determination of the sign of the final accumulated result; however it is possible to approximate symmetric rounding without this spare cycle by checking the sign of the penultimate accumulation value. This checkbox allows the user to specify whether or not such approximation is permitted.

• Registered Output: The filter output bus can be registered or unregistered. When the registered output option is selected, the filter output bus DOUT is maintained at the core output between


www.xilinx.com

FIR Compiler v3.2

44

successive assertions of RDY. In the unregistered mode, the output sample is valid only when RDY is active. At other times, the port changes on successive clock cycles.

• Filter Response Analysis: Parameters in this etch-box affect the filter response analysis fields of the Frequency Response Tab.

• Passband Range: Two fields are available to specify the passband range, the left-most being the minimum value and the right-most the maximum value. The values are specified in the same units as on the graph x-axis (for example, normalized to pi radians/sec).

• Stopband Range: Two fields are available to specify the stopband range, the left-most being the minimum value and the right-most the maximum value. The values are specified in the same units as on the graph x-axis (for example, normalized to pi radians/sec).

• Set to Display: This selects which of multiple coefficient sets (if applicable) is displayed in the Frequency Response Graph.

Detailed Implementation Options Screen

The Detailed Implementation Options screen (Figure 44) is described in this section. Be aware thatusing the available control pins can require a moderate increase in resources and can lead to a reductionin maximum achievable clock frequencies. These option should only be used if required. Halting of thecore’s operation can be achieved either with CE (which freezes all core operations) or by holding NDLow (which allows samples currently being processed to be completed) and pausing the input datastream until resumption of normal core operation is desired.

Figure Top x-ref 43

Figure 44: Filter Configuration - Control, Implementation, and DSP48 Column Options Screen


www.xilinx.com

FIR Compiler v3.2


• Optimization Goal: Specifies if the core is required to operate at maximum possible speed (“Speed” option) or minimum area (“Area” option). The “Area” option is the recommended default and will normally achieve the best speed and area for the design, however in certain configurations, the “Speed” setting may be required to improve performance at the expense of overall resource usage (this setting normally adds pipeline registers in critical paths).

• SCLR: Specifies if the core will have a reset pin. This pin can be used with any other pin combination.

• CE: Specifies if the core will have a clock enable pin. This pin can be used with any other pin combination, although it can be used to replace ND as a means to halt core operation, which can lead to significant reductions in resource usage for parallel symmetric filter implementation structures.

• ND: Specifies if the core will have a New Data pin. This pin can be used with any other pin combination. If the ND pin is not present, samples are assumed to be present on the input data bus at specific cycle times according to the designated sample rate, and the input is sampled at those times. This is indicated by the core by RFD pulsing high during those cycles.

• Memory Options: The memory type for MAC implementations can either be user-selected or chosen automatically to suit the best implementation options. Several new options have been added in v3.0 of the core (described below). This option is disabled for DA-based architecture and is limited to Data and Coefficient Buffers for families which do not have DSP slices or Embedded Multipliers available, with no Automatic selection facility. Note that a choice of “Distributed” may result in shift register implementation where appropriate to the filter structure. Forcing the RAM selection to be either Block or Distributed should be used with caution, as inappropriate use can lead to inefficient resource usage - the default Automatic mode is recommended for most users.

• Data Buffer Type: Specifies the type of RAM to be used to store data within a MAC element. Users can select either “Block” or “Distributed” RAM options, or select “Automatic” to allow the core to choose the memory type appropriately.

• Coefficient Buffer Type: Specifies the type of RAM to be used to store coefficients within a MAC element. Users can select either “Block” or “Distributed” RAM options, or select “Automatic” to allow the core to choose the memory type appropriately.

• Input Buffer Type: Specifies the type of RAM to be used to implement the data input buffer, where present. Users can select either “Block” or “Distributed” RAM options, or select “Automatic” to allow the core to choose the memory type appropriately.

• Output Buffer Type: Specifies the type of RAM to be used to implement the data output buffer, where present. Users can select either “Block” or “Distributed” RAM options, or select “Automatic” to allow the core to choose the memory type appropriately.

• Preference for Other Storage: Specifies the type of RAM to be used to implement general storage in the datapath. Users can select either “Block” or “Distributed” RAM options, or select “Automatic” to allow the core to choose the memory type appropriately. Since this covers several different types of storage, it is recommended that users only specify this type of memory directly if they really need to steer the core away from using a particular memory resource (e.g., if they are short of Block RAMs in their overall design).

• Multi-Column Support: For device families with DSP slices, implementations of large high speed filters might require chaining of DSP slice elements across multiple columns. Where applicable (the feature is only enabled for multi-column devices), the user can select the method of folding of the filter structure across the multiple-columns, which can be “Automatic” (based on the selected


www.xilinx.com

FIR Compiler v3.2

46

device for the project) or “Custom” (user selects length of first and subsequent columns).

• First Column Length: The first column length may be different from other columns, to allow users to configure a core which can be placed efficiently alongside existing blocks. In “Automatic” mode, this is set to the full column length of the chosen device.

• Column Wrap Length: The lengths of subsequent columns is defined by this field, to allow users to restrict the core’s column length to a smaller section of the chosen device to allow it to co-exist in the same device as other design blocks. In “Automatic” mode, this is set to the full column length of the chosen device. In “Custom” mode, this must be at least as long as the first column.

• Inter-Column Pipe Length: Pipeline stages are required to connect between the columns, with the level of pipelining required being dependent upon the required system clock rate, the chosen device and other system-level parameters - choice of this parameter is always left for the user to specify.

Note: Symmetric coefficient structures are not exploited in multi-column implementations. For multi-channel implementations with symmetric coefficients, it can often be more efficient to split the channels across two smaller filter applications than to amalgamate all channels into a single, larger filter that has to span multiple columns.

Summary Screen

The information available on the Summary Screen (Figure 45) is described below.

• Summary: The final page provides summary information about the core parameters selected,

Figure Top x-ref 44

Figure 45: Filter Configuration - Summary Screen


www.xilinx.com

FIR Compiler v3.2


including: information on the actual number of calculated coefficients including padding; the inferred or specified coefficient structure; the additional gain incurred as data passes through the filter due to maximizing the coefficient dynamic range during quantization; the specified output width along with the full precision width for comparison; the calculated cycle-latency value; and the latency delta from the previous major revision of the core.

Interface, Control, and TimingAll of the filter classes employ a data-flow style interface for supplying input samples to the core andfor reading the filter output port. ND (New Data), RFD (Read For Data) and RDY (Ready) are used toco-ordinate I/O operations. In addition, for multi-channel filters, CHAN_IN and CHAN_OUT indicatethe active input and output stream respectively, and for multiple coefficient sets the current set to beused is specified using FILT_SEL. Generally these flow control signals are compulsory, however forMAC-based FIR filter implementations on device families with DSP slices or Embedded Multipliers,ND is optional and a Clock Enable (CE) pin is provided to allow core processing operations to behalted.

Handshake Control Signals

ND is an Active High input signal which, when asserted, indicates to the core that there is a valid inputsample on the DIN port. ND is internally qualified with the active High output status signal RFD.When both RFD and ND are asserted, the DIN port is sampled on the rising clock edge. The active Highoutput signal RDY indicates that a valid output sum-of-products is available on the DOUT port. ForMAC implementations in device families with DSP slices, ND is optional, in which case the filteralways takes data from the input port on the first cycle that RFD is asserted, or continuously for parallelfilter structures. For parallel symmetric filters, use of CE without ND can lead to an appreciably moreefficient implementation.

The handshake signals provide a simple and efficient interface to control the flow of sample data andresults. Similar to a clock enable signal the ND signal is used to enable the input of samples into the fil-ter. The difference between ND and a clock enable is that the ND signal starts the processing operationthat continues to completion. By not asserting the ND signal further, processing is halted, whereas aclock enable provides an immediate start and stop of the processing operation. A clock enable pin isprovided for MAC-based implementations on devices with DSP slices or Embedded Multipliers, andits use is compatible with the ND function - it can be used with or without ND being present.

RFD provides a status signal for upstream data flow control and, when asserted, it indicates that thecore can accept more input samples. The RDY signal is often used as a clock enable for the next stage ofprocessing or as the ND signal when filters are cascaded.

Resetting the Core

SCLR (Synchronous Clear) is an active High input port which, when asserted, forces the internal controllogic to the initialized condition. No internal data is cleared from the memory during the reset process.Following a reset operation, the sum-of-product results remain dependent on the prior input samplesuntil the filter data memory is completely flushed.

Input/Output Channel Decoding

When configured for multiple-channel operation, two channel indicator status output ports are pro-vided: CHAN_IN and CHAN_OUT. The CHAN_IN port identifies the input channel number;CHAN_OUT provides the mapping between the current sample on the filter output port DOUT and the


www.xilinx.com

FIR Compiler v3.2

48

sample stream number. These signals are often used as select controls for multiplexing input streams orde-multiplexing the time division multiplexed result bus. The CHAN_OUT signal is valid when RDY isasserted and changes after the falling edge of RDY.

When configured for multiple-coefficient operation, an additional input port FILTER_SEL is provided.The FILTER_SEL port identifies which set of coefficients will be used to process the current set of data.This port is latched along with the input data DIN is sampled. The value on this port is used to addressthe portion of the coefficient buffer containing the desired coefficients.

Nomenclature

In the timing diagrams supplied in this section, the notation, and to denote the filter inputand output samples respectively. In some diagrams, for space reasons, the variable name ( or ) hasbeen omitted and the diagram is annotated only with the index value

MAC-Based FIR Filter Timing

Timing for Families with DSP Slices or Embedded Multipliers

Figure 46 illustrates the timing for a single-rate, single-channel, N-tap MAC-based filter implementedin device families that include DSP slices or Embedded Multipliers. ND is asserted while valid input isavailable on the DIN port. At the rising edge of the clock, the data is sampled and processing begins.RFD is deasserted to reflect that the MAC-based FIR core is processing the data and unable to acceptfurther input samples for RFD latency clocks. The RFD latency and latency of the filter is a function ofthe number of taps, filter type, number of channels, and symmetry. After a number of clock cycles equalto the in-built filter latency, RDY is asserted and the valid filter output is presented on the DOUT port.In this example, the DOUT value is held in the optional output register. In this configuration, core oper-ation can be halted by holding ND Low for the required idle note. However, the core continues to pro-cess any input data sampled so far and to produce outputs based on those input samples .

Figure 47 illustrates the timing for the same filter without the ND port. When RFD is asserted, the inputdata is sampled at the rising edge of the clock and processing begins. RFD is deasserted as normal toindicate that the core cannot accept further input samples for RFD latency clocks. When processing hascompleted, RFD is asserted once more for a single cycle and the next input data is processed. Note thatin this configuration, it is required that the system or circuit that is driving the input data continues tofeed data to the filter at the specified input rate, otherwise invalid data will be sampled. Similarly, data

Figure Top x-ref 45

Figure 46: Timing Diagram for a Single Channel Filter Using ND, With Registered Output

x n( ) y n( )x y

n.

Clock

RFD

ND

DIN

RDY

DOUT

Clock

RFD

ND

DIN

RDY

DOUT

0 1 2 3

1 2 30


www.xilinx.com

FIR Compiler v3.2


samples should be held until the RFD signal is asserted, otherwise that sample will be missed. After anumber of clock cycles equal to the in-built filter latency, RDY is asserted and the valid filter output ispresented on the DOUT port. In this example, the DOUT value is held in the optional output register.If halting of core operation is required in this configuration, a clock enable pin is required on the core tohalt all core operation. This is fundamentally different than halting the filter using ND—the clockenable halts all core operation and no outputs will be produced during the period for which CE is deasserted. Core outputs will continue only after CE is asserted once more.

Figure 48 illustrates the timing for a multi-channel filter. The core accepts inputs for each channelsequentially (Time Domain Multiplexed or TDM format). Outputs are also presented as TDM format. Achannel indicator is provided to track the currently active input and output channel. .

Figure 49 illustrates the timing for a multi-channel filter which also has multiple filter sets. The filterinterface operation is as described previously for multi-channel mode, but in this case there is a switchto an alternative filter set during the third data input cycle shown in the diagram. The filter set

Figure Top x-ref 46

Figure 47: Timing Diagram for a Single Channel Filter Without ND Port, With Registered Output

Figure Top x-ref 47

Figure 48: Timing Diagram for a 3-Channel Filter with ND Port and Registered Output

Clock

RFD

DIN

RDY

DOUT

Clock

RFD

DIN

RDY

DOUT

0 1 2 3

0 1 2

Cycle Latency

Clock

RFD

ND

CHAN_IN

DIN

RDY

CHAN_OUT

DOUT

Clock

RFD

ND

CHAN_IN

DIN

RDY

CHAN_OUT

DOUT

0 1 2 0 1 2

DIc0 DIc1 DIc2

DOc0 DOc1 DOc2

0 1 2 0 1


www.xilinx.com

FIR Compiler v3.2

50

switch-over can occur on any data input cycle and the filter will immediately move to that set of coef-ficients for processing that data sample (and all subsequent data samples while the filter select portvalue remains the same) through the filter.

Changing the filter select port setting arbitrarily between channels is permitted and supported,although this is not a common requirement. More common would be association of a particular coeffi-cient set with a particular channel, as illustrated in Figure 50.

Figure Top x-ref 48

Figure 49: Timing Diagram for a Multi-channel Filter with Multiple Filter Sets

Figure Top x-ref 49

Figure 50: Timing Diagram for Filter with Channel Tied to Coefficient Set

Clock

RFD

CHAN_IN

FILTER_SEL

DIN

RDY

CHAN_OUT

DOUT

Clock

RFD

CHAN_IN

FILTER_SEL

DIN

RDY

CHAN_OUT

DOUT

0 0 1 1 1 1

DIc0 DIc1 DIc2

DOc0, Set 0 DOc1, Set 0 DOc2, Set 1 DOc0, Set 1 DOc1, Set 1

0 1 2 0 1 2

0 1 2 0 1

Clock

RFD

CHAN_IN

FILTER_SEL

DIN

RDY

CHAN_OUT

DOUT

Clock

RFD

CHAN_IN

FILTER_SEL

DIN

RDY

CHAN_OUT

DOUT

0 1 2 0 1 2

0 1 2 0 1 2

DIc0 DIc1 DIc2

DOc0, Set 0 DOc1, Set 1 DOc2, Set 2 DOc0, Set 0 DOc1, Set 1

0 1 2 0 1


www.xilinx.com

FIR Compiler v3.2


Multi-rate filters involve an increase or decrease in rate from input to output. Figure 51 shows amulti-channel (two channels) decimation filter with a rate decrease of two. Input data is taken in TDMformat with two input samples for each channel being required before an output can be produced. Out-put data is also presented in TDM format at the lower rate.

Figure 52 shows a multi-channel (two channels) interpolation filter with a rate increase of two. Noteonce again that input data is taken in TDM format. Output data is then presented in TDM format at thehigher rate.

Figure Top x-ref 50

Figure 51: Timing Diagram for Multi-Channel Decimation Filter

Figure Top x-ref 51

Figure 52: Timing Diagram for Multi-Channel Interpolation Filter

Clock

RFD

ND

CHAN_IN

DIN

RDY

CHAN_OUT

DOUT

Clock

RFD

ND

CHAN_IN

DIN

RDY

CHAN_OUT

DOUT

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

0 0 1 1 2 2 3 3 4 4 5 5 6 6 7

0 0 1 1 2 2

0 1 0 1 0 1

Clock

RFD

ND

CHAN_IN

DIN

RDY

CHAN_OUT

DOUT

Clock

RFD

ND

CHAN_IN

DIN

RDY

CHAN_OUT

DOUT

0 1 0 1 0 1 0 1

0 1 0 1 0 1

0 0 1 1 2 2

0 0 1 1 2 2 3 3


www.xilinx.com

FIR Compiler v3.2

52

Timing for Older Families

Figure 53 illustrates the timing for a single-rate, single-channel, N-tap MAC filter. ND is asserted whilevalid input is available on the DIN port. At the rising edge of the clock, the data is sampled and pro-cessing begins. RFD is deasserted to reflect that the MAC-based FIR core is processing the data andunable to accept further input samples for RFD latency clocks. The RFD latency and latency of the filteris a function of the number of taps, filter type, number of channels, and symmetry. After a number ofclock cycles equal to the filter length and the in-built filter latency, RDY is asserted and the valid filteroutput is presented on the DOUT port. In this example, the DOUT value is held in the optional outputregister.

Notes: 1. The DIN port has an input buffer when in a multi-channel configuration, which allows multiple input samples

to be burst into the filter.2. For multi-channel, polyphase decimation filters, some clock cycles are needed for storing the data; therefore,

the maximum theoretical throughput rate cannot be supported. Because of latencies through the filter, five extra clocks are needed to process the data. For example, for a 30-tap, non-symmetric, polyphase decimating filter, 35 clocks are needed to process each set of M input samples, rather than the theoretical minimum of 30 clocks.

Figure Top x-ref 52

Figure 53: Single-Rate, Single-Channel Filter Timing


www.xilinx.com

FIR Compiler v3.2


Distributed Arithmetic Filter Timing

Single-Channel and Multi-Channel DA FIR Filters

The timing for a single-channel filter, with L clock cycles per output sample and a registered outputport, is shown in Figure 54. The ND input signal is used for loading a new input sample into the filter.It is effectively used internally as a clock enable, and the actual sample load operation occurs on the ris-ing of the clock (CLK). When the core is ready to accept a new input sample, the RFD signal is asserted.When a new output sample is available, RDY is asserted for a single clock period. When the registeredoutput option is selected, the output sample remains valid between successive assertions of RDY.

Figure 55 shows the timing for a single-channel filter with an unregistered output port. The input tim-ing is the same as for the registered output example, but now the filter result is valid for only a singleclock period and is framed by RDY.

Figure Top x-ref 53

Figure 54: Single-Channel FIR fIlter Timing. L-Clock Cycles Per Output Sample, Registered Output

Figure Top x-ref 54

Figure 55: Single-Channel FIR Filter Timing. L-Clock Cycles Per Output Sample, Unregistered Output

CLK

ND

RFD

1 L-3 L-2 L-1 0 10L-1L-2PSCDATAOUT

RDY

DIN VALID

new filter input sample x(n)

VALID

new filter input sample x(n+1)

DOUT

new filter output sample

y(n) y(n+1)


interval depends on filter latency (1)

INPUT SAMPLE LOADED ON THIS CLOCK EDGE

i i+1 i i+1

�

The latency is reported on the filter GUI1.

CLK

ND

RFD

PSCDATAOUT

RDY

DIN VALID


VALID


DOUT

new filter output samplenew filter output sample

interval depends on filter latency

VALID VALID

1 L-3 L-2 L-1 0 10L-1L-2 i i+1 i i+1

(1)

The1. latency is reported on the filter GUI


www.xilinx.com

FIR Compiler v3.2

54

In the two previous examples, the host system supplied input samples at the highest frequency possi-ble (every L clock tick). This does not have to be the case. Data samples can be supplied at a lower ratewithout disturbing the operation of the filter, as shown in Figure 56.

In this example, despite the filter being designed to specify L clock cycles per output sample, new data(input samples) is supplied to the filter every L+2 clock periods. Observe that RFD is still asserted onthe Lth clock cycle of a data sample epoch, but the host system supplies a new input sample only twoclock cycles later. RFD remains active until the new input sample has been accepted by the filter core.This occurs synchronously with the positive going edge of the clock and with ND acting as an activeHigh clock enable.

As a specific example of the filter interface timing, consider a non-symmetric single-channel FIR filterwith 10-bit precision input samples and a full serial realization (L=10). The timing diagram is shown inFigure 57. Ten clock cycles are needed to process each new input sample.

A symmetrical filter with B-bit precision input samples requires, in general, B+1 clock periods for a fullserial (SDA) implementation. Figure 58 shows the timing for a single-channel symmetrical FIR employ-ing 10-bit input samples. In this case, eleven clock cycles (L=11) are required to process each new pieceof data.

Figure Top x-ref 55

Figure 56: L-Clock Cycles Per Output Sample, Registered Output

Figure Top x-ref 56

Figure 57: Full Serial Implementation, 10-bit Input Samples, Registered Output

CLK

ND

RFD

PSCDATAOUT

RDY

DIN VALID


VALID


DOUT y(n) y(n+1)


interval depends on filter latency (1)

1 L-3 L-2 L-1 0 10 i i+1 i i+1

The latency is reported on the filter GUI1.

RDY

PSCDATAOUT

ND

CLK

DOUT

RFD

0 1

DIN VALID VALID VALID

2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

y(n+1)y(n)

0 1 2


www.xilinx.com

FIR Compiler v3.2


The previous two figures illustrate the timing for full serial or SDA filter implementations with sym-metrical and non-symmetrical coefficient data. The CORE Generator filter core supports various typesof parallel filter realizations. The greater the degree of filter parallelism employed, the higher the filtersample rate. Filter parallelism is specified in terms of the number of clock cycles (L) required to com-pute an output sample. This value is accessed via the filter core GUI when the Multi clock cycles per out-put sample is selected in the Implementation Option field. The associated drop-down menu indicates validoptions for L. The valid options for L depend on the filter parameters, symmetrical/non-symmetricalcoefficient data and precision of the input samples. For example, for an input sample precision B=10and using a non-symmetrical impulse response, the valid values for L are {1, 2, 3, 4, 5, 10}. For B=10 anda symmetrical impulse response L={1, 2, 3, 4, 6, 11}.

Figure 59, Figure 60, and Figure 61 illustrate the timing diagrams for a filter with B=10 bit precisioninput samples, registered output, with L=2, 4, and, 6, respectively. .

Figure Top x-ref 57

Figure 58: Full Serial Implementation, 10-Bit Input Samples, Symmetrical Impulse Response, Registered Output

Figure Top x-ref 58

Figure 59: PDA FIR With B=10-Bit Input Samples, L=2 Clock Cycles Per Output Sample

Figure Top x-ref 59


RDY

PSCDATAOUT

ND

CLK

DOUT

RFD

0 1

DIN VALID VALID VALID

2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0

y(n+1)y(n)

RDY

PSCDATAOUT

ND

CLK

DOUT

RFD

0 1

DIN x(n)

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

y(n+1)y(n)

0 1 01

x(n+2)x(n+1) x(n+3) x(n+4) x(n+5) x(n+6) x(n+7) x(n+8) x(n+9) x(n+10) x(n+11)

y(n+2) y(n+3) y(n+4) y(n+5) y(n+6) y(n+7) y(n+8) y(n+9) y(n+10)

RDY

PSCDATAOUT

ND

CLK

DOUT

RFD

0 1

DIN x(n)

2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

y(n+1)y(n)

0 1 23

x(n+1) x(n+2) x(n+3) x(n+4) x(n+5)

y(n+2) y(n+3) y(n+4)interval dependson filter latency


www.xilinx.com

FIR Compiler v3.2

56

Figure 62 illustrates the filter timing for a fully parallel DA (PDA) FIR filter. Observe that after the ini-tial start-up latency a new output sample is available on every clock edge. The number of clock cyclesin the start-up latency period is a function of the filter parameters. This value is reported in the filterdesign GUI in addition to the associated VHO (or VEO) file. See "Interface, Control, and Timing" onpage 47.

The figure shows ND valid on every clock edge, so a new input sample is delivered to the filter on eachclock edge. Of course, ND can be removed for an arbitrary number of clock cycles to temporarily sus-pend the filter operation. No internal state information is lost when this is done, and the filter resumesnormal operation when ND is reapplied (placed in the active again).

Figure 63 and Figure 64 demonstrate the timing for a multi-channel filter. Multi-channel filters providetwo additional output ports, SEL_I and SEL_O, that indicate the active input and output channelrespectively. Figure 63 illustrates a filter with an unregistered output. With a fully parallel implemen-tation, a new output sample is available on each clock edge (after the start-up latency), independent ofthe filter length or the bit precision of the input data samples.

Figure Top x-ref 60


Figure Top x-ref 61

Figure 62: Fully Parallel Implementation, Single-Channel Filter

RDY

PSCDATAOUT

ND

CLK

DOUT

RFD

0 1

DIN x(n)

2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1

y(n)

2 3 45

x(n+1) x(n+2) x(n+3)

y(n+1) y(n+2)interval dependson filter latency

RDY

ND

CLK

DOUT

RFD

DIN x(n) x(n+1) x(n+12)x(n+2) x(n+3) x(n+4) x(n+5) x(n+6) x(n+7) x(n+8) x(n+9) x(n+10) x(n+11) x(n+13) x(n+14) x(n+15) x(n+16) x(n+17) x(n+18) x(n+19) x(n+20) x(n+21) x(n+22)

y(n+5)y(n) y(n+1) y(n+2) y(n+3) y(n+4) y(n+6) y(n+7) y(n+8) y(n+9)

interval dependson filter latency


www.xilinx.com

FIR Compiler v3.2


Figure 64 shows the Multi-channel FIR filter timing for registered output samples.

Figure 65 demonstrates the timing for a polyphase decimator with and eight clock cyclesper output point (Clock Cycles/Output Sample=8). As previously stated, all of the multi-rate filter struc-tures—the number of clock cycles per output point specification (Clock Cycles/Output Sample)—see theindividual filter segments that comprise the filter, and are not directly associated with the filter outputport DOUT. The filter is always able to accept input samples, as indicated by RFD=1. New output sam-ples become available after M (in this case four) input samples have been delivered to the filter. Newoutput samples are produced in response to each new block of four input values. Delivering the finalvalue in each M-tuple begins a new inner product calculation. The resulting output sample becomesavailable a number of clock cycles (k) after the final sample in the M-tuple is delivered. The exact valueof k is a function of the filter parameterization. It is tightly coupled to the input sample bit precision, thevalue specified for the Clock Cycles/Output Sample parameter, and to the number of internal pipelinestages and the data buffering depth in the filter. It is always recommended to use the output control sig-nal RDY to coordinate all processes that are data sinks for the filter output port DOUT.

Figure Top x-ref 62

Figure 63: Multi-Channel FIR Filter Timing (direct output)

Figure Top x-ref 63

Figure 64: Multi-Channel FIR Filter Timing (registered output)

RDY

PSCDATAOUT

SEL_I CHAN 0 CHAN 1 CHAN 2 CHAN 0CHAN N-1

ND

CLK

SEL_O CHAN 0 CHAN 1 CHAN 2 CHAN N-1

INTERVAL DEPENDS ON FILTER PARAMETERS

DOUT

RFD

B0 B1 BB-1 B0 B1 BB-1 B0 B1 BB-1 B0 B1 BB-1 B0 B1 BB-1

DIN VALID VALID VALID VALID VALID VALID

VALID VALID VALID VALID VALID VALID

RDY

PSCDATAOUT

SEL_I CHAN 0 CHAN 1 CHAN 2 CHAN 0CHAN N-1

ND

CLK

SEL_O CHAN 0 CHAN 1 CHAN 2 CHAN N-1

INTERVAL DEPENDS ON FILTER PARAMETERS

DOUT y0,n y1,n y2,n yN-1,n y0,n+1

RFD

B0 B1 BB-1 B0 B1 BB-1 B0 B1 BB-1 B0 B1 BB-1 B0 B1 BB-1

DIN VALID VALID VALID VALID VALID VALID

CHAN 0

M B= =4 8,


www.xilinx.com

FIR Compiler v3.2

58

Figure 65 illustrates the timing for a 4-to-1 polyphase decimator with similar parameters to the filterconsidered in Figure 63, but in this case the number of Clock Cycles/Output Sample is L=4. Observe thateven though the input sample precision (B=8) is the same as in the filter demonstrated in Figure 63,samples can be presented to filter every four clock cycles, in contrast to every eight clock periods in theprevious example. The filter supports double the input sample rate and, therefore, twice the band-width, of the filter with L=8.

Polyphase Decimator DA FIR Filter Timing: Burst Input Mode

Internal buffering in the polyphase decimator allows the user to burst samples into the DIN port. Thisis illustrated in Figure 66 for a down-sampling factor M=4, 12-bit input samples, and L=12. This figureshows the timing for the filter starting from rest; that is, no data has been previously applied to theinput port. Notice in this case that a total of 8 samples can be written to the filter before the deviceremoves RFD.

After the filter has moved out of this start-up state, input samples must obey the timing diagram shownin Figure 67. Only four samples can be supplied in each data burst.

Figure Top x-ref 64

Figure 65: 8-Bit Precision Input Samples, Down-Sampling Factor M=4, L=8.

Figure Top x-ref 65

Figure 66: 8-Bit Precision Input Samples, Down-Sampling Factor M=4, L=4.

ND

CLK

RFD

DIN 0 1 2 3 4 5 6 7 8 9 10 11

First Input Sample Delivered to Filter

CLOCKCYCLE # 0 1 7 0 1 7 0 1 7 0 1 7 0 1 7 0 1 7 0 1 7 0 1 7 0 1 7 0 1 7 0 1 7 0 1 7

Interval Depends on Filter Parameters

RDY

DOUT y(0) y(1)

First Output Available

ND

CLK

RFD

DIN 0 1 2 3 4 5 6 7 8 9 10 11

First Input Sample Delivered to Filter

CLOCKCYCLE #

Interval Depends on Filter Parameters

RDY

DOUT y(0) y(1)

First Output Available

0 1 32 0 1 32 0 1 32 0 1 32 0 1 32 0 1 32 0 1 32 0 1 32 0 1 32 0 1 32 0 1 32 0 1 32


www.xilinx.com

FIR Compiler v3.2


As with the Clock Cycles/Output Sample parameter for the single-rate filters, this parameter can be usedwith all the multi-rate filters to tradeoff performance with silicon area. Figure 68 shows the polyphasedecimator timing with 12-bit precision input samples, down-sampling factor M=4, L=12, and burstinput data operation. This diagram shows timing after the filter has moved out of the start-up timing.

Polyphase Interpolator DA FIR Filter Timing

Figure 69 shows the timing for a polyphase interpolator that supports a sample rate change of P=4,eight-bit precision input samples (B=8) and eight clock cycles-per-output point. Again, as with thepolyphase decimator, the number of clock cycles specified per output point is associated with the indi-vidual subfilters in the polyphase structure. In this example, each subfilter produces a new output sam-ple every eight clock cycles. The four polyphase segments are actually operating concurrently so, infact, internal to the filter, four new output samples are available every eight clock cycles. When the newblock of output samples is available, the samples are sequenced to the filter output port DOUT using aninternal multiplexor. The multiplexor select signal is referenced to the filter master clock signal CLK. Asshown in Figure 69, the vector of P output samples is validated by the core output control signal RDY.

Figure Top x-ref 66

Figure 67: Polyphase Decimator Timing, Filter Out of Start-up State

Figure Top x-ref 67

Figure 68: Polyphase Decimator Timing,12-bit Precision Samples

RDY

ND

CLK

DOUT

RFD

DIN x(0) x(1) x(2) x(3) x(4) x(5) x(6) x(7)

y(0)

RDY

ND

CLK

DOUT

RFD

DIN x(n) x(n+1) x(n+2) x(n+3) x(n+4) x(n+5) x(n+6)

NEW OUTPUT

x(n+7)

NEW OUTPUT


www.xilinx.com

FIR Compiler v3.2

60

Figure 70 shows the timing for an interpolator with similar parameters to the previous example, but inthis case a value of L=4 has been used. This means that each polyphase segment produces a new outputsample every four clock cycles. In addition, all four outputs become available (internally) in parallel.Observe that after the initial startup latency a new interpolant is available at the filter output portDOUT on each successive rising edge of the clock. .

Filter Coefficient DataThe filter coefficients are supplied to the filter compiler using a coefficient file with a COE extension.This is an ASCII text file with a single-line header that defines the radix of the number representationused for the coefficient data, followed by the coefficient values themselves. This is shown in Figure 71for an N-tap filter.

The filter coefficients can be supplied as integers in either base-10, base-16 or base-2 representation.This corresponds to coefficient_radix=10, coefficient_radix=16 and coefficient_radix=2 respectively. Alter-natively, the coefficients can be entered as real numbers (specified to a minimum of one decimal place)in base-10 only. Note that if the user enters signed negative symmetric hexadecimal coefficients, each

Figure Top x-ref 68

Figure 69: Polyphase Interpolator Timing. 8-Bit Precision Input Samples, Up-Sampling Factor P=4, L=8.

Figure Top x-ref 69

Figure 70: Polyphase Interpolator Timing. 8-Bit Precision Input Samples, Up-Sampling Factor P=4, L=4.

radix=coefficient_radix;coefdata=a(0),a(1),a(2),….a(N-1);

Figure 71: Filter Coefficient File Format

RDY

CLOCKCYCLE #

ND

CLK

DOUT

RFD

DIN x(n) x(n+1) x(n+2) x(n+3)

interval depends on filter latency - Which is a Function of the Filter Parameters

y(n) y(n+1) y(n+2) y(n+3)

50 76 50 76 50 76 50 76

y(n+4) y(n+5) y(n+6) y(n+7)

4 4 4 4

y(n+8) y(n+9) y(n+10) y(n+11)

RDY

CLOCKCYCLE #

ND

CLK

DOUT

RFD

DIN x(n) x(n+1) x(n+2) x(n+3)

interval depends on filter latency - Which is a Function of the Filter Parameters

y(n) y(n+1) y(n+2) y(n+3) y(n+4) y(n+5) y(n+6) y(n+7) y(n+8) y(n+9) y(n+10) y(n+11)

30 21 30 21 30 21 30 21 30 21 30 21

x(n+4) x(n+5)

y(n+12)y(n+13) y(n+14)


www.xilinx.com

FIR Compiler v3.2


value should be sign-extended to the boundary of the most significant nibble or hex character - thisensures that coefficient structure inference can be performed correctly (this includes Hilbert transformfilter types, which are also negative symmetric.)

The coefficient values can also be placed on a single line as shown in Figure 72.

The coefficient file format for each of the filter classes supported by the core are discussed in the sec-tions below.

Single-Rate FIR

The coefficient file for the single-rate FIR filter is straightforward and consists of a one-line header fol-lowed by the filter coefficient data. For example, the filter coefficient file for an 8-tap filter using abase-10 representation for the coefficient values is shown in Figure 73:

Irrespective of the filter possessing positive or negative symmetry, the coefficient file should containthe complete set of coefficient values. The filter coefficient file for the non-symmetric impulse responseshown in Figure 74 is presented in Figure 75.

radix=coefficient_radix;coefdata=a(0),a(1),a(2),….,a(N-1);

Figure 72: Filter Coefficient File Format—Coefficient Data on a Single Line

radix=10;coefdata=20,-256,200,255,255,200,-256,20;

Figure 73: Filter Coefficient File—8-Tap Filter, Base-10 Coefficient Values

Figure 74: Nonsymmetric Impulse Response

radix=10;coefdata=255,200,-180,80,220,180,100,-48,40;

Figure 75: Coefficient File for the Non-symmetric Impulse Response

255

200

-180

80

220

180100

-48

40


www.xilinx.com

FIR Compiler v3.2

62

The coefficient file for the negative-symmetric filter characterized by the impulse response in Figure 76is shown in Figure 77.

Half-Band Filter

As previously described, every second filter coefficient for a half-band filter with an odd number ofterms is zero. When specifying the filter coefficient data for this filter class, the zero value entries mustbe included in the coefficient file. For example, the filter coefficient file that specifies the filter impulseresponse in Figure 78 is shown in Figure 79.

The filter coefficient set is parsed by the filter compiler. If either the alternating zero entries are absentor the coefficient set is not even-symmetric, this is flagged as an error and the filter is not generated. Adialog box is presented to indicate the nature of the problem under these circumstances.

Technically, the zero-valued entries for a half-band filter can occur at the filter impulse responseextremities as shown in Figure 80. However, observe that these values do not contribute to the result.

Figure 76: Symmetric Impulse Response

radix=10;coefdata=30,-40,80,-100,-200,200,100,-80,40,-30;

Figure 77: Coefficient File for the Symmetric Impulse Response

Figure 78: 11-Tap Half-Band Filter Impulse Response

radix=10;coefdata=220,0,-375,0,1283,2047,1283,0,-375,0,220;

Figure 79: Coefficient File for the Half-Band Filter Impulse Response

200

-200

-100

10080

-40

30

-80

40

-30

2047

1283 1283

0

-375

0220

0

-375

2200


www.xilinx.com

FIR Compiler v3.2


This condition is detected when the filter is specified. If the number of taps is such that the zero-valuedcoefficients form the first and last entry of the impulse response, the filter length is reported as aninvalid value. The number of taps N for a half-band filter must obey N = 3 + 4n, where n=0,1,2,3,…. Forexample, a half-band filter can have 11,15,19 and 23 taps, but not 9, 13, 17 or 21 taps.

Hilbert Transform

The impulse response for a 10-term approximation to a Hilbert transformer is shown in Figure 81. Theodd-symmetry and zero-valued coefficients are both exploited to generate an efficient FPGA realiza-tion. The coefficient data file for the Hilbert transform must contain the zero-valued entries. For exam-ple, the COE file corresponding to Figure 81 is shown in Figure 82.

In practice, some optimization methods used for designing a Hilbert transform can lead to the presenceof small even-numbered coefficients. If the Hilbert Transform filter class is used in the filter compiler,these terms must be forced to zero by the user.

Just like the half-band filter, the zero-valued entries for a Hilbert transformer can occur at the filterimpulse response extremities. However, these values do not contribute to the result.

This condition is detected when the filter is specified. If the number of taps is such that the zero-valuedcoefficients form the first and last entry of the impulse response, the filter length is reported as an

Figure Top x-ref 4

Figure 80: 9-Tap Half-band Filter Impulse Response

Figure 81: Hilbert Transform - Impulse Response

radix=10;coefdata=-819,0,-1365,0,-4096,0,4096,0,1365,0,819;

Figure 82: Coefficient File for the Hilbert Transformer with the Impulse Response Shown in Figure 81

a3

2047

1283 1283

0

-375

0 0

-375

0

4096

1365

0

-1365

0 0819

0

-819

-4096

0


www.xilinx.com

FIR Compiler v3.2

64

invalid value. The number of taps N for a Hilbert transformer must obey N = 3 + 4n, where n=0,1,2,3,….For example, a Hilbert transform filter can have 11,15,19 and 23 taps, but not 9, 13, 17 or 21 taps.

Interpolated Filter

A previous section explained that an IFIR filter is similar to a conventional FIR, but with the unit delayoperator replaced by k-1 units of delay. k is referred to as the zero-packing factor. One way to realize thissubstitution is by the insertion of k-1 zeros between the coefficient values of a prototype filter. Whenspecifying an IFIR architecture, the full set of prototype coefficients are supplied in the coefficient file,without the zeros implied by the zero-packing factor. The zero-packing factor is defined through the fil-ter user interface. For example, consider the filter coefficient data in the COE file shown in Figure 83.

If a zero-packing factor of k=2 is specified, the equivalent filter impulse response is shown in Figure 84.

If the zero-packing factor is changed to k=3, the impulse response is as shown in Figure 85.

These examples use a symmetrical prototype impulse response; this is not a restriction of the filter core.The prototype filter coefficient set can be symmetrical, non-symmetrical, or negative-symmetric.

radix=10;coefdata=-200,1200,2047,1200,-200;

Figure 83: Prototype Coefficient Data for IFIR Example

Figure 84: Equivalent IFIR Impulse Response for the Coefficient Data Shown in Figure 83 with a Zero-Packing Factor k=2

Figure 85: Equivalent IFIR Impulse Response for the Coefficient Data Shown in Figure 83 with a Zero-Packing Factor k=3

2047

-200

12001200

0 00 0

-200

2047

0 00 0

-200

1200 1200

-200

00 0 0


www.xilinx.com

FIR Compiler v3.2


Multiple Coefficient Sets

For multiple coefficient filters, a single COE file is used to specify the coefficient sets. Each coefficientset should be appended to the previous set of coefficients.

For example, if a 2-coefficient set, 10-tap symmetric filter was being designed and coefficient set #0 was:coef data = -1, -2, -3, 4, 5, 5, 4, -3, -2, -1;

and coefficient set #1 was: coefdata = -9, -10, -11, 12, 13, 13, 12, -11, -10, -9;

then the COE file for the entire filter would be:

radix = 10;

coefdata = -1, -2, -3, 4, 5, 5, 4, -3, -2, -1, -9, -10, -11, 12, 13, 13, 12, -11, -10, -9;

All coefficients sets in a multiple set implementation must exhibit the same symmetry for symmetry tobe used reduce resource usage. For example, if even one set of a multi-set has non-symmetric coeffi-cient structure, then all sets are implemented using that structure. All coefficient sets must also be of thesame vector length. If one coefficient set has fewer coefficients, then either a non-symmetric structurewith appended zero padding must be used. A symmetric structure with equal prepended andappended zero padding can be also used, although in this case the padding will change the phaseresponse of the filter.

Coefficient Specification Using Real Numbers

As indicated previously, the user can specify the coefficient values as real numbers (specified to a min-imum of one decimal place), with the radix set to 10. The coefficients are then quantized by the core toproduce the binary coefficient values used in the filter, based on the user’s specified coefficient bitwidth. This allows the user to supply floating-point values derived from a chosen filter design tool andexplore the costs and benefits between performance and resource usage by altering the coefficient bit-width and observing the alteration in the quantified frequency response in comparison to the idealresponse.Real number coefficients are detected by the presence of a decimal point (to access the Quan-tization features with a COE specified purely in integers add “.0” to all the coefficients). The basicquantization function is selected by setting the Quantization field to “Quantize_Only.”

The user can also choose to scale the coefficients to utilize the full dynamic range provided by the coef-ficient bitwidth, by selecting the “Maximize Dynamic Range” option. If selected, this results in the filtercoefficients being scaled up by a common factor such that the largest coefficient (usually the centre tap)is equal to the maximum representable value using the chosen bitwidth, then quantized. For example,if a coefficient bitwidth of 10 is used, the maximum positive integer value is 511 and minimum negativeinteger value is -512 (assuming signed coefficients). If the coefficients in the COE file range between-12.34 and +13.88, then the required integer bitwidth is 5 bits (including the sign), therefore 5 bits areavailable for representing fractional values; the maximum value of 511 then represents 511/(25) =15.96875, and the minimum value of -512 represents -16. All coefficients will be scaled by the factor15.96875/13.88 = 1.1504863 (= +1.2176dB) prior to quantization. The overall scale factor is calculated asthe ratio of the sum of the scaled and quantized coefficients to the sum of the original (ideal) coeffi-cients, and presented (in dB) as part of the legend text on the filter response graph. The filter responseplot for the quantized coefficients is scaled down by the scale factor for easy comparison against theideal coefficients. The scale factor is also presented in the GUI Summary page.

Important Note: While an appreciable improvement in performance can be achieved by making use ofthe full dynamic range of the coefficient bitwidth, this is not always the case, and the user must satisfythemselves that any changes are acceptable via the frequency response plot. The user must also com-


www.xilinx.com

FIR Compiler v3.2

66

pensate appropriately for any additional gain introduced by coefficient scaling elsewhere in theirapplication system. It is often desirable to amalgamate gains inherent in a signal processing chain andcompensate or adjust for these gains either at the front end (e.g., in an Automatic Gain Control circuit)or the back end (e.g., in a Constellation Decoder unit) of the chain. If the user has no facility to compen-sate for the additional gain, Quantize Only should be chosen.

The integer values used in the filter implementation can be determined by examining the main coreMIF file (<corename>.mif) which is generated in the CORE Generator project directory. The MIF fileis always in binary format.

Resource Utilization TablesThis section provides indicative resource utilization figures, for example, filters in various families andusing both MAC- and DA-based architectures. To be concise, codes are used in these tables to indicateparticular configuration options; these are detailed below.

Control Structure Options

ND indicates flow control based on the use of the New Data input pin to validate input samples. Thiscontrol mode allows halting of data sample processing by holding ND Low. The core continues to pro-cess and generate outputs based on samples received thus far, but halts after all completed output sam-ples have been generated. This mode of control may or may not use a Clock Enable input.

CE indicates control based on Clock Enable only, with no ND input pin to validate input data samples.A new sample is taken from the input port automatically as soon as the RFD pin goes High. If CE isdeasserted, the core halts all processing immediately and freezes the state of the filter; no new outputdata samples are generated.

Rounding Style Options

The following rounding option codes are used in the resource utilization tables. Note that these optionsare only applicable to MAC-based filter implementation in Virtex-4, Virtex-5, and Spartan-3A DSPdevice families. Only a limited number of rounding examples are provided. See Table 4 in the sectionon Rounding Modes for a breakdown of the filter types and families that require an additional DSPslice for rounding.

Table 8: Rounding Style Options in Resource Utilization Tables

Table Entry Rounding Style

None Full Precision; no reduction in output sample width

a Truncation to input data sample width + 2

b Non-Symmetric Rounding Down, reducing to input data sample width + 2

c Non-Symmetric Rounding Up, reducing to input data sample width + 2

d Symmetric Rounding to Zero, reducing to input data sample width + 2

e Approximated Symmetric Rounding to Zero, reducing to input data sample width + 2

f Symmetric Rounding to Infinity, reducing to input data sample width + 2

g Approximated Symmetric Rounding to Infinity, reducing to input data sample width + 2

h Convergent Rounding to Even, reducing to input data sample width + 2

j Convergent Rounding to Odd, reducing to input data sample width + 2


www.xilinx.com

FIR Compiler v3.2


Resource Utilization for MAC-Based FIR Filters (Virtex-4)

Table 9 provides characterization data for several filter implementations in a Virtex-4 FPGA. Generally,the overall filter performance approaches or matches the DSP slice clock rating for the given devicespeed grade (such as 400 MHz in -10), with the exception of some fully parallel symmetric filters, whichcan suffer from routing congestion. Note that the Speed Optimization Goal setting is sometimesrequired to achieve full clock rate. The ND style control structure can be disadvantageous for fully par-allel filter implementations in terms of both resource utilization and clock rate performance, and this isparticularly so with multi-channel decimating filters. The CE structure, if suitable can be used in pref-erence with much improved results for both of these factors; an example of the differences is providedin the table below.

Table 9: MAC-Based FIR Resource Utilization in Virtex-4 FPGAs

Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Ro

un

din

g S

tyle

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

Slic

es

Clo

ck F

max

(M

Hz)

Single Rate 1 366 1 366 18 18 ND A 1 1 79 400

Single Rate 1 4 4 1 18 18 ND A 4 0 77 400

Single Rate 1 20 1 5 18 18 ND A 5 0 119 400

Single Rate 1 20 3 5 18 18 ND A 5 0 155 400

Single Rate 1 27 1 1 18 18 ND A 27 0 62 400

Single Rate 1 21 2 1 17 18 ND A 11 0 463 379

Single Rate 1 21 2 1 17 18 ND S 11 0 616 400

Decimation 6 34 1 3 16 16 ND A 1 0 131 399

Decimation 2 69 1 18 16 16 ND A 1 2 116 310

Decimation 2 69 1 18 16 16 ND S 1 2 143 400

Single Rate 1 19 6 1 16 16 ND A 10 0 683 392

Single Rate 1 19 6 1 16 16 ND S 10 0 815 400

Single Rate 1 32 1 32 16 16 ND A 1 0 85 400

Single Rate 1 32 1 4 16 16 ND A 9 0 151 400

Single Rate 1 32 1 1 16 16 ND A 32 0 64 400

Single Rate 1 32 1 32 16 16 ND A 1 0 97 399

Single Rate 1 32 1 4 16 16 ND A 5 0 203 392

Single Rate 1 32 1 1 16 16 ND A 16 0 324 337

Single Rate 1 32 1 1 16 16 ND S 16 0 474 392

Single Rate 1 32 3 4 16 16 ND A 9 0 167 400

Single Rate 1 32 3 1 16 16 ND A 32 0 318 400


www.xilinx.com

FIR Compiler v3.2

68

Single Rate 1 32 3 4 16 16 ND A 5 0 228 399

Single Rate 1 32 3 1 16 16 ND A 16 0 939 385

Single Rate 1 32 3 1 16 16 ND S 16 0 1149 399

Single Rate 1 31 3 4 16 16 ND A 3 0 198 399

Interpolation 5 32 1 20 16 16 ND A 3 0 96 400



Interpolation 5 61 3 5 16 16 ND S 8 2 440 400




Interpolation 5/3 64 3 10 16 16 ND A 4 0 187 399

Decimation 5 32 1 4 16 16 ND A 3 0 116 400

Decimation 5 32 3 4 16 16 ND A 3 2 178 392

Decimation 5 32 3 4 16 16 ND S 3 2 198 400

Decimation 5 64 3 1 16 16 ND A 8 0 738 324

Decimation 5 64 3 1 16 16 CE A 8 0 390 390

Decimation 5 64 3 1 16 16 ND S 8 0 850 338

Decimation 5 64 3 1 16 16 CE S 8 0 506 400

Decimation 5 64 3 4 16 16 ND A 3 2 253 324

Decimation 5 64 3 4 16 16 ND S 3 2 320 400

Decimation 5 64 3 13 16 16 ND A 1 2 220 330

Decimation 5 64 3 13 16 16 ND S 1 2 264 400

Decimation 2 31 1 3 16 16 ND A 5 0 277 399

Decimation 3/5 64 3 10 16 16 ND A 3 4 162 400

Interpolation 16 288 16 16 18 18 CE A 18 19 669 399




Table 9: MAC-Based FIR Resource Utilization in Virtex-4 FPGAs (Continued)

Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Ro

un

din

g S

tyle

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

Slic

es

Clo

ck F

max

(M

Hz)


www.xilinx.com

FIR Compiler v3.2


Resource Utilization for MAC-Based FIR Filters (Virtex-5)

Table 10 provides characterization data for the same filter implementations in a Virtex-5 FPGA. Gener-ally the overall filter performance is within 10% of the DSP slice clock rating for the given device speedgrade (e.g., 450 MHz in -1), and often reaches this clock rate (although the Speed setting may berequired to achieve this in some cases). Some fully parallel cases can be slower due to routing conges-tion. Note that Block RAM counts quoted are for 18k blocks, which will often be amalgamated intopairs for mapping to 36k locations where possible, therefore customers should bear this in mind if com-paring these values with map results for their particular configuration.



Single Rate 1 32 f 1 33 16 16 ND A 1 0 77 400

Single Rate 1 32 f 1 32 16 16 ND A 2 0 78 400

Single Rate 1 32 g 1 32 16 16 ND A 1 0 74 400

Single Rate 1 32 b 1 4 16 16 ND A 9 0 144 400

Notes:

1. Clock rates determined using a -10 speed grade2. Clocks per sample per channel uses the input sample rate as the basis for all filter types.3. Clock frequency does not take clock jitter into account and should be derated by an amount appropriate to the

clock source jitter specification.

Table 10: MAC-Based FIR Resource Utilization in Virtex-5 FPGAs

Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Rel

oad

able

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

LU

T-F

F p

airs

Clo

ck F

max

(M

Hz)

Single Rate 1 366 1 366 18 18 ND A 1 1 135 411

Single Rate 1 4 4 1 18 18 ND A 4 0 145 450

Single Rate 1 20 1 5 18 18 ND A 5 0 211 424

Single Rate 1 20 3 5 18 18 ND A 5 0 225 450

Single Rate 1 27 1 1 18 18 ND A 27 0 114 450


Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Ro

un

din

g S

tyle

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

Slic

es

Clo

ck F

max

(M

Hz)


www.xilinx.com

FIR Compiler v3.2

70

Single Rate 1 21 2 1 17 18 ND A 11 0 682 450

Decimation 6 34 1 3 16 16 ND A 1 0 194 450

Decimation 2 69 1 18 16 16 ND A 1 0 234 412

Single Rate 1 19 6 1 16 16 ND A 10 0 765 406

Single Rate 1 32 1 32 16 16 ND A 1 0 125 406

Single Rate 1 32 1 4 16 16 ND A 9 0 277 450

Single Rate 1 32 1 1 16 16 ND A 32 0 115 450

Single Rate 1 32 1 32 16 16 ND A 1 0 174 406

Single Rate 1 32 1 4 16 16 ND A 5 0 390 450

Single Rate 1 32 1 1 16 16 ND A 16 0 638 420

Single Rate 1 32 1 1 16 16 ND S 16 0 910 450

Single Rate 1 32 3 4 16 16 ND A 9 0 306 450

Single Rate 1 32 3 1 16 16 ND A 32 0 681 450

Single Rate 1 32 3 4 16 16 ND A 5 0 425 450

Single Rate 1 32 3 1 16 16 ND A 16 0 1131 447

Single Rate 1 31 3 4 16 16 ND A 3 0 299 450







Decimation 5 32 1 4 16 16 ND A 3 0 163 450

Decimation 5 32 3 4 16 16 ND A 3 0 355 430

Decimation 5 64 3 1 16 16 ND A 8 0 896 427

Decimation 5 64 3 4 16 16 ND A 3 0 517 406

Decimation 5 64 3 13 16 16 ND A 1 1 344 379

Decimation 5 64 3 13 16 16 ND S 1 1 429 406

Decimation 2 31 1 3 16 16 ND A 5 0 479 414


Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Rel

oad

able

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

LU

T-F

F p

airs

Clo

ck F

max

(M

Hz)


www.xilinx.com

FIR Compiler v3.2


Resource Utilization for MAC-Based FIR Filters (Spartan-3A DSP)

Table 11 provides characterization data for the same filter implementations in a Spartan-3A DSP FPGA.Generally the overall filter performance is within 10% of the DSP slice clock rating for the given devicespeed grade (e.g., 250 MHz in -4), and often reaches this clock rate (although the Speed setting may berequired to achieve this in some cases). Some fully parallel cases can be slower due to routing conges-tion.

Decimation 3/5 64 3 10 16 16 ND A 3 0 396 420







Single Rate 1 32 f 1 33 16 16 ND A 1 0 109 412

Single Rate 1 32 f 1 32 16 16 ND A 2 0 107 412

Single Rate 1 32 e 1 32 16 16 ND A 1 0 104 411

Single Rate 1 32 h 1 4 16 16 ND A 9 0 258 427

Notes: 1. Clock rates determined using a -1 speed grade2. Clocks per sample per channel uses the input sample rate as the basis for all filter types.3. Clock frequency does not take clock jitter into account and should be derated by an amount appropriate to the


Table 11: MAC-Based FIR Resource Utilization in Spartan-3A DSP FPGAs

Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Ro

un

din

g S

tyle

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

Slic

es

Clo

ck F

max

(M

Hz)

Single Rate 1 366 1 366 18 18 ND A 1 1 81 245

Single Rate 1 4 4 1 18 18 ND A 4 0 78 250

Single Rate 1 20 1 5 18 18 ND A 5 0 120 250


Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Rel

oad

able

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

LU

T-F

F p

airs

Clo

ck F

max

(M

Hz)


www.xilinx.com

FIR Compiler v3.2

72

Single Rate 1 20 3 5 18 18 ND A 5 0 156 250

Single Rate 1 27 1 1 18 18 ND A 27 0 194 250

Single Rate 1 21 2 1 17 18 ND A 11 0 356 249

Decimation 6 34 1 3 16 16 ND A 1 0 121 250

Decimation 2 69 1 18 16 16 ND A 1 2 111 249

Single Rate 1 19 6 1 16 16 ND A 10 0 490 250

Single Rate 1 32 1 32 16 16 ND A 1 0 90 250

Single Rate 1 32 1 4 16 16 ND A 9 0 159 250

Single Rate 1 32 1 1 16 16 ND A 32 0 172 250

Single Rate 1 32 1 32 16 16 ND A 1 0 91 250

Single Rate 1 32 1 4 16 16 ND A 5 0 168 250

Single Rate 1 32 1 1 16 16 ND A 16 0 181 250

Single Rate 1 32 3 4 16 16 ND A 9 0 168 249

Single Rate 1 32 3 1 16 16 ND A 32 0 459 250

Single Rate 1 32 3 4 16 16 ND A 5 0 192 250

Single Rate 1 32 3 1 16 16 ND A 16 0 653 250

Single Rate 1 31 3 4 16 16 ND A 3 0 179 249








Interpolation 5/3 64 3 10 16 16 ND S 4 0 202 243

Decimation 5 32 1 4 16 16 ND A 3 0 113 250

Decimation 5 32 3 4 16 16 ND A 3 2 180 250

Decimation 5 64 3 1 16 16 ND A 8 0 580 209

Decimation 5 64 3 4 16 16 ND A 3 2 236 250

Table 11: MAC-Based FIR Resource Utilization in Spartan-3A DSP FPGAs (Continued)

Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Ro

un

din

g S

tyle

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

Slic

es

Clo

ck F

max

(M

Hz)


www.xilinx.com

FIR Compiler v3.2


Resource Utilization for MAC-Based FIR Filters (Spartan-3A)

Table 12 provides characterization data for the same filter implementations in a Spartan-3A FPGA.Most cases present data for both the area and speed optimization goal settings. For families withEmbedded Multipliers, the speed setting results in the cascaded adder structure being pipelined, giv-ing a higher maximum clock frequency at the cost of an increased slice count. Similarly to families withDSP slices, some fully parallel cases can be slower due to routing congestion.

Decimation 5 64 3 13 16 16 ND A 1 2 212 250

Decimation 2 31 1 3 16 16 ND A 5 0 241 250

Decimation 3/5 64 3 10 16 16 ND A 3 4 166 249



Decimation 6 31 2 1 16 16 CE A 4 0 212 250


Single Rate 1 32 f 1 33 16 16 ND A 2 0 76 250

Single Rate 1 32 f 1 32 16 16 ND A 2 0 76 249

Single Rate 1 32 g 1 32 16 16 ND A 2 0 76 249

Single Rate 1 32 b 1 4 16 16 ND A 9 0 150 250

Notes: 1. Clock rates determined using a -4 speed grade2. Clocks per sample per channel uses the input sample rate as the basis for all filter types.3. Clock frequency does not take clock jitter into account and should be derated by an amount appropriate to the


Table 11: MAC-Based FIR Resource Utilization in Spartan-3A DSP FPGAs (Continued)

Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Ro

un

din

g S

tyle

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

Are

a/S

pee

d

DS

P48

Blo

ck R

AM

Slic

es

Clo

ck F

max

(M

Hz)


www.xilinx.com

FIR Compiler v3.2

74

Table 12: MAC-Based FIR Resource Utilization in Spartan-3A FPGAs

Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Ro

un

din

g S

tyle

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

MU

TL

18

Blo

ck R

AM

Slic

es(A

rea/

Sp

eed

)

Clo

ck F

max

(M

Hz)

(Are

a/S

pee

d)

Single Rate 1 366 1 366 18 18 ND 1 1 106/213 145/221

Single Rate 1 4 4 1 18 18 ND 4 0 173/336 133/227

Single Rate 1 20 1 5 18 18 ND 4 0 241/487 134/215

Single Rate 1 20 3 5 18 18 ND 4 0 276/522 129/217

Single Rate 1 27 1 1 18 18 ND 27 0 943/1661 118/214

Single Rate 1 21 2 1 17 18 ND 11 0 718/1202 127/216

Decimation 6 34 1 3 16 16 ND 1 0 158/273 143/229

Decimation 2 69 1 18 16 16 ND 1 2 143/272 142/229

Single Rate 1 19 6 1 16 16 ND 10 0 818/1246 127/216

Single Rate 1 32 1 32 16 16 ND 1 0 110/209 140/222

Single Rate 1 32 1 4 16 16 ND 8 0 369/700 114/209

Single Rate 1 32 1 1 16 16 ND 32 0 1078/1881 127/222

Single Rate 1 32 1 32 16 16 ND 1 0 126/238 127/222

Single Rate 1 32 1 4 16 16 ND 4 0 327/620 127/216

Single Rate 1 32 1 1 16 16 ND 16 0 708/1291 121/216

Single Rate 1 32 3 4 16 16 ND 8 0 384/716 127/216

Single Rate 1 32 3 1 16 16 ND 32 0 1250/2125 121/212

Single Rate 1 32 3 4 16 16 ND 4 0 349/643 127/216

Single Rate 1 32 3 1 16 16 ND 16 0 1180/1823 127/209

Single Rate 1 31 3 4 16 16 ND 3 0 303/546 121/174

Interpolation 5 32 1 20 16 16 ND 2 0 172/367 134/209


Interpolation 5 61 3 5 16 16 ND 7 2 632 127

Interpolation 5 61 3 20 16 16 ND 2 2 430 121


Interpolation 5/3 64 3 10 16 16 ND 3 0 286/513 127/216

Decimation 5 32 1 4 16 16 ND 2 0 188/385 134/222

Decimation 5 32 3 4 16 16 ND 2 2 300/521 126/173

Decimation 5 64 3 1 16 16 ND 7 0 896/1319 127/174


www.xilinx.com

FIR Compiler v3.2


Decimation 5 64 3 4 16 16 ND 2 2 376/646 127/167

Decimation 5 64 3 13 16 16 ND 1 2 296/445 121/167

Decimation 2 31 1 3 16 16 ND 5 0 430/742 125/172

Decimation 3/5 64 3 10 16 16 ND 2 4 287/489 125/172

Interpolation 16 288 8 24 16 16 CE 18 18 968 121


Decimation 6 31 2 1 16 16 CE 3 0 384 108


Single Rate 1 32 f 1 33 16 16 ND 1 0 134 159

Single Rate 1 32 f 1 32 16 16 ND 1 0 133 162

Single Rate 1 32 g 1 32 16 16 ND 1 0 133 162

Single Rate 1 32 b 1 4 16 16 ND 8 0 321 145

Notes: 1. Clock rates determined using a -4 speed grade.2. Clocks per sample per channel uses the input sample rate as the basis for all filter types.3. Clock frequency does not take clock jitter into account and should be derated by an amount appropriate to the


Table 12: MAC-Based FIR Resource Utilization in Spartan-3A FPGAs (Continued)

Filter Type R

ate

# C

oef

fici

ents

Sym

met

ric

Hal

f-b

and

Rel

oad

able

Ro

un

din

g S

tyle

Ch

ann

els

Clo

cks/

Sam

ple

/C

han

nel

Inp

ut

Wid

th

Co

effi

cien

t W

idth

Co

ntr

ol S

tru

ctu

re

MU

TL

18

Blo

ck R

AM

Slic

es(A

rea/

Sp

eed

)

Clo

ck F

max

(M

Hz)

(Are

a/S

pee

d)


www.xilinx.com

FIR Compiler v3.2

76

Resource Utilization for DA-Based FIR Filters

The logic utilization for a filter is a function of the filter length, coefficient precision, coefficient symme-try, and input data precision. Table 13 through Table 17 provide logic resource requirements for a num-ber of serial (SDA) filter configurations, while Table 18 shows resources required by parallel (PDA)filters with several different levels of parallelism. Table 13 shows the logic slice utilization for severalFIR Filter Configurations: 10-Bit Filter Coefficients; Filter Coefficient Optimization Off; Single-Channel;Signed Input; Signed Coefficients; and Unregistered Output.

Table 14 shows the Virtex logic slice utilization for the following FIR filter configurations: 12-Bit FilterCoefficients, Filter Coefficient Optimization Off, Single-Channel, Signed Input, Signed Coefficients,and Unregistered Output.

Table 13: Virtex Logic Slice Utilization - FIR Filter Configurations

Filter Length SymmetryInput Sample Precision

4-bit 8-bit 12-bit 16-bit 32-bit

4Symmetric 31 34 41 43 66

Non-symmetric 29 33 36 43 67

8Symmetric 36 38 44 49 72


32Symmetric 103 108 113 117 157

Non-symmetric 141 146 151 154 196

80Symmetric 247 251 255 261 332

Non-symmetric 363 369 373 376 454

128Symmetric 370 377 380 358 493

Non-symmetric 532 536 537 543 646

256 Symmetric 731 747 740 749 940

Table 14: Virtex Logic Slice Utilization - Additional FIR Filter Configurations



4Symmetric 34 35 41 47 69


8Symmetric 36 41 45 52 75


32Symmetric 111 114 118 125 166

Non-symmetric 160 161 168 173 214

80Symmetric 268 273 277 279 353

Non-symmetric 408 414 413 424 498


www.xilinx.com

FIR Compiler v3.2


Table 15 shows the Virtex slice utilization for several Half=band filter configurations, including 14-BitFilter Coefficients, Filter Coefficient Optimization Off, Single-Channel, Signed Input, Signed Coeffi-cients, and Unregistered Output.

Table 16 shows the Virtex logic slice utilization for several Hilbert Transformer Configurations, includ-ing 14-Bit Filter Coefficients, Filter Coefficient Optimization Off, Single-Channel, Signed Input, SignedCoefficients, and Unregistered Output.

Table 17 shows Virtex logic slice utilization for several interpolated figure configurations, including16-Bit Filter Coefficients, Filter Coefficient Optimization Off, Single-Channel, Signed Input, SignedCoefficients, and Unregistered Output. The zero packing factor is 4.

128Symmetric 402 415 417 421 521

Non-symmetric 595 601 599 607 718

256 Symmetric 797 806 819 810 1003

Table 15: Virtex Logic Slice Utilization for Half-Band Filter Configurations



7 Symmetric 38 42 47 53 77

31 Symmetric 84 96 100 104 147

79 Symmetric 171 194 203 206 274

Table 16: Virtex Logic Slice Utilization for Hilbert Transformer Configurations

Filter Length

SymmetryInput Sample Precision


7 Odd 41 49 57 66 99

31 Odd 75 88 96 104 157

79 Odd 158 187 198 204 289

Table 17: Virtex Logic Slice Utilization for Several Interpolated Filter Configurations

Filter Length

SymmetryInput Sample Precision


8Symmetric 44 54 63 69 107


32Symmetrical 146 170 198 201 303

Non-symmetric 189 214 239 264 366

80Symmetrical 359 410 474 477 705

Non-symmetric 488 550 609 668 897

Table 14: Virtex Logic Slice Utilization (Continued)- Additional FIR Filter Configurations (Continued)




www.xilinx.com

FIR Compiler v3.2

78

Table 18 shows the Virtex logic slice utilization for several PDA FIR filter configurations, including12-Bit Filter Coefficients and Input Data, 60-Taps, Filter Coefficient Optimization Of, Single-Channel,Signed Input, Signed Coefficients, Unregistered Output, and Non-symmetrical Impulse Response. Fil-ter master clock frequency is 150 MHz.

References1. Peled and B. Liu, A New Hardware Realization of Digital Filters, IEEE Trans. on Acoust., Speech,

Signal Processing, vol. ASSP-22, pp. 456-462, Dec. 1974.2. S. A. White, Applications of Distributed Arithmetic to Digital Signal Processing, IEEE ASSP Magazine,

Vol. 6(3), pp. 4-19, July 1989.3. Xilinx Inc., Xilinx Product Guide, Xilinx Inc., San Jose California, 1999.4. P.P. Vaidyanathan, Multi-Rate Systems and Filter Banks, Prentice Hall, Englewood Cliffs, New Jersey,

1993.5. M. E. Frerking, Digital Signal Processing in Communication Systems, Van Nostrand Reinhold, New

York, 1994.6. C. H. Dick, Implementing Area Optimized Narrow-Band FIR Filters Using Xilinx FPGAs, SPIE

International Symposium on Voice, Video and Data Communications—Configurable Computing: Technology an Applications Stream, Boston, Massachusetts USA, pp. 227-238, Nov 1-6, 1998. Also available at: www.xilinx.com/products/logicore/coredocs.htm

7. Xilinx Inc., XtremeDSP Design Manual, Xilinx Inc., San Jose California, 2004.

Support Xilinx provides technical support for this LogiCORE IP when used as described in the product docu-mentation. Xilinx cannot guarantee timing, functionality, or support of product if implemented indevices that are not defined in the documentation, if customized beyond that allowed in the productdocumentation, or if changes are made to any section of the design labeled DO NOT MODIFY.

Table 18: Virtex Logic Slice Utilization for Several PDA FIR Filter Configurations

Number of Clock Cycles per Output Sample

Slice Count Filter Sample Rate1 (MHz)

1.

1 3072 150

2 1571 75

3 994 50

4 802 37.5

6 551 25

12 268 12.5

Note:1. The filter sample rate is not at all dependent on the number of filter taps.


www.xilinx.com

www.xilinx.com/products/logicore/coredocs.htm

FIR Compiler v3.2


Ordering InformationThis core can be downloaded from the Xilinx IP Center for use with the Xilinx CORE Generator soft-ware v9.2i and higher. The CORE Generator software is bundled with the Xilinx Foundation™ seriessoftware packages, at no additional charge.

To order software, visit the Xilinx Online Store or contact your local Xilinx sales representative.

Information on additional Xilinx LogiCORE IP modules is available on the Xilinx IP Center.

Revision HistoryThe following table shows the revision history for this document.

Notice of DisclaimerXilinx is providing this design, code, or information (collectively, the “Information”) to you “AS-IS” with nowarranty of any kind, express or implied. Xilinx makes no representation that the Information, or any particularimplementation thereof, is free from any claims of infringement. You are responsible for obtaining any rights youmay require for any implementation based on the Information. XILINX EXPRESSLY DISCLAIMS ANYWARRANTY WHATSOEVER WITH RESPECT TO THE ADEQUACY OF THE INFORMATION OR ANYIMPLEMENTATION BASED THEREON, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES ORREPRESENTATIONS THAT THIS IMPLEMENTATION IS FREE FROM CLAIMS OF INFRINGEMENT ANDANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Exceptas stated herein, none of the Information may be copied, reproduced, distributed, republished, downloaded,displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical,photocopying, recording, or otherwise, without the prior written consent of Xilinx.

Date Version Revision

01/18/06 1.0 Initial release.

09/28/06 2.0 Updated for v2.0 core, including Virtex-5 family support and additional features.

02/15/07 3.0 Updated for v3.0 core.

04/02/07 3.1 Added support for Spartan-3A DSP devices.

08/08/07 3.2 Added Spartan-3A DSP resource tables, Bit Growth, and Rounding Mode sections.

10/10/07 3.3 Added full feature support for Virtex and Spartan families with Embedded Multipliers.


www.xilinx.com

http://www.xilinx.com/xlnx/xil_prodcat_landingpage.jsp?title=Intellectual+Property

http://www.xilinx.com/xlnx/xebiz/onlinestore.jsp?sGlobalNavPick=PURCHASE&sSecondaryNavPick=PARTNERS&iLanguageID=1&sGlobalNavPick=PURCHASE

http://www.xilinx.com/company/contact.htm

http://www.xilinx.com/xlnx/xil_prodcat_landingpage.jsp?title=Intellectual+Property

Documents

DS534, FIR Compiler v3 - Oregon State Universityweb.engr.oregonstate.edu/~tavakola/Data Sheets/fir_compiler_ds534.pdf · For MAC-based FIR filter imple- ... Figure 2 shows the schematic