FIR Filter - capital-micro.com · The conventional single-rate FIR version of the core ... CME FIR filter IP is designed based on time ... MAC_AMOUNT 1-16 The amount of MAC the FIR

FIR Filter

User Guide

06/2014

Capital Microelectronics, Inc.

China

User Guide of FIR Filter

http://www.capital-micro.com 2

Contents

1 Introduction ............................................................................................................................. 3

2 FIR Filter IP Overview ............................................................................................................. 4

2.1 FIR Filter Theory Introduction ............................................................................................ 4

2.2 FIR Filter IP Diagram ......................................................................................................... 4

2.3 Pin Description .................................................................................................................. 5

2.4 Parameter Description ....................................................................................................... 5

2.5 Computation Detail Introduction ........................................................................................ 6

2.5.1 Single-Rate ....................................................................................................................... 6

2.5.2 Decimator ........................................................................................................................ 7

2.5.3 Interpolator ..................................................................................................................... 7

3 FIR Filter IP Usage .................................................................................................................. 9

3.1 Coefficient File Format ...................................................................................................... 9

3.2 Coefficient File Generation ................................................................................................ 9

3.3 Maximum Input Data Rate ............................................................................................... 11

3.4 FIR IP Wizard Usage Guideline ...................................................................................... 11

3.5 Output Data Width ........................................................................................................... 12

3.6 Interface Timing ............................................................................................................... 12

3.7 Resource Usage and Performance Analysis .................................................................. 12

4 Generate File Directory Structure ....................................................................................... 14

4.1 File Directory Structure for M5 Family Device ................................................................. 14

4.2 File Directory Structure for M7 Family Device ................................................................. 15

5 Revision History .................................................................................................................... 17



1 Introduction This document mainly describes the usage of FIR(finite impulse response) Filter IP. The FIR Filter IP

supports the following features:

Multiply-Accumulate(MAC) architecture

Filter order 4-2048

Single or multiple(up to 16) MAC engines used to achieve specified filter performance

Only signed input data and coefficient(complement code) supported, output data complement

code

Input data precision 9-16 bit

Filter coefficients precision 9-16 bit

Support different input/output data throughout

Single-rate filter

Poly-phase interpolator(interpolation rate value 2-16)

Poly-phase decimator(decimation rate value 2-16)

The IP can work well up to 100MHz(clock rate)

Support Device: CME M5 & M7

Basic requirements:

For IP performance optimization consideration

1、 All IP inputs need to be registered

2、 All IP outputs need to be registered

3、 Add pipe line to get better performance



2 FIR Filter IP Overview

2.1 FIR Filter Theory Introduction

The conventional single-rate FIR version of the core computes the convolution sum defined in the

following equation, where N is the number of filter coefficients.

Figure 2-1 illustrates the conventional tapped delay line realization of this inner-product calculation.

However, because this architecture utilizes a large amount of LUTs and REGs when the filter order is a big

number, CME FIR filter IP is designed based on time-shared MAC. For this architecture, data to be filtered

and coefficient are stored in RAMs, one or more time-shared multiply-accumulate (MAC) functional units

are used to service the N sum-of-product calculations in the filter. The core automatically determines the

minimum number of MAC engines required to meet user-specified throughput.

Figure 2-1 Conventional Tapped Delay Line FIR Filter Representation

2.2 FIR Filter IP Diagram

din

Control

CoefficientRAM Array

DataRAM Array

MultiplierArray

WizardInitialization

PipelineAdder

dout

din_en

dout_en

Figure 2-2 FIR Filter IP Block Diagram

The FIR filter IP core includes 4 sub modules which are illustrated in figure 2-2. Detailed introduction

are as below.



Control Module

The module is used to control the access of coefficient and data and send them into multiplier

array. Subsequently it gets multiple multiplier results into pipeline adder and outputs the filter

result.

Multiplier Array

It consists of multiple MAC(hard IP). The amount of MAC used for filter is set by the user.

Pipeline Adder

Multiple adders are used for computing the sum of all multiplier result. Pipeline architecture is

exploited to improving timing.

Coefficient RAM Array & Data RAM Array

Multiple RAM are used to store coefficient and data. The amount of RAM is equal to that of MAC.

For each RAM, the first part stores user data while the second part stores coefficient. The data

RAM has 2 ports, of which one is for writing new data and the other for reading old data for

multiplication. The coefficient is initialized when FIR filter IP is generated and can only be read out

when the IP works.

2.3 Pin Description

Table 2-1 FIR Filter IP Interface

Interface Name Direction Width Description

System clk Input 1 clock input

rst_n Input 1 reset input, low active

User

interface

din(signed) Input DIN_WIDTH1 data input

din_en Input 1 data input enable

dout(signed) Output DOUT_WIDTH data output

dout_en Output 1 data output enable

Note 1: DIN_WIDTH is configured by the user. The greatest value the IP supports is 16.

2.4 Parameter Description

Table 2-2 FIR Filter IP Parameter

Parameter Valid value Description

DATA_WIDTH 9-16 Default is 16, only signed data is accepted.

COEFFICIENT_WIDTH 9-16 Default is 16, only signed data is accepted.

FILTER_ORDER 4-2048 Order of filter, equal to the amount of

coefficient.

MAC_AMOUNT 1-16 The amount of MAC the FIR filter used

RAM_AMOUNT 1-16 The amount of RAM group the FIR filter used,

one RAM group contains two EMB5K.

RAM_AMOUNT is equal to MAC_AMOUNT

DATA_L_WIDTH 8 Lower 8 bit of data is stored in the lower part

of a RAM group. It is set to 8 by internal logic.

COEF_L_WIDTH 8 Lower 8 bit of coefficient is stored in the lower



part of a RAM group. It is set to 8 by internal

logic.

LOG2_RAM_AMOUNT 0-4 Log2(RAM_AMOUNT), which is to configure bit

width of internal logic.

EACH_RAM_DEPTH 1-256 The amount of data and coefficient each RAM

group hold

LOG2_ EACH_RAM_DEPTH 0-8 Log2(EACH_RAM_DEPTH), which is to

configure bit width of internal logic.

FILTER_TYPE 0,1,2 0: Single-rate filter

1: Decimation filter

2: Interpolation filter

DECI_RATE 2-16 Decimation rate

INTER_INTERVAL 2-256 Interpolation interval

2.5 Computation Detail Introduction

The FIR filter IP supports single-rate, decimator and interpolator. For single-rate, the output data rate

fout is equal to the input data rate fin. For interpolator, the output data rate fout is multiple times greater

than the input data rate fin(fout=M·fin). For decimator, the input data rate fin is multiple times greater

than the output data rate fout (fin=M·fout). M must be an integer. The ratio with a fractional part is not

supported.

2.5.1 Single-Rate

Figure 2-3 illustrates the data store structure and computation flow of a single-rate filter using 8 MAC

and 8 RAMs. In this example, the filter order is 8M and each RAM stores M data. Input data first fills

Data0 of RAM0, then Data1, Data2, …., Data(M-1). After RAM0 is full, then RAM1 is filled like RAM0.

When Data(8M-1) of RAM7 is filled, next data is written into Data0 of RAM0.

For filter result in single-rate application, the output data rate is equal to input data rate. Showed in

figure 2-3, to get output data 1, data vector(data0, data1, …., data(8M-1)) gets into the

multiply-accumulator. To get next output data 2, data vector(data 1, data2, …., data(8M-1), new data0)

gets into the multiply-accumulator. To get output data 3, data vector(data 2, data3, …., data(8M-1), new

data0, new data1) gets into the multiply-accumulator. And output data 4, 5, 6, …. are obtained in this

way.



Data0

Data1

Data2

Data3

………

Data(M-1)

Data4

Data5

Data6

Data7

Data(M)

Data(M+1)

Data(M+2)

Data(M+3)

………

Data(2M-1)

Data(M+4)

Data(M+5)

Data(M+6)

Data(M+7)

Data(7M)

Data(7M+1)

Data(7M+2)

Data(7M+3)

………

Data(8M-1)

Data(7M+4)

Data(7M+5)

Data(7M+6)

Data(7M+7)

Output data 1 starting point

RAM0 RAM1 RAM7

……………



……

……

…

Figure 2-3 Data Store Structure and Computation Flow(Single-Rate ,8 MAC)

2.5.2 Decimator

Figure 2-4 illustrates the data store structure and computation flow of a decimating filter using 8 MAC

and 8 RAMs. The data store structure is the same as that in single-rate application but the output data

rate is one third of input data rate.

Showed in figure 2-4, to get output data 1, data vector(data0, data1, …., data(8M-1)) gets into the

multiply-accumulator. To get next output data 2, data vector(data 3, data4, …., data(8M-1), new data0,

new data1, new data2) gets into the multiply-accumulator. To get output data 3, data vector(data 6,

data7, …., data(8M-1), new data0, new data1, …., new data5) gets into the multiply-accumulator. And

output data 4, 5, 6, …. are obtained in this way.

Data0

Data1

Data2

Data3

………

Data(M-1)

Data4

Data5

Data6

Data7

Data(M)

Data(M+1)

Data(M+2)

Data(M+3)

………

Data(2M-1)

Data(M+4)

Data(M+5)

Data(M+6)

Data(M+7)

Data(7M)

Data(7M+1)

Data(7M+2)

Data(7M+3)

………

Data(8M-1)

Data(7M+4)

Data(7M+5)

Data(7M+6)

Data(7M+7)


RAM0 RAM1 RAM7

……………



……………

Figure 2-4 Data Store Structure and Computation Flow(Decimator ,8 MAC)

2.5.3 Interpolator

Figure 2-5 illustrates the data store structure and computation flow of a interpolating filter using 8 MAC

and 8 RAMs. In this example, the interpolating ratio is 3, which means the output data rate is 3 times

greater than input data rate.



Observed from figure 2-5, the data store structure of interpolator is different from that of single-rate

and decimator. Data0, data1, data2, …. are from external pins of the IP, and two 0 are inserted between

adjacent input data by control module.

To get output data 1, data vector(data0, 0, 0, data1, …., data(7M/3), …., 0) gets into the

multiply-accumulator. To get next output data 2, data vector(0, 0, data1, 0, 0, data2, …., data(7M/3), ….,

0, new data0) gets into the multiply-accumulator. To get output data 3, data vector(0, data1, 0, 0,

data2, …., data(7M/3), …., 0, new data0, 0) gets into the multiply-accumulator. And output data 4, 5,

6, …. are obtained in this way.

Data0

0

0

Data1

………

0

0

0

Data2

0

Data(M/3)

0

0

Data(M/3+1)

………

0

0

0

Data(M/3+2)

0

Data(7M/3)

0

0

Data(7M/3+1)

………

0

0

0

Data(7M/3+2)

0


RAM0 RAM1 RAM7

……………



……………

Figure 2-5 Data Store Structure and Computation Flow(Interpolator ,8 MAC)



3 FIR Filter IP Usage The purpose of this document is introducing how to use the FIR filter IP core. The theory and methods

on how to design a filter are excluded from this document. However, here make a brief introduction

about FDATool in MATLAB so that users can generate a set of coefficient satisfying system requirement

and transform it to the specific data format for the FIR filter IP.

3.1 Coefficient File Format

To utilize FIR Filter IP, users must import a dat coefficient file and data in the file must be arranged in

specific format. The requirement is described as below:

1. Decimal complement code. If the COEFFICIENT_WIDTH is 16, then the number between 32768

to 65535 is positive number and that between 0 to 32767 is positive number. If the

COEFFICIENT_WIDTH is 12, then the number between 2048 to 4095 is positive number and that

between 0 to 2047 is positive number. The similar rule applies to other COEFFICIENT_WIDTH

value.

2. Each line contains only one number. Users must not add blank space or others symbols behind

the number.

3. The filter order must be equal to the number amount in the coefficient file.

Caution: If the coefficient file format does not meet the requirement above, the FIR filter IP may fail to

work or the result is not right.

3.2 Coefficient File Generation

For filter design, a toolbox called FDATool in MATLAB is widely used. The graphic user interface is

showed in figure 3-1.



Figure 3-1 FDATool GUI

1. Select filter type(lowpass, highpass, bandpass, etc);

2. Select design method;

3. Select frequency specification(Fs, Fpass, Fstop);

4. Select magnitude specification(Apass, Astop);

5. Press the button “Design Filter”, then the magnitude response is showed;

6. File->Output, output coefficient to workspace of MATLAB(variable coef_num); in this example, the

order of filter is 100. The coef_num(only former 12 coefficient) is as below:

Figure 3-2 coef_num value in MATLAB

7. The FIR IP’s coefficient initialization file accepts only decimal complement code, so uses should carry

out the conversion(from positive and negative digit with fractional part to decimal complement

code). The following is a set of MATLAB code. With its help, coefficients can be transformed to the

data format which is required by FIR IP and stored in user_input.dat(file name is defined by the

user).

close all;

clc;

delete('*.dat');

FILTER_ORDER = 100;

COEF_WIDTH = 16;

coef_num_t = transpose(coef_num);

coef_num_t_int = round(2^(COEF_WIDTH-1)*coef_num_t);

for i= 1:FILTER_ORDER



if coef_num_t_int(i,1) < 0

coef_num_t_int(i,1) = 2^COEF_WIDTH + coef_num_t_int(i,1);

end

end

fid = fopen('user_input.dat','wt');

for i=1:FILTER_ORDER

p=coef_num_t_int(i,1);

fprintf(fid,'%d\n',p);

end

fclose(fid);

3.3 Maximum Input Data Rate

As for the maximum input data rate(fINPUT), it is decided by filter order(NORDER), amount of MAC(NMAC),

clock rate(fCLOCK) and decimating/interpolating rate(K). The detail is as below.

For single-rate: fINPUT= fCLOCK* NMAC/ NORDER

For decimator: fINPUT= fCLOCK* NMAC*K/ NORDER

For interpolator: fINPUT= fCLOCK* NMAC/ (NORDER*K)

3.4 FIR IP Wizard Usage Guideline

Figure 3-3 Coefficient RAM Organization

Figure 3-3 illustrates the parameters that users should set when utilizing a FIR IP. The following makes a

detailed introduction on all parameters.

System clock frequency: Working frequency of FIR IP(clock connected to “clk” pin)

Input sampling frequency: Sampling frequency of data to be filtered(signal connected to “din_en” pin)

Filter order: Order of filter(the amount of coefficient)



Filter type: Three filter types are supported(Single-rate FIR, interpolation FIR, decimation FIR)

Decimation rate value: The value provided in this field defines the down-sampling factor

Interpolation rate value: The value provided in this field defines the up-sampling factor

Data width: The bit precision of the data to be filtered

Coefficient width: The bit precision of the filter coefficients

Coefficient file: Coefficient file name. This is the file of filter coefficients. The file has a .dat extension and

can be generated in accordance with section 3.1

MAC count: The amount of MAC the FIR IP uses

EMB5K count: The amount of EMB5K the FIR IP uses

3.5 Output Data Width

In CME FIR Filter IP, the output data width is a fixed value 40, which is to utilized the MAC bit width for full

precision output. For user design, the power of input signal must be equal to that of output signal. So, bit

width truncation is needed. The truncation must follow the following rule.

1) Remove (COEFFICIENT_WIDTH-1) LSB, with verilog, it is

assign dout_temp = dout[39:(COEFFICIENT_WIDTH-1)]

2) For dout_temp obtained in step 1, keep the left bit(MSB) as the sign bit, then keep its

(DATA_WIDTH-1) LSB, with verilog, it is

assign dout_result = {dout_temp[39], dout_temp[(DATA_WIDTH-1):0]

dout_result is the ultimate output signal, the power of which is equal to that of din.

3.6 Interface Timing

Figure 3-4 illustrates input and output data timing. When data(din, usually from AD) gets into the FIR

filter IP, din_en must be asserted. After processing, the IP asserts dout_en to indicate that output data

is available.

Note: The din_en must last for only one clock cycle, or else the IP core will generate wrong output

result.

Figure 3-4 Input/Output Data Timing

3.7 Resource Usage and Performance Analysis

The resource usage of FIR IP fully depends on the parameters users set, so it is difficult to list all the

resource usage corresponding to different parameters . To help users evaluate the resource usage of FIR

IP with their parameter. there lists a few typical data in table 3-1.



Table 3-1 Resource Usage and Performance

Filter Type Filter Order Clock/Sample LUTs REGs EMB5K MAC Performance

Single-Rate 100 25 480 360 8 4 110M

Single-Rate 100 50 340 310 4 2 120M

Single-Rate 200 20 1820 760 20 10 100M

Single-Rate 200 50 470 360 8 4 110M

Decimator 150 15 1820 770 20 10 100M

Decimator 150 30 730 490 10 5 110M

Decimator 240 20 2280 850 24 12 100M

Decimator 240 60 610 370 8 4 120M

Interpolator 80 40 1230 620 16 8 110M

Interpolator 80 120 360 320 4 2 120M

Interpolator 160 40 1820 780 20 10 100M

Interpolator 160 80 740 500 10 5 110M

Note: The above result is tested based on CME M5. Because the architecture of M5 and M7 is the same,

so the resource usage of FIR IP in M5 is similar to that in M7.



4 Generate File Directory Structure

4.1 File Directory Structure for M5 Family Device

The FIR filter IP wizard generated file(M5) includes: source files (src), simulation files(sim), document

and example design files. The detailed design directory structure is as below.

Project

src outputs ip_core

fir_inst.v

(define by user)

fir_top.v

control.v

ram_array.v

simsrc doc example

CME3_sim.v

fir_top_tb

_modelsim.f

fir_top_tb.do

*.vp

(Protected RTL)

CME_FIR

_user_guide.pdffir_example.zip

CME_FIR_example

_user_guide.pdf

fir_v1

src_vp

multiplier_m5.v

cme_ip_emb_m5.v

fir_top_tb.v

mac_18x18_m5.v

M5

= directory

= source RTL code

= simulation related files

= documentation

Figure 4-1 IP wizard generated file directory structure(M5)

Table 4-1 Generated File Directory structure(M5)

Directory Description

/src

fir_inst.v FIR instance file generated by Primace wizard

/ip_core/fir_v1/src Directory for project source

cme_ip_emb_m5.v CME M5 EMB module

control.v Control module (Encrypted)

fir_top.v FIR IP top module (Encrypted)

mac_18x18_m5.v CME M5 MAC IP

multiplier_m5.v CME M5 data multiply coefficient module(Encrypted)

ram_array.v Coefficient and data emb module(Encrypted)



/ip_core/fir_v1/sim

/M5/fir_top_tb.v FIR top test bench

/M5/fir_top_tb.do Do script for ModelSim to run simulation

/M5/fir_top_tb_modelsim.f File list for ModelSim simulation

CME3_sim.v M5 simulation library

/src_vp

*.vp ModelSim encrypted source code

/ip_core/fir_v1/doc

CME_FIR_user_guide

_EN01.pdf

FIR IP user guide

/ip_core/fir_v1/example

CME_FIR_example_user

_guide_EN01.pdf

FIR IP example user guide

fir_example.zip Example project

4.2 File Directory Structure for M7 Family Device

The FIR filter IP wizard generated file(M7) includes: source files (src), simulation files(sim), document

and example design files. The detailed design directory structure is as below.

Project

src outputs ip_core

fir_inst.v

(define by user)

fir_top.v

control.v

ram_array.v

simsrc doc example

m7s_sim.v

fir_top_tb

_modelsim.f

fir_top_tb.do

*.vp

(Protected RTL)

CME_FIR

_user_guide.pdf

CME_FIR_example

_user_guide.pdf

= directory

= source RTL code

= simulation related files

= documentation

fir_v1

src_vp

multiplier_m7.v

cme_ip_emb_m7.v

fir_top_tb.v

mac_18x18_m7.v

M7

Figure 4-2 IP wizard generated file directory structure(M7)



Table 4-2 Generated File Directory structure(M7)

Directory Description

/src

fir_inst.v FIR instance file generated by Primace wizard

/ip_core/fir_v1//src Directory for project source

cme_ip_emb_m7.v CME M7 EMB module

control.v Control module (Encrypted)

fir_top.v FIR IP top module (Encrypted)

mac_18x18_m7.v CME M7 MAC IP

multiplier_m7.v CME M7 data multiply coefficient module(Encrypted)

ram_array.v Coefficient and data emb module(Encrypted)

/ip_core/fir_v1//sim

/M7/fir_top_tb.v FIR top test bench.

/M7/fir_top_tb.do Do script for ModelSim to run simulation

/M7/fir_top_tb_modelsim.f File list for ModelSim simulation

m7s_sim.v M7 simulation library

/src_vp

*.vp ModelSim encrypted source code

/ip_core/fir_v1//doc

CME_FIR_user_guide

_EN01.pdf

FIR IP user guide

/ip_core/fir_v1//example

CME_FIR_example_user

_guide_EN01.pdf

FIR IP example user guide

Note: For M7, the usage of FIR Filter IP is the same to that for M5, so users can refer to the

fir_example.zip and generate the IP according to the flow in M5 example project.



5 Revision History

Revision Date Comments

1.0 2014-06-25 Initial release

Documents

FIR Filter - capital-micro.com · The conventional single-rate FIR version of the core ... CME FIR filter IP is designed based on time ... MAC_AMOUNT 1-16 The amount of MAC the FIR