13
1 Exact and Approximate Algorithms for the Filter Design Optimization Problem Levent Aksoy, Member, IEEE, Paulo Flores, Senior Member, IEEE, and Jos´ e Monteiro, Senior Member, IEEE Abstract—The filter design optimization (FDO) problem is defined as finding a set of filter coefficients which yields a filter design with minimum complexity, satisfying the filter constraints. It has received a tremendous interest due to the widespread application of filters. Assuming that the coefficient multiplications in the filter design are realized under a shift-adds architecture, the complexity is generally defined in terms of the total number of adders and subtractors. In this article, we present an exact FDO algorithm that can guarantee the minimum design complexity under the minimum quantization value, but can only be applied to filters with a small number of coefficients. We also introduce an approximate algorithm that can handle filters with a large number of coefficients using less computational resources than the exact FDO algorithm and find better solutions than existing FDO heuristics. We describe how these algorithms can be modified to handle a delay constraint in the shift-adds designs of the multiplier blocks and to target different filter constraints and filter forms. Experimental results show the effectiveness of the proposed algorithms with respect to prominent FDO algorithms and explore the impact of design parameters, such as the filter length, quantization value, and filter form, on the complexity and performance of filter designs. Index Terms—Finite impulse response filters, direct and trans- posed forms, filter design optimization problem, multiplierless design, depth-first and local search methods, delay reduction. EDICS—DSP-FILT Filter design and structures, DIS-HDSP DSP algorithm implementation in hardware and software. I. I NTRODUCTION Digital filtering is a ubiquitous operation in digital signal processing (DSP) applications and is realized using infinite impulse response (IIR) or finite impulse response (FIR) filters. Although an FIR filter requires a larger number of coefficients than an equivalent IIR filter, it is preferred to the IIR filter due to its stability and phase linearity properties [1]. The computation of the output of an N -tap FIR filter is given by y(n)= N1 i=0 h i · x(n i) (1) where N is the filter length, h i is the i th filter coefficient, and x(n i) is the i th previous filter input. The straightforward realization of Eq. 1 is depicted in Fig. 1a which is known as the direct form. Alternatively, the realization of Eq. 1 in the transposed form is shown in Fig. 1b. This work was partially supported by national funds through FCT, Fundac ¸˜ ao para a Ciˆ encia e a Tecnologia, under project PEst-OE/EEI/LA0021/2013. L. Aksoy is with INESC-ID, Rua Alves Redol, 9 1000-029 Lisbon Portugal, e-mail: [email protected]. P. Flores and J. Monteiro are with INESC-ID/Instituto Superior T´ ecnico, University of Lisbon, Rua Alves Redol, 9 1000-029 Lisbon Portugal, e-mail: {pff, jcm}@inesc-id.pt. Fig. 1. Different forms of an N -tap FIR filter: (a) direct; (b) transposed. The complexity of the FIR filter design is dominated by the multiplication of filter coefficients by the time-shifted versions of the filter input, i.e., the constant array-vector multiplication (CAVM) block in the direct form of Fig. 1a or by the multiplication of filter coefficients by the filter input, i.e., the multiple constant multiplications (MCM) block in the transposed form of Fig. 1b. Since filter coefficients are fixed and determined beforehand and the realization of a multiplier in hardware is expensive in terms of area, delay, and power dissipation, these CAVM and MCM operations are generally implemented under a shift-adds architecture using only shifts, adders, and subtractors [2]. Note that shifts by a constant value can be implemented using only wires which represent no hardware cost. Thus, a well-known optimization problem [3] is defined as: given a set of constants, find the minimum number of adders/subtractors that realize the constant multiplications. Note that this is an NP-complete problem even in the case of a single constant multiplication [4]. In the last two decades, many efficient algorithms were proposed for the multiplierless design of the MCM block, targeting not only the minimization of the number of operations, but also the optimization of gate- level area, delay, throughput, and power dissipation of the MCM design [3], [5]–[14]. The recently proposed algorithm of [15] guarantees the least number of operations in the CAVM design and incorporates efficient techniques to reduce the gate- level area and delay of the CAVM design. On the other hand, the FDO problem [16] is defined as: given the filter specifications fspec, defined as a five-tuple (filter length N , passband w p and stopband w s frequencies, and passband δ p and stopband δ s ripples), find a set of

Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

1

Exact and Approximate Algorithms for the FilterDesign Optimization Problem

Levent Aksoy, Member, IEEE, Paulo Flores, Senior Member, IEEE, and Jose Monteiro, Senior Member, IEEE

Abstract—The filter design optimization (FDO) problem isdefined as finding a set of filter coefficients which yields a filterdesign with minimum complexity, satisfying the filter constraints.It has received a tremendous interest due to the widespreadapplication of filters. Assuming that the coefficient multiplicationsin the filter design are realized under a shift-adds architecture,the complexity is generally defined in terms of the total number ofadders and subtractors. In this article, we present an exact FDOalgorithm that can guarantee the minimum design complexityunder the minimum quantization value, but can only be appliedto filters with a small number of coefficients. We also introducean approximate algorithm that can handle filters with a largenumber of coefficients using less computational resources thanthe exact FDO algorithm and find better solutions than existingFDO heuristics. We describe how these algorithms can bemodified to handle a delay constraint in the shift-adds designs ofthe multiplier blocks and to target different filter constraints andfilter forms. Experimental results show the effectiveness of theproposed algorithms with respect to prominent FDO algorithmsand explore the impact of design parameters, such as the filterlength, quantization value, and filter form, on the complexity andperformance of filter designs.

Index Terms—Finite impulse response filters, direct and trans-posed forms, filter design optimization problem, multiplierlessdesign, depth-first and local search methods, delay reduction.

EDICS—DSP-FILT Filter design and structures, DIS-HDSPDSP algorithm implementation in hardware and software.

I. INTRODUCTION

Digital filtering is a ubiquitous operation in digital signalprocessing (DSP) applications and is realized using infiniteimpulse response (IIR) or finite impulse response (FIR) filters.Although an FIR filter requires a larger number of coefficientsthan an equivalent IIR filter, it is preferred to the IIR filterdue to its stability and phase linearity properties [1]. Thecomputation of the output of an N -tap FIR filter is given by

y(n) =N−1∑i=0

hi · x(n− i) (1)

where N is the filter length, hi is the ith filter coefficient, andx(n − i) is the ith previous filter input. The straightforwardrealization of Eq. 1 is depicted in Fig. 1a which is known asthe direct form. Alternatively, the realization of Eq. 1 in thetransposed form is shown in Fig. 1b.

This work was partially supported by national funds through FCT, Fundacaopara a Ciencia e a Tecnologia, under project PEst-OE/EEI/LA0021/2013.

L. Aksoy is with INESC-ID, Rua Alves Redol, 9 1000-029 Lisbon Portugal,e-mail: [email protected].

P. Flores and J. Monteiro are with INESC-ID/Instituto Superior Tecnico,University of Lisbon, Rua Alves Redol, 9 1000-029 Lisbon Portugal, e-mail:{pff, jcm}@inesc-id.pt.

��� �

� �� �

� � � � ��

� � � � ����

� � �

� � �� � �

�� ���

���

���

�� �

� �� �

� �� �

� � � � �� �

� � �� � �

�� ���

���

���

�� �

� � � � ����

� � � �

� �� �� �� �� !! � ����

Fig. 1. Different forms of an N -tap FIR filter: (a) direct; (b) transposed.

The complexity of the FIR filter design is dominated bythe multiplication of filter coefficients by the time-shiftedversions of the filter input, i.e., the constant array-vectormultiplication (CAVM) block in the direct form of Fig. 1a orby the multiplication of filter coefficients by the filter input,i.e., the multiple constant multiplications (MCM) block in thetransposed form of Fig. 1b. Since filter coefficients are fixedand determined beforehand and the realization of a multiplierin hardware is expensive in terms of area, delay, and powerdissipation, these CAVM and MCM operations are generallyimplemented under a shift-adds architecture using only shifts,adders, and subtractors [2]. Note that shifts by a constantvalue can be implemented using only wires which represent nohardware cost. Thus, a well-known optimization problem [3] isdefined as: given a set of constants, find the minimum numberof adders/subtractors that realize the constant multiplications.Note that this is an NP-complete problem even in the case ofa single constant multiplication [4]. In the last two decades,many efficient algorithms were proposed for the multiplierlessdesign of the MCM block, targeting not only the minimizationof the number of operations, but also the optimization of gate-level area, delay, throughput, and power dissipation of theMCM design [3], [5]–[14]. The recently proposed algorithmof [15] guarantees the least number of operations in the CAVMdesign and incorporates efficient techniques to reduce the gate-level area and delay of the CAVM design.

On the other hand, the FDO problem [16] is defined as:given the filter specifications fspec, defined as a five-tuple(filter length N , passband wp and stopband ws frequencies,and passband δp and stopband δs ripples), find a set of

Page 2: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

2

filter coefficients that yields a filter design with the minimumnumber of adders/subtractors and satisfies the filter constraints.Many efficient FDO algorithms were proposed, consideringdifferent filter constraints, targeting different filter forms, usingdifferent search methods during the exploration of possiblefilter coefficients, and applying different techniques to reducethe filter design complexity [17]–[29]. However, none of thesealgorithms can guarantee that their solutions (a set of filtercoefficients) lead to a filter design with the minimum numberof adders/subtractors. This is due to two main facts: i) theydo not explore the whole search space; and/or ii) they are notequipped with the exact techniques that can find the minimumnumber of operations for the constant multiplications.

In this article, we present the recently proposed exact FDOalgorithm [30], called SIREN, that can find a set of fixed-pointfilter coefficients, satisfying the filter constraints and leading toa filter design with the minimum number of adders/subtractorsunder the minimum quantization value. SIREN is equippedwith a depth-first search (DFS) method to explore the searchspace exhaustively, the exact algorithm of [9] to find theminimum number of operations in the MCM block of thetransposed form, and efficient search pruning and branchingtechniques to speed up the search process. Since the size ofthe search space of the FDO problem grows exponentiallywith the filter length N [29], SIREN can only handle filterswith a small number of coefficients. As observed from theexperimental results, it can find solutions to the symmetricfilters including less than 40 coefficients in a reasonable time.

Hence, in this article, we propose an approximate algorithm,called NAIAD, that can cope with the FDO problems whichSIREN cannot handle and obtain solutions close to the mini-mum. NAIAD initially finds possible sets of filter coefficientssatisfying the filter constraints. Then, a local search method isapplied to each set of filter coefficients to explore the feasiblesolutions around its neighborhood, aiming to reduce the totalnumber of adders/subtractors in the filter design. As observedfrom the experimental results, NAIAD can handle symmetricfilters including more than 100 coefficients up to 325.

Note that the direct form filters occupy less area andconsume less power, but have higher delay than the trans-posed form filters [31]. Hence, in this article, we present themodifications made to SIREN and NAIAD to target both thetransposed and direct filter forms. Note also that the number ofadder-steps in the multiplier block of a filter, i.e., the numberof adders/subtractors in series, has a significant impact on thedelay of the filter design [31]. Hence, we describe how thesealgorithms can be modified to handle a delay constraint in themultiplierless design of the CAVM and MCM blocks of thedirect and transposed filter forms. Moreover, we show howthese algorithms can target different filter constraints.

The rest of the article is organized as follows. Section IIpresents the background concepts and related work. The exactand approximate FDO algorithms are introduced in Section IIIand experimental results are given in Section IV. Finally,Section V concludes the article.

II. BACKGROUND

This section gives the background concepts and presents anoverview on the methods proposed for the shift-adds designof the MCM and CAVM blocks and the FDO problem.

A. Linear ProgrammingLinear programming (LP) is a technique to minimize or

maximize a linear cost function subject to a set of linearconstraints. An LP problem is given as follows1:

minimize : f = cT · xsubject to : A · x ≥ b (2)

lb ≤ x ≤ ub

In Eq. 2, cj in c is a cost value associated with each variablexj , 1 ≤ j ≤ n, in the cost function f , and A · x ≥ b denotesa set of m linear constraints. Also, lb and ub respectivelyconsist of the lower and upper bounds of variables.

The variables are assumed to be real numbers in an LP prob-lem, for which there exist polynomial-time algorithms [32],[33]. However, if all or some variables are restricted to inte-gers, as in pure integer LP (ILP) or mixed ILP (MILP) prob-lems, respectively, these LP problems become NP-complete,for which there is no polynomial-time algorithm [34].

B. Multiplierless Design of the CAVM and MCM BlocksThe CAVM block of the direct form filter realizes a linear

transform in the form of y = h0x0+h1x1+ . . .+hN−1xN−1,where xi stands for the time-shifted version of the filter inputwith 0 ≤ i ≤ N−1 (Fig. 1a). Also, the MCM block of thetransposed form filter implements the constant multiplicationsin the form of y0=h0x, y1=h1x, . . . , yN−1=hN−1x, where xdenotes the filter input (Fig. 1b).

For their shift-adds designs, the digit-based recoding (DBR)technique [35] first defines the constants under a number rep-resentation, e.g., binary or canonical signed digit (CSD)2 [5].Second, for the nonzero digits in the representations of con-stants, it shifts the variables according to the digit positionsand adds/subtracts the shifted variables with respect to the digitvalues. As an example, consider h0 = 21 and h1 = 53 andsuppose that the CSD representation is used. The decomposi-tion of the linear transform y = 21x0 + 53x1 is as follows:

y = 21x0 + 53x1 = (10101)CSDx0 + (1010101)CSDx1

= x0≪4 + x0≪2 + x0 + x1≪6− x1≪4 + x1≪2 + x1

where 6 operations are required for this CAVM block as shownin Fig. 2a. Also, the decompositions of constant multiplica-tions y0 = 21x and y1 = 53x in an MCM block are as follows:

y0 = 21x = (10101)CSDx = x≪4 + x≪2 + x

y1 = 53x = (1010101)CSDx = x≪6− x≪4 + x≪2 + x

which lead to a design with 5 operations as shown in Fig. 2b.

1The minimization objective can be easily converted to a maximizationobjective by negating the cost function. Less-than-or-equal and equality con-straints are accommodated by the equivalences, A ·x ≤ b ⇔ −A ·x ≥ −band A · x = b ⇔ (A · x ≥ b) ∧ (A · x ≤ b), respectively.

2An integer can be written in CSD using n digits as∑n−1

i=0 di2i, where

di ∈ {1, 0, 1} and 1 denotes −1 with 0 ≤ i ≤ n− 1. Under CSD, nonzerodigits are not adjacent and the minimum number of nonzero digits is used.

Page 3: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

3

�� �

���

���

��

��

��� ������

� �

���

���

��� ������

� � �

��� � ��

Fig. 2. Multiplierless realization of constant multiplications using the DBRtechnique [35]: (a) 21x0+53x1; (b) 21x and 53x.

In the following two subsections, prominent algorithms de-signed for the optimization of the number of adders/subtractorsin the MCM and CAVM blocks are described in detail. Theircommon aim is to maximize the sharing of partial products.

1) Multiplierless Design of the MCM Operation: The meth-ods proposed for the shift-adds design of an MCM block aregenerally grouped in two categories as the common subexpres-sion elimination (CSE) algorithms [5]–[7] and the graph-based(GB) techniques [3], [8], [9]. The CSE methods initially definethe constants under a particular number representation. Then,they consider possible subexpressions, that can be extractedfrom the nonzero digits in the constant representations, andchoose the “best” subexpression, generally the most common,to be shared among the constant multiplications. Their maindrawback is their dependency on a number representation.The GB methods are not restricted to any particular numberrepresentation and aim to find intermediate subexpressions thatenable the realization of constant multiplications with the min-imum number of operations. They consider a larger numberof realizations of a constant and obtain better solutions thanthe CSE methods, but require more computational resourcesdue to a larger search space.

For our MCM example in Fig. 2b, the exact CSE algo-rithm [7] obtains a minimum solution with 4 operations byfinding the most common subexpression 5x = (101)CSDx(Fig. 3a) when constants are defined under CSD. The exact GBalgorithm of [9] obtains a minimum solution with 3 operationsby finding the intermediate subexpression 3x (Fig. 3b).

The minimum adder-steps of a shift-adds design of a singleconstant multiplication, hx, is computed as ⌈log2S(h)⌉, whereS(h) is the number of nonzero digits in the CSD representationof h. Given an MCM instance with N constants, its minimumadder-steps is determined as MASMCM=maxi{⌈log2S(hi)⌉}with 0≤i≤N−1 [11]. Given a delay constraint dc withdc≥MASMCM , the algorithms of [7], [11], [12] can findthe smallest number of operations that realize the constantmultiplications without violating dc. For our example, the min-imum adder-steps of both 21x and 53x is 2. The approximatealgorithm [9] modified to handle a delay constraint finds asolution with 4 operations when dc is 2 (Fig. 3c). With respect

�� � �� � �� �

���

��

�� ���

� �

� �

��� �

��� ���

�� ��

� �

���

���

��

���

��

Fig. 3. Multiplierless realization of 21x and 53x: (a) exact CSE algo-rithm [7]; (b) exact GB algorithm [9]; (c) approximate GB algorithm [9]modified to handle a delay constraint.

� �� � � �� �

�� � �� �

� �

��

��

Fig. 4. Shift-adds design of 21x0+53x1: (a) ECHO-A [15]; (b) ECHO-D [15].

to the solution of the exact GB algorithm [9] in Fig. 3b, itssolution has one more operation, but one less adder-step.

2) Multiplierless Design of the CAVM Operation: Thealgorithm of [15], called ECHO, consists of two main parts.In its first part, the shift-adds realizations of constants inthe CAVM operation are found using an MCM algorithm.In its second part, the constants in the linear transform arereplaced with their realizations in the MCM solution and thecommon subexpressions are extracted iteratively using a setof transformations. ECHO has two variations, ECHO-A andECHO-D, that target the optimization of area and delay ofthe CAVM operation, respectively. ECHO-A uses the exactMCM algorithm [9] and considers some area optimizations.ECHO-D is equipped with the approximate MCM algorithm [9]modified to handle a delay constraint and considers some delayoptimizations. Both algorithms ensure to obtain a solution withm+nzc−1 operations, where m is the number of operationsfound by the MCM algorithm in the first part and nzc is thenumber of nonzero constants of the CAVM block.

For our CAVM example in Fig. 2a, ECHO-A finds a solutionwith 4 operations and 4 adder-steps (Fig. 4a) that was obtainedbased on the MCM solution in Fig. 3b. Also, the solution ofECHO-D includes 5 operations and 3 adder-steps (Fig. 4b) thatwas obtained based on the MCM solution in Fig. 3c. Thisexample shows the direct impact of the MCM solution on thenumber of operations and adder-steps of the CAVM design.

Page 4: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

4

Fig. 5. Zero-phase frequency response of a low-pass FIR filter.

C. Filter Design Optimization

The zero-phase frequency response of a symmetric FIR filteris given as3:

G(w) =

⌊M⌋∑i=0

dihicos(w(M − i))

where M = (N − 1)/2 and di = 2 − Ki,M with Ki,M isthe Kronecker delta4, hi ∈ R with −1 ≤ hi ≤ 1, and w ∈ Ris the angular frequency. Considering a low-pass FIR filter asillustrated in Fig. 5 and assuming that the desired pass-bandand stop-band gains are equal to 1 and 0, respectively, thefilter must satisfy the following constraints [22]:

1− δp ≤ G(w) ≤ 1 + δp, w ∈ [0, wp]−δs ≤ G(w) ≤ δs, w ∈ [ws, π]

(3)

The pass-band gain is not relevant for many DSP appli-cations and can be compensated in the filter design. Thus, ascaling factor (s) can be added into the filter constraints as acontinuous variable as follows [25], [26]:

s(1− δp) ≤ G(w) ≤ s(1 + δp), w ∈ [0, wp]s(−δs) ≤ G(w) ≤ s(δs), w ∈ [ws, π]

sl ≤ s ≤ su(4)

where sl and su are respectively the lower and upper boundsof s. Furthermore, in some DSP applications, it is desirable tominimize the peak weighted ripple [18], the normalized peakripple (NPR) [28], [29], or the NPR magnitude [27].

A straightforward filter design technique (SFDT) followstwo steps: i) given fspec, the filter coefficients, that re-spect the filter constraints, are found using a filter designmethod, such as windowing [37], McClellan-Parks-Rabineralgorithm [38], or linear programming [39]; ii) the multiplierblock of the FIR filter is realized using the minimum numberof adders/subtractors as described in Section II-B.

Since there exist many possible sets of coefficients satisfy-ing the filter constraints, FDO algorithms incorporate sophis-ticated techniques such as local search [18], [24], [27] andexhaustive search methods, including branch-and-bound [22],[23], [25], DFS [28], [29], and MILP [17], [19], [21], [24],

3The frequency response of an asymmetric filter can be found in [36].4The Ka,b function is 1 when a is equal to b. Otherwise, it is 0.

[26] techniques. The local search algorithms can be appliedto filters with a large number of coefficients, but the optimalsolution cannot be ensured, since the entire search space isnot explored. The exhaustive search techniques can only berun on filters with a small number of coefficients due to theexponential growth of the search space. However, their runtimecomplexity can be reduced when the number of possible valuesof coefficients is limited [22], [28].

To reduce the complexity of the filter design, the algorithmsof [17]–[24] search for coefficients with the fewest nonzerodigits, since a coefficient represented with a few nonzero digitsrequires a small number of operations. The algorithms of [22],[24] find the common partial products using one of algorithmsdescribed in Section II-B after a solution is obtained. However,since the sharing of partial products is not considered duringthe search of coefficients, these methods may yield filters witha large number of operations as shown in Section IV-A.

The methods of [25]–[29] search for coefficients that exploitthe partial product sharing. In [25], all possible values of coef-ficients are explored using a branch-and-bound algorithm anda CSE heuristic is used to share the common subexpressionsamong the coefficient multiplications. The algorithm of [26]uses an MILP method to consider all possible coefficientssatisfying the filter constraints and finds coefficients thatinclude the most common nonzero digits. The method of [27]finds the coefficients that include the most 101 and 101 digitpatterns (subexpressions) and satisfy the filter constraints.The algorithm of [28] uses a subexpression basis set that isdynamically expanded as coefficients are synthesized at eachdepth of its search tree. The technique of [29] is based on thealgorithm of [28], but explores the entire search space undera given quantization value. It is able to be aware of whetheran optimum solution is obtained.

III. EXACT AND APPROXIMATE FDO ALGORITHMS

The following two subsections present SIREN and NAIAD,targeting the filter constraints given for a symmetric filterin Eq. 4, the transposed form of the FIR filter, and theoptimization of the number of operations without a delayconstraint. The last subsection describes the modificationsrequired to target different filter constraints, the direct form ofthe FIR filter, and the optimization of the number of operationsunder a delay constraint in the multiplier blocks of filter forms.

A. SIREN: An Exact FDO Algorithm

SIREN was developed to find a set of filter coefficientsyielding a minimum number of adders/subtractors in thefilter design and satisfying the filter constraints. Its pseudo-code is given in Algorithm 1. It takes the five-tuple fspecdenoting the filter specifications as input and returns a set offixed-point coefficients sol. In Algorithm 1, Q stands for thequantization value used to convert floating-point numbers tointegers. SIREN will be described in detail using a symmetricFIR filter with fspec (8, 0.2π, 0.7π, 0.01, 0.01) as an example.

First, to restrict the search space, we find the lower andupper bounds of coefficients and scale factor s using the Com-puteBounds function. To find the lower bounds of coefficients,

Page 5: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

5

Algorithm 1 The SIREN algorithmSIREN(fspec)

1: Q = 0, sol = { }2: (hl, hu, sl, su) = ComputeBounds(fspec)3: O = OrderCoefs(hl, hu)4: repeat5: Q = Q+ 1, Hl = ⌈(hl · 2Q)⌉, Hu = ⌊(hu · 2Q)⌋6: if CheckValidity(Hl, Hu) then7: sol = DFS(fspec, O, Q, Hl, Hu, sl, su)8: until sol = ∅9: return sol

the following LP problem is solved for each coefficient:

minimize : f = hi

subject to : s(1− δp) ≤ G(w) ≤ s(1 + δp), w ∈ [0, wp]− s(δs) ≤ G(w) ≤ s(δs), w ∈ [ws, π]

hl ≤ h ≤ hu

sl ≤ s ≤ su

where hl and hu denote the lower and upper bounds of allfilter coefficients which were initially assigned to -1 and 1,respectively and the lower and upper bounds of s, sl and su,were initially set to 0.01 and 100, respectively. The value ofhi in the LP solution corresponds to its lower bound hl

i andis stored in hl. In a similar way, the upper bound of eachcoefficient hu

i is found when the cost function is f=−hi andis stored in hu. Thus, the sets hl and hu consist of the floating-point lower and upper bounds of coefficients, respectively. Thevalues of sl and su are found similarly. For symmetric filters,the number of LP problems to be solved is 2⌊M⌋+4. Recallthat an LP problem can be solved in polynomial time [33].

For our example, the floating-point lower and upper boundsof filter coefficients are computed as hl = {hl

0, hl1, h

l2, h

l3} =

{-0.0966, -0.0915, 0.0015, 0.0039} and hu = {hu0 , h

u1 , h

u2 , h

u3}

= {-0.0003, -0.0002, 0.4144, 1}, respectively. Also, sl and su

are 0.01 and 2.53, respectively.Second, the OrderCoefs function finds an ordering of co-

efficients to be used in its DFS method while constructingthe search tree (described ahead). We sort the coefficients inascending order according to their hu

i − hli values and store

their indices i in this order in O. The reason behind findingsuch an ordering is that if coefficients with narrower upper andlower bound intervals are placed in lower depths of the searchtree, fewer decisions are made and conflicts occur earlier.Thus, the runtime of SIREN can be reduced significantly, stillexploring all possible values of coefficients. For our example,the ordering of coefficients is O = {1, 0, 2, 3}.

Third, starting with the quantization value Q equal to 1,the floating-point lower (upper) bound of each coefficientis multiplied by 2Q, rounded to the smallest following (thelargest previous) integer, and is stored in H l (Hu). The validityof these sets H l and Hu is tested by the CheckValidity functionby simply checking each coefficient if H l

i is less than or equalto Hu

i . If they are not valid, this function returns zero. In thiscase, Q is increased by one, H l and Hu are updated, and theCheckValidity function is applied again. Otherwise, the DFSmethod, that explores all possible values of each coefficientin between H l

i and Hui , is applied to find a set of filter

coefficients which respects the filter constraints and yields theminimum design complexity, or to prove that there exists no

such a set of filter coefficients. If the former condition occurs,sol is returned. If the latter condition occurs, Q is increased byone, H l and Hu are updated, and the DFS method is appliedagain. Hence, SIREN ensures that its solution is a set of fixed-point filter coefficients obtained using the smallest Q value.

Note that Q is an important parameter in the filter design.When Q increases, the bitwidths (sizes) of coefficients in-crease. Thus, such coefficients lead to larger sizes of registersand structural adders in the register-add block of the transposedform (Fig. 1b). Also, most probably, they lead to a largenumber of operations in the multiplier blocks of both forms(Fig. 1). Similar to Q, the solution quality of an FDO algorithmis evaluated by the effective wordlength (EWL) of a set ofcoefficients [22], [28], [29], computed as max{⌈log2|hi|⌉}with 0≤i≤N−1 when fixed-point coefficients are considered.

In the DFS method of SIREN, the search tree is constructedbased on the ordering of coefficients O, where a vertex atdepth d, Vd, denotes the filter coefficient whose index is thedth element of O, i.e., hO(d). An edge at depth d of the searchtree, i.e., a fanout of Vd, stands for an assignment to the vertexVd from [V l

d ,V ud ] where V l

d (V ud ) denotes the lower (upper)

bound of Vd. Note that the values of the vertex at depth d areassigned incrementally starting from V l

d to V ud .

When d is 1, the DFS method assigns H lO(1) and Hu

O(1) toV l1 and V u

1 , respectively and sets the value of the vertex V1

to V l1 . At any depth greater than 1, d>1, although the lower

and upper bounds of a vertex can be taken from H l and Hu,respectively, tighter lower and upper bounds can be computed,since the values of d−1 coefficients are determined and fixed.The lower bound of the vertex Vd is computed by solving thefollowing LP problem, where the non-determined coefficientsand s are the continuous variables of the LP problem.

minimize : f = hO(d)

sbj. to : s(1− δp) ≤ G(w)/2Q ≤ s(1 + δp), w ∈ [0, wp]− s(δs) ≤ G(w)/2Q ≤ s(δs), w ∈ [ws, π]

H li ≤ hi ≤ Hu

i , i ∈ [O(d), O(⌊M⌋+1)]sl ≤ s ≤ su

hO(1) . . . hO(d−1) : determined

In this LP problem, the lower and upper bounds of allnon-determined coefficients are taken from H l and Hu, re-spectively. The upper bound of Vd is computed when thecost function is changed to f=−hO(d). If there exist feasiblesolutions for both LP problems, this lower (upper) bound isrounded to the smallest following (the largest previous) integerand assigned to V l

d (V ud ). If V u

d ≥ V ld , they are determined to

be the lower and upper bounds of Vd. Whenever there is nofeasible lower or upper bound for Vd or V u

d < V ld , the search is

backtracked chronologically to the previous vertex until thereis a value to be assigned among its lower and upper bounds.

When the values of all coefficients are determined, i.e., theleaf at the final depth of the search tree is reached (whend is ⌊M⌋+1 for symmetric filters), the implementation costof the transposed form filter is computed as TA=MA+SA,where TA is the total number of operations in the filter, andMA and SA are the number of operations in the MCM blockand the number of structural adders in the register-add block,respectively (Fig. 1b). While MA is found using the exact

Page 6: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

6

��� �� �

��� �� � ��� �� �

�� � ��� �� �

�� � �

�� ��

�� ��

� ����

��

��

��� �

��� �

��� �

�� �

�� � � ��� � �� � � � � �

���� � � ��� � ����

� �� � �

� �� � � � �� � �� ����

�������

Fig. 6. Search tree formed by the DFS method.

MCM method [9], SA is computed based on the nonzerocoefficients. No adder is needed for a coefficient equal to 0 inthe register-add block. This coefficient set is stored in sol ifits TA value is smaller than that of the best one found so farwhich was set to infinity in the beginning of the DFS method.

To prune the search tree, the TA value is estimated whendepth d is greater than 2M/3 for symmetric filters. This valuewas chosen to be close to the bottom of the search tree notto waste an effort for computing an estimate that usually doesnot yield a backtrack. To compute this estimation, the lowerbound on MA is found using the determined coefficients [40].The lower bound on SA is found after all non-determinedcoefficients are set to a value. To do so, the upper and lowerbound interval of each non-determined coefficient is checkedif 0 is included. If so, this non-determined coefficient is set to0. Otherwise, it is assumed to be a constant different from 0.

The DFS method terminates when all possible values ofcoefficients have been explored. If sol is empty, it guaranteesthat there is no set of filter coefficients which can be selectedfrom their quantized lower and upper bounds respecting thefilter constraints. Otherwise, sol consists of fixed-point coef-ficients that lead to a filter with minimum design complexity,satisfying the filter constraints.

For our example, when Q is 5, the quantized lower andupper bounds of filter coefficients are H l = {-3, -2, 1, 1} andHu = {-1, -1, 13, 32}, respectively. Note that no solution wasfound with Q<5. The search tree constructed by the DFSmethod when Q is 5 is shown in Fig. 6, where a and bin [a b] given next to each vertex stand respectively for itslower and upper bounds which are dynamically computed ascoefficients are fixed. In this figure, the actual traverse of theDFS method on filter coefficients can be followed from top tobottom and from left to right. Also, Conflict denotes that givendetermined coefficients, there exists no feasible lower/upperbound for the current depth vertex. Pruned indicates that theset of determined coefficients cannot lead to a better solutionthan the best one found so far. Success presents that the set ofcoefficients leads to a solution satisfying the filter constraints.

Observe that as the values of coefficients are determined, theintervals between the lower and upper bounds of coefficientsare reduced when compared to those in the original H l andHu. If the DFS method was not equipped with techniques,that order the filter coefficients, determine the lower andupper bounds of coefficients dynamically, and prune the search

space, in the worst case, it would consider∏⌊M⌋

i=0 (Hui −H l

i+1)possible sets of coefficients for a symmetric filter, i.e., thenumber of leafs at the final depth of the search tree, in eachiteration of SIREN. For our example, this value is 2496 whenQ is 5. However, the DFS method ensures the minimumsolution with only 1 leaf at the final depth and 10 branches.

The performance of SIREN depends heavily on the mini-mum quantization Q value, the filter length N , and the exactMCM algorithm [9]. The Q value has an impact on the numberof runs of the DFS method, the lower and upper bounds ofcoefficients, the number of branches in the search tree, andthe sizes of coefficients which affect the performance of theexact MCM algorithm [9]. The N value has an effect on theperformance of the exact MCM algorithm and on the depth ofthe search tree. The performance of the exact MCM algorithmis related to the number and size of coefficients [9]. To increasethe performance of SIREN, a parallel version of the DFSmethod can be developed. Thus, the whole search space canbe divided into many small parts and they can be explored ina reasonable time simultaneously. However, the FDO problemis NP-complete [29], and hence, heuristics are indispensablefor filters with a large number of coefficients.

B. NAIAD: An Approximate FDO Algorithm

NAIAD was developed based on two observations: i) givenfilter specifications, finding a set of floating-point coeffi-cients, that satisfies the filter constraints, takes a polynomialtime; ii) given a set of coefficients, finding a multiplierlessdesign of coefficient multiplications including a number ofadders/subtractors very close to the minimum can be done ina reasonable time [3], [9]. Hence, NAIAD consists of twomain parts: i) exploring sets of coefficients that satisfy thefilter constraints and finding the ones with the smallest EWLvalue; ii) exploring the search area in the neighborhood ofeach solution obtained in the first part and finding the onethat leads to the minimum design complexity. In following,these two parts are described in detail using a symmetric FIRfilter with fspec (11, 0.2π, 0.5π, 0.05, 0.05) as an example.

1) Exploring Coefficients Satisfying Filter Constraints: Toexplore possible sets of coefficients, which satisfy the filterconstraints, in a systematic way, the variable ϵ is includedinto the left and right sides of filter constraints of Eq. 4. Tofind its lower bound, ϵl, the following LP problem is solved:

minimize : f = ϵsubject to : s(1−δp)−ϵ ≤ G(w) ≤ s(1+δp)+ϵ, w ∈ [0, wp]

s(−δs)−ϵ ≤ G(w) ≤ s(δs)+ϵ, w ∈ [ws, π]hl ≤ h ≤ hu

sl ≤ s ≤ su

where the filter coefficients h, the scale factor s, and ϵ are thecontinuous variables. The initial values of hl, hu, sl, and su

are -1, 1, 0.01, and 100, respectively. The upper bound of ϵ, ϵu,respecting the filter constraints of Eq. 4, is naturally equal to 0.We find the lower and upper bounds of each coefficient, hl

i andhui , when the cost function is f=hi and f=−hi, respectively,

and similarly, the lower and upper bounds of s.For our example, ϵl, sl and su are -0.086, 0.01, and

3.36, respectively. The lower and upper bounds of coefficients

Page 7: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

7

are hl={hl0, h

l1, h

l2, h

l3, h

l4, h

l5}= {-0.1694, -0.1945, -0.1174,

0.0009, 0.0024, 0.0029} and hu={hu0 , h

u1 , h

u2 , h

u3 , h

u4 , h

u5}=

{0.0215, -0.0001, 0.1177, 0.4842, 0.9149, 1}, respectively.Then, in an iterative loop, ϵl is increased in steps of |ϵl/r|

to 0, where r is a predetermined integer search parameter thatdenotes the number of samples to be taken from the searchspace. A set of floating-point filter coefficients is obtained bysolving the following LP problem.

s(1−δp)− ϵ ≤ G(w) ≤ s(1+δp) + ϵ, w ∈ [0, wp]s(−δs)− ϵ ≤ G(w) ≤ s(δs) + ϵ, w ∈ [ws, π]

hl ≤ h ≤ hu

sl ≤ s ≤ su

ϵl ≤ ϵ ≤ 0

(5)

Starting with a quantization value Q equal to 1, the foundcoefficients are converted to integers as round(hi2

Q) and arechecked if these fixed-point coefficients still satisfy the originalfilter constraints of Eq. 4 when they are converted to floating-point numbers as round(hi2

Q)/2Q. If not, Q is increasedby 1. Thus, the coefficients are converted to integers usingthe minimum Q considering the quantization error. The EWLvalues of these sets of fixed-point coefficients are computedand the ones with the minimum EWL value are stored in a set,called ISPset, that will include initial search points (ISPs).

Note that having a coefficient equal to 0 can significantlyreduce the filter design complexity (Fig. 1). Hence, for eachϵl value in the iterative loop, the LP problem of Eq. 5 is alsosolved when each coefficient hi with hl

i≤0 and hui ≥0 is set to

0 by simply assigning 0 to its lower and upper bounds in theLP problem. For our example, they are h0 and h2. Similarly,the sets of coefficients are quantized to integers using theminimum Q value, their EWL values are computed, and theones with the minimum EWL value are stored in ISPset.

Then, the cost of each element of ISPset is computed interms of TA as described in the previous subsection. To findthe MA value, rather than the exact MCM algorithm [9], anefficient MCM heuristic [3] is used. Among all these ISPs,the minimum value of TA is found and is denoted as the bestcost value found so far (BC). An ISP with the BC value isdetermined as the best solution (BS).

Note that r has a significant impact on the solution quality.While a small r value leads to a few possible solutions witha high EWL value, a large r value yields a large number ofpossible solutions with a small EWL value. In NAIAD, it wasset to 5000 which was found empirically. This means that thenumber of LP problems need to be solved in this first part is5001(b+1), where b denotes the number of filter coefficientsof which 0 is in between their lower and upper bounds.

For our example, ISPset has 3 elements, {-1, -2, 0, 5, 13,16}, {-1, -1, 0, 4, 9, 11}, and {-1, -2, -1, 5, 12, 15}, withTA values equal to 10, 10, and 13, respectively and an EWLvalue equal to 4. Thus, BS is the first element of ISPset andBC is 10. Note that the first and second elements of ISPset,whose h2 values equal to 0, have the smallest TA value.

2) Exploring Search Area Around Coefficients: A localsearch method (LSM) is applied to each element of ISPset(a set of coefficients denoted as ISP with its implementationcost value cost and quantization value Q). Its pseudo-code is

Algorithm 2 The local search method of NAIADLSM(fspec, ISP , cost, Q)

1: bc = cost, bs = ISP2: loop3: repeat4: w2repeat = 05: O = GenerateAnOrdering(⌊M⌋)6: for i = 1 to ⌊M⌋+ 1 do7: (hl

O(i), huO(i)) = FindLUB(O(i), fspec, ISP , Q)

8: for c = ⌈hlO(i)⌉ to ⌊hu

O(i)⌋ do9: if c = ISPO(i) then

10: NSP = ISP , NSPO(i) = c11: impcost = ComputeImpCost(NISP )12: if impcost < bc then13: w2repeat = 114: ISPO(i) = c, bs = ISP , bc = impcost15: until w2repeat = 016: if Terminating conditions are not met then17: NSP = ChangeCoefs(fspec, ISP , Q)18: if NSP = ISP then19: ISP = NSP20: impcost = ComputeImpCost(ISP )21: if impcost ≤ bc then22: bs = ISP, bc = impcost23: else24: return bs, bc25: else26: return bs, bc

given in Algorithm 2, where bc and bs are respectively thebest cost value in terms of TA and the best solution includingthe set of coefficients with bc. The bc and bs are initially set tocost and ISP , respectively. The LSM function aims to explorethe search area around ISP and to reduce the implementationcost of the filter design. To do that this function iterativelytakes a coefficient, changes its value between its lower andupper bounds, finds the implementation cost of the new filterdesign every time, and keeps the one with the minimum cost.

Since the traversing order of filter coefficients affects thesolution of the LSM function, in its iterative loop (lines 3-15), we determine an ordering of coefficients randomly usingthe GenerateAnOrdering function. Then, for a coefficient inthe given order, hO(i) with 1≤i≤⌊M⌋+1, using the FindLUBfunction, we compute its lower and upper bounds when thevalues of all coefficients except the O(i)th coefficient aredetermined as given in ISP . The following LP problem isgenerated to find the lower bound of hO(i), hl

O(i), where onlythe O(i)th coefficient and s are continuous variables.

minimize : f = hi

subject to : s(1−δp) ≤ G(w)/2Q ≤ s(1+δp), w ∈ [0, wp]s(−δs) ≤ G(w)/2Q ≤ s(δs), w ∈ [ws, π]

sl ≤ s ≤ su

hO(1), . . . , hO(i−1), hO(i+1), . . . , hO(⌊M⌋+1) : determined

The upper bound of hO(i), huO(i), is found by changing the

cost function to f=−hO(i). Then, for each possible fixed-point value of the O(i)th coefficient in between its lowerand upper bounds, i.e., c, other than its value in ISP , weassign it to the O(i)th coefficient of a new search pointNSP which was initially assigned to ISP . We find theimplementation cost of NSP in terms of TA, impcost, usingthe ComputeImpCost function as described in the previous

Page 8: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

8

{h0, h1, h2, h3, h4, h5} = {-1, -2, -1, 5, 12, 15} TA = 13Considered coefficient: h5, ⌈hl

5⌉ = 14, ⌊hu5⌋ = 16

{h0, h1, h2, h3, h4, h5} = {-1, -2, -1, 5, 12, 16 } TA = 12Considered coefficient: h2, ⌈hl

2⌉ = −1, ⌊hu2⌋ = 0

{h0, h1, h2, h3, h4, h5} = {-1, -2, 0 , 5, 12, 16} TA = 10Considered coefficient: h3, ⌈hl

3⌉ = 5, ⌊hu3⌋ = 6

{h0, h1, h2, h3, h4, h5} = {-1, -2, 0, 6 , 12, 16} TA = 9Fig. 7. Changing the values of filter coefficients in the LSM function. Themodified coefficients are shown in a box.

subsection. If impcost is less than the best one found so farbc, then the w2repeat variable is set to 1 and ISP , bs, and bcare updated. After all coefficients of ISP are traversed, if thew2repeat variable is 1, we iterate this procedure once more,but with a different ordering of coefficients. Otherwise, it isdecided that a local minima is reached. To escape from thislocal point, the ChangeCoefs function is applied, where -2, -1,0, 1, or 2 (determined randomly) is added to the value of eachcoefficient in ISP . Note that the ChangeCoefs function canchange the values of more than one coefficient simultaneously.If the new search point NSP satisfies the filter constraints,its implementation cost is computed and it is entered into theiterative loop again. Otherwise, the search is terminated. Thelocal search algorithm has also two terminating conditions:i) the number of iterations of the iterative loop in the infiniteloop (lines 2-26) is 30; and ii) the total number of runs of theMCM algorithm [3] is 40N .

If the LSM function returns a solution with an implemen-tation cost value bc better than BC, then BS and BC arereplaced with its outputs bs and bc, respectively.

For our example, suppose that the third element of ISPset,i.e., {-1, -2, -1, 5, 12, 15} with a TA value 13, is given to thelocal search method. Fig. 7 shows how a solution with a betterTA value is obtained by changing the values of coefficients.At the end of the assignments given in Fig. 7, a solution witha TA value equal to 9, i.e., {-1, -2, 0, 6, 12, 16}, is found.We note that this is the minimum solution that NAIAD couldfind. On this example, SIREN also finds a solution with TAand EWL values equal to 9 and 4, respectively.

C. Further Modifications in SIREN and NAIAD

To realize the MCM block of the transposed form with theminimum number of adder-steps, in SIREN and NAIAD, werespectively used the modified versions of the approximatealgorithms of [9] and [3] that can handle the delay constraint.Whenever a set of fixed-point filter coefficients is determinedin SIREN and NAIAD, the minimum adder-steps of coeffi-cients is computed as given in Section II-B1 and it is given tothe algorithms of [9] and [3] as a delay constraint.

In order to target the direct form of the FIR filter, in SIRENand NAIAD, ECHO-A [15] is used to compute the smallestnumber of operations in the CAVM block and ECHO-D [15]is used for the design of the CAVM block with a small numberof adder-steps. Note that in direct form filters, the total numberof operations in the filter, i.e., TA, is determined by the solutionof ECHO-A or ECHO-D on the set of filter coefficients.

The proposed methods can target different filter constraints.For example, when the lower and upper bounds of s, sl and su,

in Eq. 4 are set to 1, the filter constraints of Eq. 3 are aimed.Setting sl and su respectively to 0.7 and 1.4 correspondsto the ±3dB gain tolerance in the filter design [30]. Theproposed algorithms can also target asymmetric filters takinginto account the related filter constraints [36].

The proposed algorithms can target the optimization of thegate-level area of the filter design rather than minimizingthe number of operations. In this case, whenever a set ofcoefficients is found, an algorithm [10], [15], that can find theshift-adds design of the multiplier block of the filter occupyingminimum area, should be used. In the transposed form filter,the size of registers and adders in the register-add block shouldalso be considered.

IV. EXPERIMENTAL RESULTS

This section is divided in two subsections. In the first sub-section, we explore the effectiveness of SIREN and NAIAD,comparing their results with those of prominent FDO algo-rithms and a straightforward filter design technique (SFDT).In the second subsection, we explore the impact of filter designparameters, such as filter length, quantization value, and filterform, on the filter design complexity, presenting the gate-levelresults of filter designs obtained based on the solutions ofSIREN, NAIAD, and the algorithm of [29]. Note that SIRENand NAIAD were written in MATLAB, used lp solve 5.5.2.0as an LP solver, and were run on a PC with Intel Xeon at2.33GHz under Linux. The filter designs were described inVHDL and synthesized using the Synopsys Design Compilerwith the UMCLogic 180nm Generic II library when thebitwidth of the filter input bwi was 16. In the synthesis script,relaxed timing constraints were used in order to provide morefreedom to the synthesis tool to optimize area. The func-tionality of filters was verified on 10,000 randomly generatedinput signals in simulation, from which the switching activityinformation, that was used by the synthesis tool to computethe power dissipation, was obtained. Note also that we did notutilize any truncation method [41], which is generally usedto reduce the complexity of the filter design sacrificing theaccuracy of the result, neither on the filter output nor on anyregister/adder. Thus, the sizes of filter input and coefficientshave a direct impact on the complexity of the filter design.

Unless stated otherwise, it should be accepted that the re-sults of SIREN and NAIAD were found when they targeted theconstraints in Eq. 4, the transposed form, and the minimizationof the number of operations without a delay constraint.

A. Comparisons on FDO Algorithms

Table I shows the specifications of 10 symmetric FIR filterswhich are commonly used in evaluation of FDO algorithms.Table II presents the results of SIREN, NAIAD, and otheralgorithms whose results were taken from [22], [24], [29] asreported. In this table, BST and TT denote respectively theCPU time required to find the best solution and the total CPUtime. For each filter, the FDO methods were sorted accordingto their results on i) EWL, ii) TA, iii) TT, and iv) BST indescending order. Note that the CPU time limit for SIRENand NAIAD was 2 days and 4 hours, respectively.

Page 9: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

9

TABLE ISPECIFICATIONS OF SYMMETRIC FIR FILTERS.

Filter Type N wp ws δp δs

X1 Low-pass 15 0.2π 0.8π 0.0001 0.0001G1 Low-pass 16 0.2π 0.5π 0.01 0.01Y1 Low-pass 30 0.3π 0.5π 0.00316 0.00316Y2 Low-pass 38 0.3π 0.5π 0.001 0.001A Low-pass 59 0.125π 0.225π 0.01 0.001S2 Low-pass 60 0.042π 0.14π 0.012 0.001L2 Low-pass 63 0.2π 0.28π 0.028 0.001B Low-pass 105 0.2π 0.24π 0.01 0.01L1 High-pass 121 0.8π 0.74π 0.0057 0.0001C Low-pass 325 0.125π 0.14π 0.005 0.005

TABLE IISUMMARY OF FDO ALGORITHMS ON FIR FILTERS OF TABLE I.

Filter Method EWL MA SA TA BST TT

X1

[27] 13 7 8 15 – –NAIAD 10 5 8 13 1m3s 1m3sSIREN 10 5 8 13 <1s 2s

[29] 10 5 8 13 – <1s

G1

[26] 7 2 13 15 – –NAIAD 6 3 15 18 50s 50s

[29] 6 2 15 17 – <1sSIREN 6 2 15 17 <1s <1s

Y1

[28] 10 6 23 29 – 21m30sNAIAD 9 7 23 30 5m55s 6m3s

[29] 9 7 23 30 – 6sSIREN 9 6 23 29 2m17s 7m56s

Y2

[25] 12 – – 39 – –NAIAD 11 9 29 38 15m21s 19m18s

[29] 10 10 37 47 – 11sSIREN 10 9 29 38 3m52s 4m29s

A

[22] 10 18 58 76 3h2m 4h14mNAIAD 10 16 56 72 43m34s 44m13s

[29] 10 14 54 68 – 2d2hSIREN 10 16 52 68 14h57m 2d

S2

[22] 11 27 59 86 23m 27m[29] 10 17 59 76 – 16h42m

NAIAD 10 15 57 72 49m19s 1h10sSIREN 9 14 57 71 16h4m 2d

L2

NAIAD 11 17 60 77 55m1s 1h4m[22] 10 18 62 80 26m 54m

SIREN 10 16 60 76 1d23h 2d[29] 10 17 56 73 – 16h28m

B[24] 9 10 99 109 <1s 1m7s[22] 8 11 100 111 9m 1d

NAIAD 8 13 96 109 2h2m 2h2m

L1[19] 15 59 120 179 – –

NAIAD 15 40 116 156 2h18m 3h48m[22] 14 47 120 167 32m 56m

C[20] 11 43 322 365 20h46m 1d[22] 10 22 306 328 13h47m 1d

NAIAD 10 25 286 311 2h51m 4h

On filters including less than 40 coefficients, X1, G1,Y1, and Y2, SIREN finds a solution with the minimumnumber of operations and minimum EWL value using littlecomputational resources. NAIAD obtains solutions very closeto the minimum in terms of EWL and TA. However, it requiresmore CPU time than SIREN, especially on filter Y2, for which,many ISPs were considered. This experiment indicates that thepreviously proposed FDO algorithms obtain solutions with thesame values of EWL and TA as those of SIREN or very closeto them. On filters including around 60 coefficients, A, S2, andL2, although SIREN cannot ensure the minimum TA valueunder the minimum quantization value due to its CPU timelimit, it can find a better solution than other FDO algorithms onfilter S2. Also, NAIAD can complete the search in a reasonabletime and obtain better solutions than the algorithm of [22],which was also applied to filters including more than 100

0 0.2 0.4 0.6 0.8 1−140

−120

−100

−80

−60

−40

−20

0

20

Mag

nitu

de (

dB)

Normalized Frequency

SIRENNAIAD

Fig. 8. Zero-phase frequency responses of the filter A of Table I.

coefficients, except on filter L2 where NAIAD finds a solutionwith a higher EWL value. SIREN was not applied to filtersincluding more than 100 coefficients, B, L1, and C. Instead,NAIAD finds solutions with equal or less number of operationsthan other FDO methods and obtains a solution with a higherEWL value only on filter L1.

Fig. 8 presents the zero-phase frequency responses of filterA of Table I based on the coefficients determined by SIRENand NAIAD. Observe from Fig. 8 and Table II that whilethe solutions of both algorithms satisfy the filter constraints,SIREN needs fewer operations than NAIAD.

To further analyze the proposed algorithms, we used ran-domly generated 21 symmetric low-pass FIR filters whose Nvalues range between 20 and 40. Fig. 9 presents the results ofSIREN, NAIAD, and SFDT in terms of EWL, TA, and CPUtime in seconds. In SFDT, given fspec, the filter coefficientswere computed using the firgr function of MATLAB and werequantized to integers with the minimum Q value determinedincrementally. Then, the minimum number of operations in theMCM block of the filter was found by the exact algorithm [12].

Observe from Fig. 9a that SIREN finds a set of coefficientswith the smallest EWL value and the solutions of NAIADhave EWL values equal or very close to those of SIREN. Onthese instances, SFDT obtains solutions with the largest EWLvalues. Observe from Fig. 9b that SIREN finds a solution withthe smallest number of operations, except on filters with 25,28, 31, and 34 coefficients. Recall that SIREN guarantees asolution with the minimum number of operations under theminimum Q value. Hence, on these instances using a higherQ value than the minimum, NAIAD can find a solution withfewer operations than SIREN, but with a higher EWL value.Note that NAIAD obtains solutions in terms of TA very closeto SIREN, i.e., 0.95 more adders/subtractors on average. Onthese instances, SFDT yields sets of coefficients requiring thelargest number of operations. Observe from Fig. 9c that as Nincreases, the run-time of SIREN increases dramatically andthe run-time of NAIAD increases slightly. On these instances,SFDT uses a little computational effort to find a solution.

Moreover, Fig. 10 presents the results of NAIAD,FIRGAM [22], and SFDT on randomly generated 20 low-passsymmetric FIR filters, where the N value ranges between 41and 60. In FIRGAM, the objective was to minimize the number

Page 10: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

10

20 25 30 35 405

6

7

8

9

10

11

12

13

14

Filter length (N)

Effe

ctiv

e w

ordl

engt

h (E

WL)

SFDTNAIADSIREN

20 25 30 35 40

20

25

30

35

40

45

50

55

Filter length (N)

Tot

al n

umbe

r of

ope

ratio

ns (

TA

)

SFDTNAIADSIREN

20 25 30 35 40

100

102

104

Filter length (N)

CP

U ti

me

(s)

SFDTNAIADSIREN

(a) (b) (c)Fig. 9. Results of SIREN, NAIAD, and SFDT on randomly generated filters: (a) EWL; (b) TA; (c) CPU time in log scale.

40 45 50 55 606

7

8

9

10

11

12

13

14

Filter length (N)

Effe

ctiv

e w

ordl

engt

h (E

WL)

SFDTFIRGAMNAIAD

40 45 50 55 6045

50

55

60

65

70

75

80

85

Filter length (N)

Tot

al n

umbe

r of

ope

ratio

ns (

TA

)SFDTFIRGAMNAIAD

40 45 50 55 60

10−1

100

101

102

103

104

Filter length (N)

CP

U ti

me

(s)

SFDTFIRGAMNAIAD

(a) (b) (c)Fig. 10. Results of NAIAD, FIRGAM [22], and SFDT on randomly generated filters: (a) EWL; (b) TA; (c) CPU time in log scale.

100 120 140 160 180 200 220

80

100

120

140

160

180

200

Filter length (N)

Tot

al n

umbe

r of

ope

ratio

ns (

TA

)

POTxNAIAD

Fig. 11. TA values found by our version of POTx and NAIAD.

of nonzero digits in the coefficients. FIRGAM was run on aPC with Intel Core i5-2410M at 2.3GHz under Windows 7.

Observe from Fig. 10 that while the solutions of SFDT havehigher EWL values than NAIAD and FIRGAM, the solutionsof NAIAD and FIRGAM have 8.75 and 8.5 EWL values onaverage, respectively. NAIAD finds a solution requiring lessnumber of total operations than both FIRGAM and SFDT. Wenote that while the difference of average TA values betweenthe solutions of FIRGAM and NAIAD is 4.45, this valuebetween the solutions of SFDT and NAIAD is 11.5. However,SFDT obtains a solution using the least CPU time and NAIADrequires less CPU time than FIRGAM on average.

To explore the effectiveness of the local search methodof NAIAD, we developed the local search method of theFDO algorithm POTx [24]. Given a set of fixed-point filtercoefficients, it aims to reduce the total number of power-of-two (POT) terms in coefficients, still satisfying the filter

100 120 140 160 180 200 2200

2000

4000

6000

8000

10000

Filter length (N)

CP

U ti

me

(s)

Local search of POTxLocal search of NAIAD

Fig. 12. CPU time of the local search methods in POTx and NAIAD.

constraints. Thus, in our version of POTx, we first find thepossible sets of fixed-point coefficients with the minimumEWL value as done in the first part of NAIAD. Then, weapply the local search method of POTx to only one set ofcoefficients. Finally, we apply the MCM method [3] to thefinal set of coefficients to maximize the sharing of partialproducts in the MCM block of the filter as done in [24].In this experiment, we used randomly generated 75 low-passsymmetric FIR filters whose filter lengths range between 103and 214. Fig. 11 and 12 present the total number of operationsin the filter designs obtained by our version of POTx andNAIAD and the CPU time of the local search methods inPOTx and NAIAD, respectively.

Observe from Fig. 11 that NAIAD always yields FIR filterdesigns with the same or less number of operations than ourversion of POTx. On average, it leads to filters includingless than 15.3 operations than our version of POTx. Observe

Page 11: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

11

TABLE IIISUMMARY OF RESULTS OF SIREN ON FILTER Y1 OF TABLE I.

N Q MA SA TA CA NCA A D P28 9 9 21 30 18.2 11.4 29.6 6.4 3.229 9 7 24 31 18.2 11.6 29.7 6.1 3.130 9 6 23 29 17.5 12.1 29.6 5.7 3.231 9 7 24 31 18.2 12.0 30.2 6.0 3.132 8 7 29 36 21.1 12.5 33.5 6.3 3.5

TABLE IVSUMMARY OF RESULTS OF SIREN ON FILTER Y2 OF TABLE I.

N Q MA SA TA CA NCA A D P34 11 10 27 37 23.2 15.1 38.3 6.8 4.235 11 12 28 40 25.0 15.0 40.0 7.4 4.436 11 10 27 37 23.3 15.3 38.6 7.1 3.937 10 12 36 48 30.0 15.6 45.6 7.4 4.638 10 9 29 38 23.5 16.1 39.6 6.6 4.439 10 11 32 43 26.4 16.3 42.7 7.1 4.4

from Fig. 12 that the local search method of POTx generallyrequires less CPU time than the local search method ofNAIAD. These two observations are due to the fact that thelocal search method of NAIAD may be applied to more thanone possible set of fixed-point filter coefficients and targetsthe optimization of the number of operations.

B. Explorations on Filter Design Parameters

To explore the impact of the filter length N on the filtercomplexity, we consider the filters Y1 and Y2 of Table I.Table III presents the results of SIREN on filter Y1 when Nis in between 28 and 32. Table IV shows the results of SIRENon filter Y2 when N is in between 34 and 39. Note that theminimum value of N , i.e., 28 for filter Y1 and 34 for filter Y2,was found by decrementing N by 1 at each time. These tablesalso present the gate-level results of filter designs, where CA,NCA, and A denote the combinational area, non-combinationalarea, and total area, all in mm2, respectively, D stands for thecritical path delay in ns, and P is the total dynamic powerdissipation in mW . Note that the EWL value of each set ofcoefficients is the same as the Q value and the number ofcoefficients equal to 0 can be computed as N−SA−1.

Observe from Tables III and IV that as N increases, theQ value decreases, reducing the bitwidth of filter coefficients,which can also yield reductions on the size of structural addersand on the number of operations in the MCM block (althoughthat also depends on the filter coefficients). However, thenumber of registers and structural adders is increased in thiscase. Hence, due to this tradeoff and because the filter designcomplexity in terms of TA is also related to the explored searchspace (the search space of the FDO problem differs underdifferent N values), there is no clear evidence that a smallerN value always leads to a filter design requiring the smallestnumber of operations or occupying the smallest area. Thisobservation is also true for delay and power dissipation of thefilter design. Thus, it is useful to design a filter with differentN values and choose the one that fits best in an application.

To explore the impact of the quantization value Q on thefilter complexity, Table V presents the results of SIREN on thefilter G1 of Table I when Q is 6 and 7 and on the filter Y1 ofTable I when Q is 9 and 10. Note that the EWL value of eachset of coefficients is the same as the Q value and the numberof coefficients equal to 0 can be computed as N−SA−1.

TABLE VSUMMARY OF RESULTS OF SIREN ON FILTERS G1 AND Y1 OF TABLE I.Filter Q MA SA TA CA NCA A D P

G1 6 2 15 17 9.1 5.6 14.7 4.9 1.57 2 13 15 8.1 5.9 14.0 5.4 1.5

Y1 9 6 23 29 17.5 12.1 29.6 5.7 3.210 6 23 29 18.1 12.8 30.9 6.9 3.3

The results on filter G1 clearly indicate that a filter withcoefficients having a smaller EWL value does not alwaysyield a filter design occupying smaller area. In this case, itis because of fewer structural adders in the register-add block,that reduces the CA value, even though as EWL increases,the size of registers increases, increasing the NCA values.However, the results on filter Y1, where the total number ofoperations is the same in filters with Q values 9 and 10, showthat an increase in Q increases the bitwidths of coefficients,increasing the size of adders/subtractors and registers. This canbe observed from the CA and NCA values. In this case, thedelay and power dissipation of the filter design also increase.

To explore the impact of the filter form, the number ofadder-steps in the multiplier block of the filter, and the solutionquality of an FDO algorithm on the filter design complexity,Table VI presents the solutions of the algorithm of [29],NAIAD, and SIREN on the filters A and S2 of Table I. Notethat the solutions of the algorithm [29] (sets of fixed-pointfilter coefficients) were taken from [29] and the FIR filterswere designed after the shift-adds realization of the multiplierblock is found using the algorithms which were also used inSIREN. In this table, AS denotes the number of adder-steps inthe CAVM and MCM block of the direct and transposed formFIR filter, respectively. For each FDO algorithm, the CAVMand MCM blocks were designed under two objectives. Underthe oper objective, the FDO algorithms target the optimizationof the number of operations without a restriction on thenumber of adder-steps. Under the step objective, they targetthe optimization of the number of operations considering adelay constraint as described in Sections II-B1 and II-B2.

First, consider the results of FDO algorithms on the directand transposed forms. Observe that while the total number ofoperations in both forms is equal or very close to each other,the number of adder-steps in the CAVM block of the directform is larger than that of the MCM block of the transposedform. The direct form filters occupy significantly less areathan the transposed form, which is primarily because of twofacts: i) while the size of registers is equal to the bitwidth ofthe filter input bwi in the direct form, it is increasing up tothe bitwidth of the filter output in the transposed form. Thisfact can be observed from the NCA results; ii) the CAVMalgorithm [15] can exploit the common subexpressions in Eq. 1due to the symmetric coefficients using ⌊N/2⌋ adders with asize of bwi+1, which is 17 in our experiment. This fact can beobserved from the CA results. The transposed form filters haveless delay compared to the direct form filters because of feweradder-steps in the multiplier block. But, they consume morepower which is primarily related to the area of the design.

Second, consider the results of filters obtained with andwithout a restriction on the number of adder-steps in themultiplier blocks. Observe that reducing the number of adder-steps generally decreases the delay of the design, where the

Page 12: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

12

TABLE VISUMMARY OF RESULTS OF FDO ALGORITHMS ON FILTERS A AND S2 OF TABLE I.

Filter Algorithm Objective EWL Direct Form Transposed FormTA AS CA NCA A D P TA AS CA NCA A D P

A

NAIAD oper 10 72 12 36.1 15.8 51.9 8.7 2.3 72 3 45.4 26.7 72.1 7.2 7.3step 72 8 36.8 15.8 52.6 8.3 2.4 72 3 45.4 26.7 72.1 7.2 7.3

[29] oper 10 68 11 34.5 15.8 50.3 9.2 2.3 68 4 43.1 26.6 69.7 8.0 7.1step 69 8 34.8 15.8 50.6 8.4 2.3 69 3 43.4 26.6 70.0 7.5 7.5

SIREN oper 10 68 12 35.5 15.8 51.3 9.0 2.2 68 4 43.3 26.1 69.4 7.8 6.9step 69 8 35.8 15.8 51.6 8.8 2.2 69 3 45.0 26.2 71.2 7.3 7.8

S2

[29] oper 10 76 12 38.7 16.0 54.7 9.2 2.4 76 3 48.8 27.5 76.3 8.1 7.4step 76 9 38.7 16.0 54.7 8.7 2.3 76 3 48.8 27.5 76.3 8.1 7.4

NAIAD oper 10 72 12 36.0 16.0 52.0 8.7 2.4 72 4 46.5 27.5 74.0 7.8 7.6step 73 9 37.0 16.0 53.0 8.6 2.4 73 3 46.7 27.5 74.2 7.8 7.3

SIREN oper 9 71 12 36.0 16.0 52.0 9.2 2.3 71 4 44.5 26.4 70.9 7.3 7.9step 72 8 36.2 16.0 52.2 8.3 2.3 73 2 45.9 26.4 72.3 6.7 7.7

maximum gain is obtained as 9.7% on the direct form filter S2obtained by SIREN. However, a restriction on the number ofadder-steps may increase the total number of operations, andconsequently, the area of the design. Note that the reductionof the number of adder-steps increases the clock frequencyof the filter design and can also reduce the complexity of thepipelined realization of the filter as shown in [13].

Third, consider the results of FDO algorithms. Observethat although both the algorithm of [29] and SIREN obtaina solution with the same TA and EWL values on filter A,the filter design obtained by the solution of SIREN occupieslarger area than that realized based on the solution of thealgorithm of [29], expect on the transposed form under theoper objective. This example indicates that there may existmany solutions to the FDO problem with the same TA andEWL values, but yielding filter designs with different gate-level area. Also, the solution of NAIAD on filter A leads toa filter design occupying the largest area, since its solutionincludes the largest number of operations. In turn, the solutionsof SIREN on filter S2 lead to the least complex filter designs,since its solutions have less TA and EWL values than thoseof the algorithm of [29] and have less TA values than thoseof NAIAD. Also, since the solutions of NAIAD include lessnumber of operations than those of the algorithm of [29] onfilter S2, filters designed based on the solutions of NAIADhave less complexity than those obtained by the solutions ofthe algorithm of [29].

V. CONCLUSIONS

This article addressed the problem of optimizing the numberof operations in the FIR filter design while satisfying the filterconstraints, generally known as the FDO problem. It presentedexact and approximate FDO algorithms, all of which areequipped with efficient methods to find the fewest operationsin the shift-adds design of the coefficient multiplications.Moreover, it showed how these algorithms can be modified totarget different filter constraints and filter forms and to handlea delay constraint in the multiplier blocks of filters. It wasobserved that the exact FDO algorithm can handle filters witha small number of coefficients, on which approximate FDOalgorithms can find solutions very close to the minimum. Itwas also shown that heuristic methods are indispensable forfilters with a large number of coefficients, on which the pro-posed approximate algorithm can find better solutions in termsof the number of operations than prominent FDO algorithms.

It was indicated that the total number of operations, EWLvalue, filter length, quantization value, and filter form havea significant impact on the gate-level area, delay, and powerdissipation results of filter designs.

VI. ACKNOWLEDGMENT

The authors would like to thank Prof. Arda Yurdakul forproviding the source code of their algorithm FIRGAM [22].

REFERENCES

[1] L. Wanhammar, DSP Integrated Circuits. Academic Press, 1999.[2] H. Nguyen and A. Chatterjee, “Number-Splitting with Shift-and-Add

Decomposition for Power and Hardware Optimization in Linear DSPSynthesis,” IEEE Tran. on VLSI, vol. 8, no. 4, pp. 419–424, 2000.

[3] Y. Voronenko and M. Puschel, “Multiplierless Multiple Constant Mul-tiplication,” ACM Tran. on Algorithms, vol. 3, no. 2, 2007.

[4] P. Cappello and K. Steiglitz, “Some Complexity Issues in Digital SignalProcessing,” IEEE Tran. on Acoustics, Speech, and Signal Processing,vol. 32, no. 5, pp. 1037–1041, 1984.

[5] R. Hartley, “Subexpression Sharing in Filters Using Canonic SignedDigit Multipliers,” IEEE Tran. on Circuits and Systems II, vol. 43, no. 10,pp. 677–688, 1996.

[6] I.-C. Park and H.-J. Kang, “Digital Filter Synthesis Based on MinimalSigned Digit Representation,” in Proc. of Design Automation Confer-ence, 2001, pp. 468–473.

[7] L. Aksoy, E. Costa, P. Flores, and J. Monteiro, “Exact and ApproximateAlgorithms for the Optimization of Area and Delay in Multiple ConstantMultiplications,” IEEE Tran. on Computer-Aided Design of IntegratedCircuits, vol. 27, no. 6, pp. 1013–1026, 2008.

[8] A. Dempster and M. Macleod, “Use of Minimum-Adder MultiplierBlocks in FIR Digital Filters,” IEEE Tran. on Circuits and Systems II,vol. 42, no. 9, pp. 569–577, 1995.

[9] L. Aksoy, E. Gunes, and P. Flores, “Search Algorithms for the MultipleConstant Multiplications Problem: Exact and Approximate,” ElsevierJournal on Microprocessors and Microsystems, vol. 34, no. 5, pp. 151–162, 2010.

[10] K. Johansson, O. Gustafsson, and L. Wanhammar, “A Detailed Com-plexity Model for Multiple Constant Multiplication and an Algorithm toMinimize the Complexity,” in Proc. of IEEE European Conference onCircuit Theory and Design, 2005, pp. 465–468.

[11] H.-J. Kang and I.-C. Park, “FIR Filter Synthesis Algorithms for Mini-mizing the Delay and the Number of Adders,” IEEE Tran. on Circuitsand Systems II, vol. 48, no. 8, pp. 770–777, 2001.

[12] L. Aksoy, E. Costa, P. Flores, and J. Monteiro, “Finding the OptimalTradeoff Between Area and Delay in Multiple Constant Multiplications,”Elsevier Journal on Microprocessors and Microsystems, vol. 35, no. 8,pp. 729–741, 2011.

[13] M. Kumm, P. Zipf, M. Faust, and C.-H. Chang, “Pipelined Adder GraphOptimization for High Speed Multiple Constant Multiplication,” in Proc.of IEEE International Symposium on Circuits and Systems, 2012, pp.49–52.

[14] K. Johansson, O. Gustafsson, L. DeBrunner, and L. Wanhammar,“Minimum Adder Depth Multiple Constant Multiplication Algorithmfor Low Power FIR Filters,” in Proc. of IEEE International Symposiumon Circuits and Systems, 2011, pp. 1439–1442.

Page 13: Exact and Approximate Algorithms for the Filter Design ...algos.inesc-id.pt/~levent/publications/tsp14.pdf · 1 Exact and Approximate Algorithms for the Filter Design Optimization

13

[15] L. Aksoy, P. Flores, and J. Monteiro, “ECHO: A Novel Method forthe Multiplierless Design of Constant Array Vector Multiplication,” inProc. of IEEE International Symposium on Circuits and Systems, 2014,pp. 1456–1459.

[16] B. Y. Kong and I.-C. Park, “FIR Filter Synthesis Based on InterleavedProcessing of Coefficient Generation and Multiplier-Block Synthesis,”IEEE Trans. on Computer-Aided Design of Integrated Circuits andSystems, vol. 31, no. 8, pp. 1169–1179, 2012.

[17] Y. C. Lim and S. R. Parker, “FIR Filter Design Over a Discrete Powers-of-Two Coefficient Space,” IEEE Tran. on Acoustics, Speech, and SignalProcessing, vol. 31, no. 3, pp. 583–591, 1983.

[18] H. Samueli, “An Improved Search Algorithm for the Design of Mul-tiplierless FIR Filters with Power-of-Two Coefficients,” IEEE Tran. onCircuits and Systems, vol. 36, no. 7, pp. 1044–1047, 1989.

[19] C.-Y. Yao and C.-J. Chien, “A Partial MILP Algorithm for the Designof Linear Phase FIR Filters with SPT Coefficients,” IEICE Tran. onFundamentals of Electronics, Communications and Computer Sciences,vol. E85-A, no. 10, pp. 2302–2310, 2002.

[20] C.-L. Chen and A. N. Willson Jr., “A Trellis Search Algorithm for theDesign of FIR Filters with Signed-Powers-of-Two Coefficients,” IEEETran. on Circuits and Systems II, vol. 46, no. 1, pp. 29–39, 1999.

[21] O. Gustafsson, H. Johansson, and L. Wanhammar, “An MILP Approachfor the Design of Linear-Phase FIR Filters with Minimum Numberof Signed-Power-of-Two Terms,” in Proc. of European Confrence onCircuit Theory and Design, 2001, pp. 217–220.

[22] M. Aktan, A. Yurdakul, and G. Dundar, “An Algorithm for the Designof Low-Power Hardware-Efficient FIR Filters,” IEEE Tran. on Circuitsand Systems, vol. 55, no. 6, pp. 1536–1545, 2008.

[23] N. Takahashi and K. Suyama, “Design of CSD Coefficient FIR Fil-ters Based on Branch and Bound Method,” in Proc. of InternationalSymposium Communications and Information Technologies, 2010, pp.575–578.

[24] A. Shahein, Q. Zhang, N. Lotze, and Y. Manoli, “A Novel Hybrid Mono-tonic Local Search Algorithm for FIR Filter Coefficients Optimization,”IEEE Tran. Circuits and Systems I: Regular Papers, vol. 59, no. 3, pp.616–627, 2012.

[25] J. Yli-Kaakinen and T. Saramaki, “A Systematic Algorithm for theDesign of Multiplierless FIR Filters,” in Proc. of IEEE InternationalSymposium on Circuits and Systems, 2001, pp. 185–188.

[26] O. Gustafsson and L. Wanhammar, “Design of Linear-Phase FIR FiltersCombining Subexpression Sharing with MILP,” in Proc. of MidwestSymposium on Circuits and Systems, 2002, pp. 9–12.

[27] F. Xu, C. H. Chang, and C. C. Jong, “Design of Low-ComplexityFIR Filters Based on Signed-Powers-of-Two Coefficients with ReusableCommon Subexpressions,” IEEE Tran. on Computer-Aided Design ofIntegrated Circuits, vol. 26, no. 10, pp. 1898–1907, 2007.

[28] Y. J. Yu and Y. C. Lim, “Optimization of Linear Phase FIR Filters inDynamically Expanding Subexpression Space,” Circuits, Systems, andSignal Processing, vol. 29, no. 1, pp. 65–80, 2010.

[29] D. Shi and Y. J. Yu, “Design of Linear Phase FIR Filters with HighProbability of Achieving Minimum Number of Adders,” IEEE Tran. onCircuits and Systems, vol. 58, no. 1, pp. 126–136, 2011.

[30] L. Aksoy, P. Flores, and J. Monteiro, “SIREN: A Depth-First SearchMethod for the Filter Design Optimization Problem,” in Proc. of GreatLakes Symposium on VLSI, 2013, pp. 179–184.

[31] ——, “A Tutorial on Multiplierless Design of FIR Filters: Algorithmsand Architectures,” Circuits, Systems and Signal Processing, vol. 33,no. 6, pp. 1689–1719, 2014.

[32] G. Dantzig, Linear Programming and Extensions. Princeton UniversityPress, 1963.

[33] N. Karmarkar, “A New Polynomial-Time Algorithm for Linear Program-ming,” Combinatorica, vol. 4, no. 4, pp. 373–395, 1984.

[34] R. Vanderbei, Linear Programming: Foundations and Extensions.Springer, 2001.

[35] M. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann,2003.

[36] Y. C. Lim, “Design of Discrete-Coefficient-Value Linear Phase FIRFilters with Optimum Normalized Peak Ripple Magnitude,” IEEE Tran.on Circuits and Systems, vol. 37, no. 12, pp. 1480–1486, 1990.

[37] T. Parks and C. Burrus, Digital Filter Design. John Wiley & Sons,1987.

[38] J. McClellan, T. Parks, and L. Rabiner, “A Computer Program forDesigning Optimum FIR Linear Phase Digital Filters,” IEEE Tran. onAudio Electroacoustics, vol. 21, no. 6, pp. 506–526, 1973.

[39] L. Rabiner, “Linear Program Design of FIR Digital Filters,” IEEE Tran.on Audio Electroacoustics, vol. 20, no. 4, pp. 280–288, 1972.

[40] O. Gustafsson, “Lower Bounds for Constant Multiplication Problems,”IEEE Tran. on Circuits and Systems II, vol. 54, no. 11, pp. 974–978,2007.

[41] M. Faust and C.-H. Chang, “Low Error Bit Width Reduction forStructural Adders of FIR Filters,” in Proc. of IEEE European Conferenceon Circuit Theory and Design, 2011, pp. 713–716.

Levent Aksoy received the M.S. and Ph.D. degreesin electronics and communication engineering andelectronics engineering from Istanbul Technical Uni-versity (ITU), Istanbul, Turkey, in 2003 and 2009,respectively. He was a Research Assistant with theFaculty of Electrical and Electronics Engineering,ITU from 2001 to 2009. Since November 2009, hehas been with the Instituto de Engenharia de Sis-temas e Computadores (INESC-ID), Lisbon, wherehe is currently a post-doctoral researcher in Algo-rithms for Optimization and Simulation (ALGOS)

group. His research interests include optimization algorithms, high-levelsynthesis, and DSP system design.

Paulo Flores received the five-year engineering de-gree, M.Sc., and Ph.D. degrees in electrical and com-puter engineering from Instituto Superior Tecnico(IST), Technical University of Lisbon, Portugal, in1989, 1993, and 2001, respectively. Since 1990,he has been teaching at Instituto Superior Tecnico,University of Lisbon, where he is currently an As-sistant Professor in the Department of Electrical andComputer Engineering. He has also been with theInstituto de Engenharia de Sistemas e Computadores(INESC-ID), Lisbon, since 1988, where he is cur-

rently a Senior Researcher in Algorithms for Optimization and Simulation(ALGOS) group. His research interests are computer architecture and CAD forVLSI circuits in the area of embedded systems, test and verification of digitalsystems, and computer algorithms, with particular emphasis on optimizationof hardware/software problems using satisfiability (SAT) models. He hascontributed to more than 50 papers to journals and international conferences.Dr. Flores is a senior member of the IEEE Circuit and Systems Society.

Jose Monteiro received a five-year engineering de-gree and the M.Sc. degree in electrical and computerengineering from Instituto Superior Tecnico (IST),Technical University of Lisbon, Portugal, in 1989and 1992, respectively, and the Ph.D. degree inelectrical engineering and computer science from theMassachusetts Institute of Technology, Cambridge,MA, in 1996. Since 1996, he has been with InstitutoSuperior Tecnico, University of Lisbon, where he iscurrently an Associate Professor in the Departmentof Computer Science and Engineering. He is cur-

rently Director of the Instituto de Engenharia de Sistemas e Computadores(INESC-ID), Lisbon. His main interests are computer architecture and CADfor VLSI circuits, with emphasis on synthesis, power analysis, low-power, anddesign validation. Dr. Monteiro received the Best Paper Award from the IEEETRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS in1995.