C-Testable modified-Booth multipliers

JOURNAL OF ELECTRONIC TESTING: Theory and Applications 8, 241-260 (1996) @ 1996 Kluwer Academic Publishers. Manufactured in The Netherlands.

C-Testable Modified-Booth Multipliers

DIMITRIS GIZOPOULOS Inst. of Inform. & Telecomm., NCSR "Demokritos", 153 lO Ag. Paraskevi, Athens, Greece

dgizop @iit.nrcps.ariadne-t.gr

DIMITRIS NIKOLOS Dept. of Computer Engg. & Inform., University of Patras, 265 00, Patras, Greece

[email protected]

ANTONIS PASCHALIS Inst. of Inform. & Telecomm., NCSR "Demokritos", 153 10 Ag. Paraskevi, Athens, Greece

[email protected]

CONSTANTIN HALATSIS Dept. of lnform., University of Athens, TYPA Buildings, 157 71, Athens, Greece

balatsis @di.uoa.gr

Received August 7, 1995; Revised February 2, 1996

Editor: M. Sami

Abstract. In this paper the testability of modified-Booth array multipliers for standard cells based design environments is examined for first time. In such cases the structure of the cells may be unknown, thus Cell Fault Model (CFM) is adopted. Two C-testable designs are proposed. A design for an Nx x Ny bits modified-Booth multiplier, which uses ripple carry addition at the last stage of the multiplication, is first proposed. The design requires the addition of only one extra primary input and 38 test vectors with respect to CFM. A second C-testable design is given using carry lookahead addition at the last stage which is the case of practical implementations of modified-Booth multipliers. Such a C-testable design using carry lookahead addition is for first time proposed in the open literature. This second design requires the addition of 4 extra primary inputs. One-level and two-levels carry lookahead adders, are considered. The C-testable design requires 61 test vectors for the former and 73 test vectors for the latter, respectively. The hardware and delay overheads imposed by both C-testable designs are very small and decrease when the size of the multiplier increases.

Keywords: design for testability, C-testability, cell fault model, iterative logic arrays, Booth multipliers, carry lookahead adders

1. Introduction

Technology advances in IC fabrication allow the inte- gration of large numbers of devices in small silicon area

and the accommodation of very complex designs and large arrays of cells in a single chip. Attempting to apply, to such a design, a test pattern generation strategy based on a gate level fault model would require the

242 Gizopoulos et al.

flattening of the circuit. The CPU time for the derivation of the test patterns would be excessive and the size of the resulting test set would be very large to store it in a tester memory and to apply it to the circuit under test.

Exploiting the special structure of the circuit under test can lead to the speed up of test pattern generation and application. Regular structures consisting of iden- tical logic cells (Iterative Logic Arrays, ILAs) can be effectively implemented in VLSI and also designed to be easily testable. The derivation of test sets for ILAs was studied in [1-8]. Kautz [1] proposed the cell fault model (CFM) which was extensively used in testing ILAs. According to this fault model only one cell can be faulty at a time and any fault can occur inside the faulty cell as long as the cell remains memoryless. Un- der this fault model each cell of the ILA must receive all its possible input combinations in order to fully verify its function and faulty cell outputs must be propagated to primary outputs. The internal structure of the cells that are exhaustively tested need not be known, making the testing strategy independent of the specific implementation. Friedman [2] introduced the C-testability property. An ILA is said to be C-testable if it can be tested with a constant number of test vectors, irrespectively of its size. C-testability of ILAs was extensively explored [3-8].

Among other regular structures realizing arithmetic functions, array multipliers form a very important class of ILAs implemented in VLSI circuits. The design of easily testable array multipliers has been discussed in [9-12]. The C-testable N x N carry-save and carry- propagate parallel multipliers given, among other multiplier designs, in [9] require 16 and 20 test vectors, respectively. The N • N carry-propagate parallel multiplier proposed in [10] is C-testable with 16 test vectors. The N x N carry-save array multiplier given in [11] requires 3N + 60 test vectors. CFM was used in all the three designs and therefore each cell in the proposed multipliers is tested exhaustively. In [ 12] three C- testable carry-save array multiplier designs are given. The first design requires 9 test vectors with respect to the CFM; the second design requires 6 test vectors with respect to all single stuck-at faults for a specific gate level implementation of the full adder cell; the third design is given for DCVS implementation and is C- testable with 6 test vectors which detect all detectable stuck-at, stuck-on and stuck-open faults in the multiplier. The differences of the above designs are the fault model under consideration, the length of the test set, whether they modify or not the function of the basic

array cell and the overhead introduced by each design, in terms of extra hardware, additional circuit delay and extra primary inputs and outputs.

Modified-Booth algorithm [13, 14] decreases the number of rows that are added, to form the final product, by a factor of 2. Array multipliers based on the modified-Booth algorithm with 2-bit recoding are, therefore, faster and require significantly less area than the straightforwardly implemented array multipliers [15] and due to these reasons their use is preferred in most computer systems.

The testability of multipliers based on the modified- Booth algorithm, in the restrictive case when the structure of most or all of its cells was known, was examined in [16-18]. In [16] the test set for a multiplier- accumulator based on the modified-Booth algorithm was given requiring the addition of 2 extra primary inputs. The fault model that was used in [16] is an extension of the stuck-at fault model towards switch level faults in pass transistor logic. The cells were implemented using the Piramid silicon compiler [19]. The multiplier was also modified to be C-testable with 18 test vectors. Extra hardware is required for the C- testable design and the required extra primary inputs are 3. In [17] the specific cell implementations of another silicon compiler, namely Cathedral [20] were used. The fault model used in [ 17] includes node stuck- at, transistor stuck-open and stuck-close faults for the specific cell implementations which include symmetric but also non-symmetric CMOS gates. Three extra primary inputs are needed to make the multiplier C-testable with a test set of 50 test patterns. In [18] a C-testable modified-Booth multiplier design, implemented using DCVS logic, was given. For the specific DCVS implementations of the multiplier cells, the multiplier is C-testable with 31 test patterns, detecting all detectable single stuck-at, stuck-on and stuck-open faults, when 6 extra primary inputs, 2 extra primary outputs and some extra logic are added. In [16-18] the hardware and delay overheads were not explicitly calculated.

A large number of design environments are based on libraries of standard cells. In such a design envi- ronment, where the structure of the standard cells is unknown, the best testing strategy that can be applied is to use the CFM and exhaustively test each cell.

A testing strategy based on the CFM was applied to a Booth multiplier in [21]. The multiplier uses the "sign-propagate" method [15] for sign-extension. It is C-testable with 80 test vectors with respect to CFM,

C-Testable Modified-Booth Multipliers 243

requiring the addition of only 2 extra primary inputs. All single stuck-at faults are detectable with only 31 test vectors. The hardware and delay overheads are both very small.

The first contribution of this paper is that, for first time with respect to CFM, the testability of Nx x Ny modified-Booth multipliers, which use the "sign-generate" method [15] for sign-extension, is examined. "Sign-generate" can be faster and requires less hardware than the "sign-propagate" method. First, we propose modifications to the internal structure of two important cells of the multiplier in order to achieve full testability under CFM (exhaustive testing of each cell). Then, we propose modifications to the overall structure of the multiplier to make the design C-testable under the CFM. In this C-testable design we assume that ripple carry addition is used at the last stage of the multiplication to form the final product. The design is C-testable with 38 test vectors adding only 1 extra primary input.

In practice, implementations of array multipliers based on the modified-Booth algorithm usually em- ploy a carry lookahead (CLA) adder at the last stage of the multiplication to further speed up the process. The second contribution of this paper is that we propose a C-testable modified-Booth multiplier design consid- ering the existence of 4-bit carry lookahead adders at the last stage of the multiplication. The design requires the addition of 4 extra primary inputs to the original design. Two implementations for the 4-bits CLA adder are considered; the classical one-level CLA adder and the two-level CLA adder. The proposed design is C- testable with 61 test vectors when the former is used and with 73 test vectors when the latter is used. At this point we remark that the C-testable designs proposed in [16-18, 21] consider only the use of ripple carry adders at the last stage of the multiplication.

In both our C-testable designs the hardware overhead is negligible and decreases while the size of the multiplier increases. The important factor of circuit speed in the two C-testable designs is practically unaffected, as it will be shown.

The paper is organized as follows. In Section 2 a brief review of the structure of the Booth multiplier is given. In Section 3 the modifications of the cells of the multiplier to achieve full testability under CFM, are given. Section 4 presents the first C-testable design, with ripple carry addition. Section 5 deals with the second C-testable design, with carry lookahead addition. Section 6 concludes the paper.

2. Preliminaries

An Nx x N r bits modified-Booth multiplier consists of three parts [15] (see Fig. 1 for a 12 • 12 multiplier). The first part~erforms the 2-bit recoding function. It consists of [-=2] cells named r-cells, shown at the left side of Fig. 1. The second part that performs the generation and addition of the partial sums, consists of (Nx - 1) x (@1 - 2 ) + l full adders, Nx + @1 - 3

half adders and (Nx + 1) x [@] partial sum forming cells named ps-cells. At the bottom of the second part there is a final result forming adder, the third part of the multiplier.

The r-cell of row j receives a 2-bit group (Y2i, Y2i+J,

j = 0 . . . . . [ ~ ] - 1), of the operand to be recoded, along with the most significant bit of the previous group Y2i-1, (Y-, = 0) and produces a set of signals that de- termine the partial sum produced by the ps-cells of row j. This partial sum will be added, by the adders of row j, to the partial product produced by row j - 1 to form the next partial product which will be fed to row j § 1. The signals that an r-cell produces for the calculation of the partial sums are named x P 1, xP2 , x M 1, x M 2 with the meanings "add X", "add 2X", "subtract X" and "subtract 2X", respectively. (Signal x0 [15] denotes that zero must be added to the partial product, which is equivalent to x P1 = x P2 = x M l = x M 2 = 0. Therefore x0 is not required). The logic functionsofsignalsx P l , x P2, x M l , xM2are: x P1 =

Y2j+, "(Y2j �9 Y2j-I), x P 2 = Y2j+J "Y2i " Y2j-l, xM1 = Y2i+I"(Y2iGY2i-1),xM2 = Y2j+r'Y2i. Y2j-r.

These signals from the r-cell of row j are driven to all the ps-cells of row j (psO, j tO pXXx,.]) to calculate the bits that will be added to the previous partial product to form the next partial product. To achieve this each ps-cell (p&,j) receives the above 4 signals along with bits Xi and Xi_j of the non-recoded operand and produces a single bit output (we use notation psi..j for both the cell and its output): DSi,j = x P 1" X i @ x P 2. X i_ 1 -}-

xM1 �9 f(i + x M 2 . Xi-1 (X_j = 0). When a subtraction should take place (xM1 = 1 or

x M 2 = 1) the 2's complement of X should be calculated (modified-Booth algorithm multiplies 2's complement numbers). The above formula calculates the l 's complement of X (or 2X), since it just inverts each bit. Therefore, a 1 should be added to form the cor- rect 2's complement representation. This addition is performed, by an "add 1" logic, which consists of the [-~] OR gates shown in Fig. 1, which receive signals x M1, x M2 as inputs.


~___,ed.

! \

e ~

e ~

e ~

~D

e~

= E

[ ] az

•

r


Table 1. Truth table of the r-cell and rl-cell.

Inputs R-cell R'-cell

Y2j+l Y2j Y2j-1 xP1 xP2 xM1 xM2 Sign One Two Action

0 0 0 0 0 0 0 0 0 0 add 0

0 0 1 1 0 0 0 0 i 1 add X

0 l 0 1 0 0 0 0 1 0 add X

0 1 1 0 1 0 0 0 0 1 add 2X

1 0 0 0 0 0 1 1 0 1 subtract 2X

1 0 1 0 0 1 0 1 1 0 subtract X

1 1 0 0 0 1 0 1 1 1 subtract X

1 1 1 0 0 0 0 1 0 0 add 0

The sign extension required in the addition of partial sums, is performed using the "sign-generate" method [15]. A number of l ' s are added at the appropriate positions as shown in Fig. 1.

3. Proposed Modifications for the r-Cells andps-Cells

The result of the recoding operation, realized by the r-cells, is a group of 4 signals, x P 1 , x P 2 , x M 1 , xM2 . At most one of these 4 signals can have the value 1, while all the others have the logic value 0 as shown in Table 1. Since for each row j , these 4 signals (outputs of the r-cell of row j ) are inputs to the ps-cells of the same row, the ps-cells cannot receive all the possible combinations of their 6 inputs and therefore cannot be fully tested under CFM. In order to overcome this problem and obtain full testability under the CFM we propose modifications to the functions of the r-cells and the ps-cells.

The function of the proposed r-cell, which we denote f -ce l l is to encode the 5 different recoded digits (+0, + X , - X , + 2 X and - 2 X ) using only 3 bits, named Sign, One and Two. Sign denotes the sign of the recoded digit while One and Two denote the magni- tude. When both One and Two are l, One overcomes Two. The truth table of the r'-cell is given in Table 1. The proposed ps-cell, which we denote ps'-cell, receives bits Sign, One, Two produced by the r'-cell along with bits Xi, Xi-1 and produces its single output ps[,j. Since all the 8 possible combinations of the outputs of the r '-cell appear the ps'-cells can receive all the possible combinations of their 5 inputs. The logic functions for signals Sign, One, Two and ps[.j are: Sign = Y2j + 2, One = Y2j @ Y2j-I, Two = Y2j + 1 G

Y2j- 1, psi, j = Sign. One. Xi + Sign. One. Two. X i - 1 +

Sign. One. Xi + Sign. One. Two. Xi-1. A similar encoding of the recoded digits using 3 bits

was given in [22]. Only 5 of the 8 combinations of the r-cell outputs appear in this encoding. Therefore it is not suitable to achieve full testability under CFM.

The overall structure of the multiplier, when the r ' - cells and ps'-cells are employed, instead of r-cells and ps-cells, is the same with the original one given in Fig. 1, except that the output signals of the r-cell of each row that are inputs of the ps-cells of the same row are 3 (instead of 4) and the OR gates, which implement the "add 1" logic, at the right of each row are replaced by a complex gate implementing the function: Sign �9 (One + Two), which detects a subtraction.

The test set for the multiplier that will be derived later is valid when either r-cells/ps-cells or r'-cells/ps'-cells are used, and is also valid for any specific implementation of the cells. Full testability under the CFM is achieved when r~-cells/ps'-cells are used. When the original designs are employed each cell is tested exhaustively except the ps-cells and the OR gates, which receive all the combinations of their inputs that can appear during normal mode.

4. Booth Multipliers with Ripple Carry Adders at the Last Stage

We propose a C-testable design for the modified-Booth multiplier, with ripple carry addition at the last stage. The fault model considered is CFM. The discussion that will follow is valid for both cases which use either r-cells/ps-cells or r'-cells/ps'-cells. In the figures, r- cells/ps-cells are shown but r'-cells/ps'-cells could be used similarly.


4.1. C-Testable Design

The structure of our C-testable design of the 12 • 12 bits modified-Booth multiplier is shown in Fig. 2. The ps-cells of the same row (Fig. 1) receive, during test, the same inputs, as it will be explained in the sequel. Hence, for manipulation reasons they have been replaced by a single ps-cell in Fig. 2.

The half adders in the array (a row and a diagonal) are replaced by full adders. This modification simpli- fies the application of the required test vectors to any full adder irrespectively of the multiplier size. The full adders of the array are classified according to whether they belong to an even or odd numbered row or diagonal, in 4 classes, namely 00-class, 01-class, 10-class and 11-class, where the first bit denotes the row and the second denotes the diagonal (0 for even, 1 for odd). Rows are numbered 0, 1 . . . . . [-@:7 - 1 from top to bottom and diagonals are numbered 0, 1 . . . . . N, - 1 from right to left. In order to control, during testing, the inputs of the full adders of the top row and leftmost diagonal, which in normal operation are constantly connected to logic 0 or logic 1 (see Fig. 1), we introduce some new signals. These signals are denoted E00,1, E01,1, E01,2, El0,1, El0,2, Ell,i, Ell,2 in Fig. 2. The first two bits of the subscript of these signals denote the class of the adder. The digit after the comma denotes whether the signal drives the middle (= 1) or the right (= 2) inputs of the full adders.

Controlling the above described signals does not require the addition of extra primary inputs since they are time-multiplexed with existing primary X inputs. The time-multiplexing operation is controlled by only 1 extra primary input termed 7"1, as it is shown in Fig. 2. The extra hardware which accomplishes the time- multiplexing operation consists of the 2-to- 1 multiplexers and 2-input AND and NAND gates shown in Fig. 2. When the circuit operates in test mode (7"1 = 1) signals Eo0,1, Eol,1, Eol,2, El0,1, E1o,2, Ell,l, ElI,2 affect the inputs of the full adders of the top row and leftmost diagonal. When the circuit operates in normal mode (T1 = 0) the inputs of these full adders receive their proper constant logic values as it can be easily verified from Fig. 2.

In order to even more simplify the application of the test vectors to the cells of the multiplier we design the multiplier so that during test mode (T1 = 1) all the ps-cells of the same row give the same value at their single output. Since all the ps-cells of the same row receive the same control signals from the r-cell

of the row, what is needed is to feed all these ps-cells with the same pair of Xi, Xi-i values. This can be ac- complished using Nx § 1 pairs of 2-to-1 multiplexers, one for every diagonal of the multiplier. The outputs of the multiplexers drive all the Xi, Xi-1 inputs of the ps-cells of the same diagonal. During normal mode each ps-cell receives its corresponding Xi, Xi-j values. During test mode the same pair of Xi, Xi-1 values is driven to all the ps-cells using two signals, denoted Xt, Xt-1. These two signals are also time-multiplexed with existing primary inputs of the multiplicand (X inputs), similarly to the E signals. The above discussion implies that an abstraction is used for manipulation reasons while applying the test vectors to the cells of the multiplier. This abstraction has already been made in Fig. 2, where one can see a cell denoted rps-cell. This cell is a composite one and it includes the r-cell of the corresponding row, along with the Nx + 1 ps-cells of the row. The function of the composite rps-cell is the same when either r-cells/ps-cells or rl-cells/ps'-cells are used. Instead of Nx + 1 pairs of 2-to-1 multiplexers just one pair is shown in Fig. 2.

The time-multiplexing operation mentioned above introduces 9 signals, namely Xt, Xt-l, Eoo, l, Eoj,l, Eol,2, Elo,1, Elo,2, Ell,j and Ell,2. Since these signals come from normal primary inputs, using time- multiplexing, the size of the multiplicand (Nx), for which the above C-testable design can be employed is restricted to Nx > 9. This is not a serious limitation, since modified-Booth multipliers are very efficient for large operands. The primary inputs XI, Xo have been chosen to represent, during test mode, the signals Xt, Xt-1, respectively. Therest oftheextrasignals, namely signals Eo0,1, E0~,l, E01,2, El0,1, El0,2, Eil,1 and Ell,2 are represented, during test mode, by primary inputs X8 . , . Xa, respectively. In the time-multiplexing operation we have chosen to use only primary inputs of the multiplicand operand (X) and not the multiplier operand (Y), in order to minimally affect the speed of the circuit. The choice of time-multiplexing with the Y operands would add extra hardware to the critical path of the circuit, and thus increase the circuit delay.

In the test set that will be constructed each composite rps-cell will receive all the combinations of its 5 inputs. During the application of these combinations, and when r'-cells/ps'-cells are employed, each of these cells will receive every possible input combination (full testability with respect to CFM). In the case of using the original design for the r-cells/ps-cells each r-cell receives all possible input combinations but the ps-cells

xi X

t xi-l

Xt-1

T1

TI

E

II, I

yl

y2_

y3-

..r~_

__~a

.......

..... i

y4

yS-

~.~p

sr~.

a .....

...... i

y7-

yS_

y9-

yl0_

yl

l ~ -

-

~oo.~

21

[

EI0"

I i

~1

0

2 r

p23

p22

p21

p20

p19

p18

p17

p16

p15

p14

p13

p12

pll

pl0

p9

p8

p7

p6

p5

p4

p3

p2

pl

p0

[]

doao

tr~s a

2-1o

-1 m

ultip

lexe

r

Fig.

2.

C-t

esta

ble 1

2 •

12 b

its m

odifi

ed-B

ooth

mul

tiplie

r with

ripp

le c

arry

adde

r at t

he la

st st

age.

c~

�9

�9

=_

t,o

4~


receive the combinations that can appear in normal mode. This is also the situation for the logic which realizes the "add 1" operation. When r'-cells/ps'-cells are used the "add 1" logic is a complex gate which will receive all possible combinations of its inputs. When the original r-cells/ps-cells are used the "add 1" logic is an OR gate which will receive the combinations that appear in normal mode. The adders of the multiplier (all the full adders of the array and the half and full adders of the ripple carry chain) will receive all the different combinations of their inputs. Finally, the extra 2-to-1 multiplexers, AND and NaND gates will be also exhaustively tested.

4.2. Testing the Adders

As already mentioned each adder will receive all possible input combinations. Table 2 gives the test vectors that should be applied to the multiplier in order for each full adder of the array to receive all possible input combinations. In Table 2 we denote Vfa~ and Vrps k the input combination of a full adder and rps-cell,

respectively, where k is the arithmetic value of the binary inputs of the full adder or rps-cell. For example the input combination 110 of a full adder is denoted Vfa 6, since its arithmetic value is equal to 6. A binary pattern within parentheses and an asterisk superscript in the Y column of Table 2 means that the pattern is repeated as the size of the Y operand increases.

Conserning the ripple carry adder shown at the bottom of Fig. 2, one can easily verify that during the application of test vectors tl to tlo of Table 2, and the test vectors tll to t32, explained in the next subsection, the two half adders receive the 4 possible input combinations and all full adders of the chain receive the 8 possible input combinations.

4.3. Testing the rps-Cells

Our target is to apply all the 32 input combinations Vrps o to Vrps31 to all the rps-cells. Vrpsi, i = 0. �9 - 31, denotes the rps-cell input combination where the arithmetic value of its 5 inputs is i. The order of the inputs is Yzj+I, Yzj, Yzj-1, Xi, Xi-1. It can be verified from

Table 2. Test vectors for the full adders of the array.

Input combination Input combination

Test Y X E T1 to rps-cells to full adders

tt (10)* 01 0 00 00 00 1 Vrps 9 to all Vfa 0 to all

t2 (01)* 00 1 11 11 11 1 Vrps20 to all Vr 7 t oa l l

t3 00(10)* 10 0 01 01 01 1 Vrps 2 to row 0 l([a 5 to all

Vrps m to the rest

t4 10(0)* 10 1 10 10 10 1 VrpslotorowO ~'fa2 to all

Vrps 2 to the rest

t5 10(1000)* 11 001 10 10 1 Vrpsl~ to rows 0, 1 , 3 . . . Vii% to rows 1, 3 . . .

Vrps 3 to rows 2, 4 . . . V)% to rows 0, 2 . . .

t6 00(0010)* 11 1 1001 01 1 Vrps 3 to rows 0, 1 , 3 . . . ~([d 1 to rows 1 , 3 . . .

Vrpsll to rows 2, 4 . . . ~ffh 6 to rows 0, 2 . . .

t7 (0011)* 01 0 11 01 00 1 Vrps 5 to rows 0 , 2 . . . Vfa 4 to 11-class

Vrps25 to rows 1, 3 . . . VJd 3 to 01-class

t8 (1100)* 01 1 00 10 11 1 Vrps25 to rows 0, 2 . . . ]ff~/4 tO 0 l-class

Vrps 5 to 1, 3 . . . VJh 3 to 11-class

t9 (0110)* 11 1 0 0 0 0 0 1 1 Vrpsl9 to rows O, 2 . . . Vj'a4 to 10-class

Vrpsl5 to 1,3 . . . Vfd 3 to 00-class

rio (100l)* I I 0 11 11 10 1 Vrpsls to rows O, 2 . . , ~ffa4 to 00-class

Vrpsl9 to rows 1, 3 . . . V fh 3 to 10-class

Y = Yo " " YNy-I; X = X t X t - I = XIXo; E = E~,] Eol,] Eol,2 Elo, i EIO,2 Ell.1 EI],2 = Xs . . . X2 (pattern)* means repetition of pattern.


Table 3. Test vectors for the rps-cells.

Input combination Input combination Test Y X E TI to even row rps-cells to odd row rps-cells

tlI (0)* 00 1 00 00 00 1 Vrps 0 Vrps 0

t12 (0)* 01 d dd dd dd 1 Vrps I Vrps I

tl3 (10)* 00 d dd dd dd l Vrps 8 Vrps 8

t14 (01)* 01 d dd dd dd 1 Vrps21 Vrps21





tj9 (11)* 10 d dd dd dd 1 Vrps30 Vrps30


t21 (0011) ~ 00 d dd dd dd 1 Vrps 4 Vrps24

t22 ( 1100)* 00 d dd dd dd 1 Vrps24 Vrps 4

t23 (0100)* 10 0 00 00 00 i Vrpsl8 Vrps 6

t24 10(0100)* 10 0 11 11 11 1 Vrps 6 Vrpsl8

t25 (0011)* 11 d dd dd dd 1 Vrps 7 Vrps27

t26 (1100)* 11 0 00 00 00 1 Vrps27 Vrps 7

t27 (1001) * 00 d dd dd dd 1 Vrpsi2 Vrps16

t28 (0110)* 00 0 00 00 00 1 Vrpsl6 Vrpsl2

t29 (1001)* 01 d dd dd dd 1 Vrpsl3 Vrpsi7

t30 (0110)* 0l d dd dd dd 1 Vrps17 Vrpsa3


t32 (1110)* 10 d dd dd dd 1 Vrps26 VIps14

d = don't care.

Table 2 that during the application of the test vectors tl to tl0, all rps-cells receive ten of their combinations, namely Vrps 2, Vrps 3, Vrps 6, Vrps 9, Vrpslo, Vrps11, Vrps15, Vrps19, Vrps2o, Vrps25. The remaining combinations are applied to all the rps-cells by test vectors tl 1 to t32 of Table 3. The reason that not in all cases the E signals are don' t cares is the need to complete the testing of the half and full adders of the ripple carry adder.

When the original r-cells/ps-cells are used, the r- cells receive every possible input combination but the ps-cells only receive the combinations that appear in normal mode. When r'-cells/ps'-cells are used, full testability under CFM is achieved. Each r '-cell and ps ' -cel l receives every possible input combination, during the application of test vectors tl to t32.

We have to note here that the r-cell (r/-cell) of row 0 always receives a 0 in its Yzj-1 input, and thus it can be simplified to a cell with Y2j+J ( = Y1 ), Y2j ( = Y0) inputs. This r-cell (r~-cell) receives, during the application of

test vectors tl to t32 , all the 4 possible input combinations, and therefore is fully tested under CFM.

4.4. Testing the "add 1" Logic

During the application of test vectors tl to t32 and when r'-cells/ps'-cells are used the "add 1" logic, which consists of a complex gate is fully tested under CFM, receiving every possible combination of its inputs. When the original r-cells/ps-cells are used the "add 1" logic consists of simple OR gates which receive the input combinations that appear in normal mode (00, 01, 10).

4.5. Testing the Extra Hardware

The extra hardware required to modify the multiplier to a C-testable one, should be fully tested under CFM, as well. During the application of test vectors tl to t32


Table 4. Test vectors to complete the testing of the extra hardware.

Test Y Xo X t X2, X4 . . . . X3, X5 . . . . TI

t33 100l(d)* 0 1 (1)* (0)* 0

t34 1001(d)* 1 0 (1)* (0)* 0

t35 1001(d)* 0 1 (0)* (1)* 0

t36 1001(d)* 1 0 (0)* (1)* 0

t37 lOOl(d)* 0 0 (d)* (d)* 0

t38 lO01(d)* 1 1 (d)* (d)* 0

each 2-to- 1 multiplexer receives the 4 of its 8 different input combinations, the ones having T1 = 1. The 4 remaining input combinations of the multiplexers are applied using test vectors t33 to t3s given in Table 4. Additionally, all extra AND and NAND gates are fully tested, receiving their exhaustive test set.

4.6. Fault Propagation

In order to cover all the faults of the adopted model, CFM, we have applied to all the cells of the multiplier all possible input combinations. What is left to prove is that the result of a fault inside the faulty cell can be propagated to the outputs of the multiplier, and thus the fault can be detected.

The following proposition stands and can be easily verified: if only one of the three inputs of a full adder changes its logic value, then at least the sum output of the adder changes its logic value as well.

A. rps-Cells. Since the abstraction of an rps-cell rep- resents one r-cell and Nx + 1 ps-cells, any of these cells can be the faulty one. The following discussion is given for r-cells/ps-cells and is also valid for r t- cells/pst-cells. If the r-cell is the faulty one, then the fault is exercised to all the ps outputs of the Nx + 1 ps-cells. This means that all the full adders of the row that the faulty r-cell belongs to, will receive one of their input inverted. Therefore, using the proposition given above, it is easy to verify that the fault is propagated to a primary output, at least through the rightmost full adder of the row and the corresponding full adder of the ripple chain without any fault masking. In the case where the faulty cell is one of the Nx + 1 ps-cells of the row then the fault is exercised to its single ps output. The fault in this case is propagated to at least one primary output through the column of full adders below the one that receives the faulty ps output and no fault masking can happen again.

B. FulIAdders of the Array. If the single faulty cell is a full adder of the array then its sum or/and carry out lines are inverted from the fault free situation. If at least the sum output is inverted then the fault is propagated to one primary output through the column of adders below the faulty one. If only the carry out line of the faulty adder is inverted then the fault is propagated to one primary output through the column of adders below and to the left of the faulty one.

C. Half and Full Adders of the Ripple Carry Chain. This situation is the easiest one, since a fault inside one the adders that construct the ripple carry chain, is directly observed at its sum line, which is also a primary output, or the sum line of the adder at the left of the faulty one.

D. "Add 1" Logic. The "add 1" logic consists of the OR gates when the original design of the r-cells and ps-cells is used, and the complex gate implementing function Sign. (One + Two), when the/ -ce l ls and ps'- cells are used. Since the output of this logic feeds an input of an adder of the ripple chain, any fault in this logic is detected at the sum output (primary output) of the corresponding adder.

E. Multiplexers. A fault inside the multiplexers for signals E00,1, E10,1 is propagated to a primary output, since the outputs of these multiplexers feed adder inputs. In the case of the multiplexers for signals Xt, Xt-i the test set consisting of vectors t~ to t38, is constructed in such a way that a faulty multiplexer output is propagated to the output of the corresponding ps- cell. Since the output of the ps-cell feeds an adder input, the fault is propagated through an adders chain to a primary output.

F. AND and NAND Gates. The outputs of the extra A N D and N A N D gates are directed to inputs of adders. Therefore, using again the proposition given above any fault in such a gate is propagated to a primary output through a chain of adders.

4.7. Evaluation of the Design

The hardware and delay overhead imposed by our C-testable design, using the information provided by VLSI Technology [23] is calculated. For each standard cell in [23] an equivalent number of 2-input NAND gates is given. The full adder is equivalent to l0 gates,


Y2j+I

i --jm ~ - ~ M 1

xM2 (a)

xP1

xM1

xP2 ~ si'j

Co)

Y2j-1 One

~J+ 1 ~ ~ ) T__wo

Sign

(c)

Sign . ~ , r--,

Xi.1 (d)

Fig. 3. Realizations of (a) r-ceil, (b) ps-cell, (c) r~-cell and (d) ps%el l .

the half adder to 5 gates, the multiplexer to 3 gates and the 2-input AND and OR gates to 2 gates. For the example realizations of the r-cell, ps-cell, rl-cell and ps'-cell of Fig. 3 the gate equivalents are calculated to 13, 12, 8 and 13, respectively.

The total number of gate equivalents for the original design of the multiplier, is: H = 22Nx F-~-l - 5Nx + 22 [-'~7 + 10Ny - 5 (the ripple adder consists of 2 half and Nx + Ny - - 3 full adders).

The extra hardware required in our C-testable design consists of the modification of Nx + r@7 - 3 half adders to full adders, 2Nx + [--~7 - 4 multiplexers, 3 2-input NAND and 4 2-input AND gates. When the original design of the r-cells and ps-cells is used the extra hardware is: EHI = 11Nx + 8 [--~7 - 16. When r'-cells/ps<cells are used the modification of r-cells to r'-cells, the modification of ps-cells to ps'-cells and

�9 , N ? the modlficataon of [5-70P~ gates to the complex gate which detects a subtraction are taken into consideration and the extra hardware is: EH2 = l lNx + 6['-~] + Nx f ~ l - ~6.

The above give for the hardware overhead: HOvl = _ _ E H 2 EHI when r-cells/ps-cells are used, and: HOv2 - H H

! ! when r -cells/ps -cells are used. For a 32 x 32 multiplier HOvl = 3.94%, HOv2 = 8.02%, and for a 64 x 64 multiplier HOvl = 2.05%, HOv2 = 6.35%.

We must also notice that whenever Nx ~ Ny we must choose Nx < Ny because in this case the multiplier is faster since the recoded operand is the larger one, and the hardware overhead to make it C-testable is smaller than the case Nx > Nv. For example the hardware overhead for a C-testable 32 x 64 multiplier is HOvl = 2.49%, HOv2 = 6.55% and for a 64 x 32 multiplier it is HOvi = 3.57%, HOv2 = 7.90%.

Let us now consider the delay overhead imposed by our design. The slowest path in the original design of the multiplier is through: the r-cell of row 1, the ps- cell at position (1, Nx - 4), the half adder at the same position, the [-~l - 2 full adders below the half adder and the leftmost Nx + 1 full adders of the ripple carry chain. It can be verified from Fig. 3 that the propagation delays through a chain of an r-cell and a ps-cell or an F-cell and a ps'-cell are the same. Therefore, the use of F-cells and ps'-cells does not impose extra delay in the slowest path of the multiplier. The modifications required by our design add to this path only the delay from the modification of the half adder at position (1, Nx - 4 ) to a full adder. The multiplexers used for signals Xt, Xt-1 do not add extra delay because they produce their outputs faster than the r-cell outputs that feed the ps-cells. In the original design of the multiplier, the half adder of position (1, Nx - 4) receives a


constant value of 1, and thus in a fully optimized design, can be replaced by an inverter producing the sum output. The mean propagation delay ((rising+falling)~2) of an inverter as given in [23] is 0.6 nsec. In the C- testable design this inverter is replaced by a full adder. The corresponding mean propagation delay in this case is 1.4 nsec. Therefore, for any multiplier size a fixed delay of 0.8 nsec is added to the slowest path. For the case of the 16 x 16 multiplier, implemented in 1 #m using COMPASS Design Automation Framework [23] the original design has a worst case delay of 51.66 nsec. This delay increased by 0.8 nsec due to the C-testable design gives a delay overhead of 1.55%, which is further decreased for larger multipliers.

It has been made clear from the analysis of the C- testable design that full testability with respect to CFM is achieved and that any combinational cell fault is detected since each cell receives all input combinations. Therefore, if only single stuck-at faults are considered which form a subset of the covered set of faults, a 100% single stuck-at fault coverage is guaranteed when any non-redundant realization of the basic cells is used. Since an implementation independent testing strategy is proposed and no specific cell realizations are considered, the proposed test set does not cover sequential faults. But, if a specific realization is considered for the multiplier cells, it is possible, following the method presented in [24], and using only the test vectors given here to construct a test set which detects all CMOS stuck-open faults.

A comparison of the proposed C-testable design with the designs given in [16, 17] is provided in Table 5. The proposed design has the significant characteristic that it is the only which can be used under any design environ- ment (full custom, standard cells, etc.) because no specific realization of the multiplier cells have to be used. Another important advantage of the proposed design is the need of 2 less extra primary inputs (1 compared to 3) which is critical in chips with pin count restric-

Table 5. Comparison of designs.

Criterion [16] [17] Proposed

Implementation independent ? No No Yes 100% Single stuck-at ? Yes Yes Yes 100% CFM ? No No Yes Extra inputs 3 3 1 Test set size 18 50 38 Delay overhead HA ~ FA HA ~ FA HA --+ FA

tions. The test set size in all designs is sufficiently small, but full CFM testability is achieved only in the proposed design. The proposed 38 test vectors form a very efficient test set if one considers that at least 32 test vectors should be used to test only the ps-cells under CFM. With the addition of only 6 test vectors full CFM testability is achieved for the entire multiplier. Finally, the delay overhead imposed in all three designs is small and consists of the transformation of a half adder into a full adder in the slowest path of the multiplier.

5. Booth Multipliers with CLA Adders at the Last Stage

The need to speed up the multiplication process in practical implementations of Booth multipliers, is the reason for the use of carry lookahead (CLA) adders at the last stage of the multiplication. This is the motiva- tion for the C-testable design presented in this section. We assume that 4-bits CLA adders are used in the final product forming adder and that the carry lines of these adders are connected in a ripple manner. The last Nx § 1 bits of the final product are produced by these 4-bits CLA adders shown in Fig. 4, which also depicts the required modifications for the design to be C-testable. We use CLA adders only for the leftmost bits of the product, because the inputs of these CLA adders are available almost simultaneously, and therefore they can speed up the calculation compared to the ripple carry case (see Fig. 4). Two realizations of CLA adders will be considered.

The first of the two CLA adders realizations is the classical one-level carry lookahead adder which can be found for example in [25]. The fault model used in this section is the CFM for all the cells of the multiplier except the CLA adder cells which are tested for all single stuck-at faults.

The second realization of the CLA adder is the two- levels CLA adder, shown in Fig. 5. Each block in Fig. 5 is a simple or complex gate, which can be found in every CMOS standard cells library (the function of each simple or complex gate is given in the next subsection). When this two-levels CLA adder is used each simple or complex gate of Fig. 5 will receive, during testing, every input combination that can appear in normal mode. Full exhaustive testing of the 4-bits CLA adders, in any of the two realizations, would require the application of all 2 9 = 512 different input combinations of their inputs and the size of the overall test set would increase prohibitively.


~,--)

e t

7-_

i

ii -<r__

e ~

,<

e-,

G

e ~

E

•

"4 ~b


Cin

Ca--

Fig. 5. Two-levels 4-bits carry lookahead adder.

so

$1

$2

$3

Cout

To justify the need for using CLA addition in the last stage of the multiplication, we have implemented, in 1 /xm, using COMPASS Design Automa- tion Framework of VLSI Technology, three 16 • 16 bits Booth multipliers, using ripple carry addition and both the realizations of CLA adders at the last stage. The worst case delays of the three implementations were 51.66 nsec, 47.55 nsec and 43.61 nsec. The use of CLA addition gives a 7.95% and I5.6% speed-up over the use of ripple carry addition.

5.1. Test Sets for the CLA Adders

First, we derive the test set for a 4-bits one-level and two-levels CLA adder, under the described fault models.

A. One-Level CLA Adder. The input patterns given in Table 6 were verified to detect every single stuck-at fault in the one level CLA adder [25]. Fault simulation took place using the fsim fault simulator [26]. In- put patterns Vcla~ to Vda~5 of Table 6 have the property Cou t = f i n and therefore can be applied simultaneously to all the CLA adders. Input patterns Velar, to Vcla~9 will be applied to the CLA adders in two steps each, first to the even numbered CLA adders and then to the odd numbered CLA adders (the rightmost CLA adder is CLA adder 0).

B. Two-Levels CLA Adder. The logic functions implemented by the simple or complex gates of the two-levels CLA adder given in Fig. 5 are:

Table 6. Test set for the 4-bits one-level CLA adder.

Test vector Inputs Test vector Inputs

V c l a l 001100110 Vclajl 001011000

Vcla2 010101010 Vclal2 000010110

Vcla~ 010101011 Vcla13 010010110

gcl ~ 110011001 gclal4 010011000

Vcla 5 001000100 Vclai 5 010100110

Vcl~ 011101111 gclal6 001010101

gcl ~ 1010101ll gclal 7 100010101

Vcla8 101011001 Vcla18 101000101

Vcla9 000100110 Vclal 9 I01010001

Vclalo 001010110

Inputs = A 3 B3 A 2 B2 A1B ~ Ao Bo Cin.

First level propagate-generate signals:

Pi=Ai~DBi , G i = A i . B i , i = 0 . . . . . 3

Group propagate-generate signals."

G Pi = P2i" P2i+l ,

G G i ~- G2i �9 P2i+I q- G 2 i + l , i = 0, 1

Group carry signals:

GCo = Cin " G Po + GGo,

GC1 = Gin ' G Po �9 G P1 + GGo �9 G P1 + GG1

Carry signals:

C0 = Cin �9 Po -}- G o , C2 = G C o �9 t92 "}- G2


Table 7. Test set for the 4-bits two-levels CLA adder.

Test vector Inputs Test vector Inputs

V c l a ] 000000000 Vcl~4 101000000

Vcla~ 010101010 Vcla] 5 001010110

Vc~a~ 101010100 Vcla~ 6 110010101

Vcl ~ 111111111 Vcla~ 7 101011001

gcla; 001100110 Vcla] 8 101000101

Vca; 110011001 Vcia] 9 i01010001

Vcl ~ 0111011ll Vcl~o 001010101

Vcla~ 010101011 Vcla~ l 001000101

Vda ~ 001000100 Vda~ 110010100

Vcla] ~ 000010110 Vcla~ 3 110011000

Vcla~ I 001011000 Vcl~4 t01100100

Vcla~ ~ 010011000 Vcl~s 100011001

V c l ~ 3 000010100 Vcl~6 110010001

Inputs= A 3 B3 A 2 B2 A I BI Ao BoCin.

Sum signals:

Si = Pi ~3 Ci-l, i = 0 . . . . . 3 (C-1 = Cin)

When the input patterns of Table 7 are applied to every two-levels CLA adder, each of the modules of this adder (Fig. 5) will receive all the different input combinations that can appear in normal mode. The missing combinations are due to the nature of the propagate and generate logic. In both normal and test mode, signals Pi and Gi of either the first or the second level cannot be simultaneously equal to 1. For the straightforward two level A N D - O R implementations of the functions the missing combinations do not im- pact the testability and all stuck-at faults are detected as was verified using fsim. Additionally, the result of a fault in one complex gate is propagated to the outputs of the CLA adder it belongs. Input patterns Vcla'~ to Vda'~7 will be applied simultaneously to all the two-levels CLA adders, since they have the property Cout = Cin, and input patterns Vda'~ to Vcl~ will be applied in two steps each since they have the property

Co,t r Ci,.

5.2. C-Testable Design

The C-testable design is shown in Fig. 4. Three extra primary inputs, except T1, are required and denoted T2, T3, T4, having the following usefulness. During the normal mode of operation some inputs of full adders

Table 8. Test set for the one-level CLA adders (Part 1).

Test Y X E T Comment

t39 (0)*10 10 l 00 11 00 1000 gclal to all CLAs

a0 00(10)* 10 0 01 01 01 1000 Vcla2 to all CLAs

t41 (0)*0101 01 1 10 10 10 1000 gcla3 to all CLAs

ta,2 (0)*01 00 0 00 00 01 1000 Vc]a4 to all CLAs

t43 (0)* 00 0 10 00 10 1000 Vcla~ to all CLAs

t44 (01)* 00 0 l 1 00 11 i000 VcJa~, tO all CLAs

T = ~ .

of the leftmost diagonal and the top row of the multiplier are constantly set to logic 1. During the application of some of the input patterns VcIa~ to Vcla~ 9

(one-level CLA) or VCWl to Vcl4~ (two-levels CLA) these inputs must be set to logic 0. This is accom- plished using the extra input T2. The other two extra primary inputs T3, T4 are used to invert (through XOR gates) bit Nx + 1 of the partial sums of rows

Nv Nv I - - y ] - 2, I - - y ] - 3, respectively and make the test set size for the multiplier independent of its operands length.

The test vectors given in Tables 8, 9, apply the input patterns of Table 6 to all CLA adders when the one- level CLA adder is used. The test vectors of Table 8 are applied with T1 ---- 1 and, therefore, the abstraction of the rps-cells is valid in this case (see Fig. 4). Ta- ble 8 contains the values for signals Xt, Xr- i , Eo0,~, Era,l, E01,2, El0,1, E10,2, Eli,1 and Ell,2 and is appli- cable for all the values of Nx > 9. The test vectors of Table 9 are applied while T1 -- 0 and, therefore, the rps-cells abstraction is not applied in this case. The test vectors of Table 9 are given for all the different values o f N x ( 4 k - i , i = 0 , 1 , 2 , 3 , k >_ 3). When any is stated in the Nx row it means that the test vector is given for the case Nx = 4k and that for the cases 4k - i (i --- 1, 2, 3) the test vector is the same provided that 1, 2 and 3 bits, respectively are truncated from the most significant (right) end of the X operand.

For the case of two-levels CLA adders, the test vectors given in Tables 10, 11, 12, apply the input patterns of Table 7 to all the CLA adders of the multiplier. The test vectors of Table 10 are applied with 7"1 -- 1 (rps- cell abstraction is valid). The test vectors of Tables 11, 12 are applied while 7"i = 0 (rps-cell abstraction is not valid) and are given for all the different values of Nx. Again, when any is stated in the Nx row it has the same meaning as in the one-level case.

Table 9. Test set for the one-level CLA adders (Part 2).

Test Nx Y X T Comments

t45 any (0)*011010 1(1011)*101 0100 Vcla7 to all CLAs

t46 4k (0)*101011 0(0010)*001 0111 Vcla~ to all CLAs

4k- 1 (0)*1010l 1 0(0010)*00 0111

4k - 2 (0)*101011 0(0010)'0 0111

4 k - 3 (0)*10101i 0(0010)* 0110

t47 4k (0)*010011 0(0110)*011 0100 gcla9 to all CLAs

4 k - 1 (0)*010011 0(0110)*01 0100

4 k - 2 (0)*010011 0(0110)*0 0110

4k - 3 (0)*010011 0(0110)*0111 0111

148 any (0)*0111101l 1(1110)*111 0100 Vclal 0 to all CLAs

t49 any (0)*1001 0(0010)*000 0100 Vclal~ to all CLAs

tso 4k (0)* 1001 0(0100)*010 0100 Vclal 2 to all CLAs

4 k - 1 (0)*1001 0(0100)*01 0100

4k - 2 (0)*1001 0(0100)'0 0110

4k - 3 (0)* 1001 0(0100)* 0100

t51 4k (0)* 101010 0(0011)*001 0100 Vclal 3 tO all CLAs

4 k - 1 (0)*101010 0(0011)*01 0100

4k - -2 (0)*101010 0(0011)*01100 0110

4k - 3 (0)* 101010 0(0011)*0110 0100

t52 4k (0)*010011 0(0011)*001 0100 Vclal 4 to all CLAs

4 k - 1 (0)*010011 0(0011)'00 0100

4 k - 2 (0)*010011 0(0011)'0 0111

4k - 3 (0)*010011 0(0011)* 0100

t~3 4k (0)*10001101 0(1000)* 100 0100 Vclal 5 to all CLAs

4 k - 1 (0)* 10001101 0(1000)'10 0101

4 k - 2 (0)*10001101 0(t000)*1 0110

4k - 3 (0)*10001101 0(1000)* 0111

t54 4k (0)*10100000ll 0(01000000)*010 0100 Velar6 to even CLAs

4 k - 1 (0)*1010000011 0(01100000)*01 0100

4k 2 (0)* 1010000011 0(11000000)* 1 0100

4k - 3 (0)*1010000011 0(01 I00000)* 0100

t55 4k (0)*1000001011 0(01000000)*010 0100 Vclal 6 to odd CLAs

4 k - 1 (0)*1000001011 0(10000000)'10 0100

4k - 2 (0)'011111 0(10000000)* 1 0110

4 k - 3 (0)*0111110100 1(10111111)* 0100

t56 4k (0)*111100 0(11011111)*011 0100 Vcla17 to even CLAs

4 k - 1 (0)*111100 0(11011111)*01 0100

4 k - 2 (0)*111100 0(11011111)*0 0100

4 k - 3 (0)*111100 0(11011I 11)* 0101

t57 ally (0)*111100 l(11111101)*111 0100 gclal 7 to odd CLAs

t58 any (0)*111100 1(10111111)*111 0100 Vcla~ to even CLAs

t~9 any (0)*111100 1(11111011)*111 0100 Vclal~ to odd CLAs

t60 4k (0)*111100 1(01111111)'011 0100 V~la19 to even CLAs

4k - 1 (0)*111100 1(01111111)*01 0100

4 k - 2 (0)*111100 1(0111111l)*0 010l

4 k - 3 (0)*111100 1(01111111)* 0101

t61 any (0)*111100 1(11110111)*111 0100 Vcla~ to odd CLAs

X = X o . . ' X u x - I .


Table 10. Test set for the two-levels CLA adders (Part 1).

Test Y X E T Comment

t~9 (0)* 00 0 00 00 00 1000 Vcla] to all CLAs

t~o 00(10)* 10 0 01 01 01 1000 Vda ~ to all CLAs

t,~ 1 (0)* 10 11 0 00 00 00 1000 Vcla~ to all CLAs

t~2 (10)* II 1 11 11 11 1000 Vcla~ toall CLAs

t,~3 (0)* 10 10 1 00 11 00 1000 Vcla ~ to all CLAs

t~4 (0)*01 00 0 00 00 01 1000 Vda~, to all CLAs

t~5 (01)* 00 0 11 00 11 1000 Vc|a~ to all CLAs

t~6 (0)*0101 01 1 10 10 10 1000 Vda ~ to all CLAs

t~7 (0)* 00 0 10 00 10 1000 Vcla~ to all CLAs

X = X t X t - I = X I X o .

If the length of the X operand (Nx) is not a multiple of 4 then the leftmost CLA adder is a 3-bits, 2-bits CLA adder or a single full adder and it receives the same test set of Tables 6 or 7, with the leftmost 2, 4 or 6 bits, respectively, truncated. In all these cases, it was verified that this adder is tested for the specified fault model, when either CLA adder is employed. The

extra hardware required in the C-testable design of this section is very simple and can be easily verified to be fully tested.

When the one-level CLA adder is employed, test vectors t39 to t61 of Tables 8 and 9, along with the 38 test vectors t~ to t38 given in Tables 2, 3, 4 constitute a C-test set of 61 test vectors for the modified-Booth multiplier. In the case of two-levels CLA adder, test vectors t~9 to t43 of Tables 10-12, along with tl to t38 form a C-test set of 73 test vectors for the multiplier. In both cases test vectors t~ to t38 are applied to the multiplier while T2 = T3 = T4 = 0.

If the faulty cell is one of the cells that are common in both designs then the results of the discussion on the propagation of a fault to a primary output for the first C-testable design are also valid in the second proposed C-testable design. If the faulty cell is one of the CLA adders and the fault results in at least one faulty sum output, then the fault is detected, since its sum outputs are primary outputs of the multiplier. If the fault results only in a faulty carry out output then the fault is propagated to the least significant sum output of the next CLA adder and it is detected.

Table 11. Test set for the two-levels CLA adders (Part 2).


t~8 4k (0)* 1001 0(0100)*010 0100 Vcia,lo to all CLAs

4k - 1 (0)* 1001 0(0100)*01 0100

4k - 2 (0)* 1001 0(0100)*0 0110

4k - 3 (0)* 1001 0(0100)* 0100

t~9 any (0)* 1001 0(0010)*000 0100 Vc[a,ii to all CLAs

t~0 4k (0)*010011 0(0011)*001 0100 Vcla, 2 to all CLAs

4k - 1 (0)*010011 0(001 i)*00 0100

4k - 2 (0)*010011 0(0011)*0 0111

4k - 3 (0)*010011 0(0011)* 0100

t~l any (0)* 10 0(I 100)* 110 0100 Velar13 to all CLAs

t}2 any (0)* 10 0(001 i)*001 0100 Vcra,4 to all CLAs

t~ 3 any (0)*01111011 i(1110)*1li 0100 Vcxa,15 to all CLAs

t~ 4 4k (0)*000011 0(1000)* 100 0100 Vda, m tO all CLAs

4 k - 1 (0)*000011 0(1000)* 10 0100

4k - 2 (0)*000011 0(1000)* 1 0111

4k - 3 (0)*000011 0(1000)* 0111

t~5 4k (0)*000011 0(0010)*001 0 1 0 1 Vclati7 tO all CLAs

4k - I (0)*0000l I 0(0010)*00 01 l 1

4k - 2 (0)*000011 0(0010)*0 0101

4k - 3 (0)*000011 0(0010)* 0110

X = X o . �9 XN:~ - r.

Table 12. Test set for the two-levels CLA adders (Part 3),


t~6 any (0)*111100 l(10111111)*lll 0100 Vcla, t oevenCLAs

t~7 any (0)~ II1100 1(11111011)'111 0100 Vcla,l~ to odd CLAs

t~ 8 4k (0)*1 l l l 0 0 l(01111111)*0ll 0100 Vcla,i 9 to even CLAs

4 k - 1 (0)*111100 l(01111111)*01 0100

4 k - 2 (0)*111100 1(01111111)*0 0101

4 k - 3 (0)*111100 1(01111111)* 0101

t~9 any (0)*111 i00 1(11110111)*1II 0100 Vcta,t 9 to odd CLAs

t~( I 4k (0)*1010000011 0(01000000)*010 0100 Vcla~ ~ to even CLAs

4 k - l (0)*1010000011 0(01100000)'01 0100

4k -- 2 (0)* 1010000011 0(l 1000000)* 1 0100

4k - 3 (0)* 1010000011 0(0 1100000)* 0100

t}, k 4k (0)* 1000001011 0(01000000)*010 0100 Vcla~ ~ to odd CLAs

4 k - 1 (0)*1000001011 0(10000000)'10 0100

4k - 2 (0)*011111 0(10000000)* 1 0 110

4 k - 3 (0)*0111110100 l(10111111)* 0100

t~2 4k (0)*10000011 0(01000000)'010 0100 Vcla~ 1 to even CLAs

4 k - 1 (0)*01111110 l(01111111)*10 0100

4k - 2 (0)*01111110 1(01111111)'0 0100

4k - 3 (0)*10000011 0(0I 100000)* 0100

t~3 4k (0)* 1011 O(OlOlOlO0)*OlO 01 I0 Vda~t to odd CLAs

4 k - 1 (0)*10l l 0(01010100)*01 0110

4 k - 2 (0)*1011 0(01010100)*0 0110

4k - 3 (0)*1011 0(01010100)* 0111

t~4 any (0)*111100 l ( l l l l l l 0 1 ) * l l l 0100 Velar2 to even CLAs

t~5 4k (0)* 1011 0(10000000)* 100 0100 Vcta,22 to odd CLAs

4k - 1 (0)*1011 0(10000000)'10 0100

4 k - 2 (0)*100011 0(10000000)'1 0110

4 k - 3 (0)*100011 0(10000000)* 0110

t~6 any (0)*101011 l(00001000)*000 0100 gcla~ 3 to even CLAs

t~7 4k (0)* 101011 0(10000000) * 100 0100 Vcla~ 3 to odd CLAs

4k - 1 (0)* 101011 0(10000000)~ 10 0100

4 k - 2 (0)*101011 0(10000000)* 1 0111

4k -- 3 (0)*1010I 1 0(1000000(3)* 0111

F 68 any (0)*111100 0(11111011)'111 0100 V~la~ 4 to even CLAs

1/69 4k (0)* 1100 0(10000000)* 100 0100 Vcla~ 4 to odd CLAs

4 k - l (0)* 1100 0(10000000)* 10 0100

4k - 2 (0)* 1100 0( 10000000)* 1 0110

4k - 3 (0) '1100 0(10000000)* 0110

t~o 4k (0)* 1011 0(00100000) * 100 0100 Vcla; s to even CLAs

4 k - 1 (0)'1011 0(00100000)*00 011I

4 k - - 2 (0)'1011 0(00100000)'1 0110

4k -- 3 (0)* 10I 1 0(00100000)* 0i 10

t)i any (0)*110100 0(11111101)*111 0100 Vcla~ s to odd CLAs

rs 4k (0)* 1000101 l 0(10000000)* 100 0100 Vc]a~ 6 tO even CLAs

4k- - I (0)*1000101 l 0(I0000000)~ 10 0100

4k -- 2 (0)*10001011 0(10000000)*0 0110

4k - 3 (0)* 10001011 0(10000000)* 010I

r}3 any (0)* 1100 0 ( I l l 10Il l )*010 0110 Velar(, to odd CLAs

S ~ X 0 �9 �9 " X N x - 1 ,


5.3. Evaluation o f the Design

If only single stuck-at faults are considered, which form only a subset of the covered set of faults, 100% coverage is obtained when either one level or two level CLA adders are used.

The one-level carry lookahead adder given in [25] is equal to 77 gate equivalents and the two-levels carry lookahead adder given in Fig. 5 is equal to 71 gate equivalents. The number of CLA adders required for an Nx x Ny multiplier is [--~]. The total number of gate equivalents for the original design of the multiplier which uses CLA addition is (Q denotes the number of gate equivalents of the 4-bits CLA adder):

H = 22Nx F-~] - 1 5 N x + l O N y § - 5 . (The final adder consists, from right to left, of 2 half adders, Ny - 3 full adders and F~ -~ ] 4-bits CLA adders).

The extra hardware in the C-testable design is equal to: EH1 = 11Nx + 8F-~] - 2 gate equivalents when r -ce l l s~u-ce l l s are used, and EH2 = l l N x +

6F-~] + Nx V~-~] - 2 when r '-cells/ps '-cells are used. EH, when r-cells/ The above formulas give: HOvl ---- -y -

ps-cel ls are used, and HOv2 = ~ when r '-cel ls /ps z-

cells are used. For a 32 • 32 multiplier and when one- level CLA adders are used (Q = 77), HOvl = 3.96%, HOv2 = 7.94%. When two-levels CLA adders are used (Q = 71), HOvl = 3.98%, H O v 2 = 7.97%. For a 64 x 64 multiplier and when one-level CLA adders are used, HOvl = 2.05%, HOv2 = 6.30%. When two-levels CLA adders are used, HOvl = 2.06%, HOv2 = 6.32%. The hardware overhead decrease also in this case when the size of the two multiplier operands increases.

Again, whenever Nx r Ny we must choose Nx < Ny

since it is faster and the hardware overhead is smaller than the case N~ > Ny. For example, for a 32 x 64 multiplier and when one-level CLA adder is used, HOvl = 2.46%, HOv2 = 6.36%. For the 64 x 32 multiplier, HOvl = 3.63%, HOv2 = 7.97%.

The results of the discussion concerning the delay overhead for the C-testable design with ripple carry adder are valid for the C-testable design with CLA adder also. The delay overhead imposed by the design is negligible.

6. Conclusion

Various C-testable Booth multipliers have already been proposed, which are restricted in specific silicon

compiler based design [ 16, 17]. In this paper, we have, for first time, presented C-testable Booth multipliers for standard cell based design environments where the specific design of the cells is unknown. Thus, Cell Fault Model has been adopted. Specifically, two C-testable designs for the modified-Booth multiplier with 2-bit recoding have been presented. The first assumes the use of ripple carry adders at the last stage of the multipli-

cation. The design requires the addition of 1 extra primary input and is C-testable with only 38 test vectors. The second C-testable design assumes that CLA adders are used at the last stage of the multiplication, which is the case in most practical implementations of modified- Booth multipliers. This second C-testable design requires the addition of 4 extra primary inputs. Two popular realizations of the CLA adder are explored, namely the one-level and the two-levels CLA adders. The C-test set size is 61 and 73 test vectors respectively for the one-level and the two-levels CLA adder realizations. In both the C-testable designs, the hardware and delay overheads are very small and decrease when the sizes of the multiplier operands increase.

References

1. W.H. Kautz, "Testing for Faults in Cellular Logic Arrays," Proc. 8th Annu. Syrup. Switching and Automata Theory, 1967, pp. 161-174.

2. A.D. Friedman, "Easily Testable lterative Systems," lEEE Trans. on Computers, Vol. C-22, pp. 1061-1064, Dec. 1973.

3. R. Parthasarathy and S.M Reddy, "A TestabIe Design of lterative Logic Arrays," IEEE Trans. on Computers, Vol. C-30, pp. 833- 841, Nov. 1981.

4. S.C. Seth, "Fault Diagnosis of Combinational Cellular Arrays," Proc. 7th Annu. Confi Circuits Syst. Theory, Oct. 1969, pp. 272- 283.

5. ER. Menon and AD. Friedman, "Fault Detection in Iterative Logic Arrays," IEEE Trans. on Computers, Vol. C-20, No. 5, pp. 524-535, May 1971.

6. C.H. Sung and C.L. Coates, "Tessellation Aspects of Combina- tional Cellular Array Testing," IEEE Trans. on Computers, Vol. C-23, No. 4, pp. 363-368, Apr. 1974.

7. H. Elhuni, A. Vergis, and L. Kinney, "C-Testability of Two Dimensional lterative Arrays," lEEE Trans. on CAD, Vol. CAD-5, No. 4, pp. 573-581, Oct. 1986.

8. A. Chatterjee and J. Abraham, "Test Generation for Iterative Logic Arrays Based on an N-Cube of Cell States Model," 1EEE Trans. on Computers, Vol. C-40, No. 10, pp. 1133-1148, Oct. 1991.

9. J.E Shen and EJ. Ferguson, "The Design of Easily Testable VLSI Array Multipliers," lEEETrans, on Computers, Vol. C-33, No. 6, pp. 554-560, June 1984.

10. A. Chatterjee and J,A. Abraham, "Test Generation for Arith- metic Units by Graph Labeling," in Ppvc. FTCS 17, July 1987, pp. 284-289.


11. S.J. Hong, "An Easily Testable Parallel Multiplier," in Proc. FTCS 18, June 1988, pp. 214-219.

12. A.R. Takach and N.K. Jha, "Easily Testable Gate Level and DCVS Multipliers," IEEE Trans. on CAD, Vol. 10, No. 7, pp. 932-942, July 1991.

13. A.D. Booth, "A Signed Binary Multiplication Technique," A. ,L Mech. Appl. Math, 4, pp. 260-264, Apr. 195l.

i4. L.P. Rubinfield, "A Proof of the Modified Booth Algorithm for Multiplication," IEEE Trans. on Computers, Vol. C-24, pp. 1014-1015, Oct. 1975.

15. M. Annaratone, Digital CMOS Circuit Design, Kluwer Aca demic Publishers, Boston, 1986.

16. R. Stans, "The Testability of a Modified Booth Multiplier," in Proc. (~/'lst European Test Con./'., Apr. 1989, pp. 286-293.

17. J. van Sas, C. Nowe, D. Pollet, E Catthoor, R Vanoostende, and H. De Man, "Design of a C-testable Booth Multiplier Using a Realistic Fault Model," Journal ~/" Electronic Testing: Theory and Applications, Vol. 5, No. 1, pp. 29-4 1, Feb. 1994.

[8. W.A.J. Waller and S.M. Aziz, "A C-testable Parallel Multiplier Using Differential Cascode Voltage Switch (DCVS) Logic," IFIP Trans. A, VoI. A-42, pp. 133-142, 1994.

19. J. Huisken, H. Janseen, E Lippens, O. McArdle, R. Segers, R Zegers, A. Delaruelle, and J. van Meerberger, "Design of DSP Systems Using the Piramid Library and Design Tools," MCNC Logic Synthesis Workshop, 1988.

20. H, De Man, F. Catthoor, G. Goossens, J. Vanhoof, J. van Meerbergen, and J. Huisken, "Architecture Driven Synthesis Techniques for Mapping Digital Signal Processing Algorithms into Silicon," Proceedings :~t'IEEE, Vol. 78, Feb. 1990, pp. 319- 336.

21. D. Gizopoutos, D. Nikolos, A. Paschalis, and R Kostarakis, "'C- testable Multipliers Based on the Modified Booth Algorithm," in Proc. 3rdAsian TestSymp., Nov. 1994, pp. 163-168.

22. X. Huang, B. Wei, H. Chen, and Y. Mao, "High-Performance VLSI Multiplier with a New Redundant Binary Coding," Journal (~/'VLSlSignal Processing, Vol. 3, pp. 283-291,199l.

23. VLSI Technology, Inc., "1.0 Micron CMOS VSC370 Portable Library," Rev. 2, Apr. 1991.

24. D. Gizopoulos, D. Nikolos, and A. Paschalis, "Testing Combi- national lterative Logic Arrays for Realistic Faults," in Proc. of 13th VLSI Test Symp., May 1995, pp. 35-40.

25. M.M. Mano, Digital Design, Prentice-Hall Inc., 1984. 26. H.K. Lee and D.S. Ha, "An Efficient Forward Fault Simulation

Based on the Parallel Pattern Single Fault Propagation," in Proc. c~/lnternational Test ConiC, Oct. 1991, pp. 946-955.

Dimitris Gizopoulos was born in Serres, Greece, in 1969. He grad- uated from the Department of Computer Engineering and lnformatics, University of Patras, Greece, in November 1992. He is currently completing his work towards the Ph.D. degree in the Department of lnformatics, University of Athens under research assistantship fiom the Institute of Informatics and Telecommunications, NCSR "Demokritos". He has been working at the Institute of Informatics

and Telecommunications of NCSR "Demokritos" since 1992, as a researcher in R. & D. projects in the area of VLSI Design and Testing and is co-author of more than 10 technical publications. His research interests include VLSI design and testing, computer architecture and fault-tolerant computing.

Dimitris Nikolos was born in Arta, Greece. He received the B.Sc. degree in Physics in t979, the M.Sc. degree in Electronics in 1981 and the Ph.D. degree in Computer Science in 1985, all fiom the University of Athens, Greece. During the period November 198 l - June 1985 he was working in the Computer Department of NCSR "Demokritos," under scholarship from the Hellenic Atomic Energy Commission. Since June 1985 he has cooperated with the Com- puters Laboratory of the University of Athens and the Institute of lnformatics and Telecommunications of NCSR "Demokritos". In April 1989, he joined the Department of Computer Engineering and lnformatics, University of Patras, as a Lecturer, where he is now Associate Professor. His main research interests are fault-toIerant computing, computer architecture, VLSI circuit design, test and design for testability,

Antonis Paschalis was born in Athens, Greece, in 1960. He received the B.Sc. degree in physics in 1983, the M.Sc. and the Ph.D. degrees in computer science in 1986 and 1987, respectively, all from the Uni- versity of Athens, Greece. From 1983 to 1987 he was working on his Ph.D. degree in the Institute of Informatics and Telecommunications of NCSR "Demokritos" under research assistantship. In 1988 he joint the research personnel of the Institute of Telecommunications and Informatics of NCSR "Demokritos" where he is the head of the VLSI design and testing group and the project leader of various R. & D. projects in this area. He is author or co-author of more than 40 publications in the area of Testing (Design for Testability, Built-In Self Test, Automatic Test Generation, Self-Checking and Confor- mance Testing). Dr Paschalis is member of the Editorial Board of the Journal of Electronic Testing: Theory and Applications and has served the test community participating in several Program Com- mittees and in the European Test Technology Technical Committee Support Group.

Constantin Halatsis has been a Full Professor at the Department of Informatics of the University of Athens since I987. He was also the Director of the Institute of lnformatics and Telecommunications of NCSR Demokritos from 1988 to 1993. His research interests include Computer Architecture, Artificial Intelligence, Logic Pro- gramming, Constaints-based Programming, Computer Networks, Multimedia/Hypermedia Systems and Parallel Computing. He has been the UoA project leader in various projects funded by the Euro- pean Union (EDS, APPLAUSE, PARACHUTE). He has been invited speaker and member of organizing committees of various international conferences, referee in scientific journals and conferences and evaluatoffreviewer of various EU funded projects. He is anthor/co- author of more than 90 technical publications in refereed scientific journals and international conferences proceedings.

Documents

C-Testable modified-Booth multipliers