12
Concurrent testing in high level synthesis A.A. Ismaeel * , R. Bhatnagar, R. Mathew Department of Electrical and Computer Engineering, College of Engineering and Petroleum, Kuwait University, P.O. Box 5969, Safat 13060, Kuwait Received 8 February 2000; received in revised form 21 March 2000 Abstract A new methodology to incorporate concurrent testing in high level synthesis is presented. Optimization techniques in VLSI designs tend to reduce the idle time of resources in which case the proposed methodology is found extremely useful. The objective is to test each functional unit (FU) of a circuit under test (CUT) at least once in a time frame called pass. Testing is performed continuously by repeating the pass. We carry out the testing by shifting out selective variables of the CUT to an external-testing unit for verification. An additional pin is employed to shift out the variables. Testing time is reduced by minimizing the number of variables needed to be shifted out, which is achieved by an FU allocation technique. The FU allocation utilizes a given scheduled data flow graph as input. Proposed testing meth- odology, and FU allocation technique, are presented. Results of implementation of the technique, on dierent benchmark examples, are presented. Ó 2000 Elsevier Science Ltd. All rights reserved. 1. Introduction High level synthesis compiles the behavioral de- scription of a digital system into a data path. The be- havioral description explains the sequence in which a set of operations is executed. The data path is composed of resources like functional unit (FU), registers and mul- tiplexers. Recent developments in the VLSI circuits have significantly increased the logic density on the chips. One of the important aspects aected by these complex de- signs is the testability. Several designs for testability (DFT) techniques have been devised. These techniques can be o-line or on-line. In the o-line testing, normal operation of the circuit under test (CUT) is interrupted for testing. Hence, it is a serious dilemma for the systems that run continuously as in production control systems, air trac control sys- tems, communication control systems, etc. On the other hand, concurrent testable circuits can be tested without aecting the normal operation of the CUT. Concurrent testing can give a faster alarm upon developing any fault in the system. It envisages the performance degradation of a system, thus, improving the quality and reliability [1]. Concurrent testing not only detects permanent faults but also detects transient and intermittent faults [2,3]. It yields reduced system maintenance requirements and enhanced diagnostic capabilities. Synthesis of the concurrent testable circuits is gaining importance as a diversified field of high level synthesis. Two major constraints for testing are time requirement to detect a fault, and area overhead. Concurrent testing can reduce the time requirement. However, it depends on the chip complexity and extra circuitry added for testing. The area overhead is caused by the on-chip circuitry to provide test patterns and analyze output response. This area overhead also adds to the com- plexity of the system. Our goal is to minimize the de- tection time of the faults and have a minimum impact on the area. The behavioral description is written in VHDL or any other procedural language. The description explains the sequence in which a particular operation is executed. These sequences form control steps. The cost of a system mainly depends on the hardware resources used, such as adder, multiplier, logical operators, etc. Sharing the Microelectronics Reliability 40 (2000) 2095–2106 www.elsevier.com/locate/microrel * Corresponding author. E-mail address: [email protected] (A.A. Ismaeel). 0026-2714/00/$ - see front matter Ó 2000 Elsevier Science Ltd. All rights reserved. PII:S0026-2714(00)00028-7

Concurrent testing in high level synthesis

Embed Size (px)

Citation preview

Page 1: Concurrent testing in high level synthesis

Concurrent testing in high level synthesisA.A. Ismaeel *, R. Bhatnagar, R. Mathew

Department of Electrical and Computer Engineering, College of Engineering and Petroleum, Kuwait University, P.O. Box 5969, Safat

13060, Kuwait

Received 8 February 2000; received in revised form 21 March 2000

Abstract

A new methodology to incorporate concurrent testing in high level synthesis is presented. Optimization techniques in

VLSI designs tend to reduce the idle time of resources in which case the proposed methodology is found extremely

useful. The objective is to test each functional unit (FU) of a circuit under test (CUT) at least once in a time frame called

pass. Testing is performed continuously by repeating the pass. We carry out the testing by shifting out selective

variables of the CUT to an external-testing unit for veri®cation. An additional pin is employed to shift out the variables.

Testing time is reduced by minimizing the number of variables needed to be shifted out, which is achieved by an FU

allocation technique. The FU allocation utilizes a given scheduled data ¯ow graph as input. Proposed testing meth-

odology, and FU allocation technique, are presented. Results of implementation of the technique, on di�erent

benchmark examples, are presented. Ó 2000 Elsevier Science Ltd. All rights reserved.

1. Introduction

High level synthesis compiles the behavioral de-

scription of a digital system into a data path. The be-

havioral description explains the sequence in which a set

of operations is executed. The data path is composed of

resources like functional unit (FU), registers and mul-

tiplexers. Recent developments in the VLSI circuits have

signi®cantly increased the logic density on the chips. One

of the important aspects a�ected by these complex de-

signs is the testability.

Several designs for testability (DFT) techniques have

been devised. These techniques can be o�-line or on-line.

In the o�-line testing, normal operation of the circuit

under test (CUT) is interrupted for testing. Hence, it is a

serious dilemma for the systems that run continuously as

in production control systems, air tra�c control sys-

tems, communication control systems, etc. On the other

hand, concurrent testable circuits can be tested without

a�ecting the normal operation of the CUT. Concurrent

testing can give a faster alarm upon developing any fault

in the system. It envisages the performance degradation

of a system, thus, improving the quality and reliability

[1]. Concurrent testing not only detects permanent faults

but also detects transient and intermittent faults [2,3]. It

yields reduced system maintenance requirements and

enhanced diagnostic capabilities.

Synthesis of the concurrent testable circuits is gaining

importance as a diversi®ed ®eld of high level synthesis.

Two major constraints for testing are time requirement

to detect a fault, and area overhead. Concurrent testing

can reduce the time requirement. However, it depends

on the chip complexity and extra circuitry added for

testing. The area overhead is caused by the on-chip

circuitry to provide test patterns and analyze output

response. This area overhead also adds to the com-

plexity of the system. Our goal is to minimize the de-

tection time of the faults and have a minimum impact on

the area.

The behavioral description is written in VHDL or

any other procedural language. The description explains

the sequence in which a particular operation is executed.

These sequences form control steps. The cost of a system

mainly depends on the hardware resources used, such

as adder, multiplier, logical operators, etc. Sharing the

Microelectronics Reliability 40 (2000) 2095±2106

www.elsevier.com/locate/microrel

* Corresponding author.

E-mail address: [email protected] (A.A. Ismaeel).

0026-2714/00/$ - see front matter Ó 2000 Elsevier Science Ltd. All rights reserved.

PII: S0 02 6 -2 71 4 (00 )0 0 02 8 -7

Page 2: Concurrent testing in high level synthesis

resources in their idle time can reduce the system cost.

Scheduling algorithms are available that utilize the idle

time to share the resources by scheduling an operation in

feasible control steps. The schedule obtained is called a

scheduled data ¯ow graph (SDFG). A data path can be

generated from an SDFG by binding operations and

variables to FUs and registers, respectively. These are

achieved by FU allocation and register allocation. The

FUs and registers are interconnected through multi-

plexers in the data path. This interconnection is achieved

by a multiplexer allocation.

Recently, researchers have worked on concurrent

testing concept in high level synthesis. Singh et al.,

Flottes et al., Harris et al. and Swaminathan et al. have

presented approaches that utilize idle time [4±7]. How-

ever, the area minimization in VLSI designs demands

minimization of resources and, hence, an even distri-

bution of the operations of same type over the possible

control steps. This situation leads to a reduced idle time

(i.e., a tight schedule) and hence, the above-mentioned

approaches loose their applicability.

Saluja et al. addressed a concurrent comparative

built-in self-test approach for testing combinational

circuits [2,3]. In this approach, the concurrent testing

was achieved by modifying the o�-line test resources and

observing the normal inputs and outputs of the CUT.

Sharma et al. and Sun et al. have proposed conventional

duplication of the CUT, and compared the response of

CUT with its duplicate [8,9]. However, all these ap-

proaches have been applied on the structural description

(post-synthesis). The post-synthesis results in more

hardware test resources and area overhead.

The approaches discussed above have the disadvan-

tage of either having a poor applicability for a tight

schedule or being post-synthesis. We also presented an

FU allocation approach earlier, where we assumed that

the existence of idle time is feasible [10]. In this paper, we

propose a method that overcomes the above-mentioned

disadvantages. The present approach is more generic

and does not count on the existence of the idle time.

Also, it is applied on a given SDFG (pre-synthesis).

The operations are performed sequentially according

to their occurrences in the control steps (®rst through

the last) in an SDFG. We call the duration between the

®rst and last control step a cycle. We assume that there is

no functional pipelining. Thus, each cycle begins with a

fresh set of inputs in the ®rst control step. The cycle

completes when all the operations in the last control step

are executed. Then the next cycle begins.

The testing is carried out by capturing the selective

input and output variables of FUs of the CUT. We

employ special registers called concurrent testing regis-

ters (CTRs) to capture and hold these inputs and out-

puts. These registers do not a�ect the normal operation

of the circuit. It may be noted that these registers are

di�erent from the normal registers that are used to hold

data for the normal operation of the circuit. The con-

tents of CTRs (i.e., inputs and outputs) are shifted out

sequentially to an external-testing unit. The shifting is

serial through an additional shift pin. The veri®cation is

done in the testing unit and an error is reported if any

discrepancy is found.

We test the FUs in every pass. The pass is repeated

inde®nitely, and is de®ned as follows:

pass � max (NC, Ntest),

where NC is the total number of control steps in the

SDFG, and Ntest, is the number of control steps needed

to test each FU exactly once.

The pass is repeated at the beginning of a cycle that

starts after the previous pass. Note that the duration of a

cycle is NC. The following two possibilities exist:

(a) NC P Ntest : Pass is equal to NC, i.e., one cycle

long. A new pass starts at the beginning of each cy-

cle. Each FU may be tested more than once in a

pass. Our objective is to maximize the number of

times each FU can be tested.

(b) NC < Ntest : Pass is equal to Ntest and is more

than one cycle long. A new pass starts at the begin-

ning of a cycle that follows the previous pass. Each

FU is tested exactly once in a pass. Our objective is

to reduce the pass time.

In both the possibilities, the desired objectives are

achieved by minimizing the number of variables needed

to be shifted out to test the FUs. E�ectively, it results

into minimization of the testing time for each FU.

Fig. 1 illustrates how a data path can be synthesized

by using the proposed methodology. The methodology

is shown in two phases: FU allocation and shift se-

quence generation. The behavioral description of a

circuit in terms of an SDFG is assumed to be provided.

We apply an FU allocation technique to bind opera-

tions to FUs in such a way that the number of vari-

ables needed to be shifted out are minimized. The shift

sequence generation, that follows FU allocation, selects

a minimum number of variables, which are captured to

test FUs at least once. Also, we obtain the order and

instants at which the variables are shifted out. This is

needed for synchronizing the external testing unit with

the CUT. We then follow a register allocation tech-

nique to bind the variables to registers, and an inter-

connect allocation scheme to optimize the number of

multiplexer inputs. These multiplexers are required at

the inputs of each FU. Any standard available algo-

rithm can be used for register allocation and inter-

connect allocation. These algorithms were presented in

Ref. [11]. Finally, we can obtain the concurrent test-

able data path.

Section 2 describes the di�erent strategies used for

concurrent testing. Section 3 describes the proposed

concurrent testing methodology. Sections 4 and 5 pre-

sent FU allocation and shift sequence generation. In

2096 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106

Page 3: Concurrent testing in high level synthesis

Section 6, we show experimental results, and in Section 7

the conclusions.

2. Strategies for concurrent testing

Di�erent strategies for concurrent testing are illus-

trated in Fig. 2. The TPGR represents test pattern

generator register, CTC represents concurrent test cir-

cuit, MISR represents multiple input signature register,

PLA represents programmable logic array, and ILA

represents iterative logic array. The strategies are brie¯y

discussed below.

Fig. 2(a) illustrates the idle time concept [12]. When

the CUT is idle, we apply test patterns and observe the

output response. Fig. 2(b) illustrates a full duplication of

the hardware resources that are under test [8]. The

output response is collected and compared. In Fig. 2(c),

CUT duplication is eliminated and an equality com-

parator is introduced. When the normal input vector

and the test vector (from TPGR) match, the equality

comparator signals MISR to collect the output response

[3]. In this approach, the test resources are considerably

reduced but the veri®cation is o�-line. Fig. 2(d) is a

modi®cation of Fig. 2(b), where a PLA is utilized instead

of the golden CUT [8]. This is also a full duplication and

requires more hardware test resources. In Fig. 2(e), the

CUT is partitioned into di�erent identical cells called

ILA. The inputs and outputs of each cell are mutually

compared for the veri®cation [13]. This approach is

more useful at structural level.

3. Concurrent testing methodology

We have made the following assumptions throughout

this paper. There is no functional pipelining in the CUT.

Multi-type operations are not bound to the same FU.

Only one external pin is available to shift out the vari-

ables from the CUT. Registers, multiplexers and paths

are fault-free. All registers are of the same size and re-

quire the same shift time. Shifting time is given in terms

of number of control steps. The execution time of each

operation is one control step.

Our main objective is to test the circuits concurrently,

without a�ecting the normal operation of the CUT. The

testing is carried out by capturing selective inputs and

Fig. 1. Outline of concurrent testable data path synthesis sys-

tem.

Fig. 2. Di�erent strategies for the concurrent testing.

A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2097

Page 4: Concurrent testing in high level synthesis

output variables of the FUs at instants of their genera-

tion in the pass. The captured variables of the CUT are

shifted serially to an external-testing unit, where veri®-

cation is made. We employ serial shifting of variables of

the CUT in order to reduce chip pin count overhead.

Fig. 3 shows the outline of our proposed testing

methodology. The input and output variables of the FUs

are strobed with a signal STROBE and captured in the

CTRs. The CTRs are arranged in a memory form. The

number of CTRs needed, n, is equal to the number of

variables needed to be shifted out in a pass. An algorithm

presented in Section 5 obtains this count. This algorithm

also obtains the order in which variables are shifted out.

The variables are strobed in the CTRs sequentially in the

same order in which they are shifted out. A MOD n

counter, clocked by STROBE, generates the sequential

addresses for strobing variables into CTRs.

The contents of CTRs are shifted out sequentially (1

through n) to the testing unit. Another MOD n counter

is utilized to address the CTRs sequentially for shifting.

The contents of an addressed CTR are fed to a shift

register when a signal SHIFT becomes active. The shift

register shifts out the contents of the CTR on a pin

SHIFT OUT. To synchronize the activity of counter

with the shift register, the signal SHIFT is used as a

clock to the counter.

Note that the two address sources, generated by two

MOD n counters, must be fed to the CTRs through

multiplexers. For the sake of simplicity, these details are

not shown in the ®gure. A RESET signal resets the

counters to address the ®rst CTR at the beginning.

4. FU allocation

The FU allocation binds operations to FUs. An al-

gorithm for the FU allocation is presented below that

requires an SDFG as the input. The algorithm binds the

operations to FUs in such a way that by shifting out a

minimum number of variables all the FUs can be tested.

This is possible if there are some variables shared by

di�erent FUs, so that they have to be shifted out just

once. Thus, we bind the candidate operations (the op-

erations under consideration for binding in a control

step) sharing the same variables to di�erent FUs.

In the algorithm, a count matrix is generated. The

rows represent the candidate operations of a particular

type in a control step and the columns represent the FUs

of that type. The elements of the matrix represent the

maximum number of common variables between the

respective operation and the FU. As mentioned above,

an operation that has more number of common vari-

ables with an FU should not be bound to that FU.

Hence, we ®nd the highest element in the matrix and

bind the corresponding operation to an FU that has the

least number of variables common with that operation.

However, if more than one row (and hence operation)

corresponds to the highest element, preference is given

to the one that has the lowest element in those rows.

This way we maximize the number of common variables

between various FUs.

Procedure FU_allocation( )

Input: SDFG;

Output: FU allocation;

/*

Let t be a type of operation, Ntype be the total num-

ber of types of operations, Nfu[t] be the number of

FUS of type t, List[t] be the candidate operations of

type t in a control step, NList[t] be the number of

candidate operations in a control step, row_max

be the maximum element of a row in a matrix,

min_count be the count of a speci®ed element in

a row.

*/

Begin

Ntype � count of operation types in the SDFG;

Nfu�t� � maximum number of concurrent opera-

tions in a control step;

for (all control steps) do

for (all operation types) do

Generate List[t];

NList�t� � count of operations in List[t];

if (®rst control step)

Bind all operations in the List[t] to NList[t]

number of FUs;

else

Generate the count matrix;

Fig. 3. Proposed concurrent testing methodology.

2098 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106

Page 5: Concurrent testing in high level synthesis

while (there is a candidate operation in List[t])

Compute row_max for each row in the

count matrix;

Select the row(s) that has highest row_max;

if (more than one row is selected) /* Deselect

all but one row */

Find the lowest element in all the selected

rows;

Deselect the rows that do not have the low-

est element;

Compute min_count of the lowest element

in all selected rows;

Keep only one row selected that has

the minimum min_count, and deselect all

other selected rows;

end if

Select a column having a lowest value in the

selected row;

Bind the operation represented by the select-

ed row to the FU represented by the selected

column;

Delete the selected row and column from the

count matrix;

Delete the candidate operation from List[t]);

end while

end if

end for

end for

Return the operations binding to FUs;

End FU_allocation

4.1. Example 1

Fig. 4 shows a small SDFG example that has been

used to illustrate most of the aspects of the algorithms

presented in this paper. The SDFG has six control steps

and single type of operation (�). Nfu[t], given by the

maximum number of concurrent operations in a control

step, is two. In the ®rst control step, the candidate op-

erations are �1 and �2. They are bound to FU1(�) and

FU2(�), respectively. The variables bound to FU1(�) are

a, b and v1. The variables bound to FU2(�) are c, d and

v2. In the second control step, the candidate operations

are �3 and �4. The count matrix would be as shown in

Fig. 5.

Since the maximum for each row is 1, we temporarily

select both rows. The lowest element in these rows is 0.

The count of 0 in each row is represented by min_count

and is the same for both rows. Hence, we can select any

row. Let us select the row representing �4. The column

representing FU1(�) will be selected as it has the lowest

value in that row. Thus, we bind �4 to FU1(�), and

similarly, �3 to FU2(�). Proceeding in a similar way, we

get the following result:

FU1��� � �1; �4; �5; �8; �9; �11;

FU2��� � �2; �3; �6; �7; �10; �12:

5. Shift sequence generation

After binding the operations to FUs, our objective is

to obtain the operations and the corresponding variables

that may be shifted out in order to test the FUs. To

achieve this, we must select the operations from each FU

that have maximum number of common variables. It

must be noted that selection of these operations is not

possible in the FU allocation because FU allocation

maximizes the number of common variables only among

FUs of the same type. However, in this algorithm, we

select operations (and hence variables) from FUs of

di�erent types. Thus, we achieve an overall minimization

of variables to be shifted out.

We divide the overall algorithm into three parts. The

®rst part, sequence( ), selects a set of operations (and

the corresponding variables), one from each FU, such

that the maximum number of variables are common

among them. The second part, optimal_sequence( ),

utilizes the ®rst part repeatedly to select more than one

operation from each FU, such that each FU can be

tested more than once in a pass, if possible. The variables

Fig. 4. The SDFG for Example 1.

Fig. 5. Count matrix for Example 1.

A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2099

Page 6: Concurrent testing in high level synthesis

corresponding to these operations are also obtained.

This part also gives the number of CTRs required in the

synthesis. The third part, order( ), puts these variables

in an order and determines the instants when each one of

them can be shifted out.

5.1. Algorithm to select a set of operations (one from each

FU)

Given a list of operations, this procedure selects one

operation from each FU, such that the common number

of variables between them is maximum. The corre-

sponding variables are also obtained. The list of opera-

tions is provided by the optimal_sequence( ). It may be

noted later that the list of operations passed to se-

quence( ) may contain a dummy operation, the vari-

ables of which are assumed to be the variables selected

by all the previous calls to sequence( ). It is passed with

the objective of selecting operations that have maximum

number of common variables with the previously se-

lected ones. It is not bound to any FU.

In the algorithm, we have assumed a ®ctitious oper-

ation, which is initially, NULL. It corresponds to a set

of operations selected by the current call to sequence( ).

Its variables are given by the variables of the selected

operations.

We generate a similarity matrix. Both, rows and

columns represent operations (including the ®ctitious

operation, but excluding the selected operations). The

elements of the matrix represent the number of common

variables between operations represented by the corre-

sponding rows and columns. The diagonal elements are

taken as zero. Also, the elements corresponding to the

operations bound to the same FU are considered as

zero. An operation pair corresponding to a zero element

is never selected, thereby prohibiting selecting two op-

erations from an FU. The matrix is diagonally sym-

metric and hence, in actual storage we can store either

upper diagonal or lower diagonal matrix.

There may exist some operations, bound to the same

FU, that do not have any variable common with the

other operations. Thus, all the elements corresponding

to the respective rows and columns would be zero. In

this case, we arbitrarily select any operation from the

FU and ignore the other operations.

Procedure Sequence( )

Input: SDFG, FU allocation, A list of operations

(including dummy operation) from which se-

lection is to be made;

Output: Selected operations (one from each FU), Se-

lected variables;

/*

Let max_element be the highest element in a speci-

®ed matrix, row_sum be the sum of a row of a ma-

trix, overall_sum be the sum of all the elements in

the matrix

*/

Begin

Fictitious operation � fég;Append the ®ctitious operation to the list of oper-

ations;

Generate a similarity matrix from the modi®ed list

of operations;

Identify the FUs that have single operation, and se-

lect these operations;

Select the dummy operation also;

do

Append selected operations to ®ctitious opera-

tions;

Update variables of the ®ctitious operation and re-

move duplicate variables, if any;

Delete the selected operations and the other oper-

ations bound to the same FU, from the list of op-

erations;

Update the similarity matrix;

Compute the max_element in the similarity matrix;

Compute row_sum for each row;

Compute overall_sum;

if (overall_sum is not zero) then

Select the row corresponding to the ®ctitious op-

eration;

if (the selected row doesn't have highest row_-

sum)

Select the row having highest row_sum;

end if

if (none of the elements has a value max_element

in the selected row)

if (a row having the highest row_sum has an

element equal to max_element)

Select the row and deselect the previously se-

lected row;

end if

end if

Select the columns having highest value in the se-

lected row;

If (there exist columns that correspond to the op-

erations bound to the same FU),

Keep one column selected from them that has

the largest row_sum value and deselect others;

end if

Select the operations corresponding to the se-

lected row and columns;

end if

while (overall_sum is not zero);

while (the similarity matrix has an operation other

than the ®ctitious)

Select the operation;

Append selected operations to ®ctitious opera-

tions;

2100 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106

Page 7: Concurrent testing in high level synthesis

Update variable of ®ctitious operation and remove

duplicates;

Delete rows and columns corresponding to the se-

lected operation and the other operations bound

to the same FU;

end while

Delete dummy operation and its variables from the

®ctitious operation, and its variables.

Return the operation and variables corresponding

to the ®ctitious operation;

End Sequence

5.1.1. Example 2

We extend Example 1 to illustrate the procedure se-

quence( ). Let us assume that the input list of operations

is {�1, �2, �3, �4, �5, �6, �7, �8, �9, �10, �11, �12}. Let

the ®ctitious operation be denoted by F. The similarity

matrix will be generated as shown in Fig. 6.

The highest element in the matrix, max_element is 2.

The FU allocation is shown in Example 1. There is no

FU that has a single operation. Row_sum for each row

is computed. The overall sum is 24. We select row cor-

responding to �2 as it has the highest row sum. This row

has elements equal to the max_element, in columns

corresponding to �8 and �9. We select operation �8,

because it has a higher row_sum than �9. Thus, opera-

tions selected from the two FUs are �2 and �8. Deleting

these operations and the other operations bound to the

same FU leaves us only with F. Thus, the procedure

returns �2 and �8. The variables corresponding to these

operations, with the duplicates removed, are c, d, v2 and

v8. It may be noted that we need to shift out only these

four variables to test both the FUs once.

5.2. Algorithm to select a set of operations

The procedure optimal_sequence( ) selects a set of

operations and the corresponding variables (without any

duplication) from the given SDFG and the FU alloca-

tion. This algorithm also returns the number of CTRs

required. It is same as the number of variables needed to

be shifted out to test the FUs.

The procedure repeatedly calls sequence( ) to select

the best possible operations, from a given set of opera-

tions, for each FU. Initially, the sequence( ) is called

with all the operations. Next time onwards, the se-

quence( ) is called by removing operations previously

selected. We have assumed a dummy operation, the

variables of which are the variables selected by all the

previous calls to sequence( ). The dummy operation is

passed to the successive calls to sequence( ) with the

objective of selecting operations that have a maximum

number of common variables with the previously se-

lected ones.

Since our objective is to test each FU at least once in

a pass, we have to select at least one operation from each

FU. Thus, ®rst time, we select all the operations selected

by sequence( ) unconditionally. However, if pass is NC,

we check if there is a possibility to shift out any addi-

tional variable. This will depend on the shift time re-

quired by each register. If there is a possibility to shift

out any additional variable, we ®nd a new set of oper-

ations by calling sequence( ). These operations will be

di�erent from the ones previously selected. Depending

on the number of variables needed to be shifted out for

each of these operations and the amount of additional

variables that can be shifted out; we select additional

operations.

Procedure Optimal_Sequence( )

Input: SDFG, FU allocation;

Output: Selected operations (more than one from each

FU), Selected variables, Number of CTRs;

/*

Let ST be the shift time in terms of the number of

control steps for each register, NC be the number

of control steps, operation_list be a list of opera-

tions, var_additional be the number of additional

variables that can be shifted out, selected_opera-

tions be a set of the selected operations, var_se-

quence be a list of variables corresponding to the

selected operations, var_count be the count of vari-

ables in var_sequence, common_var be the number

of variables an operation has common with the

var_sequence, var_operation be the count of vari-

ables in an operation, NCTR be the number of

CTRs.

*/

Begin

Initialize var_additional to a large positive value;

Selected_operations � fùg;var_sequence � fùg;Generate an operation_list consisting of all the op-

erations;Fig. 6. Similarity matrix for Example 2.

A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2101

Page 8: Concurrent testing in high level synthesis

while (var_additional > 0)

Create a dummy operation with the variable

var_sequence.

Append dummy operation to operation_list;

Call Sequence( ) with the operations in the opera-

tion_list;

Delete the dummy operation and operations se-

lected by sequence( ) from the operation_list;

if (number of variables obtained by Sequence( ) 6var_additional) then

Append the operations selected by Sequence( )

to selected_operations;

Append the variables selected by Sequence( ) to

var_sequence;

Remove duplicate variables from the var_se-

quence;

Compute var_count;

var additional � NC±var count� ST

ST

;

else

for (each operation selected by the Sequence( ))

Compute common_var;

end for

Sort the operations in the descending order ac-

cording to their common_var value;

for (each operation in the sorted list)

if (var_operation-common_var 6 var_addi-

tional)

Append the operation to selected_opera-

tions;

Append the (non-common) variables corre-

sponding to the operation, to var_sequence;

Update var_count;

var additional � NC±var count� ST

ST

;

else

Break the for loop;

end if

end for

var_additional � 0;

end if

end while

NCTR � var count;

Selected variables � var sequence

Return NCTR, selected_operations selected vari-

ables;

End Optimal_Sequence

5.2.1. Example 3

Let us consider Example 1 for illustrating opti-

mal_sequence( ). We obtained the FU allocation in

Example 1. As illustrated in Example 2, the ®rst call to

sequence( ) gives us a list of variables as c, d, v2 and v8.

Thus, var_count is 4. Assuming a shift time of 1 control

step, var_additional will be 2. The next call to se-

quence( ) returns variables v9 and v12. These variables

correspond to operation �9 and �12. This implies that�9 from FU1 and �12 from FU2 have the maximum

number of variables that are common with the previ-

ously obtained variables c, d, v2 and v8. The new set of

variables is appended to the previous ones. Thus, we

obtain var_count as 6 and var_additional as 0. Hence,

®nally we select operations �2, �8, �9 and �12, and the

corresponding variables c, d, v2, v8, v9, v12. The number

of CTRs required is 6.

5.3. Algorithm to ®nd out instants and order of shifting

variables

The procedure order( ) determines the order and

instants (in terms of control steps) in which the variables

must be shifted out. The variables are shifted out in the

same order as they are assigned and strobed in the

CTRs. Our objective is to order the variables in such a

way that there is no discontinuity in the shifting. Thus,

before the contents of a CTR are shifted out completely,

the next CTR in sequence should have captured a

variable to be shifted out in that pass.

It may be noted that the pass time may be greater

than NC. Thus, an instant that exceeds NC refers to the

next cycle in the same pass. For example, if there are six

control steps in an SDFG and the variables needed to be

shifted out to test all the FUs once are 10. Assuming

ST � 1, the pass time would be 10 control steps. An

instant (step) 8 would mean the second control step in

the next cycle. However, the next pass would start at the

beginning of the third cycle, which actually will be re-

ferred to as step 1 of the second pass.

Once if a variable is captured in a CTR, it can be

shifted out any time in the pass. Hence, to avoid the

discontinuity, we sometimes delay the shifting of all the

variables considered before. The gap created at the be-

ginning can be ®lled with the variables of the previous

pass, if needed. However, this might result in a very

negligible time wastage in the ®rst pass only, since there

will be no previous pass.

In the algorithm, we have used a variable ®rst_step. It

indicates the control step at which the ®rst variable

starts shifting out in a pass. The variables before this

control step belong to the previous pass. By knowing

®rst_step, and the order in which variables are shifted

out, we ®nd out the instant at which a variable will be

shifted out. The variables in the CTRs are assigned in

the same order as they are shifted out.

Procedure Order( )

Input: SDFG;

2102 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106

Page 9: Concurrent testing in high level synthesis

Output: Order in which variables are shifted out, ®rst

step from where shifting the variables of a new

pass starts, CTR assignment;

/*

Let ST be the shift time for each register, ®rst_step

be the control step from which the variables of a

pass start shifting out, shift_step be a temporary

variable to hold the value of control step where a

new variable can be shifted, var_step be the control

step where a variable is generated, order_list be the

list of variables in the order they will be shifted out,

NCTR be the number of CTRs.

*/

Begin

order_list � fùg;Call optimal_sequence( ) to select the operations

and obtain NCTR;

Arrange the selected operations in a sequence in

which they appear in the SDFG (left to right, then

top to bottom);

®rst_step � control step of the ®rst operation in the

arranged list;

shift_step � first_step;

for(each operation in the arranged list)

for (each variable in the operation)

if (the variable does not exist in the order list)

Append the variable to the order_list;

var_step � control step of the operation;

if (the variable is an output of the operation

under consideration)

Increment var_step by one;

end if

if(var_step > shift_step)

®rst_step � first step� var_step-shift_step;

/* shift the ®rst_step */

end if

shift_step � shift step� ST;

end if

end for

end for

Assign the variables in the order_list to NCTR

number of CTRs;

Return order_list, ®rst_step, CTR assignment;

End Order

5.3.1. Example 4

Let us consider Example 1 for illustrating the pro-

cedure order( ). As illustrated in Example 3, opti-

mal_sequence( ) selects the variables c, d, v2, v8, v9 and

v12. Fig. 7 shows these variables in the control steps in

which they are generated. The pass is six control steps. If

®rst_step is taken as one, the variables c, d and v2 will be

shifted out in control steps 1±3, respectively, However,

since v8 is available only in control step 5, there would

be a discontinuity in the shift sequence at control step 4.

This unordered sequence is shown in the ®gure. Proce-

dure order( ) shifts the ®rst_step to second control step.

Thus, the variables c, d, v2, v8, v9 and v12 will be shifted

out in control steps 2 through 7, respectively. The con-

trol step 7 for the variable v12 implies the ®rst control

step of the next pass. It may be noticed that the gap

created in ®rst control step of a pass is utilized by v12 of

the previous pass. The variables v12, c, d, v2, v8 and v9

are assigned to CTR # 1 through 6, respectively.

6. Experimental results

We have implemented the proposed methodology

using ÔCÕ language. Table 1 shows the experimental re-

sults for Example 1. We have also implemented our

synthesis system on di�erent benchmark examples. The

examples considered are the ®fth-order digital elliptic

wave ®lter (EWF) and di�erential equation from Ref.

[14], and Tseng from Ref. [15].

6.1. Fifth-order digital elliptic wave ®lter

The SDFG of the ®fth-order EWF is adopted from

Ref. [14] and depicted in Fig. 8. The EWF consists of 26

additions and eight multiplication operations scheduled

over 14 control steps. We consider the shift time as one

control step. Our FU allocation is shown below:

FU1��� � �1; �7; �8; �11; �16; �18; �23; �26;

FU2��� � �2; �3; �5; �10; �13; �14; �17; �19; �21; �24;

FU3��� � �4; �6; �9; �12; �15; �20; �22; �25;

FU4��� � �1; �2; �3; �4; �5; �6; �7; �8:

Fig. 7. Generation and ordering of shift sequence.

A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2103

Page 10: Concurrent testing in high level synthesis

Table 2 provides a comparison with PHITS-NS [14].

The FU allocation of PHITS-NS is shown below:

FU1��� � �1; �3; �5; �7; �8; �9; �11; �12; �14; �17;�19; �22; �24;

FU2��� � �2; �4; �6; �10; �13; �15; �16; �20; �23; �26;

FU3��� � �18; �21; �25;

FU4��� � �1; �2; �3; �4; �5; �6; �7; �8:Each FU is tested twice using the present approach. We

require 14 variables to be shifted out in a pass time of

NC (14 control steps). In order to test each FU twice,

the allocation of PHITS-NS requires 16 variables to be

shifted out, and hence takes more testing time.

6.2. Di�erential equation example

The SDFG for the di�erential equation (di�eq) exam-

ple is shown in Fig. 9. The experimental results for the

di�eq is summarized in Table 4. We consider the shift

time as one control step. Our FU allocation is shown

below:

FU1��� � �1; �2; FU2�ÿ� � ÿ1; ÿ2;

FU3�<� � <1; FU4��� � �1; �4; �5;FU5��� � �2; �3; �6:

Table 3 provides the comparison of results with Lyra

and Aryl taken from Ref. [14]. The FU allocation of

Lyra and Aryl is shown below:

FU1��� � �1; �2; FU2�ÿ� � ÿ1; ÿ2;

FU3�<� � <1; FU4��� � �1; �4; �6;FU5��� � �2; �3; �5:

Table 1

The experimental results for Example 1

Resource type Results obtained

FU allocation FU1��� � �1, �4, �5, �8, �9, �11

FU2��� � �2, �3, �6, �7, �10, �12

Operations selected for testing �2; �8; �9; �12

Variables in the shift sequence c, d, v2, v8, v9, v12

Number of CTRs 6

Number of times each FU is tested in a pass FU1(�):2 FU2(�):2

Fig. 8. SDFG for the ®fth-order EWF.

Table 2

Comparison of results for the EWF example

Resource type Proposed method PHITS-NS

Operations selected for testing �5, �6, �7, �1, �2, �9, �10, �11 �8, �10, �18, �1, �2, �9, �15

Variables in the shift sequence x9, f, b, T26, x1, t33, T39, g, e, con, x2,

x3, d, x4

b, x2, d, e, x9, IN, x11, x14, con, x4, x,

x12, x3

Number of CTRs 14 13a

Number of times each FU is tested

in a pass

FU1(�):2 FU2(�):2 FU3 (�):2 FU4(�):2 FU1(�):2 FU2(�):2 FU3(+):1a FU4(�):2

a To test FU3(�) twice will add three more variables resulting to total 16 variables.

2104 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106

Page 11: Concurrent testing in high level synthesis

Each FU is tested once. We require 10 variables to be

shifted out, whereas, the allocation of Lyra and Aryl

requires 11 variables to be shifted out.

6.3. Tseng example

This is the straight-line code example adopted from

Avra [15], and modi®ed as Tseng example in Ref. [14].

The SDFG is shown in Fig. 10. The operations are al-

located in three adders, one multiplier, one subtractor,

one logical OR and one logical AND FUs. We consider

the shift time as one control step. Our FU allocation is

shown below:

FU1��� � �1; �4; FU2��� � �2;

FU3��� � �3; FU4�ÿ� � ÿ1;

FU5�j� � j1; FU6�&� � &1; FU7��� � �1:

The comparisons of experimental results are sum-

marized in Table 4. The FU allocation of with PHITS-

NS [14] is shown below:

FU1��� � �1; �2; FU2��� � �3;

FU3��� � �4; FU4�ÿ� � ÿ1;

FU5�j� � j1; FU6�&� � &1; FU7��� � �1:

The FU allocation of Facet taken from Ref. [14] is

shown below:

FU1��� � �1; �3; FU2��� � �2;

FU3��� � �4; FU4�ÿ� � ÿ1;

FU5�j� � j1; FU6�&� � &1; FU7��� � �1:

Each FU is tested once. We require 12 variables to be

shifted out in a pass time of 12 control steps. Thus, each

pass is three cycles long. The allocations of PHITS-NS

Fig. 9. SDFG for the di�erential equation example.

Table 3

Comparison of results for the di�eq example

Resource type Proposed method Lyra and Aryl

Operations selected for testing �1, �1, <1, ÿ1, �6 �1, �1, <1, ÿ1, �2Variables in the shift sequence u, dx, v1, x, x', A, ctrl, v3, v5, v7 u, dx, v1, x, x', A, ctrl, v3, v5, #3, v2

Number of CTRs 10 11

Number of times each FU is

tested in a pass

FU1(�):1 FU2(ÿ):1 FU3(<):1 FU4(�):1 FU5(�):1 FU1(�):1 FU2(ÿ):1 FU3(<):1 FU4(�):1FU5(�):1

Fig. 10. SDFG for the Tseng example.

Table 4

Comparison of results for the Tseng example

Resource type Proposed method PHITS-NS Facet

Operations selected for testing �1, �1, ÿ1, �2, �3, j1, &1 �1, �1, ÿ1, �3, �4, j1, &1 �1, �1, ÿ1, �2, �4, j1, &1

Variables in the shift sequence v1, v2, v3, v6, v7, v4, v5, v9, v8,

v2', v11, v1'

v1, v2, v3, v6, v7, v4, v5, v9, v8,

v2', v11, v1', v10

v1, v2, v3, v6, v7, v4, v5, v9, v8,

v2', v11, v1', v10

Number of CTRs 12 13 13

Number of times each FU is

tested in a pass

FU1(�):1 FU2(�):1 FU3(�):1

FU4(ÿ):1 FU5(j):1 FU6(&):1

FU7(�):1

FU1(�):1 FU2(�):1 FU3(�):1

FU4(ÿ):1 FU5(j):1 FU6(&):1

FU7(�):1

FU1(�):1 FU2(�):1 FU3(�):1

FU4(ÿ):1 FU5(j):1 FU6(&):1

FU7(�):1

A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2105

Page 12: Concurrent testing in high level synthesis

and Facet require 13 variables to be shifted out. Also it

must be noted that these approaches will require four

cycles, as the pass time is 13 control steps.

7. Conclusion

This paper endeavors a new approach to achieve the

concurrent testability. The testing methodology and

synthesis schemes are presented. The methodology uti-

lizes an external-testing unit to which the input and

output variables of the CUT are shifted out. An FU

allocation technique is presented that allows each FU

to be tested in a shorter time. This is made possible

by minimizing the number of variables required to be

shifted out from the CUT. Algorithms are presented for

obtaining the order in which variables are shifted out

and stored in the CTRs. This information can be utilized

for synchronizing the testing unit with the CUT. Pro-

posed approach is applied on benchmark examples and

the results are compared with some approaches available

in the literature.

Acknowledgements

Kuwait University Research Grant EE-081 sponsors

this work.

References

[1] Chen CH, Yuen JT. Concurrent test scheduling in built-in

self-test environment. IEEE Design Test Comp 1992,

p. 256±9.

[2] Saluja KK, Sharma R, Kime CR. Concurrent comparative

testing using BIST resources. International Conference on

Computer Aided Design 1987. p. 336±9.

[3] Saluja KK, Sharma R, Kime CR. A concurrent testing

technique for digital circuits. IEEE Trans Comp Aided

Design 1988;7(12):1250±60.

[4] Singh R, Townssend M, Knight JP. Concurrent testing of

digital ASICs synthesized from data-¯ow graphs. Sixth

Workshop on New Directions for Testing, Canada, 1992.

p. 87±95.

[5] Flottes ML, Hammad D, Rouzeyre B. Automatic synthesis

of BISTed data paths from high level speci®cation.

European Design and Test Conference 1994. p. 591±8.

[6] Harris IG, Orailoglu A. SYNCBIST: synthesis for con-

current built-in self-testability. International Conference

on Computer Design 1994. p. 101±4.

[7] Swaminathan G, Aylor JH, Johnson BW. Concurrent

testing of VLSI circuits using conservative logic. IEEE

Trans Comp 1990, p. 60±5.

[8] Sharma R, Saluja KK. An implementation and analysis of

a concurrent built-in self-test technique. International

Symposium on Fault-Tolerant Computing 1988. p. 164±9.

[9] Sun X, Serra M. Merging concurrent checking and o�-line

BIST. Proceeding International Test Conference 1992. p.

958±67.

[10] Ismaeel AA, Bhatnagar R, Mathew R. Modi®cation of

scheduled data ¯ow graph for on-line testability. Micro-

electron Reliab 1999;39:1473±84.

[11] Ismaeel AA, Dhodhi MK, Mathew R. Assignment and

allocation of highly testable data paths under scan

optimization. Integrat VLSI J 1996;21(3):191±207.

[12] Singh R, Knight J. Concurrent testing in high level

synthesis. International Conference on Computer Design,

1994. p. 96±103.

[13] Abramovici M, Breuer MA, Friedman AD. Digital

systems testing and testable design. New york: Computer

Science Press, 1990.

[14] Lee TC, Wolf WH, Jha NK. Behavioral synthesis of highly

testable data paths under the non-scan and partial scan

environments. Proceeding 30th Design Automation Con-

ference 1993. p. 292±7.

[15] Avra L. Allocation and assignment in high level synthesis

for self-testable data paths. Proceeding IEEE International

Test Conference 1991. p. 463±72.

2106 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106