Upload
aa-ismaeel
View
224
Download
0
Embed Size (px)
Citation preview
Concurrent testing in high level synthesisA.A. Ismaeel *, R. Bhatnagar, R. Mathew
Department of Electrical and Computer Engineering, College of Engineering and Petroleum, Kuwait University, P.O. Box 5969, Safat
13060, Kuwait
Received 8 February 2000; received in revised form 21 March 2000
Abstract
A new methodology to incorporate concurrent testing in high level synthesis is presented. Optimization techniques in
VLSI designs tend to reduce the idle time of resources in which case the proposed methodology is found extremely
useful. The objective is to test each functional unit (FU) of a circuit under test (CUT) at least once in a time frame called
pass. Testing is performed continuously by repeating the pass. We carry out the testing by shifting out selective
variables of the CUT to an external-testing unit for veri®cation. An additional pin is employed to shift out the variables.
Testing time is reduced by minimizing the number of variables needed to be shifted out, which is achieved by an FU
allocation technique. The FU allocation utilizes a given scheduled data ¯ow graph as input. Proposed testing meth-
odology, and FU allocation technique, are presented. Results of implementation of the technique, on di�erent
benchmark examples, are presented. Ó 2000 Elsevier Science Ltd. All rights reserved.
1. Introduction
High level synthesis compiles the behavioral de-
scription of a digital system into a data path. The be-
havioral description explains the sequence in which a set
of operations is executed. The data path is composed of
resources like functional unit (FU), registers and mul-
tiplexers. Recent developments in the VLSI circuits have
signi®cantly increased the logic density on the chips. One
of the important aspects a�ected by these complex de-
signs is the testability.
Several designs for testability (DFT) techniques have
been devised. These techniques can be o�-line or on-line.
In the o�-line testing, normal operation of the circuit
under test (CUT) is interrupted for testing. Hence, it is a
serious dilemma for the systems that run continuously as
in production control systems, air tra�c control sys-
tems, communication control systems, etc. On the other
hand, concurrent testable circuits can be tested without
a�ecting the normal operation of the CUT. Concurrent
testing can give a faster alarm upon developing any fault
in the system. It envisages the performance degradation
of a system, thus, improving the quality and reliability
[1]. Concurrent testing not only detects permanent faults
but also detects transient and intermittent faults [2,3]. It
yields reduced system maintenance requirements and
enhanced diagnostic capabilities.
Synthesis of the concurrent testable circuits is gaining
importance as a diversi®ed ®eld of high level synthesis.
Two major constraints for testing are time requirement
to detect a fault, and area overhead. Concurrent testing
can reduce the time requirement. However, it depends
on the chip complexity and extra circuitry added for
testing. The area overhead is caused by the on-chip
circuitry to provide test patterns and analyze output
response. This area overhead also adds to the com-
plexity of the system. Our goal is to minimize the de-
tection time of the faults and have a minimum impact on
the area.
The behavioral description is written in VHDL or
any other procedural language. The description explains
the sequence in which a particular operation is executed.
These sequences form control steps. The cost of a system
mainly depends on the hardware resources used, such
as adder, multiplier, logical operators, etc. Sharing the
Microelectronics Reliability 40 (2000) 2095±2106
www.elsevier.com/locate/microrel
* Corresponding author.
E-mail address: [email protected] (A.A. Ismaeel).
0026-2714/00/$ - see front matter Ó 2000 Elsevier Science Ltd. All rights reserved.
PII: S0 02 6 -2 71 4 (00 )0 0 02 8 -7
resources in their idle time can reduce the system cost.
Scheduling algorithms are available that utilize the idle
time to share the resources by scheduling an operation in
feasible control steps. The schedule obtained is called a
scheduled data ¯ow graph (SDFG). A data path can be
generated from an SDFG by binding operations and
variables to FUs and registers, respectively. These are
achieved by FU allocation and register allocation. The
FUs and registers are interconnected through multi-
plexers in the data path. This interconnection is achieved
by a multiplexer allocation.
Recently, researchers have worked on concurrent
testing concept in high level synthesis. Singh et al.,
Flottes et al., Harris et al. and Swaminathan et al. have
presented approaches that utilize idle time [4±7]. How-
ever, the area minimization in VLSI designs demands
minimization of resources and, hence, an even distri-
bution of the operations of same type over the possible
control steps. This situation leads to a reduced idle time
(i.e., a tight schedule) and hence, the above-mentioned
approaches loose their applicability.
Saluja et al. addressed a concurrent comparative
built-in self-test approach for testing combinational
circuits [2,3]. In this approach, the concurrent testing
was achieved by modifying the o�-line test resources and
observing the normal inputs and outputs of the CUT.
Sharma et al. and Sun et al. have proposed conventional
duplication of the CUT, and compared the response of
CUT with its duplicate [8,9]. However, all these ap-
proaches have been applied on the structural description
(post-synthesis). The post-synthesis results in more
hardware test resources and area overhead.
The approaches discussed above have the disadvan-
tage of either having a poor applicability for a tight
schedule or being post-synthesis. We also presented an
FU allocation approach earlier, where we assumed that
the existence of idle time is feasible [10]. In this paper, we
propose a method that overcomes the above-mentioned
disadvantages. The present approach is more generic
and does not count on the existence of the idle time.
Also, it is applied on a given SDFG (pre-synthesis).
The operations are performed sequentially according
to their occurrences in the control steps (®rst through
the last) in an SDFG. We call the duration between the
®rst and last control step a cycle. We assume that there is
no functional pipelining. Thus, each cycle begins with a
fresh set of inputs in the ®rst control step. The cycle
completes when all the operations in the last control step
are executed. Then the next cycle begins.
The testing is carried out by capturing the selective
input and output variables of FUs of the CUT. We
employ special registers called concurrent testing regis-
ters (CTRs) to capture and hold these inputs and out-
puts. These registers do not a�ect the normal operation
of the circuit. It may be noted that these registers are
di�erent from the normal registers that are used to hold
data for the normal operation of the circuit. The con-
tents of CTRs (i.e., inputs and outputs) are shifted out
sequentially to an external-testing unit. The shifting is
serial through an additional shift pin. The veri®cation is
done in the testing unit and an error is reported if any
discrepancy is found.
We test the FUs in every pass. The pass is repeated
inde®nitely, and is de®ned as follows:
pass � max (NC, Ntest),
where NC is the total number of control steps in the
SDFG, and Ntest, is the number of control steps needed
to test each FU exactly once.
The pass is repeated at the beginning of a cycle that
starts after the previous pass. Note that the duration of a
cycle is NC. The following two possibilities exist:
(a) NC P Ntest : Pass is equal to NC, i.e., one cycle
long. A new pass starts at the beginning of each cy-
cle. Each FU may be tested more than once in a
pass. Our objective is to maximize the number of
times each FU can be tested.
(b) NC < Ntest : Pass is equal to Ntest and is more
than one cycle long. A new pass starts at the begin-
ning of a cycle that follows the previous pass. Each
FU is tested exactly once in a pass. Our objective is
to reduce the pass time.
In both the possibilities, the desired objectives are
achieved by minimizing the number of variables needed
to be shifted out to test the FUs. E�ectively, it results
into minimization of the testing time for each FU.
Fig. 1 illustrates how a data path can be synthesized
by using the proposed methodology. The methodology
is shown in two phases: FU allocation and shift se-
quence generation. The behavioral description of a
circuit in terms of an SDFG is assumed to be provided.
We apply an FU allocation technique to bind opera-
tions to FUs in such a way that the number of vari-
ables needed to be shifted out are minimized. The shift
sequence generation, that follows FU allocation, selects
a minimum number of variables, which are captured to
test FUs at least once. Also, we obtain the order and
instants at which the variables are shifted out. This is
needed for synchronizing the external testing unit with
the CUT. We then follow a register allocation tech-
nique to bind the variables to registers, and an inter-
connect allocation scheme to optimize the number of
multiplexer inputs. These multiplexers are required at
the inputs of each FU. Any standard available algo-
rithm can be used for register allocation and inter-
connect allocation. These algorithms were presented in
Ref. [11]. Finally, we can obtain the concurrent test-
able data path.
Section 2 describes the di�erent strategies used for
concurrent testing. Section 3 describes the proposed
concurrent testing methodology. Sections 4 and 5 pre-
sent FU allocation and shift sequence generation. In
2096 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106
Section 6, we show experimental results, and in Section 7
the conclusions.
2. Strategies for concurrent testing
Di�erent strategies for concurrent testing are illus-
trated in Fig. 2. The TPGR represents test pattern
generator register, CTC represents concurrent test cir-
cuit, MISR represents multiple input signature register,
PLA represents programmable logic array, and ILA
represents iterative logic array. The strategies are brie¯y
discussed below.
Fig. 2(a) illustrates the idle time concept [12]. When
the CUT is idle, we apply test patterns and observe the
output response. Fig. 2(b) illustrates a full duplication of
the hardware resources that are under test [8]. The
output response is collected and compared. In Fig. 2(c),
CUT duplication is eliminated and an equality com-
parator is introduced. When the normal input vector
and the test vector (from TPGR) match, the equality
comparator signals MISR to collect the output response
[3]. In this approach, the test resources are considerably
reduced but the veri®cation is o�-line. Fig. 2(d) is a
modi®cation of Fig. 2(b), where a PLA is utilized instead
of the golden CUT [8]. This is also a full duplication and
requires more hardware test resources. In Fig. 2(e), the
CUT is partitioned into di�erent identical cells called
ILA. The inputs and outputs of each cell are mutually
compared for the veri®cation [13]. This approach is
more useful at structural level.
3. Concurrent testing methodology
We have made the following assumptions throughout
this paper. There is no functional pipelining in the CUT.
Multi-type operations are not bound to the same FU.
Only one external pin is available to shift out the vari-
ables from the CUT. Registers, multiplexers and paths
are fault-free. All registers are of the same size and re-
quire the same shift time. Shifting time is given in terms
of number of control steps. The execution time of each
operation is one control step.
Our main objective is to test the circuits concurrently,
without a�ecting the normal operation of the CUT. The
testing is carried out by capturing selective inputs and
Fig. 1. Outline of concurrent testable data path synthesis sys-
tem.
Fig. 2. Di�erent strategies for the concurrent testing.
A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2097
output variables of the FUs at instants of their genera-
tion in the pass. The captured variables of the CUT are
shifted serially to an external-testing unit, where veri®-
cation is made. We employ serial shifting of variables of
the CUT in order to reduce chip pin count overhead.
Fig. 3 shows the outline of our proposed testing
methodology. The input and output variables of the FUs
are strobed with a signal STROBE and captured in the
CTRs. The CTRs are arranged in a memory form. The
number of CTRs needed, n, is equal to the number of
variables needed to be shifted out in a pass. An algorithm
presented in Section 5 obtains this count. This algorithm
also obtains the order in which variables are shifted out.
The variables are strobed in the CTRs sequentially in the
same order in which they are shifted out. A MOD n
counter, clocked by STROBE, generates the sequential
addresses for strobing variables into CTRs.
The contents of CTRs are shifted out sequentially (1
through n) to the testing unit. Another MOD n counter
is utilized to address the CTRs sequentially for shifting.
The contents of an addressed CTR are fed to a shift
register when a signal SHIFT becomes active. The shift
register shifts out the contents of the CTR on a pin
SHIFT OUT. To synchronize the activity of counter
with the shift register, the signal SHIFT is used as a
clock to the counter.
Note that the two address sources, generated by two
MOD n counters, must be fed to the CTRs through
multiplexers. For the sake of simplicity, these details are
not shown in the ®gure. A RESET signal resets the
counters to address the ®rst CTR at the beginning.
4. FU allocation
The FU allocation binds operations to FUs. An al-
gorithm for the FU allocation is presented below that
requires an SDFG as the input. The algorithm binds the
operations to FUs in such a way that by shifting out a
minimum number of variables all the FUs can be tested.
This is possible if there are some variables shared by
di�erent FUs, so that they have to be shifted out just
once. Thus, we bind the candidate operations (the op-
erations under consideration for binding in a control
step) sharing the same variables to di�erent FUs.
In the algorithm, a count matrix is generated. The
rows represent the candidate operations of a particular
type in a control step and the columns represent the FUs
of that type. The elements of the matrix represent the
maximum number of common variables between the
respective operation and the FU. As mentioned above,
an operation that has more number of common vari-
ables with an FU should not be bound to that FU.
Hence, we ®nd the highest element in the matrix and
bind the corresponding operation to an FU that has the
least number of variables common with that operation.
However, if more than one row (and hence operation)
corresponds to the highest element, preference is given
to the one that has the lowest element in those rows.
This way we maximize the number of common variables
between various FUs.
Procedure FU_allocation( )
Input: SDFG;
Output: FU allocation;
/*
Let t be a type of operation, Ntype be the total num-
ber of types of operations, Nfu[t] be the number of
FUS of type t, List[t] be the candidate operations of
type t in a control step, NList[t] be the number of
candidate operations in a control step, row_max
be the maximum element of a row in a matrix,
min_count be the count of a speci®ed element in
a row.
*/
Begin
Ntype � count of operation types in the SDFG;
Nfu�t� � maximum number of concurrent opera-
tions in a control step;
for (all control steps) do
for (all operation types) do
Generate List[t];
NList�t� � count of operations in List[t];
if (®rst control step)
Bind all operations in the List[t] to NList[t]
number of FUs;
else
Generate the count matrix;
Fig. 3. Proposed concurrent testing methodology.
2098 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106
while (there is a candidate operation in List[t])
Compute row_max for each row in the
count matrix;
Select the row(s) that has highest row_max;
if (more than one row is selected) /* Deselect
all but one row */
Find the lowest element in all the selected
rows;
Deselect the rows that do not have the low-
est element;
Compute min_count of the lowest element
in all selected rows;
Keep only one row selected that has
the minimum min_count, and deselect all
other selected rows;
end if
Select a column having a lowest value in the
selected row;
Bind the operation represented by the select-
ed row to the FU represented by the selected
column;
Delete the selected row and column from the
count matrix;
Delete the candidate operation from List[t]);
end while
end if
end for
end for
Return the operations binding to FUs;
End FU_allocation
4.1. Example 1
Fig. 4 shows a small SDFG example that has been
used to illustrate most of the aspects of the algorithms
presented in this paper. The SDFG has six control steps
and single type of operation (�). Nfu[t], given by the
maximum number of concurrent operations in a control
step, is two. In the ®rst control step, the candidate op-
erations are �1 and �2. They are bound to FU1(�) and
FU2(�), respectively. The variables bound to FU1(�) are
a, b and v1. The variables bound to FU2(�) are c, d and
v2. In the second control step, the candidate operations
are �3 and �4. The count matrix would be as shown in
Fig. 5.
Since the maximum for each row is 1, we temporarily
select both rows. The lowest element in these rows is 0.
The count of 0 in each row is represented by min_count
and is the same for both rows. Hence, we can select any
row. Let us select the row representing �4. The column
representing FU1(�) will be selected as it has the lowest
value in that row. Thus, we bind �4 to FU1(�), and
similarly, �3 to FU2(�). Proceeding in a similar way, we
get the following result:
FU1��� � �1; �4; �5; �8; �9; �11;
FU2��� � �2; �3; �6; �7; �10; �12:
5. Shift sequence generation
After binding the operations to FUs, our objective is
to obtain the operations and the corresponding variables
that may be shifted out in order to test the FUs. To
achieve this, we must select the operations from each FU
that have maximum number of common variables. It
must be noted that selection of these operations is not
possible in the FU allocation because FU allocation
maximizes the number of common variables only among
FUs of the same type. However, in this algorithm, we
select operations (and hence variables) from FUs of
di�erent types. Thus, we achieve an overall minimization
of variables to be shifted out.
We divide the overall algorithm into three parts. The
®rst part, sequence( ), selects a set of operations (and
the corresponding variables), one from each FU, such
that the maximum number of variables are common
among them. The second part, optimal_sequence( ),
utilizes the ®rst part repeatedly to select more than one
operation from each FU, such that each FU can be
tested more than once in a pass, if possible. The variables
Fig. 4. The SDFG for Example 1.
Fig. 5. Count matrix for Example 1.
A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2099
corresponding to these operations are also obtained.
This part also gives the number of CTRs required in the
synthesis. The third part, order( ), puts these variables
in an order and determines the instants when each one of
them can be shifted out.
5.1. Algorithm to select a set of operations (one from each
FU)
Given a list of operations, this procedure selects one
operation from each FU, such that the common number
of variables between them is maximum. The corre-
sponding variables are also obtained. The list of opera-
tions is provided by the optimal_sequence( ). It may be
noted later that the list of operations passed to se-
quence( ) may contain a dummy operation, the vari-
ables of which are assumed to be the variables selected
by all the previous calls to sequence( ). It is passed with
the objective of selecting operations that have maximum
number of common variables with the previously se-
lected ones. It is not bound to any FU.
In the algorithm, we have assumed a ®ctitious oper-
ation, which is initially, NULL. It corresponds to a set
of operations selected by the current call to sequence( ).
Its variables are given by the variables of the selected
operations.
We generate a similarity matrix. Both, rows and
columns represent operations (including the ®ctitious
operation, but excluding the selected operations). The
elements of the matrix represent the number of common
variables between operations represented by the corre-
sponding rows and columns. The diagonal elements are
taken as zero. Also, the elements corresponding to the
operations bound to the same FU are considered as
zero. An operation pair corresponding to a zero element
is never selected, thereby prohibiting selecting two op-
erations from an FU. The matrix is diagonally sym-
metric and hence, in actual storage we can store either
upper diagonal or lower diagonal matrix.
There may exist some operations, bound to the same
FU, that do not have any variable common with the
other operations. Thus, all the elements corresponding
to the respective rows and columns would be zero. In
this case, we arbitrarily select any operation from the
FU and ignore the other operations.
Procedure Sequence( )
Input: SDFG, FU allocation, A list of operations
(including dummy operation) from which se-
lection is to be made;
Output: Selected operations (one from each FU), Se-
lected variables;
/*
Let max_element be the highest element in a speci-
®ed matrix, row_sum be the sum of a row of a ma-
trix, overall_sum be the sum of all the elements in
the matrix
*/
Begin
Fictitious operation � fég;Append the ®ctitious operation to the list of oper-
ations;
Generate a similarity matrix from the modi®ed list
of operations;
Identify the FUs that have single operation, and se-
lect these operations;
Select the dummy operation also;
do
Append selected operations to ®ctitious opera-
tions;
Update variables of the ®ctitious operation and re-
move duplicate variables, if any;
Delete the selected operations and the other oper-
ations bound to the same FU, from the list of op-
erations;
Update the similarity matrix;
Compute the max_element in the similarity matrix;
Compute row_sum for each row;
Compute overall_sum;
if (overall_sum is not zero) then
Select the row corresponding to the ®ctitious op-
eration;
if (the selected row doesn't have highest row_-
sum)
Select the row having highest row_sum;
end if
if (none of the elements has a value max_element
in the selected row)
if (a row having the highest row_sum has an
element equal to max_element)
Select the row and deselect the previously se-
lected row;
end if
end if
Select the columns having highest value in the se-
lected row;
If (there exist columns that correspond to the op-
erations bound to the same FU),
Keep one column selected from them that has
the largest row_sum value and deselect others;
end if
Select the operations corresponding to the se-
lected row and columns;
end if
while (overall_sum is not zero);
while (the similarity matrix has an operation other
than the ®ctitious)
Select the operation;
Append selected operations to ®ctitious opera-
tions;
2100 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106
Update variable of ®ctitious operation and remove
duplicates;
Delete rows and columns corresponding to the se-
lected operation and the other operations bound
to the same FU;
end while
Delete dummy operation and its variables from the
®ctitious operation, and its variables.
Return the operation and variables corresponding
to the ®ctitious operation;
End Sequence
5.1.1. Example 2
We extend Example 1 to illustrate the procedure se-
quence( ). Let us assume that the input list of operations
is {�1, �2, �3, �4, �5, �6, �7, �8, �9, �10, �11, �12}. Let
the ®ctitious operation be denoted by F. The similarity
matrix will be generated as shown in Fig. 6.
The highest element in the matrix, max_element is 2.
The FU allocation is shown in Example 1. There is no
FU that has a single operation. Row_sum for each row
is computed. The overall sum is 24. We select row cor-
responding to �2 as it has the highest row sum. This row
has elements equal to the max_element, in columns
corresponding to �8 and �9. We select operation �8,
because it has a higher row_sum than �9. Thus, opera-
tions selected from the two FUs are �2 and �8. Deleting
these operations and the other operations bound to the
same FU leaves us only with F. Thus, the procedure
returns �2 and �8. The variables corresponding to these
operations, with the duplicates removed, are c, d, v2 and
v8. It may be noted that we need to shift out only these
four variables to test both the FUs once.
5.2. Algorithm to select a set of operations
The procedure optimal_sequence( ) selects a set of
operations and the corresponding variables (without any
duplication) from the given SDFG and the FU alloca-
tion. This algorithm also returns the number of CTRs
required. It is same as the number of variables needed to
be shifted out to test the FUs.
The procedure repeatedly calls sequence( ) to select
the best possible operations, from a given set of opera-
tions, for each FU. Initially, the sequence( ) is called
with all the operations. Next time onwards, the se-
quence( ) is called by removing operations previously
selected. We have assumed a dummy operation, the
variables of which are the variables selected by all the
previous calls to sequence( ). The dummy operation is
passed to the successive calls to sequence( ) with the
objective of selecting operations that have a maximum
number of common variables with the previously se-
lected ones.
Since our objective is to test each FU at least once in
a pass, we have to select at least one operation from each
FU. Thus, ®rst time, we select all the operations selected
by sequence( ) unconditionally. However, if pass is NC,
we check if there is a possibility to shift out any addi-
tional variable. This will depend on the shift time re-
quired by each register. If there is a possibility to shift
out any additional variable, we ®nd a new set of oper-
ations by calling sequence( ). These operations will be
di�erent from the ones previously selected. Depending
on the number of variables needed to be shifted out for
each of these operations and the amount of additional
variables that can be shifted out; we select additional
operations.
Procedure Optimal_Sequence( )
Input: SDFG, FU allocation;
Output: Selected operations (more than one from each
FU), Selected variables, Number of CTRs;
/*
Let ST be the shift time in terms of the number of
control steps for each register, NC be the number
of control steps, operation_list be a list of opera-
tions, var_additional be the number of additional
variables that can be shifted out, selected_opera-
tions be a set of the selected operations, var_se-
quence be a list of variables corresponding to the
selected operations, var_count be the count of vari-
ables in var_sequence, common_var be the number
of variables an operation has common with the
var_sequence, var_operation be the count of vari-
ables in an operation, NCTR be the number of
CTRs.
*/
Begin
Initialize var_additional to a large positive value;
Selected_operations � fùg;var_sequence � fùg;Generate an operation_list consisting of all the op-
erations;Fig. 6. Similarity matrix for Example 2.
A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2101
while (var_additional > 0)
Create a dummy operation with the variable
var_sequence.
Append dummy operation to operation_list;
Call Sequence( ) with the operations in the opera-
tion_list;
Delete the dummy operation and operations se-
lected by sequence( ) from the operation_list;
if (number of variables obtained by Sequence( ) 6var_additional) then
Append the operations selected by Sequence( )
to selected_operations;
Append the variables selected by Sequence( ) to
var_sequence;
Remove duplicate variables from the var_se-
quence;
Compute var_count;
var additional � NC±var count� ST
ST
;
else
for (each operation selected by the Sequence( ))
Compute common_var;
end for
Sort the operations in the descending order ac-
cording to their common_var value;
for (each operation in the sorted list)
if (var_operation-common_var 6 var_addi-
tional)
Append the operation to selected_opera-
tions;
Append the (non-common) variables corre-
sponding to the operation, to var_sequence;
Update var_count;
var additional � NC±var count� ST
ST
;
else
Break the for loop;
end if
end for
var_additional � 0;
end if
end while
NCTR � var count;
Selected variables � var sequence
Return NCTR, selected_operations selected vari-
ables;
End Optimal_Sequence
5.2.1. Example 3
Let us consider Example 1 for illustrating opti-
mal_sequence( ). We obtained the FU allocation in
Example 1. As illustrated in Example 2, the ®rst call to
sequence( ) gives us a list of variables as c, d, v2 and v8.
Thus, var_count is 4. Assuming a shift time of 1 control
step, var_additional will be 2. The next call to se-
quence( ) returns variables v9 and v12. These variables
correspond to operation �9 and �12. This implies that�9 from FU1 and �12 from FU2 have the maximum
number of variables that are common with the previ-
ously obtained variables c, d, v2 and v8. The new set of
variables is appended to the previous ones. Thus, we
obtain var_count as 6 and var_additional as 0. Hence,
®nally we select operations �2, �8, �9 and �12, and the
corresponding variables c, d, v2, v8, v9, v12. The number
of CTRs required is 6.
5.3. Algorithm to ®nd out instants and order of shifting
variables
The procedure order( ) determines the order and
instants (in terms of control steps) in which the variables
must be shifted out. The variables are shifted out in the
same order as they are assigned and strobed in the
CTRs. Our objective is to order the variables in such a
way that there is no discontinuity in the shifting. Thus,
before the contents of a CTR are shifted out completely,
the next CTR in sequence should have captured a
variable to be shifted out in that pass.
It may be noted that the pass time may be greater
than NC. Thus, an instant that exceeds NC refers to the
next cycle in the same pass. For example, if there are six
control steps in an SDFG and the variables needed to be
shifted out to test all the FUs once are 10. Assuming
ST � 1, the pass time would be 10 control steps. An
instant (step) 8 would mean the second control step in
the next cycle. However, the next pass would start at the
beginning of the third cycle, which actually will be re-
ferred to as step 1 of the second pass.
Once if a variable is captured in a CTR, it can be
shifted out any time in the pass. Hence, to avoid the
discontinuity, we sometimes delay the shifting of all the
variables considered before. The gap created at the be-
ginning can be ®lled with the variables of the previous
pass, if needed. However, this might result in a very
negligible time wastage in the ®rst pass only, since there
will be no previous pass.
In the algorithm, we have used a variable ®rst_step. It
indicates the control step at which the ®rst variable
starts shifting out in a pass. The variables before this
control step belong to the previous pass. By knowing
®rst_step, and the order in which variables are shifted
out, we ®nd out the instant at which a variable will be
shifted out. The variables in the CTRs are assigned in
the same order as they are shifted out.
Procedure Order( )
Input: SDFG;
2102 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106
Output: Order in which variables are shifted out, ®rst
step from where shifting the variables of a new
pass starts, CTR assignment;
/*
Let ST be the shift time for each register, ®rst_step
be the control step from which the variables of a
pass start shifting out, shift_step be a temporary
variable to hold the value of control step where a
new variable can be shifted, var_step be the control
step where a variable is generated, order_list be the
list of variables in the order they will be shifted out,
NCTR be the number of CTRs.
*/
Begin
order_list � fùg;Call optimal_sequence( ) to select the operations
and obtain NCTR;
Arrange the selected operations in a sequence in
which they appear in the SDFG (left to right, then
top to bottom);
®rst_step � control step of the ®rst operation in the
arranged list;
shift_step � first_step;
for(each operation in the arranged list)
for (each variable in the operation)
if (the variable does not exist in the order list)
Append the variable to the order_list;
var_step � control step of the operation;
if (the variable is an output of the operation
under consideration)
Increment var_step by one;
end if
if(var_step > shift_step)
®rst_step � first step� var_step-shift_step;
/* shift the ®rst_step */
end if
shift_step � shift step� ST;
end if
end for
end for
Assign the variables in the order_list to NCTR
number of CTRs;
Return order_list, ®rst_step, CTR assignment;
End Order
5.3.1. Example 4
Let us consider Example 1 for illustrating the pro-
cedure order( ). As illustrated in Example 3, opti-
mal_sequence( ) selects the variables c, d, v2, v8, v9 and
v12. Fig. 7 shows these variables in the control steps in
which they are generated. The pass is six control steps. If
®rst_step is taken as one, the variables c, d and v2 will be
shifted out in control steps 1±3, respectively, However,
since v8 is available only in control step 5, there would
be a discontinuity in the shift sequence at control step 4.
This unordered sequence is shown in the ®gure. Proce-
dure order( ) shifts the ®rst_step to second control step.
Thus, the variables c, d, v2, v8, v9 and v12 will be shifted
out in control steps 2 through 7, respectively. The con-
trol step 7 for the variable v12 implies the ®rst control
step of the next pass. It may be noticed that the gap
created in ®rst control step of a pass is utilized by v12 of
the previous pass. The variables v12, c, d, v2, v8 and v9
are assigned to CTR # 1 through 6, respectively.
6. Experimental results
We have implemented the proposed methodology
using ÔCÕ language. Table 1 shows the experimental re-
sults for Example 1. We have also implemented our
synthesis system on di�erent benchmark examples. The
examples considered are the ®fth-order digital elliptic
wave ®lter (EWF) and di�erential equation from Ref.
[14], and Tseng from Ref. [15].
6.1. Fifth-order digital elliptic wave ®lter
The SDFG of the ®fth-order EWF is adopted from
Ref. [14] and depicted in Fig. 8. The EWF consists of 26
additions and eight multiplication operations scheduled
over 14 control steps. We consider the shift time as one
control step. Our FU allocation is shown below:
FU1��� � �1; �7; �8; �11; �16; �18; �23; �26;
FU2��� � �2; �3; �5; �10; �13; �14; �17; �19; �21; �24;
FU3��� � �4; �6; �9; �12; �15; �20; �22; �25;
FU4��� � �1; �2; �3; �4; �5; �6; �7; �8:
Fig. 7. Generation and ordering of shift sequence.
A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2103
Table 2 provides a comparison with PHITS-NS [14].
The FU allocation of PHITS-NS is shown below:
FU1��� � �1; �3; �5; �7; �8; �9; �11; �12; �14; �17;�19; �22; �24;
FU2��� � �2; �4; �6; �10; �13; �15; �16; �20; �23; �26;
FU3��� � �18; �21; �25;
FU4��� � �1; �2; �3; �4; �5; �6; �7; �8:Each FU is tested twice using the present approach. We
require 14 variables to be shifted out in a pass time of
NC (14 control steps). In order to test each FU twice,
the allocation of PHITS-NS requires 16 variables to be
shifted out, and hence takes more testing time.
6.2. Di�erential equation example
The SDFG for the di�erential equation (di�eq) exam-
ple is shown in Fig. 9. The experimental results for the
di�eq is summarized in Table 4. We consider the shift
time as one control step. Our FU allocation is shown
below:
FU1��� � �1; �2; FU2�ÿ� � ÿ1; ÿ2;
FU3�<� � <1; FU4��� � �1; �4; �5;FU5��� � �2; �3; �6:
Table 3 provides the comparison of results with Lyra
and Aryl taken from Ref. [14]. The FU allocation of
Lyra and Aryl is shown below:
FU1��� � �1; �2; FU2�ÿ� � ÿ1; ÿ2;
FU3�<� � <1; FU4��� � �1; �4; �6;FU5��� � �2; �3; �5:
Table 1
The experimental results for Example 1
Resource type Results obtained
FU allocation FU1��� � �1, �4, �5, �8, �9, �11
FU2��� � �2, �3, �6, �7, �10, �12
Operations selected for testing �2; �8; �9; �12
Variables in the shift sequence c, d, v2, v8, v9, v12
Number of CTRs 6
Number of times each FU is tested in a pass FU1(�):2 FU2(�):2
Fig. 8. SDFG for the ®fth-order EWF.
Table 2
Comparison of results for the EWF example
Resource type Proposed method PHITS-NS
Operations selected for testing �5, �6, �7, �1, �2, �9, �10, �11 �8, �10, �18, �1, �2, �9, �15
Variables in the shift sequence x9, f, b, T26, x1, t33, T39, g, e, con, x2,
x3, d, x4
b, x2, d, e, x9, IN, x11, x14, con, x4, x,
x12, x3
Number of CTRs 14 13a
Number of times each FU is tested
in a pass
FU1(�):2 FU2(�):2 FU3 (�):2 FU4(�):2 FU1(�):2 FU2(�):2 FU3(+):1a FU4(�):2
a To test FU3(�) twice will add three more variables resulting to total 16 variables.
2104 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106
Each FU is tested once. We require 10 variables to be
shifted out, whereas, the allocation of Lyra and Aryl
requires 11 variables to be shifted out.
6.3. Tseng example
This is the straight-line code example adopted from
Avra [15], and modi®ed as Tseng example in Ref. [14].
The SDFG is shown in Fig. 10. The operations are al-
located in three adders, one multiplier, one subtractor,
one logical OR and one logical AND FUs. We consider
the shift time as one control step. Our FU allocation is
shown below:
FU1��� � �1; �4; FU2��� � �2;
FU3��� � �3; FU4�ÿ� � ÿ1;
FU5�j� � j1; FU6�&� � &1; FU7��� � �1:
The comparisons of experimental results are sum-
marized in Table 4. The FU allocation of with PHITS-
NS [14] is shown below:
FU1��� � �1; �2; FU2��� � �3;
FU3��� � �4; FU4�ÿ� � ÿ1;
FU5�j� � j1; FU6�&� � &1; FU7��� � �1:
The FU allocation of Facet taken from Ref. [14] is
shown below:
FU1��� � �1; �3; FU2��� � �2;
FU3��� � �4; FU4�ÿ� � ÿ1;
FU5�j� � j1; FU6�&� � &1; FU7��� � �1:
Each FU is tested once. We require 12 variables to be
shifted out in a pass time of 12 control steps. Thus, each
pass is three cycles long. The allocations of PHITS-NS
Fig. 9. SDFG for the di�erential equation example.
Table 3
Comparison of results for the di�eq example
Resource type Proposed method Lyra and Aryl
Operations selected for testing �1, �1, <1, ÿ1, �6 �1, �1, <1, ÿ1, �2Variables in the shift sequence u, dx, v1, x, x', A, ctrl, v3, v5, v7 u, dx, v1, x, x', A, ctrl, v3, v5, #3, v2
Number of CTRs 10 11
Number of times each FU is
tested in a pass
FU1(�):1 FU2(ÿ):1 FU3(<):1 FU4(�):1 FU5(�):1 FU1(�):1 FU2(ÿ):1 FU3(<):1 FU4(�):1FU5(�):1
Fig. 10. SDFG for the Tseng example.
Table 4
Comparison of results for the Tseng example
Resource type Proposed method PHITS-NS Facet
Operations selected for testing �1, �1, ÿ1, �2, �3, j1, &1 �1, �1, ÿ1, �3, �4, j1, &1 �1, �1, ÿ1, �2, �4, j1, &1
Variables in the shift sequence v1, v2, v3, v6, v7, v4, v5, v9, v8,
v2', v11, v1'
v1, v2, v3, v6, v7, v4, v5, v9, v8,
v2', v11, v1', v10
v1, v2, v3, v6, v7, v4, v5, v9, v8,
v2', v11, v1', v10
Number of CTRs 12 13 13
Number of times each FU is
tested in a pass
FU1(�):1 FU2(�):1 FU3(�):1
FU4(ÿ):1 FU5(j):1 FU6(&):1
FU7(�):1
FU1(�):1 FU2(�):1 FU3(�):1
FU4(ÿ):1 FU5(j):1 FU6(&):1
FU7(�):1
FU1(�):1 FU2(�):1 FU3(�):1
FU4(ÿ):1 FU5(j):1 FU6(&):1
FU7(�):1
A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106 2105
and Facet require 13 variables to be shifted out. Also it
must be noted that these approaches will require four
cycles, as the pass time is 13 control steps.
7. Conclusion
This paper endeavors a new approach to achieve the
concurrent testability. The testing methodology and
synthesis schemes are presented. The methodology uti-
lizes an external-testing unit to which the input and
output variables of the CUT are shifted out. An FU
allocation technique is presented that allows each FU
to be tested in a shorter time. This is made possible
by minimizing the number of variables required to be
shifted out from the CUT. Algorithms are presented for
obtaining the order in which variables are shifted out
and stored in the CTRs. This information can be utilized
for synchronizing the testing unit with the CUT. Pro-
posed approach is applied on benchmark examples and
the results are compared with some approaches available
in the literature.
Acknowledgements
Kuwait University Research Grant EE-081 sponsors
this work.
References
[1] Chen CH, Yuen JT. Concurrent test scheduling in built-in
self-test environment. IEEE Design Test Comp 1992,
p. 256±9.
[2] Saluja KK, Sharma R, Kime CR. Concurrent comparative
testing using BIST resources. International Conference on
Computer Aided Design 1987. p. 336±9.
[3] Saluja KK, Sharma R, Kime CR. A concurrent testing
technique for digital circuits. IEEE Trans Comp Aided
Design 1988;7(12):1250±60.
[4] Singh R, Townssend M, Knight JP. Concurrent testing of
digital ASICs synthesized from data-¯ow graphs. Sixth
Workshop on New Directions for Testing, Canada, 1992.
p. 87±95.
[5] Flottes ML, Hammad D, Rouzeyre B. Automatic synthesis
of BISTed data paths from high level speci®cation.
European Design and Test Conference 1994. p. 591±8.
[6] Harris IG, Orailoglu A. SYNCBIST: synthesis for con-
current built-in self-testability. International Conference
on Computer Design 1994. p. 101±4.
[7] Swaminathan G, Aylor JH, Johnson BW. Concurrent
testing of VLSI circuits using conservative logic. IEEE
Trans Comp 1990, p. 60±5.
[8] Sharma R, Saluja KK. An implementation and analysis of
a concurrent built-in self-test technique. International
Symposium on Fault-Tolerant Computing 1988. p. 164±9.
[9] Sun X, Serra M. Merging concurrent checking and o�-line
BIST. Proceeding International Test Conference 1992. p.
958±67.
[10] Ismaeel AA, Bhatnagar R, Mathew R. Modi®cation of
scheduled data ¯ow graph for on-line testability. Micro-
electron Reliab 1999;39:1473±84.
[11] Ismaeel AA, Dhodhi MK, Mathew R. Assignment and
allocation of highly testable data paths under scan
optimization. Integrat VLSI J 1996;21(3):191±207.
[12] Singh R, Knight J. Concurrent testing in high level
synthesis. International Conference on Computer Design,
1994. p. 96±103.
[13] Abramovici M, Breuer MA, Friedman AD. Digital
systems testing and testable design. New york: Computer
Science Press, 1990.
[14] Lee TC, Wolf WH, Jha NK. Behavioral synthesis of highly
testable data paths under the non-scan and partial scan
environments. Proceeding 30th Design Automation Con-
ference 1993. p. 292±7.
[15] Avra L. Allocation and assignment in high level synthesis
for self-testable data paths. Proceeding IEEE International
Test Conference 1991. p. 463±72.
2106 A.A. Ismaeel et al. / Microelectronics Reliability 40 (2000) 2095±2106