Automatic Test Program Generation Using Executing-Trace-Based Constraint Extraction for Embedded Processors

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Automatic Test Program Generation UsingExecuting-Trace-Based Constraint Extraction

for Embedded ProcessorsYing Zhang, Member, IEEE, Huawei Li, Senior Member, IEEE, and Xiaowei Li, Senior Member, IEEE

Abstract— Software-based self-testing (SBST) has been apromising method for processor testing, but the complexity ofthe state-of-art processors still poses great challenges for SBST.This paper utilizes the executing trace collected during executingtraining programs on the processor under test to simplifymappings and functional constraint extraction for ports of innercomponents, which facilitate structural test generation withconstraints at gate level, and automatic test instruction generation(ATIG) even for hidden control logic (HCL). In addition, forsequential HCL, we present a test routine generation techniqueon the basis of an extended finite state machine, so that structuralpatterns for combinational subcircuits in the sequential HCL canbe mapped into the test routines to form a test program. Exper-imental results demonstrate that the proposed ATIG methodcan achieve good structural fault coverage with compact testprograms on modern processors.

Index Terms— Constraint extraction, instruction testing,processor self-testing, software-based self-testing (SBST), testprogram generation.

I. INTRODUCTION

W ITH technology shrinking into the nanometer scale,hardware defects are becoming more common, threat-

ening product lifetime seriously [1]. It is thus desirablefor microprocessor manufacturers to provide an online andeffective self-testing method, and satisfy the customers withadvanced reliable demand. Software-based self-testing (SBST)was proposed for this purpose, which executes test programon embedded processors and tests the processors themselvesand other resources on a system-on-chip device under thefunctional model [2]. Recently, the methodologies for SBSThave been intensively studied, but fewer of them show theirscalability directed to complex control logics with a largeproblem space. Taking hidden control logics (HCLs), for

Manuscript received October 9, 2011; revised April 13, 2012; acceptedJune 15, 2012. This work was supported in part by the National NaturalScience Foundation of China under Grant 61176040 and Grant 60921002, theNational Basic Research Program of China (973) under Grant 2011CB302501,and the Swedish Foundation for Strategic Research under Project RIT08-0056on Fault-Tolerant and Secure Automotive Embedded Systems.

Y. Zhang is with the State Key Laboratory of Computer Architecture,Institute of Computing Technology, Chinese Academy of Sciences, Beijing100190, China, and also with the Embedded Systems Laboratory, LinköpingUniversity, Linköping SE-581 83, Sweden (e-mail: [email protected]).

H. Li and X. Li are with the State Key Laboratory of Computer Architecture,Institute of Computing Technology, Chinese Academy of Sciences, Beijing100190, China (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2012.2208130

example, since they are not visible to assembly programmersand are triggered only under special conditions, the existingmethods have not yet explored them sufficiently.

The mainstream SBST methods combine functional con-straint extraction and gate-level constrained automatic testpattern generation (CATPG) to develop test programs. At first,SBST was proposed with manual constraint extraction, andrandom test patterns were imposed on the components of theprocessor [2], called SBST with random pattern (RSBST).Later, a scalable SBST [3] was presented for more com-plex processors, which applied statistical regression to extractinstruction-level constraints for CATPG [4]. Arguing againstthe efficiency of constrained test generation, learning methods[5] were proposed in a simulation-based method. However, thegate-level CATPG has not been implemented on HCLs so far.

Due to the high complexities of gate-level implementation,many researchers shifted their attention to register-transfer(RT) level for generating targeted instruction sequences. Thedeterministic test at RT level [6] has been successful on aseries of normal functional components. For example, thedata input configuration [6] for the register bank compressestest patterns greatly but with less fault coverage loss, it isthus widely applied in designing SBST programs. In [7], asystematic SBST method was proposed that enhances existingSBST programs to comprehensively test the pipeline logic. Forthe combinational HCL, the forwarding unit, it is modeled in[7] as some verification cases that were sensitized throughinserting several empty instructions among instructions withdata dependency. However, there is usually no guarantee thatverification tests at RT level can achieve a certain fault cov-erage level for high-quality manufacturing test [8]. To furtherimprove test performance, multiple-level information can beused [9], [10]. To achieve fault coverage that is comparable tothat of the full-scan method, a hybrid SBST method combinesthe deterministic SBST methodologies with verification-basedself-testing programs [11], while another hybrid SBST methodcombines a deterministic test program that explores differentlevels of information with abundant random instructions [9].Besides, bounded model checker provides another way for theRT-level SBST [12], but it still requires further improvementfor the time-out problem.

So far, complex sequential HCLs have never been targeted,and they bring great challenge to SBST because both sequen-tial ATPG and functional test methods cannot deal with themeffectively.

1063–8210/$31.00 © 2012 IEEE


2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

This paper utilizes the executing trace collected duringexecuting training programs on the processor under test(PUT) to extract constraints, and deal with HCL in the PUT.An automatic test program generation method is proposedwith executing-trace-based constraint extraction for embeddedprocessors, which has the following contributions. First, exe-cuting trace generated by an instruction-level simulator greatlyhelps the identification of mappings between trace signals andports of inner components in processors. Second, accordingto the mappings, automatic test instruction generation (ATIG)with CATPG can detect the faults beyond the checking ofverification rules on combinational control logic. Third, forsequential control units, structural patterns for combinationalsubcircuits in the units obtained by ATPG are mapped intofunctional routines developed from the extended finite statemachine (EFSM). It significantly expands the sequential depthfor pattern generation, and hence provides an effective wayto detect faults that needs many time frames to activate orpropagate.

This paper is organized as follows. Section II covers thenecessary background of test generation considering functionalconstraint and HCL in processors. In Section III, we discussthe executing-trace-based constraint extraction method, whichapplies the instruction-level simulator to generate executingtrace, and then uses a decision tree algorithm with directedrandom checking to simplify mapping extraction. Section IVpresents the automatic test program generation methods forboth combinational and sequential HCL using the mappings.Experimental results are given in Section V. Finally, weconclude this paper in Section VI.

II. BACKGROUND OF FUNCTIONAL CONSTRAINT

EXTRACTION AND HCL

A. Functional Constraint Extraction

With the powerful capability of online and offline faultdetection without the need of an expensive high-speed auto-matic test equipment (ATE), SBST has been a promisingmethod for processor testing. First, the test code (instructions)and test data are downloaded slowly from a low-speed ATE orsystem memory in Fig. 1 [13]. Second, the processor executesthe instructions, and tests the component under test (CUT) at-speed. Third, test responses are written back to the memoryfor observing.

One of the keys to SBST is to extract functional constraints,as shown in Fig. 1. Because instructions instead of scan chainsare exploited to impose test patterns on the CUT, test patternsmust be generated under functional constraints, or else theycould not be mapped to valid instructions. Because instruction-level constraints can be easily obtained through mapping thevalue range of instructions onto the component ports, theconstraint extraction can be simplified to extract mappingsbetween the instructions and the component ports.

Considering functional constraints, there are four classicmethods that can generate SBST programs. First, statisticalregression [3] is applied to obtain mappings between settablefields on instruction templates and component ports, whilethe value ranges of each parts in the instruction templates

Fig. 1. Framework of SBST [13].

are imposed on the component ports as instructional-levelconstraints according to the mappings. However, regressionis not suitable to discrete logic. Second, Boolean and/or arith-metic learning [5] are used to build learning models betweensimulation I/O data and component signals, and then themodels are used to justify ATPG-generated patterns. However,some CUT ports are often unrelated to I/O data, so that theirlearning models do not exist. So far, both of them have notapplied to HCLs.

Third, the deterministic test at RT-level [6] takes advantageof the inherent regularity of functional components, and deli-cately designs test programs to achieve considerably high faultcoverage on functional components with compacted patterns.For HCL, there are some structural faults that are beyondthe checking of verification rules, but also cause chip failure.Fourth, bounded model checker is used in SBST [12], wherethe processor is modeled as several conjunctive normal formsfor k time frames, test patterns are transferred as lineartemporal logic, and an satisfiability solver is applied to solvethe problems. However, bounded model checker is efficientonly if k is small [14], while state-of-the-art processors oftenneed a large k. It thus easily results in time-out problems.

B. HCL

In state-of-the-art processors, HCLs are often used to avoidpipeline stall and enhance overall performance, meanwhilethey have been covering increasingly large area. Because theyare often invisible to the assembly programmer and onlytriggered by special conditions, it is difficult to test them.

As a combinational HCL, the forwarding unit is usedto prefetch the results of the former instructions and avoidpipeline stall caused by data dependency. The functions of theunit are described with processor miniMIPS, which containsfive pipeline stages: prefetching stage (PF), instruction fetch-ing stage (EI), decoding stage (DI), executing stage (EX), andmemory stage (MEM). As shown in Fig. 2(a), add requires thefollowing results: $1 of sub and $2 of lw, but they have notbeen written back into $1 and $2 yet, so add has to wait. The


ZHANG et al.: AUTOMATIC TEST PROGRAM GENERATION 3

I5 I4 I3 I2 I1

I49 I5 I4 I3 I2

I50 I49 I5 I4 I3

I51 I50 I49 I5 I4

I6 clear clear clear I5

PF EI DI EX MEMN

N+1

N+2

N+3

N+4

lw $30 $2 0x0004

jr $12

add $3 $1 $2

sub $1 $11 $13

lw $2 $30 0x0004

PF:

EI:

DI:

EX:

MEM:

(a) (b)

Fig. 2. Principles of (a) forwarding and (b) branch prediction.

forwarding unit can help advancing the value of $1 from thearithmetic logic unit (ALU) (i.e., at stage EX) and the valueof $2 from the memory unit (i.e., at stage MEM) to add, asshown in Fig. 2(a) so that add is no longer stalled.

It is difficult to test the forwarding unit, because it is onlysensitized by an instruction group with data dependency, underthe condition that the required data have not been writtenback into the register bank. In addition, the forwarding unitoften contains many comparators whose structures are notregular. Without implementation detail, verification-based RT-level methods cannot test them effectively and would losesome fault coverage.

As a sequential HCL, the predictor unit is used to offer thenext address for branch instructions. When a branch instructionhappens, the pipeline must be stalled for several cycles till thebranch condition is obtained. The unit can thus greatly enhancethe pipeline performance. As shown in Fig. 2(b), I5 is a branchinstruction, and the predictor offers I49 as its next instruction.If the prediction is right, the pipeline is never stalled, elseI49, I50, and I51 are cleared and never affect the final result.Generally, the predictor unit contains three major functions.

1) If the current instruction address of stage PF matcheswith a record in the prediction table, the branch addressof the record is set as the next address of stage PF.

2) If a new branch instruction in stage DI does not have arecord in the prediction table, a new record is generatedand the oldest record is erased.

3) If the prediction is right in stage EX, nothing hap-pens, else, the pipeline is cleared to delete the wronginstructions.

Because the predictor unit is sequential, and its innerstructure is very complex, it is a great challenge to test iteffectively.

III. EXECUTING-TRACE-BASED

CONSTRAINT EXTRACTION

The existing SBST methods extract mappings just frominstructions themselves. To the best of our knowledge, weare the first to obtain information beyond instructions duringsimulation for mapping extraction. As shown in Fig. 3(a),because the run-time information contains so much informa-tion, the component signals (i.e., ports of components) areoften included directly within the run-time information. There-fore, the powerful algorithms that predict the relationshipsbetween instructions and component signals in the existing

Instruction EIRExecutingTrace

Extending Information

RegressionLearing

BMCPowerfulAlgorithms CUT

Gen PC Fetc

h

Dec

ode

ALU

MU

XM

UX

MEM

PF EI DI EX MEMForward

Predict

EIR1 EIR2 EIR3 EIR4 EIR5

ForwardPredict

Predict(PF)Predict(DI)

Predict(EX) Forward()

(a)

(b)

Determinstic Test

add sub lw

HardwareEmulator

data1

data2

EIR(MEM)EIR(EX)EIR(DI)EIR(EI)EIR(PF)

Fig. 3. Simplifying the mapping extraction using executing trace. (a) Extend-ing information for extracting mappings. (b) Architecture-level emulater forminiMIPS.

methods are no longer required, and it thus greatly reducesthe complexity of mapping extraction.

A. Extending Instruction Information With the Executing Tracefrom the Simulator

Since the instruction-level simulator is widely used in high-level design, it can be easily modified to collect the run-timeinformation of programs, which is called the executing traceof programs. In the executing trace, the instruction in eachpipeline stage can be extended to an expanded instruction(EIR) [15], which can provide direct access to the I/O portsof the inner components in a processor

EIR = (I, D1, D2, R, F, A, op). (1)

As shown in (1), an EIR contains seven elements, instruc-tion itself I, two operands D1 and D2, result R, flag F,address A, and instruction type op. Because every innercomponent often completes a micro-operation that uses someelements of EIRs and changes the others, a one-to-one map-ping is thus formed between their ports and the elementsof EIRs

ET = ({EIR(stagei )}, IS) (2)

where i = 1, . . . , number of pipeline stages, {EIR(stagei )} isthe set of EIRs in all pipeline stages of one cycle.

As shown in (2), an executing trace for one cycle, denotedas ET, contains not only EIRs in all pipeline stages butalso the current processor inner state IS at this cycle, such



as the records in the predictor unit. As a result, even ifa component is mainly related to inner states instead ofinstructions themselves, its I/O mappings can also be obtainedfrom the executing trace. In conclusion, because plenty ofprocessor run-time information can be observed, the mappingscan be easily gained for inner components.

In Fig. 3(b), a simple simulator of processor miniMIPS,written in C++ language, is used as a case study to show theprocess of executing-trace generation, which is described asfollows.

1) The EIR is defined as a common-class EIR, whoseelements are I , D1, D2, R, F , A, and op as in (1).The functions of the class are the operations of EIRduring each pipeline stage [i.e., EIR(pipeline_stage)].For instance, EIR in stage PF would generate element Awith current address Ac and next address Anext, whichis the function of EIR(PF).

2) As a combinational HCL, the forwarding unit withoutinner states is defined as a function Forward(). In con-trast, the predictor unit with inner states, such as therecords for branch, is defined as another class Predict,whose inner states are its elements, while three functionsmentioned in Section II are defined as its functions,Predict(PF), Predict(DI), and Predict(EX) in Fig. 3(b).

3) The processor is defined as the top class, which containsfive EIRs for the instructions in its five pipeline stages,and the class Predict as its elements, while it alsoincludes the main function process(), as well as thefunction Forward().

During the execution of process(), because the simulatorworks sequentially but the hardware works concurrently, thesimulator has to execute in the reverse pipeline order fromEIR(MEM) to EIR(PF), as shown in Fig. 3(b). In this way, theprevious instruction can use the results of the latter instructionsto maintain data dependency. Meanwhile, three functionsof the class Predict as well as the function Forward() areexcited during the execution of process(). Later, the elementsin every EIR and Predict are downloaded as the executingtrace. Finally, the EIRs are advanced into the next pipelinestages.

Based on the executing trace generated by the simulator,mappings for HCL can be extracted easily. Take the forwardingunit as an example. For the output port data1 of the for-warding unit, the previous methods must predict the resultsof instructions in pipeline EX and MEM through statisticregression or learning methods. Furthermore, they have todecide which value would be assigned to data1. The wholeprocess is of great complexity. However, since the proposedsimulator can execute the related operations and generatethe executing trace at high level, the required information iseasily obtained without the necessity to consider the complexinteractions. As shown in Fig. 3(b), the simulator executesthe function of these EIRs (lw, sub, and add) in the reversepipeline order. Later, the function Forward() is excited, whichanalyzes the element I of these EIRs and decides whether thedata dependency emerges. If it emerges, the simulator wouldassign R of the EIR sub or lw to the D1 of the EIR add, elseD1 of the EIR add remains the same. Finally, a one-to-one

New Training

Program

Unvisted signal Si as AT

Gen_Tree()

Exist unvisited signal?

Mapping is wrong?

Sample is empty or AT’s are the same ?

All VB in AB are used?

Gen new sample’based on VB

Start

Obtain Training List

End

Executing Trace Generation

op

I17 I17

11 0 1 0

No

Yes

No

Yes

1 0 1 0

I12

1 0

1 0

EX_adr[1]

addijal add

add

(correct instances , incorrect instance )

(8, 0)(29, 0)

(11, 0)(18, 0)

(6, 0)(18, 0)(17, 1)

(24, 0)(43, 0)

(12, 0)(23, 2) (25, 0)

(a)

(b)

YesNo

AT:

AB:

AB:

Find the Best AB

Yes

Run Simulastion for Trace

Generation

No

Return Root of the Tree as Mapping of A T

Directed Random Checking

TestProgram

Gen_Tree() on the new sample’

Gen_Tree()

Fig. 4. (a) Decision tree with directed random checking. (b) Directionalrandom checking for over-fitting problem.

mapping between data1 and D1 of EIR3 in stage DI canbe obtained easily by using the algorithm in Section III-B.In this way, the executing trace generated by an instruction-level simulator greatly helps the identification of mappingsbetween trace signals and ports of inner components inthe PUT.

B. Extracting Mappings Through Decision Tree With DirectedRandom Checking

1) Decision Tree With Directed Random Checking: Inthis paper, a decision tree algorithm with directed randomchecking is proposed to automatically extract mappingsbetween the executing trace and abundant component ports,as shown in Fig. 4(a). Since the decision tree (i.e., mappings)obtained excessively fits to the training programs, there maybe some overfitting problems [16] in building the decisiontree, which results in wrong mappings. It is thus necessaryto check the correctness of the mappings through directedrandom checking.

The training program is generated through constrained ran-dom simulation, which is often used in the verification ofprocessors. In this method, some instruction sequences thatcan toggle the CUT are generated as the models of randomprograms. For each model, the register numbers, operands,or the addresses of its instructions are randomized to togglethe ports of the CUT until no extra port is toggled. Therandom programs used in verification are usually too long tobe used in testing, but they can still be efficient for mappingextraction.



When the training program is executed, a training list isgenerated, including the signal list SL of the CUT ports froma signal simulator tool (such as ModelSim), and the list of exe-cuting traces TL generated by the instruction-level simulator.

Next, the signals in SL are visited one by one. Each time anunvisited signal Si is selected as the target attribute AT , whilethe bits of the executing trace are the condition attributes AC .In the training list, AC and AT are combined together as aninitial sample <AC , AT > according to the period cycle. Later,the algorithm Gen_Tree() is used to generate a decision treefor AT based on the current sample.

In function Gen_Tree(), if sample is empty or all AT ’s arethe same, then this is a leaf node, else, the best attribute AB

is chosen out from condition attributes AC (i.e., a bit in theexecuting trace) according to the lists of AT and AC in thecurrent sample. Later, AB is used as the current root node, andthe algorithm continues to build subtrees for AT . Specially, avalue VB in AB is chosen out, the records with value VB areextracted as a new sample’, and Gen_Tree() is implementedon the new sample’ recursively to build the sub-trees. Thedecision tree is available till all values in AB are dealt with.Finally, the root node is returned.

For example, the port EX_adr[1] of the forwarding unit isset as AT, which is combined with T as the initial samplebased on the cycle number. At first, op of EIR4 in stage EXis chosen out as AB , while sample is segmented accordingto VB of op. When VB is addi, I17 of EIR4 in stage EXis chosen out for the new sample’. Meanwhile, sample’ issegmented according to VB of I17 again. Later, two leaf nodesare obtained. The process continues, and finally the decisiontree is shown in Fig. 4(b).

The overfitting problem in this process is mainly causedby the insufficiency of the training program or noise. In thispaper, directed random checking is used to check these treesautomatically. Once the tree is available, the related conditionattributes are set according to each path in the tree, and othercondition attributes (i.e., other bits of the executing trace) arerandomized as a part of the test program. Later, the mappingof AT is checked with the test program. If the mapping isright, the algorithm goes to extract mapping for other ports,else, the test program is inserted into the training program,and the algorithm extracts the mapping for the port again.

With directed random checking, overfitting problems canbe greatly alleviated. Taking the port EX_adr[1] for exampleagain, according to the first path, op and I17 are set as addiand “1”, respectively, while other bits of EIR4 are randomizedto construct a test program. Since I17 is just the same as I12 ofEIR4 in stage EX when op is add, such “noise” results in anoverfitting problem, and the algorithm generates a wrong map-ping. However, after I17 is set as “1” and other bits of EIR4 arerandomly set several times, the noise is eliminated. As a result,one error instance is detected when checking the path. Finally,the test program is inserted into the training program, whereEX_adr[1] is actually related to I12 of EIR4 instead of I17.

2) Time Complexity and Mapping Classification: Thetheoretical time complexity for building the decision tree[Gen_Tree() in Fig. 4(a)] for one port of CUT mainly dependson the depth of the tree. For level s of the decision tree, its

TABLE I

EXECUTING-TRACE GENERATION AND MAPPING

EXTRACTION FOR THE FORWARDING UNIT

Amount

Training instruction count 11 485N : trace bit width 675

E: trace count 12 770Signal count 380

Trace-generating time (s) 8.391Mapping extraction time (s) 440.219

time complexity F(s) is shown in (3), where F(s−1) is thetime complexity of building the first (s−1)–level decision tree,{1, . . . , ts−1} stands for the node set at the (s−1)th level ofthe decision tree, ei (i = 1, . . . , ts−1) is the trace count ofthe sample under the i th node, and N is the bit width of anexecuting trace

F(s) = F(s−1)+ts−1∑

i=1

(ei + ei × (N − (s − 1)) + e2

i

). (3)

After building the first (s−1)–level tree, the algorithmchecks whether all signals in the sample are the same or emptyfor each node. If they are not the same or empty, the bestattribute is selected by comparing the list for the target attributeAT and the list for the other [N − (s−1)] AC ’s in the sample.Finally, based on the best attribute, the sample is sorted usingthe bubble sort algorithm to generate new samples. In the worstcase, it takes

∑ts−1i=1 (ei + ei×(N −(s −1))+e2

i ) times to buildthe s-level decision tree. With the increase of the depth s ofthe decision tree, the ts−1 increases exponentially. So the depthof the decision tree is the chief factor for the time complexity.

Because the executing trace is extended to cover all runninginformation, the signals of the ports often have one-to-onemaps to certain bits in the subsample after separating theinitial sample based on the instruction types. The depth ofthe decision tree for a one-to-one map is no more than two.Therefore, the real-time complexity F is shown in (4), whereE is the trace count of the initial sample, and t is the numberof instruction types. The algorithm sorts the samples accordingto the instruction type at first, checks if the list for AT in thesubsample for each instruction type are the same, empty, ordirectly equal to one of the other (N − 1)AC ’s

F = E2 +t∑

i=1

(ei×(1 + N − 1)) = E2 +t∑

i=1

N × ei . (4)

Due to the low time complexity, the algorithm can rapidlyextract the mappings for pipeline processors. As shown inTable I, the training program that toggles the forwarding unitin miniMIPS contains around 11 485 instructions. Because theforwarding unit relates only to stages DI, EX, and MEM, thealgorithm takes advantage of 675 bits of each executing tracein building the decision tree. The CPU times are also shownin Table I. As shown in Fig. 4(a), the total executing timecontains the time for executing-trace generation and mappingextraction, while the time for mapping extraction includesthe time for building the decision tree [i.e., Gen_Tree()] anddirected random checking.



alu pf ei di ex mem renvoi banc sys bus pred0

50

100

150

200

250

300

350

400

inner componnenct

sig

nal

nu

mb

erunrelatedstate-relatedinstruction-relatedfixed-value

Fig. 5. Signal types for inner components in miniMIPS.

Moreover, the method is scalable to more complex proces-sors. First, for most of the modern processors, the designersusually develop a cycle-accurate simulator. This simulator canbe modified to easily output all executing traces. With theexecuting trace, the depth of the decision tree can be reduced.Second, training instructions are only used to toggle the portsof the components instead of the internal signals of the CUT,which would not result in huge random programs. Third, thecurrent parallel decision tree algorithm has a capacity to handleterabyte data, so it is able to deal with mapping extraction oncomplex processors.

After analyzing the mapping, component signals (i.e.,ports) can be classified into four types: fixed-value signals,instruction-related signals, state-related signals, and unrelatedsignals. Fixed-value signals have a fixed value based on theinstruction type of the EIR. The instruction-related signals andthe state-related signals relate to certain bits of an EIR and aprocessor inner state, respectively. Other signals are unrelatedsignals, such as external signals.

The fixed-value signals, also called control signals (CSs),are very important because they are used as a guide toautomatically translate test patterns into instructions (this isintroduced Section V). Fig. 5 shows the type distributionof these signals on the inner components of the miniMIPSprocessor, where all components have CSs except for thepf unit. For the forwarding unit, there are 17 CSs that canindicate the instruction types in stages DI, EX, and MEM.For the predictor unit, there is one CS that means whether theinstruction in stage DI is a jump instruction.

IV. ATIG WITH THE MAPPINGS

With the accurate mappings between the executing traceand the ports of CUT, ATIG is proposed to translate validtest patterns into test programs automatically, even for HCL.As shown in Fig. 6, the constraints caused by the instructionset architecture are extracted based on the mappings at first.Under these constraints, valid test patterns can be generatedthrough CATPG.

In the following, ATIG is implemented on three differenttypes of components to generate test programs. For combina-

ISA mapping

Unit Type

RTLCATPG CATPG CATPG on sub-circuits

mapping mapping mapping

Patterns Patterns Patterns

Constraints

Instructions InstructionGroups

Test Routines

Functional Routines

EFSM

Controlling and Observing

Test Programs

Functional Units Sequential HCL

Combinational HCL

Fig. 6. Framework of ATIG.

tional functional units, CATPG is imposed on the whole unitsto generate test patterns [15]. According to the mappings, CSsare identified at first, and then the types of test instructions aredecided depending on their values, while left bits of the testpatterns are mapped into the other part of the test instructions.For combination HCLs, because they are controlled by severalinstructions, their test patterns are also mapped into instructiongroups. For sequential HCLs, test programs are generated inan indirect way. At first, functional routines are employedfrom the EFSM of the units to cover the functional pathsof the whole units. Meanwhile, CATPG is imposed on thecombinational subcircuits of the units instead of the wholeunits to obtain structural test patterns. Based on the mappingsbetween the ports of the internal subcircuits and the executingtrace, structural patterns from subcircuits are used to initializetest instructions of the functional routines and enhance faultcoverage.

Finally, complete executable test programs are constructedafter inserting, controlling, and observing instructions beforeand after the generated test instructions. Note that all testresponses are stored into the memory for observation. Forarithmetic/logic operation instructions, their results andflags are stored into the memory directly. For jump/branchinstructions, an indirect observing method [4], [17] is used,which puts a store instruction at the target address of thejump/branch instruction to make an observing point. Once thejump/branch instruction arrives at that address successfully,that store instruction will store a value at a certain address ofthe memory. In this way, the test response of the jump/branchinstructions can also be observed.

A. ATIG for Combinational Functional Units

There are two steps to automatically generate the test pro-gram for combination functional units. First of all, instruction-level constraints are extracted and imposed on CUT based onthe mappings. One type of instruction-level constraints is thespace of valid CSs, because the values of CSs have to be within



VC

ALU

D1 (0x15)

D2(0x33)

R(0x48)

OP (add)

CSs

001

add R3 R1 R2st R3 R5 0x08

lw R2 R5 0x04lw R1 R5 0x00 # 0x15 in (R5+0x00)

# 0x33 in (R5+0x04)

# 0x48 to (R5+0x08)

VC VC

100 00010101 00110011Pattern:

Program:

op CSsadd ---- 100… ---- …sub ---- 110

Mapping:

Fig. 7. ATIG on ALU.

the space that valid instruction types can set them in. Forcertain types of EIRs, their elements may have some numericarea, which is another type of instruction-level constraints. Forinstance, the operands of the signed arithmetic instruction, addcannot be 0x80, so the related signals are also forbidden totake on that value. All of these constraints can be translatedinto the virtual circuits on the ports of CUT. In this way, testpatterns are valid and can take place during normal operation.

In the following, based on the mappings, test patterns fromCATPG are easily translated into a test program. For example,the 8-bit ALU component has been imposed on virtual circuitsin Fig. 7. Later, a test pattern is generated for the constrainedALU in Fig. 7. The bit pattern “100” which relates to the CSsis used as a guide to generate the test instruction. Accordingto the mappings, instruction add can set the value of the CSsas “100”, so that add is used to generate the pattern. For theremaining signals relating to the two operands of the instruc-tion, two load instructions are inserted before the test instruc-tion to initiate the two operands, while their values are also setat right position as shown in Fig. 7. Finally, a store instructionis inserted after the test instruction, and stores the value of itsdestination register into the memory for observation.

B. ATIG for Combinational HCL

With the accurate mappings, ATIG can be directly imple-mented on combinational HCL. Compared with ATIG onthe functional components, test patterns are translated into atest instruction group instead of a single instruction. Thereare three steps to apply ATIG to combinational units. First,instruction-level constraints are extracted from the mappingsto generate structural patterns. Second, structural patterns aretranslated into an executing trace according to the mappings;meanwhile unassigned parts of the executing trace are initial-ized properly. Third, after inserting controlling and observinginstructions, test programs are generated whose test responsesare stored into memory for observation.

For example, the forwarding unit is shown in Fig. 8(a).Data from stage MEM, EX, and DI are assigned to the twooperands of the instruction in stage DI through the unit, whilethe process is under the control of register numbers adr1, adr2,and adr as well as the CSs.

First, structural patterns are generated by CATPG, wherefunctional constraints can be extracted automatically according

to the mappings. On one hand, the values of CSs depend on thetables in Fig. 8(b), which are modeled as virtual circuits in thegray blocks of Fig. 8(a). On the other hand, the value rangesof the EIR elements are also limited for some instructiontypes, for instance, the EIR elements D1 and D2 cannot be0x80 000 000 when the instruction type is add. As a result, theports read_data1 and read_data2 cannot be 0x80 000 000 forthe instruction add, whose virtual circuit is shown in the blackblocks of Fig. 8(a). Later, structural patterns are generatedfrom the CATPG, as shown in Fig. 8(c).

Second, CSs within these patterns are used to searchmatched instruction types in the tables of Fig. 8(b).For example, if the ports MEM_level, MEM_ecr, andMEM_adr[5] are “11”, “0”, and “0”, respectively, thematched instruction type is MTHI, and it is used as the EIRin stage MEM. In addition, instruction-related signals aremapped to other elements of the EIR. Meanwhile, someregister numbers and data are properly inserted into theunassigned parts of the trace. In Fig. 8(d), the register 29 isset as the first source register (I25 ∼ I 21) of MTHI, and itscontent must be 0x00000000. Because the function of MTHI

is to move the content of the source register to register HI,the content of HI has been set to 0x00000000 already.

Third, three lw instructions are used to initialize the sourceregisters 29, 21, and 3. Meanwhile, one sw instruction is usedto store the final result in the destiny register 6. After loadingthe required data into the data memory, the test program isexecutable as shown in Fig. 8(e). All steps are implementedautomatically, and it only takes around 0.016 s to generate thetest program for the forwarding unit.

Because structural patterns from CATPG cover alldetectable faults under normal operations, ATIG usually has abetter fault coverage compared to RT-level SBST methods. Thefollowing example demonstrates the advantage of the proposedATIG method. With the structural pattern in Fig. 8(c), the testprogram in Fig. 8(e) can detect the stuck-at-zero fault in thecomparator control1 as marked in the subgraph of Fig. 8(a),without exhausting the combinations of instruction types andregister numbers. Note that when the fault happens, the unitwould mistakenly judge that the destiny register (I20 ∼ I16) ofADDI is equal to the first source register (I25 ∼ I21) of AND,which causes data dependency. To “solve” the data depen-dency, data 0x00000000 instead of data 0xffffffff are set as thefirst operand of AND. Finally, the result in register 6 is equalto 0x00000000, and the fault is detected. However, since ADDI

and AND do not have data dependency in the fault-free situ-ation, verification-rule checking would not consider this case.Moreover, the combinations of instruction types and registernumbers are very huge, therefore it is impossible to exhaust allsuch combinations to check the fault-induced data dependency.Therefore, RT-level SBST methods would not cover thesenonfunctional cases, causing some fault-coverage loss.

C. ATIG for Sequential HCL

Because sequential control logic in state-of-the-art proces-sors often has a complex structure, it is even more difficult toapply SBST to sequential HCL than to combinational HCL.



use1 use2 adr2[5] DI_level ecr DI_adr[5] DI(type) 1 1 0 10 1 0 ADD… … … … … … … 1 1 0 10 1 0 AND1 0 0 11 0 0 MTHIEX_level EX_ecr EX_adr[5] EX(type)

10 1 0 ADD10 1 0 ADDI10 1 0 ADDIU… … … …

MUX

data2

MUX

control2control1

DI EX MEMdata1

(a)

CSsCSs

MEM_level MEM_ecr MEM_adr[5] MEM(type)10 1 0 ADD… … … … 10 1 0 ANDI11 0 0 MTHI11 0 0 MTLO

MEM_level MEM_adr MEM_ecr ……11 000000 0 ……EX_level EX_adr EX_ecr ……10 000001 1 ……DI_level DI_adr DI_ecr ……10 000110 1 ……

Con

stra

ined

A

TPG

op I25~I21 I20~I16 I15~I11 …… data1 data2 res

MTHI 29 -- -- …… 0x00000000 -- 0x00000000ADDI 21 2 -- …… 0x00000000 0x0000 0x00000000AND 3 3 6 …… 0xffffffff 0xffffffff 0xffffffff

(d)

(e)

lw $29 $30 0x0004lw $21 $30 0x0008lw $3 $30 0x000CMTHI $29ADDI $2 $21 0x0000AND $6 $3 $3sw $30 $6 0x8000Stuck-at 0

…

EX_adr[0]

EX_adr[4]

adr1[0]

adr1[4]

5-bit comparator(b)

(c)

CSs

Fig. 8. ATIG for (a) forwarding unit, (b) mapping for CSs in DI, EX, and MEM, (c) test pattern from constrained ATPG, (d) executing trace for test, and(e) test program.

In this section, functional routines developed from EFSMare filled with structural patterns for subcircuits in sequentialcontrol logic to achieve high fault coverage.

Sequential ATPG for sequential control logic usually cannotachieve good coverage. For example, the predictor unit isused to provide the next address for jump instructions, whichonly contains two major output ports, yet it has ten 32-bitcomparators, two 32-bit adders, and one 3-cell record table asshown in Fig. 9(a). Due to the low observability, it is verydifficult to test the unit. As shown in Table II, sequentialATPG for the predictor unit itself can only achieve 61.62%fault coverage with 1045 cycles of patterns, while the CPUtime exceeds 12 h. Moreover, many patterns are nonfunctional,which would not happen during normal operations.

1) Functional Routines Developed from EFSM: The weakand nonfunctional test patterns from sequential ATPG greatlyaffect the fault coverage of SBST on sequential control logic.Fortunately, the EFSM that is used in functional test providesanother chance. EFSM was proposed in [18] to generatefunctional patterns for sequential circuits, which is defined asa 7-tuple {S, I, O, D, F, U, T }, where

S set of symbolic states;I set of input symbols;O set of output symbols;D n-dimensional space D1 × · · · × Dn ;F set of enabling functions fi such that fi :D → {0, 1};U set of update transformations ui such that ui : D → D;T transition relation such that T : S× F × I → S×U ×O.For standard hardware description language (HDL) descrip-

tion, the EFSM can be generated automatically throughbuilding a statement tree [18]. After stabilizing the EFSM,

functional test patterns are generated through traversing alltransitions in the EFSM. However, many general HDL descrip-tions may not have specific states. In this case, every decisionnode is considered as a state, and the EFSM is built up throughgraph splitting [19], [20]. In these methods, functional test hasto cover all paths of the EFSM. However, many paths do nevercontribute to the final results, so testing them contributes littleto structural fault coverage.

Since only the output signals of EFSM (signals in set O)can be observable in testing, the paths of EFSM which donot contribute to the change of the output signals are uselessin testing. Then all the signals in set O are selected andcalled chosen variables for testing the EFSM. For the chosenvariables, a transition or a variable is contributive if it canaffect the chosen variables, else it is noncontributive.

To compact functional patterns, the noncontributive edges(i.e., the noncontributive transitions) are slicing away [21] withrespect to the chosen variables, and only the contributive edgesare used in the functional test. At first, any edge that outputsor affects the chosen variables directly is defined as a criticaledge, and functional pattern generation only starts from thesecritical edges. Later, the edges that affect the variables in thefunctions (F) of critical edges directly or indirectly are calledrelated edges. Related edges and critical edges construct thecritical paths in the EFSM for functional tests.

For the EFSM of the predictor unit, the whole EFSMcan be segmented to several independent sub-EFSMs to testthe functional paths circularly. According to the functionsdescribed in Section II-B, the EFSM of the predictor unitis segmented to PF_EFSM, DI_EFSM, and EX_EFSM, asshown in Fig. 9(b), while the shared variables among these



S1

S2

e1 e2 e3 e4

S3

e5 e6

S8

S9

e24 e25S5

S6

e14 e15

R1 E adr L braR2 E adr L braR3 E adr L bra

S4

e7 e8

S4

e10 e11 e12 e13

S6

16 e17 e18 e19

S7

e20 e21 e22 e23

Pred_table (T)

E adr L braE adr L braE adr L bra

#1#2#3

add2

adr

0x00000004 adr+4

cmp4cmp5cmp6cmp7

add10x00000004

code

PF_a

dr

DI_

adr

EX

_adr

index1index2index3

ex_b

ra

Index1(I1)

Index2(I2)

confirm?=L

e9

Index3(I3)

ex_bra?=braadd

bad_prednext?=3

bad_

bra

PF_b

ra

confirm

DI_EFSM

PF_EFSM

EX_EFSM

(a)

(b)

(c)

T0

T4

T2 T6

DI_adr+4

cmp0

L0 L1

confirm==1

confirm==0

confirm==0confirm==1

initial

bra?=ex_bra

confirm==0

T0

T1(DI_adr+4)

(EX_bra)

(ex_bra)(adr+4)

(bra)

(bra)

(bra)

T2

T3

T4T5

1initialT6

T7

T2 T4 T6 T7

T0 T1 T3 T5

Fig. 9. Functional test generation based on EFSM for the predictor unit.(a) Predictor framework. (b) EFSM for predictor unit. (c) Transmission forEX-EFSM.

sub-EFSMs compose a shared table, called pred_table (T ). Toguarantee the process is valid, the shared table has to includeall contributive variables shared by the three sub-EFSMs. Thenif a sub-EFSM has output signals in the original EFSM, thesesignals can be the chosen variables of the sub-EFSM. If thereis no output signals in the sub-EFSM, its chosen variablesare those variables shared in the shared table. In this way,functional test patterns are generated from each small sub-EFSM instead of the whole EFSM, it is thus easy to generatea compacted loop to cover all critical paths in the EFSM.

Because DI_EFSM never transports signals out of thepredictor, the variables in the shared table are set as its chosenvariables. The edge e6 that inserts a new record into the nextcell of pred_table is a critical edge, while e4, e7, and e8 are therelated edges of e6. Only 2 functional paths have to be coveredin the functional routine, while there are totally 24 functional

TABLE II

SEQUENTIAL ATPG FOR PREDICTOR UNIT

Total fault 11 080

Detected fault 6324Fault coverage 61.62%

Total test generation time (h:m:s) 12:16:46Sequential pattern count 185

Total cycle count of sequentialpatterns

1045

paths in DI_EFSM. According to the description of the edges(given in the Appendix), the record for the current addressof the conditional or unconditional jump in DI (DI_adr) mustbe erased from pred_table at first. Later, a program loop isused to make the record of DI_adr inserted into every cellof pred_table. Taking condition jump bgez for example, thefunctional routine for testing DI_EFSM is given in Fig. 10(a),where Ac and Anext are the current and next address of theEIR, respectively.

For PF_EFSM, the next address for stage PF (PF_bra) isincluded in set O of the EFSM, so it is set as the chosenvariable. The edge e14 that outputs the chosen variable is acritical edge, and e10–e12 are the related edges. Only threefunctional paths out of the total eight functional paths need tobe covered by the routine for PF_EFSM. These paths are usedto locate a record for the current address of the instructionin stage PF (PF_adr). In the routine for testing DI_EFSM[Fig. 10(a)], these edges can be sensitized after the recordof DI_adr for bgez is inserted into each cell of pred_table.Therefore, related edges in PF_EFSM are also detected if bgezenters into stage PF in the next cycle.

For EX_EFSM, the right address of the bad predictionbad_bra is included in set O of the EFSM, so it is also setas the chosen variable. The edge e24 that outputs the chosenvariable is set as a critical edge, while e16–e18 and e20–e22are the related edges. Only 9 functional paths out of the total32 functional paths need to be covered by the routine forEX_EFSM. As shown in Fig. 10(a), when bgez enters intostage EX, the related edges e16–e18 are also tested. The otherrelated edges e20–e22 are mainly controlled by variable L thatdenotes whether the jump instruction happens at the last time.As shown in Fig. 9(c), the complete state graph is extractedfor L, where signal confirm denotes whether the jump willhappen at this time.

To test EX_EFSM completely, two routines are proposed totraverse all the transitions in Fig. 9(c). The first routine usesthe conditional jump bgez as the test instruction, as shownin Fig. 10(a). With the cooperation of instructions sltu andbgtz to provide suitable confirm signals, the routine traversesthe transitions from T0 to T5. The second routine deals withunconditional jump, which is not shown for simplicity.

Moreover, the routines can propagate the results of the innercircuits to the address bus through all possible data paths in theunit. On one hand, the results of two adders and the content(adr, bra) in pred_table are propagated to the address busdirectly. For example, the result of add2 is stored into braand transmitted out through port bad_bra when traversing T4.Later, the result stored in bra is transmitted out through port



unit A B Outcmp1 PF.adr T(3).adr --cmp2 PF.adr T(2).adr --cmp3 EX.adr T(3).adr --cmp4 EX.adr T(1).adr --cmp5 EX.adr T(2).adr --cmp6 DI.adr T(3).adr --cmp7 DI.adr T(1).adr --cmp8 DI.adr T(2).adr --cmp9 PF.adr T(1).adr --add0 DI.adr 0x04 --add1 T_EX 0x04 --

op I26~I21 I20~I16 I15~I11 R …… Ac Anextlw 30 15 -- PB-M…… -- --jr 15 -- -- -- …… -- PB-M-- -- -- -- -- …… L1 --bne -- -- -- -- …… PB --lw 30 16 -- PA-N …… -- --jalr 16 -- 31 -- …… -- PA-N-- -- -- -- -- …… L2 --bgez 10 -- -- -- …… PA L2 jr -- -- 31 -- …… -- --sltu 11 2 10 0x01 …… -- --bxx -- -- -- -- …… -- --bgtz 11 -- -- -- …… -- L1 bgtz 12 -- -- -- …… -- L1

op I26~I21 I20~I16 I15~I11 R …… Ac Anextlw 30 15 -- PA-N…… -- --jr 15 -- -- -- …… -- PA-N-- -- -- -- -- …… L1 --bgez 10 -- -- -- …… PA L1 sltu 11 2 10 0x01 …… -- --bxx -- -- -- -- …… -- --bgtz 11 -- -- -- …… -- L1 -- -- -- -- -- …… -- --bgtz 12 -- -- -- …… -- L1

op I26~I21 I20~I16 I15~I11 R …… Ac Anext-- -- -- -- -- …… L1 ---- -- -- -- -- …… -- --bgez 10 -- -- -- …… -- L1sltu 11 2 10 0x01 …… -- --bxx -- -- -- -- …… -- --bgtz 11 -- -- -- …… -- L1-- -- -- -- -- …… -- --bgtz 12 -- -- -- …… -- L1

Con

stra

ined

A

TPG

(a)

(d)

(c)

(b)

Pattern 2

Patte

rn 1

Fig. 10. Mapping structural patterns into functional routines. (a) Trace forEX_EFSM. (b) Mapping for inner circuits. (c) Trace of pattern 1. (d) Traceof pattern 2.

PF_bra when traversing T5. Since all data paths that propagatethe results of add2 are sensitized by the routine, it enhances theobservability of the subcircuit add2. On the other hand, duringthe execution of the routines, the results of the comparatorscan also be observed indirectly through the address bus if theyactivate wrong data paths and affect the target address of theinstruction jump/branch. The faults on the address bus are alsoobserved through the indirect method [4], [17] just like thebranch/jump instruction. Note that the faults in the predictorof miniMIPS can affect the program execution, because theexecuting unit of miniMIPS just checks the records in thepredictor instead of the real address of the next instruction.For example, if there is a fault on signal PF_bra whichchanges the next address, the predictor cannot detect it justthrough checking its table and signals at stage EX. So thenext address would be changed and the program would go toa wrong address for execution. In more complicated predictors,the observing method that applies performance monitoringhardware [22] can be used to observe the test response forthe predictor unit. Therefore, all subcircuits in the predictorunit can be sensitized and observed in the proposed routines.

2) Structural Patterns Mapping Into Functional TestRoutine: To guarantee good structural fault coverage, func-tional test routine generation based on EFSM is not enough. Inthis section, structural patterns for subcircuits in the predictorunit, which are combinational circuits, are mapped into theroutines to achieve high fault coverage.

Just as the gate-level SBST on the inner components,structural test for the whole predictor unit is still feasible ifstructural patterns for its subcircuits are mapped into theseroutines. First of all, the executing trace is used to obtainthe mappings for the subcircuits within the unit, while therecord that is equal to the EX_adr, called T_EX, is insertedinto the executing trace. The mappings between the signalsof the executing trace and the two input ports (A, B) ofthe subcircuits are shown in Fig. 10(b). According to themappings, these ports can be controlled by initializing therelated elements of the executing trace. Meanwhile, structuralpatterns, denoted as (PA, PB) on ports (A, B) are obtained

by CATPG on these subcircuits, and they contain two types,where PA and PB of pattern type 1 are the same, and those ofpattern type 2 are different. As shown in Fig. 10(c) and (d),these patterns are mapped into the former functional routingsdeveloped from the EFSM. Note that most of the subcircuitports are mapped to the address signals in pipeline stages.Finally, the functional routines are combined with structuralpatterns automatically, and it only takes 0.032 s to generatean executable test program for the predictor unit. Because theformer functional routines can propagate the results of theinner circuits to the address bus through all possible data pathsin the unit, the responses of these structural patterns can alsobe functionally observed.

V. EXPERIMENTAL RESULTS

The experiments are conducted on three processors: Parwan,miniMIPS, and OpenRISC [23]. In this paper, the experimentalresults for the latter two complex processors are shown indetail, while one can refer to [15] for the detail data ofParwan. These processors are synthesized with a 180-nmtechnology lib, and the gate amounts of the processors arecalculated through total area dividing the area of two-inputNAND. For the commercial fault-simulation tool, which deletesredundant faults automatically, the fault coverage is the amountof detected faults dividing the amount of detectable faults. Forprocessor miniMIPS [24], the old version contains a predictorunit, but the previous works used only the new version withoutthe predictor unit. As a result, the fault coverage and test costfor the predictor unit would be listed alone.

In the following, the proposed method (denoted as ATIG) isapplied to major components of these three processors, wherethe data input configuration technique [6] is also applied inour method for the register bank. In addition, some auxiliaryinterruption routines are inserted into the final test program ofATIG for testing the exception units. To analyze fault coverageand test cost, the systematic SBST method [7], the hybridSBST [9], [11] method that uses random programs to enhancefault coverage, and the full-scan ATPG method, are used forcomparison.

A. Fault Coverage Achieved by Different Methods

Because structural patterns from CATPG are used in ATIG,it can achieve the highest fault coverage among the existingSBST methods. For processor miniMIPS, the fault coveragefor its components achieved by different methods is shownin Table III. On the component ALU, the fault coverage ofATIG is 98.67%, where there are 300 undetectable faults infunctional mode. Because the executing trace used in ATIGcan obtain mappings effectively, the fault coverage on ALU isclose to that of the full-scan method. On the forwarding unit,the fault coverage of ATIG is 99%. In comparison with thesystematic SBST method that uses verification rules, ATIGimproves the fault coverage of the forwarding unit by 5%,which contributes to 0.2% total fault coverage enhancementfor the whole processor of the new version. As mentionedbefore, data mining on the executing trace can obtain themappings for components sensitized by several instructions,



TABLE III

FAULT COVERAGE ON MINIMIPS ACHIEVED BY DIFFERENT METHODS

ComponentsSystematic

SBST[7]

HybridSBST

[9]

Fullscan

ATIG

Newversion

ALU 97.85 – 99.94 98.67

Forwardingunit 93.64 – 100 99.00

PPS_PF 86.32 – 99.69 98.32

PPS_EI 90.86 – 99.75 99.71

PPS_DI 90.24 – 99.91 95.28*

PPS_EX 84.12 – 99.93 97.62

PPS_MEM 81.87 − + 99.81 83.41

Registerbanc

99.98 – 95.84# 99.90*

Syscop 87.90 – 97.74 93.60*

Buscontrol 93.95 – 97.01 92.62

Total 95.08 98.13 98.42 97.52

Oldversion

Predictorunit – – 98.31 96.01

Total – – 98.41 97.31

Note:* Data input configuration [6] or auxiliary interruption routines areadded to enhance the fault coverage of these units.+ PPS_MEM is excluded in the experiment of the hybrid method.# There are some hard-to-detect faults in the register banc, and somead hoc DFTs are required to further enhance the fault coverage of thescan test.

which helps the structural test generation to cover the faultsbeyond verification rule checking. Compared to the systematicSBST method, ATIG and the hybrid SBST method can achievebetter fault coverage on the new version (without the predictorunit), very close to that of the full-scan method. Note thatthe unit PPS_MEM is excluded by the hybrid SBST method.If this unit is also excluded when applying ATIG, the faultcoverage improves to 98%, which is very similar to that ofthe hybrid SBST method.

In addition, ATIG achieves 97.31% total fault coverage onthe old version that includes the predictor unit, which is only1% less than the full-scan method using combinational ATPG.Specifically, ATIG achieves the fault coverage of 96% onthe predictor unit, which contributes to nearly 5% total faultcoverage enhancement for the old version. The main reason isthat the proposed method expands the sequential depth forpattern generation with effective routines based on EFSM,but sequential ATPG makes great effort on searching withinlimited time frames. As will be shown later in Table V, the testprogram for the predictor unit includes 13 000 clock cycles.On one hand, it is no longer necessary to sensitize a faultthrough many time frames before the current time frame. Thestructural patterns of subcircuits are mapped into the executingtrace, and the instructions can impose the patterns on thecomponent ports of the PUT. On the other hand, test responsescan be directly observed through the functional routines, whilesequential ATPG has to search several time frames after thecurrent time frame. It is thus desirable to fill the functionalroutines developed from EFSM with structural patterns of

TABLE IV

FAULT COVERAGE ON OPENRISC ACHIEVED BY DIFFERENT METHODS

Components SystematicSBST [7]

HybridSBST[11]

Fullscan

ATIG

ALU 87.23 91.90 99.96 99.62

MAC 97.29 99.00 99.07 98.04

GPR 98.70 99.90 96.59# 99.80*

System unit 62.41 72.6 99.67 90.48

Load/store unit 90.76 94.2 100 90.44

Exception 61.11 55.4 97.21 81.22*

PC generation – 79.1 98.94 93.58

EI – 23.5 97.50 85.34

Control unit 87.20 93.2 96.89 91.93

Writebackmultiplexer – 82.1 98.14 93.60

Operandmultiplexer

– 99.0 97.57 98.50

Total 90.03 92.30 98.08 95.55

Note:

* Data input configuration [6] or auxiliary interruption routinesare added to enhance the fault coverage of these units.

# There are some hard-to-detect faults in the GPR, and some adhoc DFTs are required to further enhance the fault coverage ofthe scan test.

subcircuits for SBST on sequential components.The fault coverage for processor OpenRISC, achieved by

different methods, is shown in Table IV. Because test patternsare directly generated based on gate-level netlist, even if thecomponent has irregular structures that RT-level methods can-not cover, ATIG can still test them effectively. For example, theALU in OpenRISC integrates customized instructions cust5and SFxx, whose structures are not regular, so systematic andhybrid SBST methods achieve a fault coverage of only 87.23%and 91.90%, respectively. Through mapping structural patternsas test programs, ATIG achieves 99.62% fault coverage, whichis nearly equal to that of the full-scan method. In OpenRISC,the system unit manages the reading and writing of the special-purpose registers, whose values are also inner states. It can alsobe treated as combinational control logic. After extracting themappings between inner states and processor input ports, struc-tural patterns are mapped into both instructions and the proces-sor input signals. In this way, ATIG achieves 90.48% faultcoverage on this unit, which contributes to 1.8% total faultcoverage enhancement of the whole PUT in comparison withthe systematic SBST method. In conclusion, ATIG that trans-lates structural patterns into test programs can test pipelineprocessors effectively. Particularly, the proposed method canachieve good fault coverage for the HCLs which have visiblecontribution to the total fault coverage of the PUTs.

B. Test Overhead of Different Methods

With accurate mappings, efficient structural patterns can betranslated into SBST program directly, so ATIG can signifi-cantly reduce test volume and test time. For processor min-iMIPS, Table V compares the overhead of different methodsusing miniMIPS, where the test programs for the new version



TABLE V

OVERHEAD OF DIFFERENT METHODS FOR MINIMIPS

ComponentsTest

volume(word)

Testtime

(clock)

Newversion

SystematicSBST[7] 1565 7162

HybridSBST[9]

1.1 × 106

[9]–

ATIG 1815 10 200

Oldversion

Full-scan 19 290 587 516

ATIGPredictor

unit1435 13 000

Total 3195 23 200

are considered for comparison. Because the test programs ofATIG need operands from memory, the test volume has 1815words (1175 instructions with 640 words of data), which isaround 16% [(1815 − 1565) ÷ 1565] more than that of thesystematic SBST, while the test time is around 42% longer. Toachieve approximate fault coverage, ATIG requires only 1000instructions, but the hybrid SBST uses 1 million instructions.ATIG are thus much more efficient than the hybrid SBST withrandom programs. In brief, ATIG increases the test volumeslightly compared with the systematic SBST, yet achieves theperformance of the hybrid SBST. The application of the full-scan method requires 587 516 test cycles, and 19 290 wordsof memory on ATE, as well as extra area overhead for DFT.In addition, the use of high-speed ATE to apply full-scan testpatterns obviously brings much higher test cost than SBSTmethods.

As shown in Table V, the test programs of ATIG for the oldversion consist of 3195 words test volume (2465 instructionswith 730 words of data), while it takes 23 200 cycles to applythe test programs. For the predictor unit, because many jumpinstructions are involved in the program, the test time is a littlelong. However, since the test program is executed at speed, andthe speed of modern processors is thousands of megahertz, theprogram can finish within 1 ms.

As shown in Table VI, ATIG also reduces test cost greatlyfor processor OpenRISC. The test volume of ATIG has only3013 words (1973 words of instruction and 1040 words ofdata), which is even less than that of the systematic SBSTmethod. Because OpenRISC adopts delay slot to avoid wrongjump, the ATIG program runs much faster, and its test time isonly 6405 cycles. As mentioned above, the application of thefull-scan method also brings great time cost for OpenRISC.Specifically, it requires 804 099 cycles of high-speed ATE with25 056 words of test volume. In conclusion, with accurate map-pings, ATIG can generate the most efficient SBST programsfor pipeline processors.

C. Overall Performance

Implementing constrained structural ATPG on the CUT forboth functional units and HCL in ATIG is an effective wayto obtain compact test programs with good fault coverage.On the one hand, the structural test patterns can guarantee

TABLE VI

OVERHEAD OF DIFFERENT METHODS FOR OPENRISC

MethodTest volume

(word)Test time(clock)

Systematic SBST [7] 3116 56 716

Hybrid SBST [11] 74 262 646 185

Full scan 25 056 804 099

ATIG 3013 6405

TABLE VII

OVERALL PERFORMANCE OF PARWAN, MINIMIPS, AND OPENRISC

Methods Gatenum

Faultcoverage

Testvolume(word)

Testtime

(cycle)

Parwan

ATIG

1272

94.8% 517/4 9413

RSBST[2] 92.1% 1214/4 121 105

miniMIPS

ATIG

32 765#

97.52 %(98%+)

1815 10 200

HybridSBST [9] 98.1%+ 1.1 ×

106 –

OpenRISC

ATIG

35 157

95.5% 3013 6405

HybridSBST[11]

92.3% 74 262 646 185

Note:+ PPS_MEM is excluded in the fault-coverage calculation.# The old miniMIPS contains 32 765 gates, but the new one with thepredictor unit contains 36 769 gates.

E I F U Oe1 DI_adr 1L&T(1).adr==DI_adr I1=1 --e2 DI_adr 2L&T(2).adr==DI_adr I1=2 --e3 DI_adr 3L&T(3).adr==DI_adr I1=3 --e4 DI_adr else -- --e5 -- I1!=0 -- --e6 -- I1==0 T(next)={1,DI_adr --

0,DI_adr+4} add=1e7 -- add==1 & next!=3 next=next+1 --e8 -- add==1 & next==3 next=1 --e9 -- add!=1 -- --e10 PF_adr 1L&T(1).adr==PF_adr I2=1 --e11 PF_adr 2L&T(2).adr==PF_adr I2=2 --e12 PF_adr 3L&T(3).adr==PF_adr I2=3 --e13 -- else -- --e14 -- I2!=0 -- T(I2).brae15 -- I2==0 -- --e16 EX_adr 1L&T(1).adr==EX_adr I3=1 --e17 EX_adr 2L&T(2).adr==EX_adr I3=2 --e18 EX_adr 3L&T(3).adr==EX_adr I3=3 --e19 -- else -- --e20 -- C1 && confirm==1 T(I3).{L,bra}={1, --

&T(I3).L!=confirm ex_bra} bad=1e21 -- C1 && T(I3).L==1 T(I3).{L,bra}={0 --

&T(I3).L!=confirm T(I3).bra+4} bad=1e22 -- C1 & T(I3).bra=ex_bra --

ex_bra!=T(I3).bra& bad=1T(I3).L==confirm==1

e23 -- else -- --e24 -- bad==1 -- out{bad_adr

,bad_pred}e25 -- bad!=1 -- --C1: I2==0 && EX_uncleared==1T(1).L==1: 1L T(2).L==1: 2L T(3).L==1: 3L

Fig. 11. Edge description for the EFSM of the predictor unit.

good fault coverage on CUT with compact vectors. On theother hand, large random test programs are no longer requiredto enhance fault coverage for HCL as used in the existingmethods. Thus the size of the test program is greatly reduced.



The summary comparisons for the three processors are givenin Table VII, where the gate amounts of the three processorsare 1272, 32 765, and 35 157, respectively. It can be seenthat ATIG with structural patterns can achieve equal or evenbetter performance in comparison with the hybrid SBST with70 K ∼ 1 M instructions, yet ATIG requires only 1 K ∼ 2 Kinstructions. For processor Parwan, ATIG achieves 94.8%fault coverage with 517-B test volume and 9413 cycles testtime. Compared to RSBST [2], it improves fault coverage by2.7% (94.8%–92.1%), while it cuts down 57% test volume,and the test time is one-thirteenth that of RSBST [15]. Forprocessor miniMIPS, ATIG with 1 K instructions achievesa fault coverage of 97.52%, which is comparable to that ofthe hybrid SBST with 1 million instructions. For OpenRISC,ATIG achieves 95.5% fault coverage with 3013 words testvolume, so it performs much better than the hybrid methodwith 24× test volume. In conclusion, ATIG can test processorseffectively and efficiently.

VI. CONCLUSION

This paper greatly simplified the identification of mappingsbetween trace signals and ports of inner components withan executing-trace-based constraint extraction method for thestate-of-the-art processors. Based on the mappings, ATIG canbe applied to functional components or it can be applied tocombinational control logic to generate test programs withstructural patterns. For sequential control logic, the proposedmethod mapped structural patterns for subcircuits in the unitsinto the functional routines developed from EFSM. It greatlyexpanded the sequential depth for pattern generation, andhence provided an effective way to detect faults that needmany time frames to activate or propagate. Experimentalresults showed that the proposed method can achieve the faultcoverage comparable to or even better than that of the hybridSBST methods with 70 K ∼ 1 M instructions, yet the proposedmethod required only 1 K ∼ 2 K instructions.

APPENDIX

Fig. 11 shows the edge description for the EFSM of thepredictor unit.

REFERENCES

[1] K. Constantinides and T. Austin, “Using introspective software-basedtesting for post-silicon debug and repair,” in Proc. 47th ACM/IEEE DACConf., Anaheim, CA, Jun. 2010, pp. 537–542.

[2] L. Chen and S. Dey, “Software-based self-testing methodology forprocessor cores,” IEEE Trans. Comput.-Aided Design Integr. CircuitsSyst., vol. 20, no. 3, pp. 369–380, Mar. 2001.

[3] R. S. Tupuri and J. A. Abraham, “A novel functional test generationmethod for processors using commercial ATPG,” in Proc. Int. Test Conf.,Nov. 1997, pp. 743–752.

[4] L. Chen, S. Ravi, A. Raghunathan, and S. Dey, “A scalable software-based self-test methodology for programmable processors,” in Proc. Des.Autom. Conf., San Diego, CA, Jun. 2003, pp. 548–553.

[5] H.-P. C. Wen, C.-L. Wang, and K.-T. Cheng, “Simulation-based func-tional test generation for embedded processors,” IEEE Trans. Comput.,vol. 55, no. 11, pp. 1335–1343, Nov. 2006.

[6] N. Kranitis, A. Paschalis, D. Gizopoulos, and G. Xenoulis, “Software-based self-testing of embedded processors,” IEEE Trans. Comput., vol.54, no. 4, pp. 461–475, Apr. 2005.

[7] D. Gizopoulos, M. Psarakis, M. Hatzimihail, M. Maniatakos, A.Paschalis, A. Raghunathan, and S. Ravi, “Systematic software-basedself-test for pipelined processors,” IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 16, no. 11, pp. 1441–1453, Nov. 2008.

[8] C. H.-P. Wen, L.-C. Wang, K. T. Cheng, W. T. Liu, and J. J. Chen,“Simulation-based target test generation techniques for improving therobustness of a software-based self-testing methodology,” in Proc. IEEEInt. Conf., Austin, TX, Nov. 2005, pp. 936–945.

[9] T.-H. Lu, C.-H. Chen, and K.-J. Lee, “Effective hybrid test programdevelopment for software-based self-testing of pipeline processor cores,”IEEE Trans. Very Large Scale (VLSI) Syst., vol. 19, no. 3, pp. 516–520,Mar. 2011.

[10] C.-H. Chen, C.-K. Wei, T.-H. Lu, and H.-W. Gao, “Software-based self-testing with multiple-level abstractions for soft processor cores,” IEEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 5, pp. 505–516,May 2007.

[11] N. Kranitis, A. Merentitis, and D. Gizopoulos, “Hybrid-SBST method-ology for efficient testing of processor cores,” IEEE Des. Test. Comput.,vol. 25, no. 1, pp. 64–75, Jan.–Feb. 2008.

[12] S. Gurumurthy, S. Vasudevan, and J. A. Abraham, “Automated mappingof pre-computed module-level test sequences to processor instructions,”in Proc. ITC Test Conf., Austin, TX, Nov. 2005, pp. 294–303.

[13] M. Psarakis, D. Gizopoulos, E. Sanchez, and M. S. Reorda, “Micro-processor software-based self-testing,” IEEE Des. Test. Comput., vol. 27,no. 3, pp. 4–18, Jun. 2010.

[14] A. Biere, A. Cimatti, E. M. Clarke, O. Strichman, and Y. Zhu, “Boundedmodel checking,” Adv. Comput., vol. 58, no. 2, pp. 117–148, 2003.

[15] Y. Zhang, H. Li, and X. Li, “Software-based self-testing of processorsusing expanded instructions,” in Proc. 19th Asian Test Symp., Shanghai,China, Jan. 2011, pp. 415–420.

[16] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining.Upper Saddle River, NJ: Pearson Education, 2006.

[17] ECE260C: Advanced Topics in VLSI Design. (2001) [Online]. Available:http://mesdat.ucsd.edu/∼lichen/260c/parwan/

[18] K. T. Cheng and A. S. Krishnakumar, “Automatic functional testgeneration using the extended finite state machine model,” in Proc. 30thInt. Des. Autom. Conf., 1993, pp. 86–91.

[19] M. U. Uyar and A. Y. Duale, “Modeling VHDL specifications asconsistent EFSM,” in Proc. MILCOM Conf., vol. 2. Monterey, CA,Nov. 1997, pp. 740–744.

[20] A. Y. Duale and M. U. Uyar, “A method enabling feasible conformancetest sequence generation for EFSM models,” IEEE Trans. Comput., vol.53, no. 5, pp. 614–627, May 2004.

[21] M. K. Ganai and A. Gupta, “Accelerating high-level bounded modelchecking,” in Proc. IEEE Comput. Aided Des. Int. Conf., San Jose, CA,Nov. 2006, pp. 794–801.

[22] M. Hatzimihail, M. Psarakis, D. Gizopoulos, and A. Paschalis, “Amethodology for detecting performance faults in microprocessors viaperformance monitoring hardware,” in Proc. IEEE Int. Test Conf., SantaClara, CA, Oct. 2007, pp. 1–10.

[23] OpenCores. (2012) [Online]. Available: http://opencores.org/project,or1200_hp

[24] miniMIPS Overview. (2012) [Online]. Available:http://opencores.org/project,minimips

Ying Zhang (M’11) received the B.S. degree incomputer science from Harbin Engineering Univer-sity, Harbin, China, in 2006, and the Ph.D. degreefrom the Institute of Computing Technology, Chi-nese Academy of Sciences, Beijing, China, in 2011.

He was a Post-Doctoral Researcher with theEmbedded Systems Laboratory, Linköping Univer-sity, Linköping, Sweden. His current research inter-ests include signal integrity, reliable designs ofnetwork-on-chip, and software-based self-testing.



Huawei Li (M’00–SM’09) received the B.S. degreein computer science from Xiangtan University,Xiangtan, China, in 1996, and the M.S. and Ph.D.degrees from the Institute of Computing Technology,Chinese Academy of Sciences, Beijing, China, in1999 and 2001, respectively.

She is currently a Professor with the Instituteof Computing Technology, Chinese Academy ofSciences. She was a Visiting Professor with the ECEDepartment, University of California Santa Barbara,Santa Barbara, from 2008 and 2009. She has co-

authored more than 100 papers in academic journals and international con-ferences. Her current research interests include VLSI/SoC design verificationand test generation, delay testing, and dependable computing.

Dr. Li served as Program Chair of the IEEE Workshop on RTL and HighLevel Testing in 2003, and of the IEEE Asian Test Symposium (ATS) in 2007.She has served as the Secretary General of the China Computer FederationTechnical Committee on Fault Tolerant Computing since 2008. In addition,she serves on the Technical Program Committee of several IEEE conferences,including ITC, ASP-DAC, DFT, ATS, and VLSI-DAT.

Xiaowei Li (SM’04) received the B.Eng. andM.Eng. degrees in computer science from the HefeiUniversity of Technology, Hefei, China, in 1985and 1988, respectively, and the Ph.D. degree incomputer science from the Institute of ComputingTechnology (ICT), Chinese Academy of Sciences(CAS), Beijing, China, in 1991.

He was an Assistant Professor from 1991 to 2000and has been an Associate Professor since 1993with the Department of Computer Science, PekingUniversity, Beijing, China. He joined the ICT, CAS,

as a Professor in 2000. He is currently the Deputy (executive) Director ofthe State Key Laboratory of Computer Architecture (ICT, CAS). He hasco-authored more than 200 papers in academic journals and internationalconferences, and holds 34 patents and 35 software copyrights. His currentresearch interests include VLSI testing, design verification, and dependablecomputing.

Dr. Li has served as an IEEE Asian Pacific Regional Test TechnologyTechnical Council Vice Chair since 2004. He has served as the Chair of ChinaComputer Federation Technical Committee on Fault Tolerant Computing since2008. He has served as the Steering Committee Chair of IEEE Asian TestSymposium (ATS) since 2011. In addition, he serves on the Technical ProgramCommittee of several IEEE and ACM conferences, including VTS, DATE,ASP-DAC, and PRDC. He also serves as an Editorial Board Member of JCST,JETTA, and JOLPE.

Documents

Automatic Test Program Generation Using Executing-Trace-Based Constraint Extraction for Embedded Processors