5
Optimization of Microprogram Control Unit with Code Sharing A. Barkalov, L.Titarenko, L.Smolinski Institute of Computer Engineering and Electronics Podgórna 50, Zielona Góra, Poland E-mail {A.Barkalov, L.Titarenko}@iie.uz.zgora.pl,[email protected] Abstract The method of hardware reduction is proposed which is oriented on compositional microprogram control units with code sharing and PAL-based CPLD chips. The method is based on a wide fan-in of PAL macrocells allowing using more than one source for codes of operational linear chains. An example of the proposed method application is given. 1. Introduction Control units are very important parts of digital systems [1]. Now, complex programmable logic devices (CPLD) are widely used for implementing logic circuits of control units [2, 3]. They include macrocells of programmable array logic (PAL) having the wide fan-in and limited number of terms per macrocell. To optimize the hardware amount in logic circuit of a control unit, the peculiarities of CPLD should be taken into account, as well as features of a control algorithm to be implemented. If a control algorithm is represented by the linear graph-scheme of algorithm (GSA), then the model of compositional microprogram control unit (CMCU) with code sharing [4] can be used for its interpretation. In this case the codes of classes of pseudoequivalent operational linear chains (OLC) [4] can be represented by more than one source due to the wide fan-in of PAL macrocells. In this article we propose some method of CMCU logic circuit optimization based on use of two sources of codes. 2. CMCU with code sharing Let GSA be represented by sets of vertices B and arcs E. Let 2 1 0 , E E b b B E , where 0 b is an initial vertex, E b is a final vertex, 1 E is a set of operator vertices, where M, E 1 and 2 E is a set of conditional vertices. A vertex 1 E b q contains a microinstruction Y b Y q , where N y y Y ,..., 1 is a set of data-path microoperations [1]. Each vertex 2 E b q contains a single element of the set of logical conditions L x x X ,..., 1 . Let us form a set of operational linear chains (OLC) G C ,..., 1 for GSA , where each OLC C g is a sequence of operator vertices and each pair of its adjacent components corresponds to some arc of the GSA. Each OLC C g has only one output g O and the arbitrary number of inputs. Formal definitions of OLC, its input and output can be found in [4]. Let us name GSA as a linear GSA (LGSA) if the following condition takes place: 2 G M (1) Each vertex 1 E b q corresponds to microinstruction q MI kept in a control memory (CM) of CMCU and it has an address q A . The microinstruction can be addressed using Each vertex 1 E b q corresponds to microinstruction q MI kept in a control memory (CM) of CMCU and it has an address q A . The microinstruction can be addressed using M R 2 log (2) 978-1-4577-1958-5/11/$26.00 ©2011 IEEE

[IEEE Test Symposium (EWDTS) - Sevastopol, Ukraine (2011.09.9-2011.09.12)] 2011 9th East-West Design & Test Symposium (EWDTS) - Optimization of microprogram control unit with code

  • Upload
    l

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE Test Symposium (EWDTS) - Sevastopol, Ukraine (2011.09.9-2011.09.12)] 2011 9th East-West Design & Test Symposium (EWDTS) - Optimization of microprogram control unit with code

Optimization of Microprogram Control Unit with Code Sharing

A. Barkalov, L.Titarenko, L.Smolinski Institute of Computer Engineering and Electronics

Podgórna 50, Zielona Góra, Poland E-mail {A.Barkalov, L.Titarenko}@iie.uz.zgora.pl,[email protected]

Abstract

The method of hardware reduction is proposed which is oriented on compositional microprogram control units with code sharing and PAL-based CPLD chips. The method is based on a wide fan-in of PAL macrocells allowing using more than one source for codes of operational linear chains. An example of the proposed method application is given.

1. Introduction

Control units are very important parts of digital systems [1]. Now, complex programmable logic devices (CPLD) are widely used for implementing logic circuits of control units [2, 3]. They include macrocells of programmable array logic (PAL) having the wide fan-in and limited number of terms per macrocell. To optimize the hardware amount in logic circuit of a control unit, the peculiarities of CPLD should be taken into account, as well as features of a control algorithm to be implemented. If a control algorithm is represented by the linear graph-scheme of algorithm (GSA), then the model of compositional microprogram control unit (CMCU) with code sharing [4] can be used for its interpretation. In this case the codes of classes of pseudoequivalent operational linear chains (OLC) [4] can be represented by more than one source due to the wide fan-in of PAL macrocells. In this article we propose some method of CMCU logic circuit optimization based on use of two sources of codes.

2. CMCU with code sharing

Let GSA be represented by sets of vertices B

and arcs E. Let 210 , EEbbB E , where

0b is an initial vertex, Eb is a final vertex, 1E is a set

of operator vertices, where M,E1 and 2E is a set

of conditional vertices. A vertex 1Ebq contains a

microinstruction YbY q , where NyyY ,...,1

is a set of data-path microoperations [1]. Each vertex

2Ebq contains a single element of the set of logical

conditions LxxX ,...,1 .

Let us form a set of operational linear chains

(OLC) GC ,...,1 for GSA , where each

OLC Cg is a sequence of operator vertices and

each pair of its adjacent components corresponds to

some arc of the GSA. Each OLC Cg has only

one output gO and the arbitrary number of inputs.

Formal definitions of OLC, its input and output can be found in [4].

Let us name GSA as a linear GSA (LGSA) if the following condition takes place:

2G

M (1)

Each vertex 1Ebq corresponds to

microinstruction qMI kept in a control memory

(CM) of CMCU and it has an address qA . The

microinstruction can be addressed using

Each vertex 1Ebq corresponds to

microinstruction qMI kept in a control memory

(CM) of CMCU and it has an address qA . The

microinstruction can be addressed using

MR 2log (2)

978-1-4577-1958-5/11/$26.00 ©2011 IEEE

Page 2: [IEEE Test Symposium (EWDTS) - Sevastopol, Ukraine (2011.09.9-2011.09.12)] 2011 9th East-West Design & Test Symposium (EWDTS) - Optimization of microprogram control unit with code

bits represented by variables Rr TTTT ,...,1 .

Let OLC Cg include gF components and let

)...,,max( 1 GFFQ . Let a binary code

)( gK correspond to OLC Cg and let it

have GR bits, where

GRG 2log (3)

Let a binary code )( qbK correspond to each

component of OLC Cg and let it have QR bits,

where

QRQ 2log (4)

Let components be encoded in such a way that the following condition takes place

1)()( 1 gigi bKbK (5)

where GgF,i g ,,1,1...,1 . Let us use

variables TTr for encoding of components, where

QG RTR , . Let the folloowing condition take

place

RRR QG (6)

In this case a linear GSA can be interpreted using

the model of CMCU 1U with code sharing (Fig. 1) .

Figure 1. Structural diagram of CMCU U1

In CMCU 1U , a block of microinstruction

addressing (BMA) implements the system of input memory functions for counter CT and register RG:

.,

,,

X

X

(7)

Let us point out that in the case of CMCU 1U an

address of microinstruction is represented as the following one:

,))*K(bK()A(b qgq (8)

where qb is a component of OLC Cg and “*” is

a sign of concatenation. The CMCU 1U operates in

the following order.

If 1Start , then an initial address (all zeros) is loaded into RG and CT. In the same time a flip-flop TF is set up which causes 1Fetch , then microinstructions can be read out of control memory.

Each cell of CM keeps microoperations Yyn and

special variables 0y and Ey . If 10 y , then a

current content of CT is incremented, otherwise both CT and RG are loaded from BMA. The first case corresponds to transition from any OLC component except of its output. The second case corresponds to

transition from an OLC output. If 1Ey , then the

flip-flop TF is reset, signal 0Fetch and operation of CMCU is terminated. It corresponds to transition

from the vertex 1Ebq , where Ebb Eq , .

Pulse Clock is used for timing of CMCU.

Let us point out that OLC Cji , are

pseudoequivalent OLC [4] if their outputs are connected with input of the same vertex of GSA . The hardware amount in logic circuit of BMA can be decreased due to introduction of a special block for transforming the OLC codes into the codes of the classes of pseudoequivalent OLC named as a code transformer (TC) [4]. But the TC consumes some resources of the chip in use. In this article we propose to use free cells of CM for representation of the codes of classes of POLC (the first source of codes). In the same time, the register RG can be used as the second source of the codes of POLC.

3. Main idea of proposed method

Let CC 1 be a set of OLC, where 1Cg if its

output is not connected with the final vertex Eb . Let

IC BB ,,1 be a partition of the set 1C on the

classes of POLC. Let us encode the OLC 1Cg in

such a way that the majority of classes CiB are

Page 3: [IEEE Test Symposium (EWDTS) - Sevastopol, Ukraine (2011.09.9-2011.09.12)] 2011 9th East-West Design & Test Symposium (EWDTS) - Optimization of microprogram control unit with code

represented by a single generalized interval of GR -

dimensional Boolean space. The well-known algorithm ESPRESSO [5] can be used for this encoding. Let

BAC , where AiB if the block is

represented by a single interval, otherwise BiB .

If condition

OB (9)

takes place, then the register RG represents all codes

iBK for classes CiB . In this case the BMA

implements 0H terms, where 0H is the number of

transitions of Mealy finite state machine (FSM) used for microinstruction addressing [4]. It is a minimal possible number of transitions.

Let us point out that logic circuits for BMA, RG, CT and TF are implemented using PAL macrocells. To implement the CM, some external PROM chips should

be used having t outputs, where 16,8,4,2,1t [2,

3]. Let the hot-one encoding of microoperations [6] be

used in CMCU 1U . In this case each word of CM

includes N+2 bits. The number 2 is added to N to take

into account additional variables 0y and Ey (Fig. 1).

If each PROM has t outputs and not less than M

words, then it is enough 0K chips for implementing

CM, where

t

NK

20 (10)

Obviously, there are 0R free outputs of PROM chips

where

200 NtKR (11)

These outputs can be used for encoding of classes

BiB , where

1log2 BBR (12)

The number 1 is added to B to indicate the

situation when BiB . If condition

BRR 0 (13)

takes place, then all classes BiB are

represented by CM. Otherwise, only CMI classes can

be represented by control memory, where

02RCMI (14)

The rest of the classes BiB should be placed

into A and represented by RG. In both cases we

propose to use the CMCU 2U (Fig. 2) for

interpretation of a LGSA .

Figure 2. Structural diagram of CMCU U2

In CMCU 2U , the block BMA implements functions

,,,

,,,

XZ

XZ

(15)

where variables Zzr are used for encoding of

classes BiB , 0RZ . The control memory CM

implements functions 0,, yYZ and Ey depended on

variables from sets and T . The principles of

operation for both 1U and 2U are practically

identical. The following design method for CMCU

2U is proposed in this article:

1. Construction of sets 1,CC and C for LGSA

.

2. Encoding of OLC Cg and their

components.

3. Construction of partitions A and B .

4. Encoding of classes BiB .

5. Construction of CMCU transition table.

6. Specification of the control memory.

Page 4: [IEEE Test Symposium (EWDTS) - Sevastopol, Ukraine (2011.09.9-2011.09.12)] 2011 9th East-West Design & Test Symposium (EWDTS) - Optimization of microprogram control unit with code

7. Implementation of CMCU logic circuit.

4. Example of proposed method

Let the sets 71181 ,,,,, CC

and 41 ,, BBC be formed for a GSA 1 ,

where 11 B , 4322 ,, B , 653 ,B ,

74 B , 3211 ,, bbb , 76542 ,,, bbbb ,

10983 ,, bbba , 14114 ,, bb , 16155 ,bb ,

1918176 ,, bbba , 242322821207 ,,,, bbbbb .

As we can see 8G , 3GR , 4Q , 2QR ,

24M , 5R . Therefore, condition (6) takes place and code sharing can be applied. Let us point out

that condition (1) takes place and GSA 1 is LGSA.

Let us encode OLC Cg in the way shown in

Fig. 3 using variables r , where 3 .

Figure 3. Codes of OLC for GSA 1

The components of OLC Cg are encoded in a

trivial way [4]: the first component has the code 00, the second – 01 and so on to satisfy condition (5).

Remind that variables 1T , 2T are used for the

component encoding.

From Fig. 3 it can be found that classes 31, BB and

4B are represented by single intervals of Boolean

space. These intervals are treated as the codes iBK ,

namely 0001 BK , *103 BK ,

1114 BK . The class CB 2 is represented by

the intervals 00* and 01*. This analysis shows that

431 ,, BBBA and 2BB 1BR .

Let it be N=13 for the GSA 1 and let us use PROM

chips with t=4 for implementation of CM. In this case

40 K , 10 R and condition (13) takes place. The

variable Zz 1 is used for encoding of the class 2B .

Let 12 BK , whereas 01 z indicates the

situation BiB .

Let transitions for classes BiB be represented

by the following system of generalized formulae of transitions (GFT) [4]:

;

;

;

;

2211714

1151953

4322043217321522

143218321621411

bxbxB

bxbxB

xxxbxxxbxxbxB

bxxxbxxxbxxbxB

(16)

This system is the base for construction of the

transition table having the following columns: iB ,

iA BK , iB BK qb , qbA , hX , h , h , h .

Here iA BK is a code iBK , where AiB ;

iB BK is a code for BiB ; hX is an input

signal taken from system of GFT; h is a set of input

memory functions loading the code gK into RG;

h is a set of input memory functions loading the

code qbK into CT; 0,,1 Hh is the number of

transition. In our case 321 ,, DDD ,

54 , DD . Let us find some addresses of

microinstructions. For example, the vertex 20b is the

first component of OLC C7 . Therefore,

1110020 bA . By analogy we can find that

0100011 bA , 1000015 bA , 1010015 bA

and 1011019 bA .

Let the symbol jiU stand for interpretation of

GSA j using the model 2,1iUi . The part of the

table of transitions for classes CBB 32 , is shown

in Table 1.

Table 1. Table of CT for FSM )( 12 U

Page 5: [IEEE Test Symposium (EWDTS) - Sevastopol, Ukraine (2011.09.9-2011.09.12)] 2011 9th East-West Design & Test Symposium (EWDTS) - Optimization of microprogram control unit with code

This table is used for construction of system (15). For example, the following equations can be derived

from Table 1: 51213212 xzxxzD (after

minimization); 51214 xzD . The control memory of

CMCU 12 U is specified using well-known

methods [4]. This step is omitted in our article, as well as the step of implementation of the logic circuit. Let

us point out that 120 H and it is determined by the

total number of terms in system (16). In the case of

CMCU 11 U we have 20H . It means that

application of proposed method permits decrease of this value in 1,67 times. We can expect that the

numbers of PAL macrocells in the logic circuits of BMA have the same ratio [4].

6. References [1] Baranov S. Logic Synthesis for Control Automata. –

Kluwer Academic Publishers, 1994. – 312 pp. [2] http://www.altera.com. [3] http://www.xilinx.com. [4] Barkalov A., Titarenko L. Logic Synthesis for

Compositional Microprogram Control Units. – Berlin: Springer, 2008. – 272 pp.

[5] Maxfield C. The Design Warrior’s Guide to FPGAs. – Amsterdam: Elseveir, 2004. – 541 pp.