fDEC II - apps.dtic.mil · store-through algorithm is used, main memory is simultaneously updated. However, cache2 continues to have the urnodi fled version MV. If, before this inconsistency

IIfDEC 3111981

"!" Center for Information Systems ResearchMassachusetts Institute of Technology

:, Alfred P. Sloan School o' Management=l "•50 Memorial Drivei-" Cambridge, Massachusetts, 021 39

',, i' \617 253-1000

008

SECURITY CLASSIFICATION OFr THIS PAGE (Shon 0010 Ente,*j__________________

BEFORE COMPLETINGO FORM

A Study of the Multicache-Consiste;-icy Problemin Multi-Processor Computer Systems a. PEnFORMN~O *oR. REPORT NUMBER

___ ___ ___ ___ ___ __ ___ ___ ___ ___ ___ __ MOIO-8109-06 -11,A - /:

7. AUTH4OR(s) 6. CONTRACT O11 GRANT NUM11910(a)

Tarek K. Abdel-Hamid "'N00039- 78-G-0 160Stuart E. Madnick_______________

S. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PRIGRAM ELEMENT. PROJECT. TASKSloan Scolo aaeetAR A a WORK UNIT NUMBERS

Massachusetts Institute of TechnologyCambridge, Massachusetts 02139 ___

It. CONTROLI.ING OFFICE NAME ANDOADDRESS IS. REPORT DATE

* September 198113. NUMSER of PAGES

____________________________________ 14014. MONITORING AGE04CY NAME A AOORESS(AI different from Controlling Office) It. SECURITY CLASS. (of 0..,. reperfl)

Unclassified

IS.. CECI. DAUSsiFicATION/OOWNGRAOINGSCM I ULE[I IS. DISTRIBUTION STATEMENT (of this Report)

Approved for public release; distribution unlimnited

17. DIST1411UTION STATEMENT (of the abstract entered In Block 20. if different from Rhpowl)

III. SUPPLEMENTARY NOTES

19. KEY WORDS (Continue on rev.ere side It necessary ant' Identif by I block nsimbet)

Bus, Cache, Multicache-consistency, Multiprocessors, Perfoirmance Evaluation,Simulation

'- 20. ABSTRACT (Continue on revcere side If necessary and Identify' by block number)

j;- This paper is a report on an ongoing research project at the M.I.T.Sloan School of Management to study the multicache-consistency problem int multi-processor computer systems.

The nature of the consistency problem in multicache memory systems isbriefly discussed, together with an explanation of the three conmmonapproaches proposed in the literature to handle it. A new solution to theproblem, called the tCornmon-Cache / Pended Transaction Bus" (CC/PTB)- -

continued on reverse sidO,* . ~DD 1473 EDITION OF I NOV 65 IS OUSOL~tET 7- .'/

SN0102-014-4601S9CURI*Fy CLASSIFICATIOM OF THIS5 PAGE (WShen Del. Ent

1 rA.t.-IJ'41Y CLASIAPICATU)N Of T01 PAGE(Wh" 1)&14 "191

\' 20. Abstract"----approach, is developed and discussed. The "CC/PTB" approach attempts

to minimize performance degradation by eliminating the overhead ofmaintaining cache-consistency. Its two distinctive features are:Firstly, the conventional private caiche per procpssor organization isreplaced by one where a pool of cache-modules it commonly shared byall processors. Secondly, the Pended Transaction Bus (PTB) is usedas the interconnection protocol that connects the processors and thecache-modules.

The performance of the CC/PTB approach is evaluated using ahighly detailed simulation model with favorable results.

I/

Jurt1.-t I • •

r -- ode

'Dis

; ' :2 $1SECURITY CtLASSIFICATI*OM OF TMIS rAGE[MORheN DataI U14140)

S. I

ABSTPUA=

This paper is a report on an ongoing research projwet at the

M.I.T. Sloan School of Management to study the multicache-consistency

problem in multi-processor computer systems.

The nature of the consistency problem in multicache memory

systems is briefly discussed, together with an explanation of the

three common approaches proposed in the literature to handle it. A

new solution to the problem, called the Qxrumon-Cache / Pended

Transaction Bus" (OC/PTB) approach, is developed andI discussed. The

"(X/PTB" approach attempts to minimize performance degradation by

eliminating the overhead of maintaining cache-consisitency. Its two

distinctive features are: Firstly, the conventional private cache per

processor organization is replaced by one where a pool of

cache-modules is comonly shared by all processors. Secondly, the

Pended'Transactkonj Bus (PTM) is used as the interconnection protocol

that connects the processors and the cache-modules.

The performance of the CC/PTB approach is evaluated using a

highly detailed simulation model with favorable results.

~\

ThBLE OF COTENTS

I. THE MULTICACHE-CcNSISTENCY PROBLEM: AN INTRODUCTION .... 1

II. INFOPLEX: A MULTI-PROCESSOR C14JPUTER SYSTEM ........... 7

II I. THREE COMMON APPRONCHES FOR SOLVING THECACHE-CONSISTENCY PRCELE4 ............................. 1lIII.1 The "Broadcasting" Approach .................... 13111.2 The "Store-Controller" Approach ................. 091111.3 The "Multics" Approach ....................... 19III.4 Conclusion ...................................... 23

IV. THE "CCMMON-CACHE/PENDED TRANSACTION BUS" APPROACH .... 25IV.1 Introduction ....................IV.2 The Fended Transaction Bus (PTS) ............... 29IV.3 An Application: The INFOPLEX Storage Hierarchy .. 31

V. EVALUATING THE PERFORMANCE OF THE"CMt4ON-CACHE/FENDED TRANSACTION BUS" APPROACH ........ 38

V.1 Introduction ...................... *...... , .... 38V.2 The Evaluation Tbol ............................. 39V.3 The Simulation Experiment ...................... 42V.4 The Hardware Parameters ......................... 44V.5 Analysis of the Simulation Results ......... ..... 46V6 Conclusion ....

VI. CctCLUSION ......................

B IBLIOGRAPHY.................,........•.6

APPENDIX (I) :The wOPT" Program...............2APPENDIX (II) •The "STVR* Prograr ......................... 81APPENDIX (IIlI) The "CC/PTB/C" Program .................... 14APPENDIX (IV) : The "CC/PTB/P" Program ................... 123

- - T i

II. THE MLLTICA1HE-QONSISTEU"Y PR(OL4: AN INTRODUrICN

( A "consistency problem" refers, in general, to a situation

where two or more entries representing the same "fact" in a data

base differ (i.e., are inconsistent). This, of course, can only

occur when redundancy exists. In this paper we will be concerned

with the "consistency problem" that arises in cache-based memory

systems.

A cache memory system (Kaplan and Winder, IS73) represents a

type of memory hierarchy that ttempts to bridge the CPU-main memory

speed gap by the use of a small, high speed random access memory

whose cost per bit is higher than that of main memory, but whose

total cost is relatively small because of the small size.

Conceptually, this configuration has analogies with paging systems

(Matick, 1577). The implementations, however, are far apart because

of speed oonsicerations. In contrast to a pý.tging system, a cache is

manageo by hardware algoritimns, provides a smaller ratio of memory

access times (e.g., 10:1 rather than 3000:1), and deals with smaller

2

tblocks of data (64 bytes for example rather than 409).

In a cache system, all data are referenced by their main memory

address. At any given time, a certain subset of the contents of

main meory is contained in the cache level. If a processor then

requests a data item in this subset, the request is serviced at the

cache level.

A cache-based system works "well" for two basic reasons:

First, executing programs tend to re-use instructions and data; and

second, programs tend to use instructions and data near recently

usea instructions and data. The first property means that once

information is fetched from main memory to cache, subsequent

accesses to it are at cache speed. The. seornd property mans that

if a request to main memory is satisfied by bringing into a cache a

block of information larger than is immediately needed, the

additional information is likely to be needed soon, and its presence

in the cache will save one or more references to main memory.iIn this paper we will refer to the block of information that

constitutes the minimum amount of data which may be transmitted

between the cache and main memory and which is also the allocation

unit in the cache as the "cache line." All bytes of a cache line

are, therefore, simultaneously all present or all absent from the

cachee. A directory is usually used to record the main memory

aadresses of all lines in the cache.

It is easy to demonstrate how inconsistency can develop in

3

cache-basea systems oue to the existence of redundancy. Considerfirst the simple case of a single-cache organization (Figure l(a)).

The CPU can only acmss words that are in the cache. If a word

that is needed for processing is not already in the cache, it will

first have to be transferred from main memory to the cache. oe

the word is in the cache it beco ms accessable by the CU and

processing can take place. If the CPU then updates (i.e., mo••i ies)

the word, inconsistency between the copy in the cache and the copy

in main memory could develop. This depends on the store algorit-u

used.

If a store through algorithn (in which the cache and main

inemory are upxatea simultaneously) is used, inconsistency will not

arise. The price paid for that is a decrease in processing speed as

store operations beome limited by the speed of main memory. When

this price is too high, store-behind or store-replacement algorithms

may be used. In both cases main memory is not updated immediately,

and as a result inconsistency arises. The modified word in the

cacle will, for "some" interval of time, be different fram its

unmodified version in main memory.

The more interesting case, however, is that of multicache

systems. Consider the two-cachie organization of Figire l(b). What

is important to emphasize here, is that the store-through algorithm

is no longer sufficient to avoid inconsistency. Assume, for

example, a word whose main memory address is (A), and whose current

value is (V) is present in both cAchel and cache2. CPUl then

------ - ----

4

CPU

ICache

Main Memory

(a) Single-Cache Organization

CPU 1 CPU)

i. -

LCache Cache2

Main Memory

(b) Two-Cache Organization

Figure (1)

5

modifies the value of the word in its cachel to (V.), and assuming a

store-through algorithm is used, main memory is simultaneously

updated. However, cache2 continues to have the urnodi fled version

MV. If, before this inconsistency is resolved by either updatings,

replacing, or invalidating cache2.s copy, CPU2 attempts to access

the word (A), it will get what is now an invalid value (V) fram its

cache2.

It is time now to adopt what we be'ieve is a more useful

definition of what a "consistency problem" is. We claim that

inconsistency per se is not necessarily a problem. Reconsider the

case of a single-cache organization. We have already explained how

inconsistency can arise for "same" interval of time when the store

through algorithm is not used. During that interval of tine the

cache will contain the modified version of the word, while main

memory will not. If, during this interval, the CPU needs to

re-access the word for processing, what will happen? It will check

the cache, find the word in it, and, therefore, access it i.e., no

transfer from main memory will be needed. Thus, although

inconsistency exists, no problem arises because the CPU will always

access the updated version of the word from the cache.

With this in mind, we now adopt the following definition of a

"consistency problem" (Censier and Feautrier, 1978):

"A consistency problem exists in a cache-based system if the

value acessed by a CPU is not the value given by the latest

store operation (by any CPU) to .he same address."

S....JI

6

=t is obvious, from the above discussion, that there will be no

consistency problem for single-cache organizations. These,

unfortunately, are not very attractive for high performance systuse.

In the next part of this paper we pcesent a brief discussion of

INFOEtX, a highly parallel multi-processor computer system that

utilizes a multicache organization. In such an organizaticn the

consistency problem is a significant one. The IN L.M( organization

will constitute the context within which we shall evaLuate the

different apzoaches for handling the multicache-onsM.stency problem.

7

II. INFOPLEX: A MULTI-PRCESSOR CcMPumr SYSTE4

II

A research project is currently underway at the MIT Sloan School

of Management aimed at investigating the architecture of a new data

base computer, called INFOPLEX, which is particularly suitable for

large-scale information management (Madnick, 1579). The specific

objectives of the project include providing substantial performance

improvements over conventional architectures, supporting very large

complex data bases, and providing extremely high reliability.

To p;ovide a high performance, highly reliable, and large capacity

storage system, INFOPLEX makes use of an autcmatically managed memory

hierarchy. It is this aspect of the project that will be of relevance

in our present discussion.

As a simplistic illustration, we show in Figure 2 three levels

(only) of the memory hierarchy. As can be seen, this is a multicache

organization. The proposed number of caches (m) is relatively large

compared to present day systems (e.g., m = 32). Flor the INFOPLEM

-------------------------

8

Cachel ..... .. Cachen Level (1)

SLC Local Bus (i)

emorydule.............d•ule Level (2

Level (3)bModule. .. .........Mdule

m .___ SLCm I Local Bus (2)

.Memory Memory Level (3) !

Module .. . .Module

Local Bus (3)

SLC : Storage Level ControllerMRP • Memory Request Processor

Figure (2)

9

objectives such a large number of caches is essential. It will, for

example, help attain the high performance impcovements sought (up to a

1000 fold increase in throughput over conventional architectures). In

addition, it allows for th•e implementation of such features as dynamic

reconfiguration and automatic recovery which are aimed at improving the

reliability of the system.

Two important "boxes" in the design of Figure 2 are the Storage

Level Controller (SIC) and the Memory Request Processor (MP). The

function of the SiC is to couple the local bus of a storage level to

the global bus that connects all storage levels. In essence, the SWC

serves as a gateway between levels. For example, the SW of level (1)

accepts requests to the lower storage levels from the caches and

forwards them to the SWC of level (2). When the responses to these

recq.sts are ready, the level (1) SLC accepts them and sends them back

to the appropriate caches.

The Memory Request Processor (MRP) performs such functions as:

implementing the storage management algorithms (e.g., directing the

transfer of information across a storage level); handling all the

cowmunication protocols that are peculiar to thle particular storage

modules (devices) at a storage level; and mapping virtual addresses

into their real equivalents. (Note that an MRP is not needed at the

cache level.)

The INIFOPLEM organization described (briefly) above will

constitute the context within which we shall study the different

appoaches for handling the multicache-consistency problem. For

10

further information on INFFOPLEX the interested reader can consult the

following references: (Maamick, 1975; Madnick, 1979; Lain, 1979; Lam-

and Madnick, 197S; and Hsu, 1S80).

tI

| 11

III. THREE COMMON APPROACHES FOR SOLVING THE

CACHE-CONISIENCY PRCBLEM

In Part (i) we explained how a consistency problem could arise in

multicache memory systems. We saw, for example, that in the two-cache

organization of Figure l(b) a word (A) that existed in both cachel and

cache2 could be nodified to (V.) in cachel and in main memory but not

in cache2, and thus giving rise to an inconsistent state. This

example, although rather simple, is quite adequate to demonstrate the

motivations behind the basic strategies that have been used to handle

the consistency problem in multicache systems. There are two such

strategies. First, we could restrict the "encacheability" of data

items, such that only those data items that cannot cause

inconsistencies are allowed to move into the cache level. tr example,

words that can only be READ would be encacheable. On the other hand,

all data items that could potentially cause inconsistencies are

prohibited from moving into the cache level, and thus all accesses to

them ar3 done through main memory. Word (A) of the above example

would, therefore, fall in this category, and so it (under this

7A j

12

strategy) would have been prohibited from moving into either cachel or

cache2. Thus accesses to word (A) by both CPUl1 and CTU2 would have

been made to its single copy in main memory, and no inconsistency would

have resUlted. The price paid, however, is that accesses to word (A)

are now done at main memory speed and not at the faster cache speed.

The second basic strategy that has been employed doesn.t put any

such restrictions on moving data items into the cache level. The idea 9here is to "invalidate" a cache line when there is a risk that its

contents have been modified elsewhere in the system. When a cache line

is invalidated (by setting, for example, a flag in the cache directory)

it is considered not in the cache. Referring again to the above

example, when the value of word (A) is modified in cachel (and in main

memwry) to (V.), word (A) in cache2 is invalidated. Thus if CPU2

happens to request word (A) at a later time, it will have to access it

from main irumory since the invalidated version (V) in its cache is

oonsidereU not to exist. CPU2 will, therefore, access the valid value

In the remainder of this section we will present more specific

approaches to handle the consistency problem in cache-based systems.

In particular, three approaches that are proposed in the literature

will be discussed, namely, the "Broadcasting," the "Store-Qontroller,"

and the "Multics" approaches. The "Broadcasting" and

"Store-Oontroller" approaches are based on the second strategy

discussed above. They, though, implement it differently. The

"Multics" approach, on the other hand, is based on the first strategy.

LAU'&',. --- - ----- ~ -

13

III.l. The "Broadcasting" A&h2roach

The idea here (as mentioned in the second strategy above) is to

invalidate a cache line when its contents is modified in another cache

in the system. When a cache line is modified its address is

broadcasted throughout the system so that other caches sharing the line

would invalidate their now outdated version of it.

Every cacie is connected to an auxiliary data path over which all

other caches send the addresses of lines to be modified. Each cache

constantly monitors this path and executes a searching algorithm on all

addresses thus received. In case of a "hit," the affected line is

invalidated.

When a CPU needs to read (or write) a wcord that doesn.t exist in

its own cache, the word will be seized from main memory, To ensure

consistency main memory must always be kept "up-to-date." This (in

general) can be guaranteed only if a store-through algorithm (in which

the cache and main memory are updated simultaneously) is used. As was

argued before, such a restriction is not without its cost: A decrease

in processing speed as store operations become limited by the speed of

main memory.

Another major drawback of this approach is that the invalidation

data path must acimdate a very high traffic. The mean write rate for

most pcocessor architectures lies in the range between 10 and 30 per

cent (Censier and Feautrier, 1978), and thus if the number of

processors is higher than two, the productive traffic between a cache

14

MW its aL=ojcatW processor may be lower than the parasitic traffic

between the cache and all other caches. This explains why this

approach has been confined to systems with at most two caches (Censier

and feautrier, 1978).

111.2. The "Store-Controller" Approach: )

The basic idea here is the sam as in the "Broadcasting" approach

i.e. to invalidate a cache line when its contents have been modified

elsewhere in the system. It is in the implementation that the two

approaches differ. Here, a "Store-Controller" SC (see Figure 3(a)) is

used at the cache level to keep track of every line in every cache.

The store-controller "knows," not only which lines are in which cache,

but also which caches share any single line. When, therefore, a line

that is shared by two or more caches is updated (i.e., modified) in one

of them we do not now need to broadcast invalidation requests to all

caches. We, instead, use the information in the store-controller to

send invalidation requests to only those caches that are sharing the

updated line, if any. In other words, the motivation behind using the

store-controller is to filter out all unnecessary invalidation

requests.

We will present, in a flowcharted form, an example implementation

of this approach which is largely based on Tang.s proposals (Tang,

1976). In this implementation, If a processor wants to write and the

line is not found in its cache, then the line is always brought to the

cacie so that the processor can always write to cache. The storealgorithm used is the "store-replacement" algorithm. This means that

when a line is modified in a cache, main memory is n concutrently

updated. It is updated later when thM line has to be replaced in the

cache or when other caches need to have the line.

Figure 3(a) shows how the store-controller fits conceptually into

the cache organization. Figures 3(b) and 3(c) show possible layouts

for the directories of both the cache and the stor.•-controller. The

"status" column of the cache directory shown In Figure 3(b) %eeds some

explanation. The status of a line can be one of three things:

1. Private: For a line which has been modified (with respect to

main memory) or is going to be modified. A private line exists in

only one cache.

2. Non-Private: For a line that exists in one or more caches and

which has not been modified with respect to ,main memory.

3. Invalid- Fir a line that has been modified elsewhere and thus

becomes outdated. An invalid line is oonsidered "not in the

cache."

In Figures 4 and 5 the READ and WRITE operations are illustrated

respectively.

The boo major drawbacks of implementing this approach in a highly

parallel multi-processor computer system such as INFCPLMC where the

Snrnber of caches is relatively large are:

16

CpýPU2 CPU3

ache irecor kacheIDi rec to ~ Zache virec tor4

Store Controll r

Main Memory

(a)

Ma.n Memory Addres Cache Line Location Status

(b) Cache Directory

. --- MndI fI rdM,,in Mem. A^t i,'.'acho [,I ch,2 Flag

One bit per cache is Set if li eset as follows: is a pri-vate li e

1 if line in cac]e inva0 if not __

(c) Central Directory

Figure (3)

AREAD instruction~

to CPU

Line in Cache Check •Line not in Cache

-. "Cche Dirac,

READ itV READ it Another CacheCont aPr -Checks its^3 a Private DDirec.

LineLEND - .. .. .

Change Status ofline in CacheDirec. from Privat Ito Non-Private No

Unset "ModifiedFlag" in Central 1

Directory.•

Transfer Line toLMain Memory

Transfer Line from1Main Memory to

Cache

Figure (4 ). I updatc Directories

11EAD Li ne

ENDL, A&,,, _z 9 &.i4 -----------.------------ -----. _-

18

A WRITE instto CPU

Line in Cach Check Line not in Cache

NonFPrivatte -As a Private ecC One or more

-~nt , Dir"•4 Caches contain• "" Pivae?• ILine " -j.... 2-Lne aM a Non-

ChanSqc STranser Lin Invalidateio Caivthe DiRj. to tol Min Memo y Line in all

__2 Invalidate ...- pda"t °° :: e;S~et "Modi- LieinCc irectorie

fied Flag" -END I

UpdateUDirectories

Other Caches Contain

Cent. DI

Invalidate Transfer Line froLine in Main Memory to

Line other Caches Cacheb not

inotherCaches Update Cent- Update Cache Direc

ral Drec.UpaeCceDrral Direc. to indicate Priva eLine

Set "Modified Flain Centr. Dirc

[ITE totCaCache .• IWRITE to Cache

Figure 5

19

1. The size of the centrP1 directory beazmes too large, which

means increased time in pcocessing it and increased costs in

bui ldi ng it.

2. The store-controller could beazme a bottleneck in the system

as the traffic between itself and the caches beaomes very large.

111,3. The "4JLTICS" Approach

This approach, which is used in Honeywell.s Multics computer

system (Greenberg, 1978), has two important features. Firstly, the

Multics cache is a "store through" cache. This means that the cache

and main memory are updated simultaneously. The second feature is that

the system address space is divided into segments, each of which has

associated with it names and per-user access rights which govern the

ability of each potential user of the segment to read and/or write its

contents (i.e., words).

Every segment is known by the system to be either "writable" or

"non-writable." The rnn-writable class of segments, that is those to

which no users have write access, is an imrportant and statistically

significant one in MJLTICS. All procedures, including all parts of the

operating system, utilities, libraries, translators, and so forth, fall

into this category. These segmer.ts are "encacheable" by all

processors. This means that their contents (or words) are allowed to

"migrate" to any or all caches. Notice that when such words find their

way into a cache (or more than one cache) there is no possibility for a

20

consistency problem to arise (for such words) since no processor can

write or moxify them.

oFr the other class of segments, those which are writable, the

situation is slightly more oomplicated. For a segment falling in this

category, there are tnree possible states:

I. One or more processes (users) are accessing the segment but

none of them has a write access to it. The segment is encacheable

by all processors but no consistency problem will arise.

2. One or more processes are accessing the segment, with at least

one of them having write access. The segment beoames

non-encacheable i .e., its words cannot migrate to any of the

caches. The consistency problem will not arise here also since

there will only be one copy (i.e., the one in main memory).

3. Only one process that has write access is accessing the

segment. The segment is encacheable only to that process. And it

is only in this third state that the consistency problem could

possibly arise. Consider the following scenario:

Assume a certain segment S1 is addressable by only one

process PROCl which is currently running for the first timn

on processor CPUI. Assume also that PROCI has write access

to SI. Thus SI is encacheable by CPU1. As long as CP•l runs

PROCl, words of SI may be drawn into CPUI.s cache and be

modified by CFUl with no problem. During this period, no

other processor can address SI, for by assumption it is

aduressable only by PRCCI, which is uniquely associated with

21

CPU. during the interval in question. Thus, it is impossible

for other processors to draw words of S1 into their caches as

long as PROC1 is associated with CPUI. Until CPUl leaves

PROCI, there is thus no danger of words of S1 in CPUI.s cache

becoming outdated, as no other processor can address Si.

Similarly, there cannot be words of Si in any other jprocessor.s cache, for by assumption, CPUl was the first and

only processor to run PROCl. Thus, there is no danger that

modifications to words of Si1 made by CPU1 can invalidate

copies in other processors. caches, since such copies camnot

exist. Potential difficulty arises when CPU1 has left PROC(,

and scme other processor attempts to run PROCl. The first

time this happens, there is no problem. Since all words in

main memory are accurate, by virtue of the store-through

cache, another processor, say CPU2, :annot have inaccurate

data, ir main memory is accurate, and we have just shown how

CPU2.s cache may not contain inaccurate data. However, while

PROC1 runs on CPU2, CPU2 may modify words of Si in its own

cache and in main memory. Still there is no problem. Main

memory is accurate, as is CPU2.s cache. This can go on like

this as long as processors which have never ran PROCI (sinceIPROCI started running) run it. However, the first time some

processor which has already ran PROCI since then attempts to

run it again, the scheme appears to break down. Assum CpUl

attempts to run PROC1 for the second time. There may be

words of Sl in CJl1.s cache from the previous time CpIu. ranPROCL. Some of these words may have been rmrdi fied by PROCi

while it ran on CPU2. Thus, these words are accurate in main

22

memory anu in CPU2.s cache, but are inaccurate in CPUl.s

cache, for CPU2 had no way of knowing or acting upon the fact

that they were in CPUI.s cache.

The MULTICS solution to "its" oonsistency problem is simple:

clear the cache of a processor upon entering a process if it was not

the last processor to run that process. This is performed by the

MULTICS process dispatcher, with a special processor instruction that

accxmplishes this task. This ensures that no words of any per-process

writable segment will be found in a processor.s cache if there was any

possibility that any of those segments may have been modified by other

processors. The operating system maintains in the control block

describing each process the identity of the last processor to have run

this process thus this check is easy to make when a processor is

dispatched into a process.

Fran an INFOPLEX-type-system view point, there are three major

drawbacks to the MULTICS approach, all of which are performance

related:

1. The class of segments that are non-encacheable can be of

significant size, and thus dampening the performance gains sought

by the cache organization. Note that in MULTICS there is the

significant class of segments which we termed "non-writable" aO.i

which are encacheable. In data base computers, like INFOPLEX,

this category which contains things like utilities, libraries, and

translators will probably be much smaller, and thus decreasing the

portion of encacheable segments. (There could, however, be

- - - -

23

special applications where this cdoesn.t apply e.g., READ-only data

base applications.) In addition, the LMULTICS approach doesn.t

discriminate between two processes who although both have write

access to a segment, one actually exercises the "right" and

modifies the segment, while the other doesn.t. In both cases the

segment will become non-encacheable if there are other processes

that are also reading it. This means that in both cases the

system.s speed will be slowed down to the speed of main memory.

2. Using a store-through algorithm has its own performance

disadvantages. As was stated earlier, it limits the speed of

store operations to that of main memory, and thus defeating the

very purpose of using a cache.

3. The procedure of clearing up the cache is also a wasteful one.

Note that a cache is always cleared if its processor wasn.t the

last one to run the process. All access requests to the cleared

cache contents that would have otherwise been serviced by the

cache, must now wait for transfers from main meemory. This will,

obviously, slow down the system.

111.4. Conclusion

We have analyzed the three common approaches that have been

proposed in the literature to handle the consistency problem in

multicache systems. We have considered each approach in the context of

the INFOPLEX framework presented earlier. Within this context we were

able to identify some major drawbacks in each of the approaches.

24

In the next section we propose a new approach to handle the

cacke-consistency problem in multi-processor architectures. In Part

(V) we will evaluate the performance of this proposed approach.

I.

I

25

IV. THE "'NO-C.PN E / PENDI )TRANSACTICN BUS" APPROt]I

IV. 1 Introduction

We argued in the beginning of Part (I) that in a single-cache

memory system (Figure 6(a)), where there is only one access path

between each level, no cache-consistency problem will arise. Cnce two

or more caches are used, however, the potential for the problem

develops.

The "traditional" approach in employing caches in multi-processor

sytems has been to basically replicate the structure of Figure 6(a)

for eacti of the processors, as shown in Figure 6(b), and then solve any

problems that arise. A problem that arises, of course, is the

cache-consistency problem, and the three basic approaches that have

been developed to handle it are those of Part (III).

Implementing any of the three approaches will obviously constitute

scme processing overheatd. M an example, consider theS*1I•

26

-CI'U/Cache Bum

Cache

Cache/Main Memory Bus

Main Memory

(a)

I

CI~t~i(CPU

tCPU/Cache Bus

Cache

Cache?/Main M.m y "-us

Memory Memoryain MerModule Module Module

(b) Private Caches (PC)

CPUI CU U

CPU/Cache Bus

Cache Cache Cah I omo ahCaIhe Commn Cache

m- Lory Memorr - - .... .-- -. - -- -

[........... Module Modulehg Main Mamory

4o) Comon Cachem (CC)

Figure (6)

27

"Store-(bntroller" approach and refer in particular to the flow-chart

of Figure 5. When a CPU needs to rndi fy a line that exists in its

caclh, and the line happens to be a "non-private" line, then the status

of the line is first changed to "private" and the "modified flag" is

set. Next, the Store-(bntroller.s directory is clhcked, and if the

line is found not to be shared by other caches it is modified. If,

however, it happens to be shared, then messages are sent to the

appropriate caches to invalidate the line, the central directory is

updated, and finally the line is modified. How much overhead did we

incur to maintain consistency? Well, compare the above steps with

those needed for a uniprocessor system where the cache-consistency

problem does not arise. In such a system, when the CPU needs to modify

a line that exists in its cache, it simply proceeds and modifies it.

None of the above checks, updates and messages are needed.

The concern over the processing overhead needed to maintain

consistency is a legitimate one. Such overhead does undoubtedly dampen

the performance gains (e.g., system throughput) which are sought by

introducing caches in the first place. Attempting to minimize this

overhead, therefore, seems an attractive direction for research work.

In this research endeavor, we are basically proposing an

architecture that eliminates the processing overhead associated with

handling the multicache-consistency problem. The architecture is

depicted in Figure 6(c). The important distinction between this

organization and that of Figure 6(b) is that, here, each of the cache

modules is accessed by, and thus services, all of the processors.

Thus, just as main memory is common to and shared by all processors,

le

28

the cache level in our proposed architecture is also coo_.n to and

shareu by all processors. In such a scheme there is no need to store

more than a single copy of any data item at the cache level. This

eliminates redundancy. And with no redundancy at the cache level, no

inconsistencies can obviously arise, which in turn eliminates the need

for mechanisms to maintain cache-consistency and the overhead

associated with such mechanisms.

It is important to note that this architecture preserves the basic

intent behind cache-based systems, namely, using a high speed memory

level between the CPU and main memory in order to bridge the speed gap

between the boo. It, however, introduces the concern as to whether the

CPU/cache bus (see Figure 6(c)) can haidle the needed high volume of

traffic. Comparing the Private Cache (PC) architecture of Figure 6(b)

against the Common Cache (CC) architecture of 6(c), we note that the

CPU/cache bus in PC handles the transaction load generated by a single

processor where as in CC it handles the load generated by all the CPUs.

We can, therefore, expect that the load on the CPU/cache bus in our

proposed (CC) archi tecture to be close to N-times that of the PC

architecture, where N is the number of processors. Of course, the load

will be scxewhat less than exactly N-times because in PC some overhead

traffic will be generated by the mechanism used to handle the

cache-consistency problem, and which will not be need-d in (CC).

Thus, to re-state, the Conamm Cache approach eliminates the

cache-consistency problem (i .e., by eliminating private caches) but

there is a fundamental concern that the CPU/cache bus will develop into

a bottleneck that degjrades the system.s performance. In the next

29

section we introduce the Pended Transaction Bus Protocol, which we

believe provides the basis for a viable solution to the problem.

IV.2 The Pended Transaction Bus (FT8)

The degree of bus utilization of a precessor is a function of the

physical characteristics of the bus, and the protocol used on it. The

physical characteristics of the bus which include the length, voltage

levels, impedence, termination, capacitance, noise immunity, and

overall reflection characteristics, affect its operating speed.

Ultimately, any bus is limited by the speed of electricity along a

conductor (0.6 to 0.9 nanoseoornds per foot typically, depending on wire

characteristics). Careful electrical analysis and physical layout can

optinize these parameters to achieve reasonable electrical speed.

Given an electrical bandwidth of the bus, as formed by the bus

wires and the driving logic, the actual data bandwidth becomes a

function of the protocol used on it. The traditionally high bus

utilizations of multi-microprocessor architectures that employ the

single bus as their interconnection scheme is primarily a result of the

bus protocol used, and which is called the "master-slave" pcotocol. In

such a protocol, the CPU (master) asserts a request on the bus, and the

memory (slave) that receives it does the appropriate action (e.g., a

MEAD), arx] then responds (with the requested data). The bus is viewed

as "busy" during this entire time. A large portion of this time is

actually spent waiting for the slave zo complete the requested action.

- .i-.-----'~~1~---

m .Sa

JU

The electrical and logical time to tranasit the actual request and

return the renly are a relatively small portion of the total bus usage

time. The period of time between the request and the acknowledge is a

wait interval, and nm useful work is done with the bus during this

time. Bus utilization could be significantly reduced if we released

the bus during the wait interval. To do this we split the transaction

into two parts, a request part and a reply part. The master requests

the bus, and upon being granted it, sends the request to the slave at

maximum speed. The slave acknctvledgles reception of the request, and

starts to work on it. The processor then releases the bus and waits

hi for the results. When the slave completes its task, it asks for the

bus, and upon receiving it sends, the reply back to the originating

master at maximum speed. The time between the request and reply can be

used by other masters and slaves to transfer other messages over the

bus. Because the slave stores the inamnpleted request from its master,

it is called a pended transaction, and the bus protocol, developed at

M.I.T., is called a Pended Transaction Bus (PlrB) protooDl (Toong, et

al, 1 SO) .

In the above discussions, it was presumed that the slaves were

always able to accept the master requests when presented. This would

imply that each master was using different slaves to guarantee such

separation. Given the shared nature of the cache data, it is likely

that two (or more) masters may make requests to the same cache-memory

slave. The simplest scheme to resolve this contention problem is for a

busy slave to refuse a new request and make the requesting master retry

at a later time when the slave beoomes free. This, however, is

wasteful of bus bandwidth, since the time spent to send a request out

31

the first time, only to be refused, is not useful. Furthermore, the

slave may still be busy when the master tries again later.

Goodrich (Goodrich II, 1980) has proposed putting queues on the

inputs of the slaves t buffer requests. That is, any requests that

come while the slave is busy would be placed in a first-in-first-out

queue, and these requests would be serviced in order when the slave ia

able to handle them. Such a sclhme would reduce the bus load to what

is actually needed for tranrldssion, without any extra cycles. In

addition, it would also improve the slave response time as seen by the

master over the simple scheme described before, since the slave would

have the request in its queue, and would service it as fast as it

could, not just when the master is finally successful in transmitting

it. Note that a queue overflow need not be fatal in this. system. It

can be treated like a refusal in the pLevious scheme, and would simply

require the master to retransmit the request later. If the quete size

is sufficiently large, such refusals would be rare. Finally, for best

slave throughput, there should also be a queue on the output of each

-"slave.

IV.3 An Application: TVe INFOPRMD( Storage Hierarchy

In the above sections we have introduced an approach to the

cache-consistency problem in highly parallel multi-processor computeri ~systems, which we will call the "Common-Cache / Pended Transaction Bus"

appyoach (CC/PrB). Its too distinctive features are: Firstly, the Ithe 'Qxun~nCace / endd TrnsationBus

32

conventional one to one relationship between processors and caches

i.e., where each CPU has its own private cache, is replaced by an

N-to-M relationship, where a pool of cache-irodules is commonly shared

by all processors. Secondly, we propose the use of the Pended

Transaction Bus (PrB) as the interconnection scheme that connects the

processors and the cache-modules.

In this section we describe how the CC/PTB architecture can be

incorporatec into the storage hierarchy of the INFOL data base )computer. Schematically the proposed architecture would look like

Figure 7(b). We will, henceforth, refer to this architecture as

INFOPt•E/C (for Common Cache) while referring to the original Private

Cache ocganization (shown in Figure 7(a)) as the IOPLEX/PC

archi tecture.

The key INFOPLEX storage hierarchy operations are the READ and

WRITE. in INFWLEX/PC two strategies are reeded to implement these

operations, a strategy for the cache level and another for all other

levels. The reason for this is manifested in Figure 7(a), where it can

be seen that the organization of the cache level is different fran the

other levels of the hierarchy. In IN LEX/CC, on the other hand, the

cache level organization is very similar to that of the lower storage

levels. As a result, the strategy used to implement the READ and WRITE

operations could be the samew for all levels of the hierarchy with very

minor provisions to account for the few differences that do still

distinguish the cache level. (For example, the absense of an MRP at

the cache level.)

L•I~A a•j

33

CPUt CPU$

Level

SSIC

Busr--mory

L Level

(a) Private Cache (PC) Architecture

CPUI CPUS

IBSGlobal

Bus

SL A iUSin Memory Level

(b) Coeuon Cache (CC) Architecture

LSUSI: Local BUS at level '1) - cache level

LBUS2: Local bUS at level (M) - min mem"ry level

SC: Storage-Corntrollet I needed only If 'Storage-Controller* approach

Is Impleterted

SLC: Storage Level Controller - It couplet a stor&;e level to theGlobal bus (I.e. serves as a gateway beNen levels)

MAP: Remry Roqjost Processor -perform., addr•ess empplag function

Li2ure_7

34

We will attempt now to explain briefly how the basic RAD and

WRrTE operations will be implemented in the INFOPLEVO(/C architecture.

To a large extent this will be based on the work of Lain (Lam, 75.),

where a much more detailed and complete discussion is presented.

All READ and WRITE operations are performed in the highest storage

level L(l) i.e., the cache level. If a referenced data item is not in

L(l), it is brought up to L(l) fran a lower storage level via a

READ-'rHROUQH operation. The effect of an update to a data item in L(l)

is later propagated down to the lower storage levels via a nuTber of

SIURE-BEHIND operations.

When a READ request is issued by a processor, the cache level is

checked to see if the requested data is in it. If the data is found in

a cache-module, it is retrieved and returned to the processor. If,

however, the requested data is not found in the cache level, a

READ-TOUGH request is queued to be sent to the next lower level L(2)

via the Storage Level Controller (SLC). As mentioned in Part (II) the

SWC serves as a gateway between the storage levels of the hierarchy.

At a storage level, a READ-JMUROGH request is handled by the

Memory RL quest Processor (MRP). An MRP perfotrms the address mapping

function. It contains a directory of all the data maintained in the

storage level. Using this directory, the MRP can determine if the

requested data is in one of the storage devices at that level. If the

data is not in the storage level, the RFAD-FHRCI•QI request is queued to

be sent to the next lower storage level via the Storage Level

Controller (S). .

LAI

35

If the data is found in a storage level L(i), the MLP maps the

main memory address of the requested data item into its real address in

L(i). This real address is used by the appropriate storage device to

retrieve the block containing the data and then passes it to the SIC.

The SLC would then broadcast the block to all upper storage levels by

dividing it into fixed size packets. Each upper storage level has a

buffer to receive these packets. A storage level only collects those

packets that assemble into a sub-block of an appropriate size (peculiar

to the storage level) that contains the requested data. This sub-block

is then stored in a storage device. At L(l), the sub-block (i.e., the

cache line in this case) containing the requested data is stored, and

the data is finally sent to the processor that initiated the request.

In a WRITE operation, the data item is written into a

cache-module, and the processor is notified of the completion of the

WRITE operation. We shall assume that the data item to be written is

already in L(l). (This can be realized by reading the data item into

L(l) before the WRITE operation.) A STORE-BEHIND operation is next

"generated by the cache-module and sent to the next lower storage level.

SINFOPLFX uses a two-level STORE-BEHIND strategy. This strategy ensures

that in a hierarchy with N levels, an updated block will not be

considered for eviction from a storage level L(i), until its "parent"

blocks at levels L(i+l) and L(i+2) are updated. This scheme will

ensure that at least two copies of the updated data exist in the

storage hierarchy at any time. The motivation behind using such a

strategy is two-fold. Firstly, the reliability of the system is

enhanced because at least two copies of newly written data are always

maintained until the data is "securely" copied at the lowest level of

4I

36

the hierarchy. Furthermore, the STORE-BEHIND strategy allows for the

updating of lower storage levels to be carried out at slack periods of

system operation, thus enhancing performance.

In the above discussions we didn.t show how a specific

cache-rwodule can be correctly selected to handle a READ or a WRITE

operation of a particular data item. In all but the cache level this1.function is performed by the Memory Request Processor (MRP), which maps

main memory addresses into their physical equivalents at a particular

level. What we need, therefore, is to augment the

processors/common-cache interface to translate main memory addresses to

physical addresses in the cache-modules.

There are two possible places to perform t-he translation

operation: at the processor interface and at the cache interface. At

the processor interface, the address gets translated before it reaches

the bus. This requires that each processor be "informed" about all

current lines at the cache level. This information would be used to

translate all the main memory addresses of these lines to their

equivalents at the cache level. An advantage of this is that the

translation can be performed while the procesor is arbitrating for the

bus, and if the translation operation is fast enough, it can be done

without any access time penalty.

In the second possible scheme the address gets translated at the

cache-module interface. What would be needed here is a mechanism

incorporated in the recognition circuitry of each cache-module that

would translate the main memory address put on he bus, and that would

37

accordingly decide IF the desired data item is present in thL cache

level and if so WHERE i.e. in which cache-module -nd in which location

in it. Both these should be done as fast as possi I1h. Tag directory

schemes incorj:orating set associative mapping are suggested by Mattick

(Mattick, 1977).

In Part (V) we will evaluate the performance of both these schemes

when implemented at the cache level of the INFOPLEX storage hierarchy.

4 • -' )

S.. . .. ....

38

V. EVALUATING THE PERFORMANCE CO THE "Ca44ON-CACHE / PENDED

TRANSACTION BUS" APPRC"

V.1. Introduction

To evaluate the performance of our CC/PTB approach we used the

INFOPLEX multi-level storage system as our test case. Very slight

modifications to the existing design were needed to incorporate

"CC/PTB" into the INEFLEX storage stucture. For example, instead of

using the four level memory hierarchy studied previously by Lam (Lam,

197%) just two levels, the cache level and main memory were sufficient

for our purposes.

The "CC/PrB" approach was o~mpared against two benchnarks.

Firstly, we evaluated the performance of the "Store-Controller"

approach as a representative of the traditional approaches. Selecting

the "Store-Controller" approach to evaluate our scheme against cannot,

however, provide an effective answer to the question of how "optimal"

either approach is. What is needed is an "absolute" benchmark against

39

which both approaches could be judged. For this purpose we evaluated

the performance of a traditional private cache architecture assuming no

overhead for handling the cache-consistency problem. This, then,

constitutes the "performance ceiling" that no cache-consistency

handling mechanism (for INFOPLED) could possibly exceed. It is also a

valuable reference point in that it tells us how close we are to an

"optimal" solution.

V.2. The Evaluation Tool

The evaluation was perforred by producing a simulation model in

GPSS. We developed four separate GPMS programs:

1. Program "OPr" ignores the cache-consistency problem, and thus

incorporates no overhead for handling it. It provides us with a

ceiling an performance.

2. Program "SMCR" incorporates the "Store--Controller" approach.

3. Program "OC/PIWC" incorporates ..he "CC/PTB" approach with the

translation operation performed at the cache-interface.

and 4. Program "OC/PTB/P" incorporates the "CC/PrB" approach with the

translation operation performed at the processor interface.

The GPSS code for each of the above four programs is presented in

a separate Appendix (Appendices I through IV). All four programs are

highly detailed simulators. A widely used index that characterizes the

degree of detail of a simulation model is its resclution in time, which

Lh i-.J. 77I'77J ~ TZj7

40

is defined as the shortest interval of simulated time between two

consecutive events being considered (Ferrari, 1978). In our four

simulations, the resolution in time is 10 nanoseconds. Such detailed

simulators are likely to be more accurate and to have broader field of

application than less detailed ones. However, they are certainly more

expensive to design, implement, test, document, and use.

In addition to deciding on the degree of detail of the models,,

another basic decision had to be made concerning the workload that

would drive the simulations. We decided to perform our measurements in

an operating environment that is not at all uncommon in present-day

computer systems, namely, running at capacity. Under such conditions,

there will always be at least one transaction in the system.s input

queue(s) waiting bo be serviced. In such a case, the performance

characteristics that are of interest to us, such as throughput per unit

time, beame insensitive to the distribution of job arrivals.

Before concluding this section we would like to emphasize some of

the structural differences between the four nrodels, as well as some of

their common characteristics. The basic structural differences are

highl ighteo in the diagrams of Figure 8. In Figure 8(a) a

two-stoL-.ge-lcvel version of the INFOPLhC storage hierarchy proposed in

(Lam, 110b) is shuwn. To support the "Store-Controller" approach an

SC "box" (dotted in Figure 8(a)) is added to the architecture. In

Figure 8(b) the architecture we proposed in Part IV to support the two

versions of the "(OZ/PTB" approach is incorporated into the INFOPLEX

storage system. Notice that in the CC/PrB architecture we deliberately

maintained the same number of caches (5) as in Figure 8(a). It is

a.

41

CPVI CPU%

Cache I Cache$5 Cache

SC'klobal I CIus ...

PlainL7 "ems ry

SL LBLevel

(a) Private Cache (PC) Architecture

CPUl CPes

sic ILBUSl

al obalBus

(b) Conan Cache (CC) Architecture

LBUSI: Local BUS at level (1) - cache level

LOUS2: Local BUS at level (2) - main memory level

SC: Storage-Controller - needed only If "Storage.Controller" approachis implemented

SLC: Storage Level Controller - It couples a stormse level to the

Global bus (I.e. serves as a gateway bet.een levels)

MRP: Pe'ory Request Processor - perform address napping function

Figure (8)

42

important b realize that while a 1:1 relationship between the CPUs and

the caches is inherent in the Private Cache (PC) architectures, such is

not the case for the Common Cache (CC) architectures. We, however,

chose to maintain the same number of caches in both architectures (and

four simulation models) to neutralize it as a factor that might affect

performance.

Finally, there is a set of ommion characteristics that are shared

by all four models, these are:

Degree of Multiprogramming of a CPU = 10Bus Width = 8 bytesSize of Transaction without data = 8 bytesSize of Transaction with data:

at Level 1 = 8 bytesat Level 2 = 64 bytes

Size of Data Buffers = 10 64-byte transactions

Finally, the Pended Transaction Bus (PTB) protocol will be used in

all three architectures (and four models). That is, we are committed

to the PTB protocol in INFK)'LEX as a result of our experimental work

(at M.I.T.), which demostrated its performance advantage.

V.3. The Simulation Fxperiment:

.,r criteric- for measuring performance in this experiment, and

which will also serve as our dependent variable, will be the total

system throughput as measured by the total number of transactions

processed per unit of time. As for the independent variable, there

43

were two candidates: (1) the hit ratio, which is the percentage of

time that a referenced data item is found in the cache level; and (2)

the transaction mix i.e., the percentage of READ requests versus WRITE

requests.

The latter was chosen because we felt it is in a sense, a more

independent variable. What we mean is that the transaction mix is

largely a function of the use of the system and as such we have very

little control over it. The hit ratio, on the other hand, is

system-dependent. It can he affected by manipulating such system

parameters as the total size of the cache level and the size of the

individual cache blocks.

Thus, the hit ratio will be held constant, throughout the

experiment, at a value of 0. 90 for READ requests and 1.00 for WRITE

requests. (That is, we are assuming that a WRITE of a data item is

always preceded by a READ to it.) The transaction mix i.e., the

perceritage of READs, will be allowed to vary in the range from 70 % to

90 %. This is the range in which the mean READ rate lies for most

processor architectures (Censier and Feautrier, 1078).

Both the hit ratio and the transaction mix are variables that

affect the performance of the system but which are independent of the

mecnanisms used to handle the cache-consistency problem. There are,

however, variables that are peculiar to the particular mechanism used,

and which influence the system.s performance significantly.

In the "Store-4Yontroller" approach the degree of sharing between

LAW",----- ~ 7T~L7JY;i.z:zi-n ~-

44

the caches is by far the most important such variable. We evaluated

the "Store-Controller" approach under two operational modes, a

"Pessimistic" mode and an "optimistic" mode. Under the optimistic mode

no sharing between caches takes place, yielding an upper bound on the

performance of this approach. Under the pessimistic mode a high degree

of sharing will be introduced (50 % of the cache blocks will be shared

by more than one cache-module). This will then provide us with a

oznservative lower bound on the performance of the "Store-Conttoller"

approach.

With respect to the "CC/PrB" approach we ..11 also follow the

above strategy, and evaluate it under both "pessimistic" and

"optimistic" conditions. Under the optimistic condition we will assume

the load on the cache-modules to be uniformly distributed (i .e., each

of the 5 cache-modules carries 20 % of the load). And for the

pessimistic case we will assume the load to be linearly distributed

between 10 % at the least loaded cache-module and 30 % at the most

loaded.

V.4. The Hardware Parameters

:t is necessary to determine the speeds of the different hardware

components in the IDOPL'M storage hierarchy. In particular, what we

sought were the best possible I.85 projections for these speeds, since

the first hardware prototype of the INFOPLEX data base computer was not

expected before then.

i -

F.....745

The forecasts shown below constitute a realistic, but ambitious, Jscenario for lS85. In other words, they incorporate the fastest

possible owuonents that we envision as being available (and

appropriate) for building an INFPLFEX in 1985. (Only a subset of the

parameters are shown.) The bus speed (b) is 10 nanoscooris, the cache

READ/WRITE spkxtd (c) is 20 nanoseconds, and the rcmaining parameters

are multiples of the latter, as shown. In other words, we used the

cache speed as a logical building block to "build" the forecasts of the 4

other hardware components (other than the bus). For example, if the

caclhe REAO/WRr E speed is 20 nanoseconds the READ/WRITE speed of main )memory would be 10 x c = 200 nanosecoonds. The parameter values are:

Bus speed (b) = 10 nanoseoondsCache READ/WRITE speed (c) = 20

Directory LIookup (2c) - 40Directory Update (4c) M 80Storage Level Controller (SW.) sp~Ed (2c) = 40Main Memory READ/WRITE speed (10c) = 200

Three other scenarios will be tested. The bus speed, in all

three, will remain at 10 nanoseconds. The cache READ/WRITE speed,

however, will take the increasing values of 40, 60, and 80 nanoseconds.

And finally, the remaining parameters will maintain their relative

values in terms of n, the cache READ/WRITE speed. For example, when

the cache RED/WRITE speed (c) becomes 40 nanoseconds the READ/WRITE

speed of main memory will be 10 x c - 400 nanoseconds.

Selecting several "good" 1585 scenarios reflects the fact that

different options, in building an INFr3PLEX, will be available to

aXaccomoxdate the cost/performance tradeoffs. rund because the bulk of the

- .J ZJ

46

system.s cost lies largely in the storage omxnents, the variations in

the forecasts between the different scenarios involved mainly those

o4 4.qnents.

V.5 Analysis of the Simulation Results:

ASt mentioned previously, our dependent variable in this experiment

is system throughput. It is also :he criterion we use to evaluate the

perforrk-ince of the cacho-consistency handling mechanisms.

The throughput of any computer system is bounded by one of two

factors, nawely, Lottlenecks in the system or the transaction load on

it. In section V.2 we mentioned that our models will operate in a

maximum load closed-loop environment, where new transactions are

continuously generated to replace serviced ones. In such an

environment, bottlenecks will definitely arise, limiting the throughput

of the system.

Thus, in the process of interpreting the simulation results, we

wish Wo identiLy and analyze system bottlenecks. In such an analysis,

one neeu. tu cotisider four major factors that directly influence the

evollution of a particular system component into bexooming the system.s

bottleneck. The four factors are:

1. Architecture: Oonsider for example the (a) PC and (b) CC

architectures of Figure 8. In PC the local bus of level 1 (the

47

cacne level) handles only the communications between the five

caches and main memory. In the "CC/PrB" architecture the cache

level bus must handle, in addition, the communications between all

the processors and the cache-modules. The chances for the local

bus of level 1 (LBUSI) to become the system bottleneck are,

therefore, much higher in the CC architecture than they are in PC.

2. Algorithms and Protocols: The set of algorithms supported by

an archi tecture and the protocols and meclanisms used to implement

them widoubtedly influence the utilization patterns of the

different architectural components. Consider for example the

central role of the Store-Controller "box" in implementing the

algorithmi-s that support the READ and WRITE operations in the

"Store-Controller" approach. Such a role will inevitably lead to

high levels of utilization of the Store-Controller.

3. Workload: We mentioned in section V.3 that we intend to

evaluate the "CC/PTB" approach using two different load

distributions on the cache-modules, a uliform distribution and a

linear one. In the latter case, the cache-module carrying the

highest load (i .e., 30 % of total load) could clearly develop into

a system bottleneck.

4. Hardware Components. Characteristics: The characteristic of

significance here is speed. For an example we refer again to

Figure 8. All communications between -level 1 and level 2 in the

INFOPLEX storage hierarchy go through the Storage Level

Controllers (SLCs) of both levels as well as through the Global

Bus. The time needed by the Global Bus to process any of the

communication messages (i .e., the time it takes to transmit the

message) will usually be less than that needed by the SWC to

-*------- -- *-------- ---

48

process the same message. This means that the SIC will always

saturate before the Global Buu can develop into a bottleneck.

As part of the standard GPSS output, the utilizations of all

hardware components (that are modelled as GPSS "facilities") are

printed. This allows us to precisely identify the system

bottleneck(s). An example is shown in Figure 9. In this case the

bottlenck is LBUSI (the local bus at level (1)) with a utilization of

approximately 100 %.

In Figure 10 our simulation results for the four scenarios are

presented. In the discussion that follows, we will identify the

different scenarios by their cache speed/bus speed ratios (n) (i.e., n

= 2, 4, 6, or 8) as it is a convenient parameter that completely

characterizes each.

For each technology scenario (i .e., n value) seven curves are

plotted. The single solid (-) curve portrays the performance of the

"CWr" model, in which no mechanisms for handling the cache-consistency

problem (and, therefore, no overheads) are i ncrporated. Thus the.

throughput of the "OFr" model, as is demonstrated in the figure,

provides a ceiling for all the other models. The "Store-Controller"

model.s results are depicted by the two dash-and-dot (-.-) curves (Sl)

and (S2). Curve (SI), which is always the dominating curve, is for the

ca.se where there is no data sharing between the caches, and (S2) is

when 50 % data sharing is introduced. And finally, there are two

curves for each of the two implementations of the "TC/PB" approach.

The two dashed (-...) curves (P1) and (P2) belong to the "CC/_TP"

-f-- - .---- ,--------'

49

I

FACILITY AVERAGE NUMBER AVERAGE SEIZING PREEMPTINGUTILIZATION ENTRIES TIME/TRAN TRANS. NO. TRANS. NO.G.JS .474 1199 3.9c9LBSI P. .999 6514 1.535 59LBUS2 .702 1702 4.126 72

0P11 .396 679 5.846 37ORPi2 .405 695 5.833.411 703 5.849ORP14 .369 629 5.879

RP 15 .398 681 5.856KRP1 .479 1199 4.000I•P2 .485 1214 4.000

R4P2 .2S7 719 3.997 52DRP21 .198 496 4.000

Figure (9)

5,9

CAL t Iu ) (In0

,•. .,,. \, -

J- ill

Cx3L

cc oI.- I X\

\.." -a,

..-.-. (~COD•

- I!

I.--- •*. , 0

0N.. ON~..c

N.• •." . . \\

CD

m. Cie

N1. \.\

I. -C*• --"-, I- *"

IN,-

N N.

-0

9x C6.

464 u c U

•- ',- - -C.)D

-CF g rON

- - ) I" . . .,.)

CD UD

R- z

a~ c to NW N %0 C -3 OD -D

Figjure (1.0)

51

implementation, while the two dotted (...) ones (CI) and (C2) are for

"OC/PTB/C." The two dominating curves in both implementations, namely

the (P1) and (Cl) curves, are for the case where the load is uniformly

distributed among all cache-modules. The two other curves (P2) and

(C2) depict the performances under the linearly distributed load.

As Figure 10 demonstrates, the "CC/PrB" architecture, in both its

implementations, cnmpletely dominates the "Store-Controller" in three

out of the four technology scenarios. Qnly in the first scenario (n

2) does the "Store-Controller" show a performance advantage and only

for high READ rates. The remaining part of this section will be

devoted to an analysis of the different factors affecting the

performance patterns demonstrated in Figure 10. Of particular help in

conducting this analysis is the information of Figure 11 depicting the

system bottlenecks in all the cases tested.

There are two basic patterns that deserve separate analysis. The

distinctive pattern of n = 2, and the pattern ommon among the three

other scenarios, n = 4, 6, and 8. To analyze the latter we will

arbitrarily pick the case of n = 6 as our analysis ve.dcle.

V.5.1. Case of n = 2

To many, the most surprising aspect of these results will be the

almost identical shapes of the "OPT" curve, and the "Store-Controller"

curve (Sl). To understand why this is so, we first note from Figure 11

that in both models LBUS2 (i.e., the local bus of level 2) is the

bottleneck. Now, the difference between the "Store-Controller" model

52

n 2 n=4 n=6 n=8

% READS % READS % READS % READSProgram I ! II 1L8 ,

70 80 185 90 70 80 85190 70180 18590 70180 85 9o

LBUS2 SLC

OPT . " / ... .... .. _ _,_

"SCLBUS2

STCR

LBUS2 SC

LBUSJ2 LUSU SLC

CC/PTB/Cin--LBUS1 SLC C1 * SLC SLC

LBU 2 LBUSI SLC

CC/PTF3/P

LBU,42 LBUSI SLC

*C is the Cache-Module carrying 30 percent of the load

Figure (11)

53

anm the "OPT" model lies in the overhead necessary to handle the

cache-consistency problem. All this overhead (in the

"Store-Controller" model) is in the form of additional operations which

all take place at the cache level. Thus, while removinq this overhead

(in the "(0pr" model) will necessarily decrease the load on the cache

level, it will not decrease the load on level 2, and in particular on

LBUS2. Thus, since LBUS2 is the bottleneck in both cases, and since

the load on it (by an "average" transaction) remains the same, the

performances in both are very similar. Flor the same reasons, the other

five models, namely, (S2), (P1), (P2) , (Cl), and (C2), have

performances close to that of "OPT" for % READs ,= 80 (i .e., they all

have LBUS2 as the system bottleneck).

The fact that LBUS2 is the bottleneck is itself, by the way, an

interesting finding. With a hit ratio of 0.90 for READ requests and

1.00 for MITE requests most of the "action" is clearly done at the

cache level. It would, therefore, seem that LBUSl must be saturated

before LBUS2. The answer goes back to the parameter values of section

V.2. Notice that the transaction size for level 2 (64 bytes) is eight

times larger than that for level 1 (8 bytes). This means that a single

transmission on LBUS2 will be approximately eight times longer in time

than a single transmission on LBUSl. Thus, although the absolute

number of transmissions on LBUS2 is less than that on LBUSl, the

utilization of LBUS2 in this case is higher.

Another interesting observation re]ates to curve (S2) of the

"Store Contruller." Curve (02) i3 always below curve (SI) because in

(S2) the "Store Controller" (SC) is the system bottleneck with a

Lh f ...--..... . .. •- ' - ..... --- -- =.,r---•

54

saturation point lower than that of LBES2. The reason this happens is

that the high degree of data sharing between the caches (and which is

"orchestrated" by the Store-Controller) means a higher utilization of

the Store-Controller by the "average" READA/rFE request. Notice also

that the bi curves (Sl) and (S2) diverge as the % of READs increases.

This is merely a reflection of the pattern by which the READ and WRITE

operations utilize the two respective bottlenecks, LBUS2 and SC.

(Straightforward analytic calculations would demonstrate that the

effective capacity of LBUS2 increases faster than does that of SC as

the % of READs increases.)

Next let us turn our attention to the "OC/PTB" results. Notice

first the deflections in curves (PI), (P2), (Cl), and (C2). This

happens because at approximately 80 % READs LBUSI (and not LBUS2)

beomes the system bottleneck in the four cases. However, even though

LBUSI is the bottleneck for both "CC/PrB/P" and "CC/PTB/C", "(C/PrB/P"

clearly domxinates. This simply is because an "average" READ/WRITE

request utilizes LBUSI less often in "OC/PWB/P" than it does in

"OC/FTB/C." For example, in CC/PTB/P when a requested data item is

found by the processor not to be in the cache level, a request is sent

to main memory via LBUSL. In CC/PTB/C, on the other hand, a CPU

request must first go to a cache-module (through LBUSl), only to be

found unavailable, and then forwarded by the cache-module to main

memory through LBUS1 again.

Notice finally the negligible effect that the load distribution on

the cache-modules has on performance (i.e., curves (Cl) and (C2) are

similar, as well as curves (P1) and (P2)). The reasons for this are

- -- ,J----

55

completely analogous to the ones explaining the close resemblance

between the "OPT" curve and curve (Sl). In other words, changing the

load distribution does not affect the utilization of LBUSl. As long as

no new bottleieck develops because of the change in the load

distribution, LBUSI will remain to be the bottleneck, and thus maintain

approximately the same throughput. The utilization of the cache-module

that carries the highest load under the linear load distribution for

both "(C/PrB/P" and "CC/PrB/C" turns out never to exceed 60 %, which is

far below the 100 % mark that has to be approached before it would

replace LBUS1 as the system bottleneck. This, in a sense, is very

comforting to know. It shows that the behavior of the models is, to a

large extent, insensitive to the shape of the load distribution on the

cache-modules that we used.

V.5.2. Case of n - 6

Most of the ideas of the above discussion are applicable to the

case of n=6. For example, "OPT," "OC/PTI/P".s (P1) and (P2), and

"CC/PtB/C".s (Cl) all have almost identical performances because they

all have the same bottleneck, namely, the Storage Level Controller

(SW).

Notice, on the other hand, that because the bus is now relatively

faster in omqparison to the other system oxnponents, and in particular

to the Store-Controller (SC), LBUS2 ceases to be the bottleneck in the

two "Store-O~ntroller" models. Instead, SC is now the bottleneck, and

the degradation in performance is quite evident. However, notice that

even though SC is the bottleneck for both (SI) and (S2), the

- - -..- -

56

throughputs are quite different. The reason for this is that an

"average" READ/WRITE request in (S2) (where there is 50 % data sharing)

requires more "services" from the "Store-Controller" than does an

"average" READ/WRITE request in (Si). (When data sharing exists

between the cache-modules, the SC has, for example, to invalidate

redundant copies when a data item is modified.)

The only remaining result that deserves some explanation, is the

deflection exhibi ted in curve (C2) of "OC/PTB/C" at % READs = 80. The

reason for this beha,-ior is that somewhere between % READs = 80 and

% READs = 85 the most heavily loaded cach(-module (i.e., the one

carrying 30 % of the load) replaces SrC as the system.s bottleneck.

(See Figure 11). Notice that this does not happen in "CC/PTB/P.s"

curve (P2) even though the same linear load distribution is used. The

reason for this is because, even though, the cache-modules in both

cases are subjected to the same load distribution, they are not

subjected to the same load. This, of course, is because the READ/WRITE

operations in the "OQ/PTMP" implementation use the cache-modules less

often.

V. 6. Conclusion

The above results clearly indicate that no one approach dominates

over all four teclmology scenarios. Technology, therefore, must remain

as an element of some uncertainty.

57

It is important to realize, though, that what is really important

in our technology forecasts is not the absolute values of the different

speeds, but rather the relative values for the different hardware

components. And the most important such value is the relative speed of

the bur, vis-a-vis the processing components that use it. Our results

clearly demonstrate that the faster the bus is relative to everything

else, the more appealling the "CC/PTB" approach beaomes. More

specifically, when the bus is four times as fast as the cache or faster

(i.e., n ¢= 4), the "OC/PTB" approach provides a 20 to 60 % performance

advantage over the "Store-Controller" approach.

But, what perhaps is the most interesting finding, is the fact

that for three out of the four technology scenarios (with r€ (= 4) the

performance of our "(C/PTB" scheme is very close to that of "CPT."

Notice that in the above statements no attempt was made to single

out any of the two different implementation schemes of the "CC/PTB

approach. Qie of the interesting findings in the simulation results is

that the performances of both schemes are very close indeed. Our own

intuition was that the "(C/PIWP" ivplementation would display a

performance advaitage. It was clear, that by utilizing "fast"

processors that would overlap address translation with arbitrating for

the bus, the utilizations of the bus and the cache-kodules would

decrease. Although the utilizations were indeed lower, this did not

materialize into the higher performance we anticipated. The reason:the execution of the INFOPLEX storage hierarchy operations and storage

mechanisms was such that the storage level controller (SW), whose

utilization is independent of the "CC/PTB" scheme used, would, in most

- - -1

58

cases, be the first ooqponent to saturate. The only exception to this

is when, under the linear load distribution case, the most heavily

utilized cache-module developed into the system.s bottleneck. In such

a case the higher performance potential of the "CC/PTB/P" scheme was

indeed realized.

! -I

b

59

VI. CCNCLWICN

This paper is a report on an ongoing research effort at the M.I.T.

Sloan School of Management to study the multicache-consistency problem

in highly parallel milti-processor computer systems.

There are three basic approaches proposed in the literature to

handle the multicache-consistency problem: The "Broadcasting"

approach, the "Store-Qontroller" approach, and the "Multics" approach.

However, serious drawbacks c,.. be identified in each. A new approach

called the "CC/PTB" was, therefore, developed. It attempts to minimize

performance degradation by minimizing the overhead of maintaining

cache-consistency.

The "OC/PTB" approach was implemented in the INFLEX sborage

hierarchy end evaluated using simulation modeling. The results are

very favorable.

I

60

This work opens up many areas for further investigation. No

mention was made in this paper, for example, of the replaoement

algorithms at the cache level. It would be interesting to find out the

value of implementing a "sophisticated" algorithm such as the "Least

Recently Used" (LRU) algorithm (or versions of it) as opposed to a

naive algorithm e.g., random replacement that requires much less

hardware overhead.

We have assumed, as is common in the literature, that all WRITE

operations for a particular data item are preceded by READ operations.

Relaxing this constraint, and developing efficient algorithms to

exploit both this relaxation and the architecture of the storage

hierarchy is definately worth investigating.

And finally, the "CX/PB" architecture should be exploited in the

development of algorithms that would improve the reliability of the

data storage hierarchy. The automatic data repair algoritTms, for

example, are particularly interesting and prcmising.

61

BIBLIOGRAPHY

1. Censier, Lucien M. and Feautrier, Paul. "A NLw Solution toColherence Problems in Multicache Systems," IEEE Trans. onComputers, Vol. C-27, No. 12, (Dec., 1.8,1112-1118

2. Ferrari, D. Computer Systems Performance Evaluation. 41glewxodCliffs, New Jersey: Prentice-lall, Inc., 1798.

3. Goodrich II, E. R. "A Distributed Operating System for a CentralShared Bus Multiprocessor System." Unpublished MasterThesis, M.I.T., 1960.

4. Greenberg, B. An internal Honeywell memorandum, 1978.

5. Kaplan, K. R. and Winder, R. 0. "Cache-based Computer Systems,"Computer, (March, 173), 31-36.

6. Lam, C. Y. "Simulation Studies of the D[H-II Data StorageHierarchy System." M.I.T. Sloan School Internal Report No.M010-7M06-04, 197%.

7. Lam, C. Y. "Data Storage Hierarchy Systems for Data BaseComputers." kiub.lfished Ph.D. Thesis, M.I.T., 1979b.

8. Madnick, S. E. "INFCLEX - Hierarchical Decxposition of a LargeInformation Management System Using a MicroprocessorComplex," AFIPS Proc., Vol. 44, 1973, 581-587.

9. Madnick, S. E. "The INFOPLF( Database Computer: Concepts andDirections," Proc. IEEE Computer Conference, Feb. 26, 1979,168-176.

10. Matick, R. Comuputer Storage Systems & Technol.g. New York: JohnWiley & Sons, 1c.77.

11. Tang, C. K. "Cache System Design in the Tightly CoupledMultiprocessor System," AFIPS Proc., Vol. 45, IV76, 746-753.

12. Toong, H. D.; StzcmTen, S. 0.; and Goodrich I, E. R. "AGeneral multi-Micropcocessor Interconnection Mchani sm forNbn-Numeric Processing," Proc. of the Fifth Workshop onComputer Architecture for Non-Numeric Processing, 1980,115-123.

62

APPENDIX (I) The "OPT" Program

61

UJII'-

00

Uw

v 0e

0 n.O t ULIJto

>0

a. a. (ý . .J.

It I -JJ01 u is s U,

- or0 uI < . 9 " i

l- .C --.---C K Ii

v ZLL IL) J U. -

0 ý~I~-JIt .. I U (At t 3In a3 0 -Z'J 0' 0- *n *4It ~ LA. **L* In * O *4

a.U~f WV>W WW CL a C) U000~00O a A - L 0 . lý

m- *LU0 1000z0 xR A -. M 0 wCCC s 1 Iai ar a .4 aL x I08 0

C) 0 0 M W 1, -C) M* * . * * A** aa *O6 **U-@O*811;wulw14

CL ~ ~ ~ ~ 4_ _ -1 X- r - -C-. )WUJ- 4I' )V

64A;

00fe 00..

cc0

otc t I.- .

0 W L>- 00t .;U z -

LAQ~J II l U.1 in

0 - CY(T I-jD 0w 0 1 - wJ Iit-ý 0 1--.

0 j VZ A0 -UM 70 -.z V)c 0l z aw a aS0

0. zs w w: ý

Ou01 0 , l t DD u-. I.4- -j 0)-~

2 . 0 z 04 A. L

0- 0. X 000J~t W (J(

U -I it Il n n -PC 4-U1s "11 4.I3 I

0.00.0~~ uulw.- .

! -t 4. .. I. -C0 000 4- k . 9

It -A Iti

IL0: I. a .bw .www zzz

-j tonI z V1 Vsi.J.J..M.J..jV.IX u(cJx.

BP I,

4 L

w

•C%

I.-

00000

In In 0S00000 0 ' 00

-A* * -In #An I'll

C) 040M in '.

I - it %- 0 0

414In in4 -04A1

.- A #A In

z .. .'" 1 .. :. .. .. , - . , . .. , ,

0 .C1C if -0 0 CA

U-u _0 0 -0

00

InA m AI)-

: 4 . 1. 0 10 100

3o to w.. -tot A cc1 A(A V) -.-- (At n 4 A ; 4 ,

CL innni ~ vIn 'S) -ji '

3 4 0 00 0 I S xS -'

I3 4y a- - 0 0 0 cc0 LL it4* 0-'MI . 4 .- **.* it ** **44

it IVU J 4y or W V W M WD El *3 0 C. 0 111 jt

ft 3 LL *j IA14 In IA t4 V)L As 0 i 4.1 Is 44

: t1 w * 4, s-

66

0I

am

w

z

a.

IO

0

2-Si ' - .- 1 9 4 . x0an" IV Aal4 A4 0ilI(.t A-z&UU a ý &.L &I. IV na oAWL

;ozzz z z z .U IA A .. =;2:zzz z0- i Aa A(

4/ ; ;/ I M ?a41 14 : )= ): 3: = ): ) * * I 1D : ):

- *t #A 440 4- 1- 1- #A w- v) 401440 #A 0- U) 4/4 4- "4 w an4

:1 4 I..I& . w wI&wI wwL WI : w %L4.L J.I4...4 4J44&4/4,4A LIU.U4L La. I.I

a n o" Il44/i i '4~n IiII I init"I mIIi/U inZ* Or 41In In I 1) n n4/l4n .0** -f -11*. 411RfU4j -de -9

al II LaCt 4 C4 a 4 4,

- ~ ~ ~ ~ ~ c 'A .4 g 44 -ac 61 4r tat44444SJ4W 4.44W 4 1.- -9 OrWLJ W.j 03 0- 4o .5m ~ olo ~ x , in 444 Si- 44 .4 Ic x co o.. Ip Or & at. J

4 ~ ~ ~ ~ ~ ~ ,QL IL X 441 9 g144x4fl444x4fitfl4111xmx44441x 3194111:t444I dc fr~ni4fm

67

II

Lu

th1AI

0r

ui-

NNN NN

-,Jý 9:0

01I 0ý

VI-h#Ai

04 (N N (N (N (d CV

10 in it 1 )C I

to ( t.) 0 0l v 00t- Z1) D- 43 n~ D D0zzzzzzz 90. ua+u

md(I3 ~ V 4k 0 R& *0 S v 4k

j * * u" 0 4 * 0 *L

W z z z I- I" z ~03 .9 It & w o4 IA.SO U

>) > > >* cI. 4. 0 I0 j 0i 4

CL wo o d - I 0 0n .9 a 4 0 l 4 - 0L >*nc

44 44) I-3 02x1 LL I- I- W 041 .

(N(N(NrA(N(NLN 1 0

W Fo-40 ~ S I0 3 4110

68

0

fA

0

zo w

> -4 -9U

0

•~ n,

I- Iou

m W

I-a 44 i

o 4I U w. in U.

x ITU

III'- ZU.

I w-J 2- I-Z U

wo o mn on. If LJ. . .1! In

I I-. 11mft 3 C) I9 -A I.- . • ll::l

1~~~~. 0• C)• "."1 1 'a 1i) I•1 I. cc, +I itIO -

ir A w w V " + 0 I.

4 u

am in "41 •I (Al U' ,U IllIlI • Z Z • • I . I UJ -• L J t * i

ow a ac a <

----------------------------------------------- -... .. .. ....-. .. - ......I

69

0

w U

U.

D- z c

w w

0 C%

C, 0 Cw a W

OL U.0- 0. I- Uz ;t1 w Ie A4 in

0. &- 0 CLa

t

I- * 4I.- C;

U - I- IS U Lx.W

0t wAM101 -4Wr uw W 4.1 At w t o 49 (~ 4 ;W

IA. 'm I

CA * r'i4 4ý

CDC

lz if 4A44 4

uj ~ ~ 39t

xr * - :Z LU:9 9I N ou to 4f OLOU I.

a LOO If 4 44

4 0 * *cc 0 I Cc in

4 CA

IX owl

It )I. w U

70

ftt0+

LL

IiLos

otA

_ i

L. . us 0 C. UPdc mC It 00nc 0 ( I.- La ftc.)

It Li u -- ,

us )- It,- b,, 0> u -a- ft 00• 0 A .

uO LO. s i " 0 f 4-u ; ft 0 a:4 I W. m at

ft-Rf 0i at -4 0.0. 0 C9v~ c 0 3 1.- 4011 m Ep

• -. u 2 =-= 0....,20

"j ~> 2 .4 4 > z LL

z -4 Z) I).

t AtoI- in It6 sZ 6-Cw0k. inIx 4 s- CL c xi 4c I 4u m~ GUM L

3-

4 3

, 0 011 - 0.3cc It z cc f'A #A0 0 4l 41

46 f* * x at 41 41 4*I 41 4-C :* 6

. I; it .) Ir Us m t4 3 6i

41 w - r : :9 dx i.- x _m 0 a 0 8 - "'0 r) u -- n I.. "14A: I. -uLt II

0 It 0 0 o0 .d 1,u IF c II

cy 17 0r 04/ Mi0 t A 6 6 I c 3z0ki z0uU.. 0 .. . Us o w a

I.U U s - L) ý - s 49I i 4 .- u-

71

0

ccz

0 U 0 r

IA f 0 W CA . 0 -I- !. Il>l Z >

ft. I. z @.-f

w Oftw u4AO , 0a v. w 0 - ef C

CL 0In cc 5- I- CL IT

ft~~ftIO~ 0 (

33 mI- - 5-~IXUJ co Vl I-

tht

vi m 4 *

to -x w - 0X gozI

ct 0 l 0 to 0 a C4 0 4. I f 4 40 I I 0 44 44. 0 0

IIA 0' 0 at 0 or W 0 w 0 ('f 0 0 IACVt 0I I~Al fA 0

> u -C U u -9 w 0 - - 0 *U0 !,- tth

Li 4d 0j us Is. of 4 4 1.-IC 0

0 0 *l04 5

U. * ft 2 n 4 0l IL i 0 U t 0

72

4t

U. U

U.wc-Cc

In o KW

o w w1.

L.e LL ; Wi.-> Z4 LL V(a. a I

_- P4 L .0 A

o: w w l i idw U.1 Li -.- 0 0 Mi Z.

I- 00 i. i.3cn 0n 0 I-M

o1 #A 0 0 0iM'K xw~ # 0 0 a. 0 46 I 3 -

In~U 71 Li.--. 111 4 In >In. 4( w O~r 0 0< M

Isr Xr U- C3 M- 0 u -(0 6 -1-40W 2U It 4 Ifl < . x xn

cc wu 0 0 ar InLiaO.

Lu III- In In 4wI- W tiCl

re uir 0111 a 0 0 -0 (I In 9 : r 0 & 0 1 Ac r Uz in (III -i -4 n -P : Z : Wf I.-70 u f 0 re - c pI0, . 0i .t I C :C . - j u . It n >. 34

ILV 4A II 01 It 0 iCL 0I/iu C . . . 3. 0 a 0 -U~

II A . .. 0 j #A l a IaA M ;A L 3 0 u #AVo

73

ww

a.a

cr. ui

o no o

I wj Lta ( f C

4ý 'a -C Iz04 - t

o L ALA ";;.1L x .0 co 0 W ( t

IA L 004 -

u 31. 1A - 1. x3Mi U) cc w U -w i

c w z -AV c .

CLw 0 tL 11 IA IzxC

fL A IA1

U -K -) 9 U6

L) 4 a:l IA A a ct i

w 0L I.- 0 -

or a w 2,) a o;00%A a y 0 f- 0 I4

Z- *L 4r > -9 Sr I- VZ ul x z O r

k- 7.L u It > uO C 1/I 0 w 0 1. 0 - u -A u I I U W A

fl4 04 4 .4 i

rl w

-9 in1

I.;-- -- w* 4-- r

74

IIccw

0. -

ana

w 0

I- * . 4 444

co com I -

to a I m

a il w a4 t W "cI a I

4ý X

a Z Ia irS -2~ ~~ ~~ rIa aa' a ,

0 a' a' alcl a' 0 D a a' a9 1or~~V tv IV & Zvcr

a u t .9 tit n I aini a > a

$A 4 -C cl IA- 4 31 b a

ul aw %na Ow 1w ot 4w w a of in in w a, alw 1LL -1 'A nW

75

o 4

0I.-

0

00u Sct-T44

w w w

71 0>

-A Ii C

0 In n

4.1 0

S.- .aCL4 O

aii w W, IL IL- u LL Z M. Z r---- z- -- in -

76

-. 04-

z-

-J4i

II!U' (4 >

n S..4n

W Of fn 0f> IA -9 -9 _

ai u u 0 0

-2rxr K cr

0- 0 , Ii t 4 I, II 0 03.9 Jr - IT -0 I O ) ' i

d, 44 . . 0r .- A a a.r0, - 44 u u 0 441 Qf I l

4 ckQ at w ckc 0v l -4 4 it ul, 6 -4 4KaA - rI 4 1 r

E) 0 .4

ki Wi A PA PA.IA

%L VI InI 4 4 444I

77

IL

II

600

#A n

Ix zIn C

CL L

cc 0IA LLI IAwu U I

Iz ICl 0 0 0 t iU f h0 ain

CL .9 0 - 4gf Ia In .A1

w -j U

* ..

78

0l

0

0s C4 I

> C , C4

oo 1>

o vcIl I II

(L) I l ID if

00 1

oo 0 i n PC

(AI "1 1AI I In 04 4 I I

In I

InW In I

': N N aV4 C4 C3 (. (

I n xn I nI l

I Y 0 n CLM"z -It :: IXI

4 I *-~ I - * *m IzM"

4.9 I* * u0 cc

In In IE A

L3 I

LU

:i

U, u U U Q 4A In U UU UUUU

in dc ( 4j 4A 4n 4t WW4 4 4 L

0 U

a I W z w

*z (L

IA U IJ

h w 2 LL LU

In W 4A Ii In I

U) 9w ku #A u an 1A =1 U 4An~ I ~ I 46 0.

79

0

-.4>

#A In .

01 a

0 0l0A f00 o

* 0

I a CK j cc 994

IA IA IAA-C-

z0 -j 11 -J 0 c

oj W 0 -C &I : 0g 0i * 9 40 09 0 0 A

04 4O M *f *9 4A 2

AL" 0 *C *f

08

InI

00c

1I w

Fn P It

I. z

o w mi1 w o

'o A U 1I nU49 Ij w

K z I V

81

II[I

I

APPENDIX (II): The "STCR" Program

.9 1[ _ _ _

82

I- t I

In A

> ) t oI 0 0

Z It N N11ol at toco U I

17 00

In tnC 0

W ~ I u ..IJW

4 N, c. "I t

ot WI O - - U,

)~ .1 .11.J.J l um ~ ~ ~ I ;;I CLU tb U U I % Zb > 0-l

tI- of ti u W I* I -it - IN l'- l') IA- 4 0 0 0

V) cr .9 it u. IIIIL oC A1--

It.J. 3 J

**YIL0Im ;b3 2' .0 *N ell NK U x 0 0 a St W40-Ill L C3 0 el IMPU 0 R Ill -4uw0 0 0CIc A1-- 1 41 2 I

I".- n .oa ntU Ill a ~ *.1 -.1i if * - : N

ofb U.1 I t 0 u 2. if w is 49 -4 ~ ~ JS-L

0 w 54. N 1- 1.-'a = a w )* -. 1- m N * 5- 4

M M4 t'l W VW Z LU 7 2 a 4-nfl4 #A U >2.0 0 00 0M - 11,U&/ 000 x/S a. a- ,*~J0nhV~I

1- - --- 5-.- a a IL I- --. in-in- -U

83

CIO0o

JI-

>. Z WthIU1-- 0

cc :If u z 9x z j

o 0 0 **O-0c0An-9 WDu Cz( 0 t- _ .j W _j

0a a. zL -K>- u- ii h, O u~ cc 1-1- 44 0 4 - j a) o 0 3: 44 so P- LU - U'2 0 C a o u 0 5 U- U4 or~o~ aIf wc

oj M 7I C.)0 001 0. 0j 0L 10 IW - 39 wC) U -C 0W - w QI W Q t~- x f 1x 0 W01 -u IA

M U t L M. I. Ru UW Z> -04 -- t.-W- LL4 ~ ~ t z~- 3) 4-LJ 21 Il IZ0 0r 44 I0 I

-. 0: z R- W U.1 00 22 u ~ ~ 0 ~ 1 L j inJU 4, xaa 0 o UI

I. CL LL W 9 -9a.w zox -Z W~ 0 a CI or'I.- 'J I.- z x W4W) (3 LL w 0 I-I-Ix0W I'- Wcc 0 0 1-01-1--1 - CI0 ID

cc 0 2 LL WLL 00 -01. I I.-L ~ - z 1A 0 x a o . Air :) T.-u I- I.IaUJ fz 1,- -w 4- 0 4a- *11te-2 -A 1-0U)-I + U3 0 SII

z O- ii man~- >l-IN4wv 0 -AW3O OWMa.0 XII-w 00V-

LL L -0W0N 0a-a A3 3 3 x .M z 031119110 00uu

U.1 C)J4 M N w UaI" L) 0- 0 11 11t 0r 91m0ua vr w It 0 .crt -- 0s .

U) - U)V) V (Jtr ) -J L) L 11 Z t Q0 A l U Z UJ V1UJ.

- --.- -U -ý- a-a--- -- IL I.- x~ x x A * aI- I-, In I---II .- I M z 444 7 wJ I& w rn-U U

a -- - - - - - - - - - - -. 38 is 39 CL* * 9% .6 0 4

0 (D - -0

84

C-1

00

9900000 0

4 00

o in W) -0 o W u

~ 00000

o tow 0 At .

U 4~ 000000

OL PA 0 0 0 0 Iflcc

000 LA0 IA v.414I.4*4)

In00000 5 -) : Irc 0 00

000 miw I . 4 0 -C-

.t 4 LA a 4 -9 . 0. N l :1 M t**A a lo 4n UA !A .-

I- U. 4L

0 g. 0 4 0 40 ft .. .. S - 4 J 4 0

44 lo A I&IWx 0 IA * I3 14 ' 0 u 4 4 * 0 4 4

UIku 4 4 t 4 I())V t S 0 t 4 t1

85

w t

1. 'N IC) 0 00 4mVI

C4In C)i o o4

00 44c q0N

'N N L)m

40 40Z Lr z0Ozz zz

0 OR A 41 4 IAW-o 0r z "I . - - - L L- ->-, = oza I

IA atp41zzz z zz z- zztzzz41 'nN lp U .t.4 It .U L( otL.L .wU 4L t 0LL 4 U wI

W * -. : 0oc0000.>- v 0c i -r 4 U4S U )-------------------w w s u

0l WiW -4l -K4 f 1- 4. 4-d4- 4 9.

o 0 90 ex *IL - -C-C . l-; I-. ý ntI

"1 444 * t* 4 -9 4444 4. 4 .4 a4 * -4- 9*r ui 4 * g0 4 '

0 0 *. 01 * 4 * * ll

L.) s o' . LL . $. t * 0 4

4- ~ ~ -C 6 in 0 0 inI xit W iMA14d O e 010 0a- -a # -

86

I

0,I

I-'

6-'A

- t It 10 t,- .1 ' . 11 *It ft, II I [" . f 'l•• [)I nJ L. -It ,- I I -J .I IJ . J 1 , .11 . • u . ,.ts

'5A

-11 D )";3 :3 L3 D' :) Z) =1i0'): 1

al- 91 i Ill M' In Il it il it) M [ailIlW a l i Di IiilIl I n W r,4 " 'A 1 to Wi %ol' 'to in In I^ top ,1t in w, vi 4.- wp 'At " n40._ A4 4#

4 ; e -` )D

z ZZZZZ zfz zzzzzzzI-1 L I i L W0LII l 11 I .. 11.U 6W 11 A & L LU LL L&

uI 1

- - r F vn~ m i -- ltI--0 - --`t -"--I""F

of ' OC, Ellfc '44 < l C 4r 4 4 -9 4c' I 1 -C on -it~' 9 -C -C 010- g ~ 10410 6- a. Cc .6 4 o .1 Ix 4o ca In w3 CD 41 CK WIF 1-M DW Com ww LP iIit- f1A , MAaa~aA u A LtFal lfsl0Ia~iifil#V~il

le9v i 9 1 9c c c a cc a Cc 41q~~~1 Ul i0~ta tiiiA liti,'iI AA lJ~~l~~l~IUU~'

87

'A

aUahIr

4A.

ce*

aj

wC

a0I.- zl L W W

X nw * W u - )q

cw UW zW 0 . c- j - j-a Au w* m i UZ4o a. I . w o w

0 N*. : Z. w 9- I- w- a-w 4Q I. - a -- " . , 7W O W o kt

0W 02

U)14A.J -9 -9 co uA -K 0UIIA4Ida

j Wt 4 ( 4mu0W 41 0

88

10

I...

XJ U.a. u -

th W. 0utzz U.

I- 00 U.0

I: I 0 w. w* WW-a0

c U. re m ao9 aL XJ- X coz1

U 31 -U~ v U;i 9

W 0.h w a 3 w $ 0 -In .9uC U Ac I-o . - 0.

cc4 t

I VIA

A IA

3- %0 K -4 0 .u NIU 4 u C axGC*

4L w- I- cc >.44 I* .

4L 4 A t- zI u~' 4, Il ' .. * u I. L u

w -a-a Z Z-a 0W a a > -C a z I LU CC IA a* a

w - 4A .. J ar M- (A I I *c j 0W 0 -a a' in z -K~ Il 4 a at zI 0 Ca a -I W0 .4Z l_ l- * U WUA~ * M . L0 a 0

WM I ILA - - '04a a -0 ~ q -.. ~ . . aU~f I -. 4IW 4/0a * I 4I O t * IA *Z1

1-~~ w 1* 4 ~ 0.4 1I . . W -4a. a - U ~ - a * U

51 O~i a

89

00a

c? w V) )

CU -

> u

.4 W > >

a! 0 0-- #'

(A CID co atIA 64A 0A Ift wa mi

-i kx C& 4 u NS S t

0 x

m In Nl

CL )- -I- .Ia 6- U, : C- IaLU -U4 c UZ

qw 6 n0. 6*c)' 0CN. Ut %ft r; 0 UJ VW I-- W

- , 2lr C, it~ (4 UZL4 ci.

0 a'o CUOhahlJ ~UC3tfl(~O0I-C ~ "o .rz4mC4n~'Jcr :X Ir L - IUL

90

43

0.

0 U.UrU

w U .

%L to

-.. I IL

Ixu

44~~ WO 1.- .- U -

cc of uWE' U Or 0.

cc1 a IT cle Ell uI a I- 4 II. IV C) in

#A -

uI II I I I II I III II III U

I .(LI . -. I II in- it6- In/ill) L

1N )I9II :j W W th C) .) N. WIn IA

C4f14.1 4 U(V ' 0. cf U4 x C 4

it) V) U. dIAII

(N C4 0) 7

C4 a) to*

* IA "IA In IA N. -U AU

C4 a C4 NN 4 N LU * 44 U

W 4 4 ) '1 (1 a I/ 4 .. - 4 CC IX -CC-U-4 U -U

- -4 .. M U in -0. m t:

>A * 4 ix CIE4I.

LU LUJ w lA z IkL 0 It Ll LL 4 - * U w LU t4 m- ZL LLL U.Z z.

I or UO ZO) cf Z UO O. z Ur 0 LLUD !-cC! 4r -4 Or U 4 w cf U Ur IIT u V c n

U U ) 4 -.U C U 49 U * 4 .'. U. I,_WZ u :(O u)

z 4 * (N a ge L_ 14 I, a Nt -w1-.

LU j 4 * 7 31 1. 4j a 3W w gx vTN U W 7 ) U

II.~CI ea0e n. * 4 4Kw Lu7 IL77 "I z2~~Z L

(A1

0

wwC,.0..

wa

0 0

-t U

4 . Isl a

to M 4u

D4 3Nn inW l

0~ 00az r dn Ix0 ~o t in 0 112

m ;3 mý -C; 4if)- II x, - I-Uco MD 04 u t t mu 1-Cc - M-. -0 - U x ; M oV!MO -ti mC4O x4 A

,-- 1 M. CY ) - *t .I u dCx X-

ft3 v! Q Zf

4j wU w' W c'4I 41 ** 4j 4." 44 4 4A 4DO-* o ( 0 '-ID zN- UN "'UI- 0 - 0 - IN 4r 44 Or znJ * nc' cJr C r z - ('w w N-1.>C - treIcN2 ft Z 2' " Et mI Wa o w ;' tt 4

Z - 5 > c c o&OU -Ca

- 9C 1 0uew )i )un a i 1 w e 4 ) IN. * 4l zr # rft o. -. 4 x Wc- .4 o. il 4 A 9 all-W W W lii 04 4 -3W W 4 1

in U. OA D0 .. 7 IA U. in LtX u IA

a/L IS III C4

Ll~

92

0

us

Qocww

aUUI-

aU

00

w w

I-

at ~w- IA 1

U 0 r QUOUs -K Z > 0

o,ý w I. ) ns ot

- u. 1J -

Z- XA W-; 8 aa

0 1 0 I --

a.: II. ui u w w

It In to 1- (A to co to

wO- Ill -0 0 0 -us -111

V; -s-a to (4 toIA

L! 8. ! a

x X 1-- .'LU-U •"W - 'A L" of* m~ mn - rt

t~~i in . ,0 - • >••t n•. , .i '

w.... V! .. '

4c.IIt Is W R III W Us x w4- I- i ir t n r4 m r)i

m r; 1;r -• m 141 Itr•{J m illr~)U -'JlU ~ "l.oWW.- 0 - - U -t - l us t Yb1

0 41 ii Un O-- in ;AI-3, Z) O a 1

> II Is~ IA*Iloe 41 wr of onI~r f

It .U 54 1- w4 i IbV . w UsnI, # Us IL Ix W >4 Z 4 U. l Z .6 US fl . 0

UL in Is :3 Or .Ow - or 00 CI 4 00 0 0' or IT.-wa. z lvusI. oor- -4 Us 1- - U- U 4 W0 ý -C 0 U .4O'U-U U -9 In 1- UI -.1 U U -

0811 UI a 11 a C t 0 a 4 W:, Z Or0 ; f 04.Ir 0U5a .4 m0,Z W U &I wu 8L-d.9 C

i-nLIL WOi

a T w .p I-. Lt> L w gZ-( z U.1 DO w zw~~~..aa.

in in 1A 6Ln

1A 40f5 50

93

0

i'

z

U'J IL

z 2 4o mui us

Ix IL

> 0l -W 4IICoU.Wm cc 0- ce 0 19 m

c) - -Z 0 t0 U -C..- a0 450-IaA zz- wU z LW.E 4 Z q-t LL X c c04

55 I to 0 I-I-I-. UiW it U' I- U'

-C 0 V CL V) U. 4 0 c

0- in0

IL c d U I 0 vc U. -Kfn (M** 0** k0* *0

94

al

Iu

CA

0'd-

0 Ix

xi

0 ~3.

44 0 l-3

vU n

m w Nm 0 4 toWrt 0 -. 31

2IInA Ix w a- 0 .UCLu. 0 x 09 2cc )Itx

- 4A C4 411 JA 3 4i x C-

qA IT 0n q 09 W o6I- In IA r): );D I AO nui

X it at c4 W cl4 Inm% l- c n 0P(A in (A V n ( & (A- II nU . C ji n 1 I 1 0 Oc

-J lk Il k&) .x3Lj L W uri Li 7f. &

IV v- . . W 'U- 4f t* 3 . " I .1 T"ty P . j 4.

I- -4 011 04 1 4 .4 - in *P.4. 1 ilioi lJIaI %I kit -(A 9fllf ;IfO *lj. X 6M 410 f-4 it In 04U A t j 9- A aI

0- It -- il &.->-0 t1 0- O.?4 1 1 6 I 4"-Int.g AJ 0- 0- 4 So A L14 q

o~t in **

ZIA V. w 4 ~ a5.j~~W o-I IW wo

WW 4. (5)ft-ftJ.'46"

0

I-wU).

LU .U

= u l ,OL -,. w I

I. 0 L I 0

w 00 ot-. w I

o -A W W4us

of Lw-~ 2 I.-

A-U 0lUJ a3 MU LA*Lh - U) In In

1-W)W~ 0 3 ~lI

A-x IAI- mU -

th. I, I

-cl IC

-

* ~~ ~ i In * n~n It n . n f n n *

n. * I 4 - 4 LI -In In 5- - - 0 .#A

C IAco 0.0 I 4 . * 0 * 0 0 IA . £ 0 I

ICC 4 l' U .- £.. cc 4 0

1--- ! 4-C4A m 4 0 1. 0 IAInc

p.r a . 20 i s . . U 4 "- 0A

wr i p

:: Vf 21. -4 IA IV .4.4 4

4 1.-

4)W Zi I . In. A 0 U.0 lf l'I I vA ) I. C

U.~ ~~~~~ I" b 111.')* 0 *4 IfJ~) b A 0 * A*1U

LA 0 In 4 aLwin cl 4 1i4A1

.0 4 o z ZW W 4 2cci g 4 0 4 .. V - 4 C 4l U4 a .l L

96

In

ar

Inan

0

44i I I> to x A

In I

Ito I^a

o st _j 4 I go f

ft ft 0 w N wt f

t~j In 'Im In1CV - 4 A 4 A -

a x w a x

to I 0 U Iii UJ a I" rv k u I- C 1 c 0 o i r 0 , v 0

00

w 44 L L 0 44

4Z U, 1 3 4A L fIOa- - >- LL 6. 4z I c c

i n4 I-II I. IA1

((-. > e I f44: v4 U MA& R'

CI- An &L. LA / 4A =1 0

IL 044 04 4 I f t IF

97

0

thi

C, II

> 4z cc

in l I >

-K O

o co 0 0 c I 0

IA IL

I IC

I~~ 0 0

0-n~ : 0 04A 1 1 1 ~I-

43 0 0 00 0 0

aq 0

or z A Z I IA 1.0 0

98

IWO

UL

CL-0

'- 0ftc0 i0- m o

- LJ0a Cel4o c IMa t ,

o Co "x J-Ito -KC 0 0jt41I '

> I -41

.Utw u U- I-W C u L

U'j u CAazu L. lJz#a M ui z 0 -C Wý- OL"-(3z 4A0 MtW U C--OW 1Z I

CWcr. .- W W w :; m > a c ua

01 uw 4 0 u u _j44 .- .. -j> a

IC 0.ujr U LP A - M IA -K - I U Z IA 0. J L 4 C f1-#AI- WUJ$AA 9 - I 2 _j1. 9 4 J. j 4 _ 1_ Q

V) L - c I

Ut,w LLIm

#A > mI, IL 0 INi 2II . - 0 l C

99

0

x

at>z

- Ic

oo aoo4t

Cc u

4 4c n 04

in it L t c9=

0 i S to 0 -.w -u @. ,U 1

- 9 w us 0. I .- w .9 w4- u n zzu I U Z. ", 9 x U U

0 0 1- 4rr rl-w.O hAtn1 0 .. VOW-- 41704m -0- . 4 I" A lI W N- 4 -" i 2.4 "I a

1- A 1- 0 0 111> .

wl 0. - 0 wC -jO 0. l

2~ IM 2 Ij1 w 4A! l

#Aa 2'ob *W

7 100

fA

k"

It

4 C

fto

oIn 0

4 1( 3. 1 * 1 11

ftw~~ ~ ~ u- 6 CVaV.M

:. (Ia n+ 9-" :- ! " I #

IAI *i w~ w U.

In n L - z n9'u * z '

#A f tA I..- In u eg u ) IS 40- Ij 99 'A u ~ W P

$A I-. *j L. .9 4 z A - j IA -9 ) a w CI 4 Ii 10I

in I- .).LIJIV A~. Ul 1 -b * .101 * j' f C ~~f. ~ O ~ t .. in 614 *ft E a

- $- w 4 f

IL 41 41 0 4 U

101

.41

At

00c

w IA I

ou to to

It 1 3

co 03 N Pj 1 10-2 x Ai Nl PA IAp

Uj N wh 4 Q 1 ? aI

at A3 - t I I

w 24 l U U; a " e 0 c M4 ,

te v% 0 A1 1 3 C C Aq ýI

40 1

I~~~~ xP U ~Pa3 @3 0 a 03 0 a S 3 a a 0 t u 1 A wP

24 -1 24 a ;4 1- LU1 al I 4 *aJ C uI aU w lux E t I A U P I %- 00 a 5 A Q'

o. A I4 04 IL W ft 0ý 4 w 4 2 4 P

I 0 k- 0- MA In U34 C ma I 0 ('4 t' 4 0 - - 0-

n in V% n IA n 0 0 a'Inq9 w

66 2 4D f P.

102

wAIn aof4.1t

LAUlU

ca

'- 0

I' -

N4 a0A: ýU

ift

to : a * ca

0 0 KlU U 0 Kln l o njU

LI Q 4 a IuI Iu Iii a l 4 ,U lI 1 1 911 I .U lf-3 9 q au U. 0-UK - P- -

am~~~~~r UtK a 0 ~ .K CA *0 . f

I-l ma 45l V il to Ut 4it ;.. 0) ft UN a22 0

K ~103*

0

o

I dc

- 0 U

-I - Io*t

0 0xx Z1 Lc 0 I

In I. m cL I W IW aKW1 -

Z > IX ft I1 4cý x 0w131 u q US-uuA I 1ý cwz -W mI ww. oz -dg U1O q W I

I.- j 1'3 0U IIn ~ I inn -

-I * I. Z w us- I-K I W4A - 0- 4h 0 z WU 0.--

vv in Z)a. 4A u A ".2 0 1 4- ~ 4A

104

i

APPFNNDTX (III): The "CC/PTB/C" Program

;-I

I ,,

105

0

I.-

;r I i

in I

InI n c2

0 0miu C)(IO

0 -j (l 0)

tIn Li)il i

it iniiO-O>. 1 LJ L.cc u uj u 0

CO ~ i 11 i

qr ~ ~ uj iU 11 - Y 40 11,1

11 u U 'A I .. toIn- .4

InW .90inI Do; inr.) l'- l'- * * l Wif >

lz Wt 44 4 An L. ,0

C) jI C CD 1 0 E L l ID~ 6 ) Ix 4U - In cc ui 00 00 o 0 4) el

W II eLx DIn.-'-LZ 0 0.-I Cc MA -. i0 4i -j CL .J 49 49 .11 4

u W LL 0OUj- 00 44 -j O g ( a I- 1.. -4 0 M40 ~ ~ ~ ~ ~ ~ ~ ~ ~C U) ago 00 000 046

-4W~*.t I- w)4z 4q to !If CA IA 4 U I.1-1. -P

0j 0 aUU .4 CL I.-- -0 - I * ) W W 4L 11 _j a4 i A

43 4 In00~0O 4 4 0 n

wWW~~ : :

106

(SS

.49

w 0.

an - 4'COI

0 cm a I.

la t o a -9 wa: .4 90 -. uI.

z0 a > 2L4 1- L -0 Wa.- c Li - V Cc W DOx z

U C U 1 0 us~ a ~ -M 31

CL Op 4 0L X X2- ý tIJ Z- 2 00 -W 4 O Wa tI.-

U OM 0 0 > M. -J i WI-`-Ius - . 0.1- wmý

0.I 0i 1- W W -J 0 tA 0. 1-

2 . C ENOUW >C 4WUI

o i -. 111 .7 1-1..-OW 1!-U!o C01 fto 0s --- rUl Wat aWUI..-!

0.~~~~ 040000t)ZW4 2

0 m m 0.000

'I 0b.- ~ UOOI Intt1 Irr.-I

0 I( 000

cA c I.

0 1- ".4.

-AJ-6- w-an 'C oo0 uc' .. lc4K --aaa w I

at Ut . * -III - -- -I- lt~* I I .- IA .- :U ID a co*

'C4* L'4 * >~1- 0 1- 00 LLtL U

> 0. IA 4

21 01-C * * ac *Z

*m a XI W 4 0A~~ 4A z n#1-tt~t t-t-a * .t~J4. 4 a 4.4 * 4 6 W IC *1-

1- - - - - -1 1 1 1 1 1--- 6 4 4 0 4 4 U t 4 A4 Jl4 B Z *

107

13I

a C

U) 03 a

00

cc ~00000 0I

z .C4 m inl

cc00n00 0

in 4-, 0 th i

U~~flV) in

ih 3 W0000011 00

m 3c I 0 .0 Ct0# C w6

,- 00 wo 0.0 4n nu 000~ 0n U n L

in3 m in en 00

*. ~ ~ a 0 0 * -400 0-4)Wf 0*0 No a0~ . -*00

Is a 0 .. 0 0- *-0 1- 0

9 0 L2 4w k 0 0 0

a, i 0 0 0 im

u) M ý E z C; S N~ ) S N w * *03

0 @A IA $A f 0 W L0

4 - 3 0 004

on 4- 0 #A 0) 0

4t 380

:

00i 000 000 0

U' a 0k0 4 WO 00 ) 0 )

108

00

a.

00

0 .00

2 0

ofo

-0

oo 00in A-

mL ac 3n c wvwr inuanr

C4e4II ' '5-"5- C rIA-

ortAtA A hI0 4#i)toi'6 1I f A4'4 l 0 8 J tftn~ S -Im ut#A 4. i I %

v :17 Z,1; :1 Zo Z, C, ' 0 0 to Ion A . 0

IV o f 0 4 #A %a O U)h4A 40 0i nt0 41I.A&AAOw

of @A 41 D w 4 41 f i 1, Id I wwwmwmwwLwwm w &LILilmLmL L o. i L U o.

ý1 Ni 41. 0 ~ 0 : wwwww w WWWWWWWW~WWW

9 1 a 4_ i j"_j_ i j_ j j-1 j_ -0- j j- 1- j jjW

109

Fcc

0

z

z

-01

-J-4f A2 I

L&I L LU0 zz

9-.hT N0C oot""0( A% %4A0 V )00a4

:)nnZ")f-

IA. A& U. L6 00 6L.U L LI

J1.W u UlLLIW. 0. - .1 A . U. A . -. -.. -J -4 U. U. U

w ~ IUU K.4flV 4E4 *44

a 4lac c 4 1 4 4

41 '1 u 4i 4

o >- D .4

; 4 t*44 m 4y 4l C4 p

lip 9 4' w2 4c 4 4

-J~ol 4 WW 4WW.WWWWW . ~ U

C~~~~~~~0U~~ ox ,da sm ba .4 U

a o * I* 4 4

110

w wIL UmuQ+u- ,co oSSZ0 4zI #

0.ww

-4 j Hj

I l W z * U UzI 4 9- 4oI-l fw1 31 J- a 3- 44 - ý 3 1 m .a0aA g

4A0 e -4 4 11a0 týWI JU

040I-

09 10>W t UW

ILI

Ic C41 04v

-4 M

a U

0 R 4 V4

us a "13. c

-j w hi

at 4* MAW§ 1CC4 e ' 1

cc wo

Qn #A* It. ft,

46 4

I-i

0

oo

4c 3

LA UA

S.-j: Z-M a

z 0 tA W0CV

0AQC In C, -- -a m 1 U

al N W; mm InI Ir i C00

w b. w ItofatI0 - t- w w. 0i L i 9 1

o N .. x &w. 'I u Pot cr 3# a0N 40I W 4. 1 O--19 0

in Z C i.,i #A S~ 0 u c u5-~ -. u Q -S~~ * a aS N m - uyg ( 091-1 a -9 44 U- -j

* ~ ~ O s IA 1- -4 a. I.. - - I . .- I

a a 4-W 3 a cc Ns. 2 .3 u zz- z &I

113

44aat

Ua

Cc4

Z I UA

cs Cci

tos

Vo -" 4 a cPUa ~ A --

kn -a c a u, . o

fA it

4 A. W aU 0 0La to 0

114

I-

IA f 9#A a1

a ' fto0 I

CL-tIE v-C sIa!-JJC 0"4

4u

awWA- iw .d4z

-4 L, " n- i jt

VAooa mo 4 4444

IL h

-i

ot I-%

.140

I-I

tcQ % a I- k

'a w 4A Z

0 x v a 0 x 1IA 0

3 m C; C; Ala

tn 6U n0 C, A Q 0 . : a

.K .1 OE MI K

u1 .4 IA It( I

O4 02-51 IAt* -4 I

u 4A (I c r -4 a .94C r i n I $-Ia~~~t a 4-. w0 01 - - 2 * * -

'WA V414 0 IS 2 w

hi U400

116

w A

IxI

a afa ;aI : W

aa-

'A IL0 0- OWOCIa

IA 'cma la* P

go, 0C CP C)

41 IL t- o- u UIA .1Z

'A 0 initd -W -ILP

WPC I u "

117

Iz

ICI 4NMa.

0

'-4 0

oc I

o I

-J It I

2CIA

OW I C

44 I

2~# 0 0

-z 0 I 4

0L 0 0

z 0 0

a. 0 0 0

~ it

I C'4 0 h -

I 1 0 a ,0

z w o cc'i cw o4-ca I I 0 - #A 4 iUA~ cc w 4

118

ICIO

q>

•4

i-

m I-," I-

wl- , . - - € 0x 0 0l toi 0

So C

ft - x

- -. IA I- IA • .

4A #A x 0U 39I

-

- 4 . C -A• u A >

ol m

-, . .. . •----

1- * 3 -4 -

4 1 X z-.--

z -C..I I .. I A L 2

* ~ I ZJ W 4 A0

* 0 4 #A 9L

JA - 4 0 9 UK- -L . a w' at flu 0~ a A- - * ~ * . -

U IA A 0 -

uU C4 ol 4 l

4f I- 41

119

IL

th4a

cc

- IA

442

- -z

to 0 x34 0 U) u

-tLi 0 0L w

# -

0z LU L

0 0

a. of IX & -Z-uy-9 de 4 I A -9 a. IA lLCeU - c a. ( ctU K a 0 1- 1. 0 0 Ia 2 ao oI- 1 0 2l 4

-a a * a - > zu * I.- - - I - * - . a0 : " ~ * a # - 0' 1.0 a -" -6- z km * w - a .4b *- cc 4 -aA W 0W a 4 I0 *

120

10U434c

I-of

#A

0 IA

Cz >

0 aD a a

C4 IA I Ia3 ao a I

z In I a c 000 a - ! xa

IA a 9

o - 6 IA 0 6

Ir I IA cr a a

z I a a0000W0 a a 0 0

.9 4 a a w - a -C aC a 1 wI L

a zin a 2 4A = A n A a

aLall"-I

'ILAcc0t

a.

uI>I-e

03c

01

03g

I-K

,w 4

- , 9

U) 4 0 0 0 g 0-0j* aai 0 0 W

v w -C a wU

CA z c wL A 44 dl CAA

u .19

*A 0 It K * kIii U)Sa

122

In1iti

w

S.- I I u

4 I I

0- 1. Io.0 tl a,' S

Uw- In U 0 I *

4 I

4 I0

I-) 'P

APPENDIX (IV): The "CC/PTB/P" Program

- -l w- -- -

12~4

I 40.

at cc x x j-.1 Iw j-> A c n. . C

2 -

41 cc -4j n

l- 0. .. 0 ; f4 0 '

wr W w-i-Jam

0A- I in I.

A 11 I t , -m-I .ini f .%

*v .01 to t t) a16 ~~i ., in;-0

Ix oil' N 'N I0 I0,10m

:- W : IAI ut 1 l06 w.- - ? oU b.- 11n 4,t ..1wU *w

4 ~ * - 4X U n K jt, * * * * t0* ro a 4 *4in* * 4*0401. 1A. V% OSC .. W IIUJ a U WE .4 It I

u.tfN--4 CX at n IO In 4 4b D* 0 0 a.114CX 1- 1. .L 4, IL I. 4 42 or0 4

OU) com 00 U L) 99 in 0 ll 0 VI 411-in~41 0 0I1 .~ I N aA 0LS N11 0 S 0 * in 0n 0 41 41 N 1 1 1

cl0

MA

44

2..

0 -1 I

z-do U.1 2 I. - i

MJI 0 z5D 0 WI I- U. D0

4C U 4 - ),- W I2 ) ..j m 0 .z

- UI Z M toal -C WO 0l'- OL ha. - I.- -U

ta cl) h. Mi 0 0 1 1 at IT0r 0 X %L.M LL - A* I, nl4

1. m ;: w w 3 In

U c. c -. M .-s.'~X hslh haS II l-t WU a.J 0 U i,- 5- OWO' g -t 0.5-aU be

U -a. IS > s-A -J.0- - .

U. c 1 r U Wt Ofl-JO .9-.-7

'L0. > SUI . w . w3 -103I3u ot

0Z2w 0 N000

O 000o~ o- --

Nl 0 0 0200000 0 th 000

40.S0 W-U 000

7. .. 9S. * ; ~ w M ~ * 4 4 4 4 4 4 4 S-0 4 4 . 4ox -4-d W-0 )(W x x 4i IL a 0 CA 11 -2 a, *4 "1 4

.1w -wW iWw L s us 6- In. CL ~CIL K x zz

o oi A#At o% #45 4 A44#A t t o 44A a- It 4 O4 efll w #: 4 4 tox Cb KP cx) K( It .- SP-M h 4 4 06 IL CL. 0 21 Its

* L %AU * 4 4I/S P ~ ~~ USC 4 4 44

W -9a. 4 i6 4x -W4. 4 : 4 .iwha w 4

4 of .4 14 455 4J 0 Vp - 1

IL zz z zz z z 5-.--I 000 M4 J 0160.. 4 * 02

U - - - - -- - - - -

ca 41- 3-.1-S .1441.

126

£0

C)V

w #U4 00 0

IAI.1 AN, IN

0 00 0

#A VLA inA 0 6 1

ccW h hIAt %An 0 A2 N'sN,

---- IC

-4-4-(4-r # I-. C) ff[

'n vI W; 0 11111 40 "A w Ir V;

IA U U .-L V 00 v0Ic 1 A0'AtU A C AO

(4 .z C4 6%

4 4 - a C- 44 44 0 0m e -e*0e o * Cc U 4 4 0 0 0 l

I~(I-. a 0- n. o. 49 4 1K ,- -9..-., 0 f A* 0 .0 ~.4- W 0 0'! J! * IiIUYS In 4 (4 0 * IU* * Uc cc4

IL N ~ m t C4 a NI~I~ N l I.. I.- I.- 4- 6A. * uski . IL #AU %AT $A 6 W In a Uo *l Vte IC /~n *4 I/4 19 w '-9 A 4 4 *

IA tI (4 (4 (4 * 4 .. 4 ~ 4 4 ~ * *x a N 1093 l . ~ * * (Cal 0~ C C . I .

I. qe$- t

- ~ -- , --. In

44C

ft 00Go-

040C

S00

0 co

0 00

C4C lx fm m moboý4 f t4

-06 . .1t n 4 01,0A00111110A04V)t k" .U L ,L , wýwLwwwT0 .w&

It D

0 - 4- 00 0 -444 9---

t3 : z zzz1Uz z LJzz zztAUz Vbz bU:)-. I.I U.4 9 * 99 9 W9 * 9 OA 9.w9 .w wU L.L L LWU

00 419 1

4 j i . 4 .. . 9 .. 1 4 j 994 A j j-jý 4-1-

* :e. 4 * 4. 14o 1.0 9mrmm roftwf'm wrirom wowoo 0. 0 414 .M 0,100 0 . 41401-t .OC0* a * dc * -. 9 " -J .J-... -- .J. . J..~ ...... " 0

141 -o . 4 9 k4 4 !Ir cc : tr el -ha9 ycra rI ~ w r i~-. 4 - I. 9 . . 41 a A :) .;-'0 ý 1

> ~~~ ~~ 1A 9 04 41 9#Au 9 4 W WW W W W W W W WWh W W W

lAW~ 9b 4 f 0 I t 'X 00 m0atn W0w 0 0 b wu 4 0 I* 44 41 go cc 9 a 4k a a a Alamonama4 *

fttl ~ ~ 1 14 w at cc at Is cc a *rr ftr ffflz rtI m ~ WiKf Vl

128

IA

N 1 4(4N NN- ~C4C4C

0 cvv;;-0---r-Av

4.I4- 4 k 0, 0 % f A ýf o ' L U

o n # A 4 t I * %

MDnn

w tmmU sC w r nt VC0 Il001_ -l- 1jTwt # nt Aitf ot

.4 49 .44444.44a.U 2 $A 0*0

N:~ .i ., 0 * J6 L 0 4 49 #

'na .4 0 a,0 , I-4, 'd.1 * 0C .4 J. ~J .J1 .4.4 .. 1 4 44 " 0 A - A-

Q6 * ý.>>,*w mu .0 : : ae;b .II 1- A mm1 > ap 1 , A- I-4 woowm I*j 4 4444I D0 t * ol 0 0 m444a4on I'444K44 0A An -K -C ZaU w

V V44 V Ili * 44 4 4 4444 N0 .4 NN4~S NA 2 lu 0, 11 bU* a.

9 ot ag * C s m

Mori,5

x 4

34 It l

I. s 03

Ci I. fI Cw C 0 It 1 cW i 9q L - ý1ý .ý 0M 6

Z 2A S c1 l -. Actý p>ý . h#

0 LLI c 0Cea

u a - *: I a I a ;4

- .41. 4 1 41

U' an 4 5

130

ImIw0

IIb a a

w zL

anw W r a

m CW4-I

U) 0 0 fl A2 l

44 -C ý4 a 41c0 L 2 11 1 11 - t%j ' I a is #A If ff 0. In . r i isI . if ft is * .1

l1 It a F *@ar0 . - Z * wf 0 * 9 "S(4 a IL 0 v@(~ -

IL 1 IL 9 442U' (" * *

9 * a ** *a-au OA a-1 1&a.; a-M 0 u a IN. a aa

It 41 SU01 V ~' S S WV ODA * a ~ tU 4 *i. a a * 4 tI ~~ . 2U

a I aa~t.4J a a a wO~J a a w0 -77

0

w r

00c

X fA

39 inwtft ZZ

o _j-- - o 1

, - ? 0 ftcc 4 44 c0xwC

- t * 4. 'i-t .) -" O 4

Im C4U in * - ajow - -at0 I

w ~ ilxz - ). itl- I.- * m * Ww m .0O- 2 Z0 LL w LZuZ uzt ZZ" L

2 *- 2m *Z 4

wV) qt A a0-~- MA 4 4 A 4 4 M W M ~ 4

4'. ZL* In 4 Zpo I. zu z- Afttu,* ~ a aw 0 Z4~-4f~

IL0 I. a- a - l4 4) cc 0z l4 ul '

* LLM

132

03

U.1

I.-VIIn

in4f2 n> *:

>0o>c

H4JII u

mLAcc r ) z 0a4mV 0t

LA 0. - a)Wi

IIam a n in a

. rr 1r

Iu I I II s; W 0 W ~ mIn (I a I Ill a

IC Pf C It fit C1a ( -- r * 4 .1 .rI jMc lPCi jo IX -n IT0 .

-IT 2_ i- 2 o-I ccLAW6 Hj w 9 l- I4 wA j -c w LA

IL ma 0 n Z .i IL ma iLL n * maL.c of an ta t5 an aa. ,- Zn I 0 1 l a 0 0s. -3 an0

CLofN4 rr.-.-r a wI ar N -. ~~ -a- LW Ntr V It 6 4 4 1ot v..- V Inil Pj v I C3 in S >n-~ Il Ji PAI. t Cj IZ S I-i uK 0 KLh(I ~ ~ ~ ~ ~ ~ .r.igif 11. l a a *M.4t11 A a h-A * 1.- I) m I- SL a4--4t

a a S SJ

C)0 -MIA 2U"LALDOCWat4WAS-L-WOccLA

41 0 r0

F- T--- -_ _

133

0

w

-0

lz

I0a

z0w

w IGoc

oo in

4 PC

1.4 W 0.-Jm a 0

W#A

0 #A In 40 " Z

44 -

M- 0 39

+ co

.1 + tol 0 4in

14 T z I0

2 * w ~ z IU * -44

W4 I1' w 0 a. 0 Lh Z A00 0 4 u 0 1-n 2

C4 w wt 4 4 4 - 4 ow-

I/u 4 4 m As

z~ Wo0

4 0

134

0.

a>

wwco th-

m i

to nh

co i

- i-:

4 A

#A4

Sa 0 LinW in IF in 410

VI 0 a CC)

0; IV ti, it

k~ t 0 ý - -: n 41 3 ru 7

VIVI~C 0 L2 31V 0 I-II

4 on~CVA 0A 4I44 a~ VI0 444 0V 0 4-I% in* * .3 3 4 4.

LoI..-

In 4

> co

4 x

- IID

o~~4 0 x t f

oD 0o1

#A IA - CC W

x 34 *s I

kn -Z -9 1 AI IIA z 3c I- x I L

10 Cc ,

IAn~ ~ I i I l M

4 ~ ~~~ -5, 0A 0 r~U In Iau- A I

0A u uU -j IA -A IA U U a

-- ~~~~~~ -; 4 -- & I i I 0 - (4 . .

u, .4 CL 4 S.

302 0 1

116

4A

o

4

z

tht~

al .

to

in u 0 1 oI. x -

,o ad • : - a: , .*

S . t •1 t* l

a v U - 0

r a a

I Z I -1 |

'It

,~~6 t -•- ,, .

it ~ ~ ~ ~ f t nI-c l 0 P r o

on~~~ a VUi A o

a 0 U.11i

V i l a I

41 in ~-JL4LJ0 I

LII,

LLI UI41I

toi

Sx(a . wI.

4Aa#A0#

K- 0auC

o, 44NI . #

2

ao 0 L0 % L M0 U4R

UU

a2I 0 00 1-

o b. 'C IA4 C -a 00 - Ga ar a ait.

I!,au mO a a, a u -aL a6- Lu f c 9 - a a a r o c A . 0w a

.4 ia an ao3- c s 0 "a aA 0 aW a - a

660 0 0 0 0 5

I A

0

C10

18

a

iit2( P A0 - 0o0

0

x •

i 4 k 4l w

I'D I'

II

!0 4A 0 (A 0C4 A u

Iw

x 0 i U X

wW to w i

0 [I .0 4 w

>Uw I - -r Il

it IV I - 'T 0 0 4 .. i ( V l r It ) A t It 46 . 12 j (.6 wL -a u- en III - . 4 .

%A 0 &A IA- 0 4 S'A Cl W C3, 3 0 'A IA Il I f 4 C

usI I A w - IA '-

'* u W IA in In in mi*~~& to to 0 -- *

IL 0

139

0.

0.

U, C

0

x j In 0

oo 0 a I to

x in th It

2I z I -j CI I

I I I aý LaCI- o I w x afa a 4

I/A I I

co cc I tC4 c C a x (Ia 0 m - a m Ic

"1 0 X l IA xA #A 4A 0 Wa Cc

1 IA ft 0 I

> IA>

LU - IZ 12 0 0 I Zw~~~ 1 01000u

aj wA 4 .4 1 r w W

u I'

02Ui Ui n 6

Lo N

140

a.

w

I-

I- ,

I,-

w2

U,

U.!

In

.4

U

ww

-J€

•'" ' ", • : . _ - ; • -I . - _, •.. .. . . . . • . 2 . . . 2Z i . . . -. " i . . .. . .. , , . .. , •.• • .•. :.. .. ..•: • .• I. .A,,..-" :

Documents

fDEC II - apps.dtic.mil · store-through algorithm is used, main memory is simultaneously updated. However, cache2 continues to have the urnodi fled version MV. If, before this inconsistency