Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
IIfDEC 3111981
"!" Center for Information Systems ResearchMassachusetts Institute of Technology
:, Alfred P. Sloan School o' Management=l "•50 Memorial Drivei-" Cambridge, Massachusetts, 021 39
',, i' \617 253-1000
008
SECURITY CLASSIFICATION OFr THIS PAGE (Shon 0010 Ente,*j__________________
BEFORE COMPLETINGO FORM
A Study of the Multicache-Consiste;-icy Problemin Multi-Processor Computer Systems a. PEnFORMN~O *oR. REPORT NUMBER
___ ___ ___ ___ ___ __ ___ ___ ___ ___ ___ __ MOIO-8109-06 -11,A - /:
7. AUTH4OR(s) 6. CONTRACT O11 GRANT NUM11910(a)
Tarek K. Abdel-Hamid "'N00039- 78-G-0 160Stuart E. Madnick_______________
S. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PRIGRAM ELEMENT. PROJECT. TASKSloan Scolo aaeetAR A a WORK UNIT NUMBERS
Massachusetts Institute of TechnologyCambridge, Massachusetts 02139 ___
It. CONTROLI.ING OFFICE NAME ANDOADDRESS IS. REPORT DATE
* September 198113. NUMSER of PAGES
____________________________________ 14014. MONITORING AGE04CY NAME A AOORESS(AI different from Controlling Office) It. SECURITY CLASS. (of 0..,. reperfl)
Unclassified
IS.. CECI. DAUSsiFicATION/OOWNGRAOINGSCM I ULE[I IS. DISTRIBUTION STATEMENT (of this Report)
Approved for public release; distribution unlimnited
17. DIST1411UTION STATEMENT (of the abstract entered In Block 20. if different from Rhpowl)
III. SUPPLEMENTARY NOTES
19. KEY WORDS (Continue on rev.ere side It necessary ant' Identif by I block nsimbet)
Bus, Cache, Multicache-consistency, Multiprocessors, Perfoirmance Evaluation,Simulation
'- 20. ABSTRACT (Continue on revcere side If necessary and Identify' by block number)
j;- This paper is a report on an ongoing research project at the M.I.T.Sloan School of Management to study the multicache-consistency problem int multi-processor computer systems.
The nature of the consistency problem in multicache memory systems isbriefly discussed, together with an explanation of the three conmmonapproaches proposed in the literature to handle it. A new solution to theproblem, called the tCornmon-Cache / Pended Transaction Bus" (CC/PTB)- -
continued on reverse sidO,* . ~DD 1473 EDITION OF I NOV 65 IS OUSOL~tET 7- .'/
SN0102-014-4601S9CURI*Fy CLASSIFICATIOM OF THIS5 PAGE (WShen Del. Ent
1 rA.t.-IJ'41Y CLASIAPICATU)N Of T01 PAGE(Wh" 1)&14 "191
\' 20. Abstract"----approach, is developed and discussed. The "CC/PTB" approach attempts
to minimize performance degradation by eliminating the overhead ofmaintaining cache-consistency. Its two distinctive features are:Firstly, the conventional private caiche per procpssor organization isreplaced by one where a pool of cache-modules it commonly shared byall processors. Secondly, the Pended Transaction Bus (PTB) is usedas the interconnection protocol that connects the processors and thecache-modules.
The performance of the CC/PTB approach is evaluated using ahighly detailed simulation model with favorable results.
I/
Jurt1.-t I • •
r -- ode
'Dis
; ' :2 $1SECURITY CtLASSIFICATI*OM OF TMIS rAGE[MORheN DataI U14140)
S. I
ABSTPUA=
This paper is a report on an ongoing research projwet at the
M.I.T. Sloan School of Management to study the multicache-consistency
problem in multi-processor computer systems.
The nature of the consistency problem in multicache memory
systems is briefly discussed, together with an explanation of the
three common approaches proposed in the literature to handle it. A
new solution to the problem, called the Qxrumon-Cache / Pended
Transaction Bus" (OC/PTB) approach, is developed andI discussed. The
"(X/PTB" approach attempts to minimize performance degradation by
eliminating the overhead of maintaining cache-consisitency. Its two
distinctive features are: Firstly, the conventional private cache per
processor organization is replaced by one where a pool of
cache-modules is comonly shared by all processors. Secondly, the
Pended'Transactkonj Bus (PTM) is used as the interconnection protocol
that connects the processors and the cache-modules.
The performance of the CC/PTB approach is evaluated using a
highly detailed simulation model with favorable results.
~\
ThBLE OF COTENTS
I. THE MULTICACHE-CcNSISTENCY PROBLEM: AN INTRODUCTION .... 1
II. INFOPLEX: A MULTI-PROCESSOR C14JPUTER SYSTEM ........... 7
II I. THREE COMMON APPRONCHES FOR SOLVING THECACHE-CONSISTENCY PRCELE4 ............................. 1lIII.1 The "Broadcasting" Approach .................... 13111.2 The "Store-Controller" Approach ................. 091111.3 The "Multics" Approach ....................... 19III.4 Conclusion ...................................... 23
IV. THE "CCMMON-CACHE/PENDED TRANSACTION BUS" APPROACH .... 25IV.1 Introduction ....................IV.2 The Fended Transaction Bus (PTS) ............... 29IV.3 An Application: The INFOPLEX Storage Hierarchy .. 31
V. EVALUATING THE PERFORMANCE OF THE"CMt4ON-CACHE/FENDED TRANSACTION BUS" APPROACH ........ 38
V.1 Introduction ...................... *...... , .... 38V.2 The Evaluation Tbol ............................. 39V.3 The Simulation Experiment ...................... 42V.4 The Hardware Parameters ......................... 44V.5 Analysis of the Simulation Results ......... ..... 46V6 Conclusion ....
VI. CctCLUSION ......................
B IBLIOGRAPHY.................,........•.6
APPENDIX (I) :The wOPT" Program...............2APPENDIX (II) •The "STVR* Prograr ......................... 81APPENDIX (IIlI) The "CC/PTB/C" Program .................... 14APPENDIX (IV) : The "CC/PTB/P" Program ................... 123
- - T i
II. THE MLLTICA1HE-QONSISTEU"Y PR(OL4: AN INTRODUrICN
( A "consistency problem" refers, in general, to a situation
where two or more entries representing the same "fact" in a data
base differ (i.e., are inconsistent). This, of course, can only
occur when redundancy exists. In this paper we will be concerned
with the "consistency problem" that arises in cache-based memory
systems.
A cache memory system (Kaplan and Winder, IS73) represents a
type of memory hierarchy that ttempts to bridge the CPU-main memory
speed gap by the use of a small, high speed random access memory
whose cost per bit is higher than that of main memory, but whose
total cost is relatively small because of the small size.
Conceptually, this configuration has analogies with paging systems
(Matick, 1577). The implementations, however, are far apart because
of speed oonsicerations. In contrast to a pý.tging system, a cache is
manageo by hardware algoritimns, provides a smaller ratio of memory
access times (e.g., 10:1 rather than 3000:1), and deals with smaller
2
tblocks of data (64 bytes for example rather than 409).
In a cache system, all data are referenced by their main memory
address. At any given time, a certain subset of the contents of
main meory is contained in the cache level. If a processor then
requests a data item in this subset, the request is serviced at the
cache level.
A cache-based system works "well" for two basic reasons:
First, executing programs tend to re-use instructions and data; and
second, programs tend to use instructions and data near recently
usea instructions and data. The first property means that once
information is fetched from main memory to cache, subsequent
accesses to it are at cache speed. The. seornd property mans that
if a request to main memory is satisfied by bringing into a cache a
block of information larger than is immediately needed, the
additional information is likely to be needed soon, and its presence
in the cache will save one or more references to main memory.iIn this paper we will refer to the block of information that
constitutes the minimum amount of data which may be transmitted
between the cache and main memory and which is also the allocation
unit in the cache as the "cache line." All bytes of a cache line
are, therefore, simultaneously all present or all absent from the
cachee. A directory is usually used to record the main memory
aadresses of all lines in the cache.
It is easy to demonstrate how inconsistency can develop in
3
cache-basea systems oue to the existence of redundancy. Considerfirst the simple case of a single-cache organization (Figure l(a)).
The CPU can only acmss words that are in the cache. If a word
that is needed for processing is not already in the cache, it will
first have to be transferred from main memory to the cache. oe
the word is in the cache it beco ms accessable by the CU and
processing can take place. If the CPU then updates (i.e., mo••i ies)
the word, inconsistency between the copy in the cache and the copy
in main memory could develop. This depends on the store algorit-u
used.
If a store through algorithn (in which the cache and main
inemory are upxatea simultaneously) is used, inconsistency will not
arise. The price paid for that is a decrease in processing speed as
store operations beome limited by the speed of main memory. When
this price is too high, store-behind or store-replacement algorithms
may be used. In both cases main memory is not updated immediately,
and as a result inconsistency arises. The modified word in the
cacle will, for "some" interval of time, be different fram its
unmodified version in main memory.
The more interesting case, however, is that of multicache
systems. Consider the two-cachie organization of Figire l(b). What
is important to emphasize here, is that the store-through algorithm
is no longer sufficient to avoid inconsistency. Assume, for
example, a word whose main memory address is (A), and whose current
value is (V) is present in both cAchel and cache2. CPUl then
------ - ----
4
CPU
ICache
Main Memory
(a) Single-Cache Organization
CPU 1 CPU)
i. -
LCache Cache2
Main Memory
(b) Two-Cache Organization
Figure (1)
5
modifies the value of the word in its cachel to (V.), and assuming a
store-through algorithm is used, main memory is simultaneously
updated. However, cache2 continues to have the urnodi fled version
MV. If, before this inconsistency is resolved by either updatings,
replacing, or invalidating cache2.s copy, CPU2 attempts to access
the word (A), it will get what is now an invalid value (V) fram its
cache2.
It is time now to adopt what we be'ieve is a more useful
definition of what a "consistency problem" is. We claim that
inconsistency per se is not necessarily a problem. Reconsider the
case of a single-cache organization. We have already explained how
inconsistency can arise for "same" interval of time when the store
through algorithm is not used. During that interval of tine the
cache will contain the modified version of the word, while main
memory will not. If, during this interval, the CPU needs to
re-access the word for processing, what will happen? It will check
the cache, find the word in it, and, therefore, access it i.e., no
transfer from main memory will be needed. Thus, although
inconsistency exists, no problem arises because the CPU will always
access the updated version of the word from the cache.
With this in mind, we now adopt the following definition of a
"consistency problem" (Censier and Feautrier, 1978):
"A consistency problem exists in a cache-based system if the
value acessed by a CPU is not the value given by the latest
store operation (by any CPU) to .he same address."
S....JI
6
=t is obvious, from the above discussion, that there will be no
consistency problem for single-cache organizations. These,
unfortunately, are not very attractive for high performance systuse.
In the next part of this paper we pcesent a brief discussion of
INFOEtX, a highly parallel multi-processor computer system that
utilizes a multicache organization. In such an organizaticn the
consistency problem is a significant one. The IN L.M( organization
will constitute the context within which we shall evaLuate the
different apzoaches for handling the multicache-onsM.stency problem.
7
II. INFOPLEX: A MULTI-PRCESSOR CcMPumr SYSTE4
II
A research project is currently underway at the MIT Sloan School
of Management aimed at investigating the architecture of a new data
base computer, called INFOPLEX, which is particularly suitable for
large-scale information management (Madnick, 1579). The specific
objectives of the project include providing substantial performance
improvements over conventional architectures, supporting very large
complex data bases, and providing extremely high reliability.
To p;ovide a high performance, highly reliable, and large capacity
storage system, INFOPLEX makes use of an autcmatically managed memory
hierarchy. It is this aspect of the project that will be of relevance
in our present discussion.
As a simplistic illustration, we show in Figure 2 three levels
(only) of the memory hierarchy. As can be seen, this is a multicache
organization. The proposed number of caches (m) is relatively large
compared to present day systems (e.g., m = 32). Flor the INFOPLEM
-------------------------
8
Cachel ..... .. Cachen Level (1)
SLC Local Bus (i)
emorydule.............d•ule Level (2
Level (3)bModule. .. .........Mdule
m .___ SLCm I Local Bus (2)
.Memory Memory Level (3) !
Module .. . .Module
Local Bus (3)
SLC : Storage Level ControllerMRP • Memory Request Processor
Figure (2)
9
objectives such a large number of caches is essential. It will, for
example, help attain the high performance impcovements sought (up to a
1000 fold increase in throughput over conventional architectures). In
addition, it allows for th•e implementation of such features as dynamic
reconfiguration and automatic recovery which are aimed at improving the
reliability of the system.
Two important "boxes" in the design of Figure 2 are the Storage
Level Controller (SIC) and the Memory Request Processor (MP). The
function of the SiC is to couple the local bus of a storage level to
the global bus that connects all storage levels. In essence, the SWC
serves as a gateway between levels. For example, the SW of level (1)
accepts requests to the lower storage levels from the caches and
forwards them to the SWC of level (2). When the responses to these
recq.sts are ready, the level (1) SLC accepts them and sends them back
to the appropriate caches.
The Memory Request Processor (MRP) performs such functions as:
implementing the storage management algorithms (e.g., directing the
transfer of information across a storage level); handling all the
cowmunication protocols that are peculiar to thle particular storage
modules (devices) at a storage level; and mapping virtual addresses
into their real equivalents. (Note that an MRP is not needed at the
cache level.)
The INIFOPLEM organization described (briefly) above will
constitute the context within which we shall study the different
appoaches for handling the multicache-consistency problem. For
10
further information on INFFOPLEX the interested reader can consult the
following references: (Maamick, 1975; Madnick, 1979; Lain, 1979; Lam-
and Madnick, 197S; and Hsu, 1S80).
tI
| 11
III. THREE COMMON APPROACHES FOR SOLVING THE
CACHE-CONISIENCY PRCBLEM
In Part (i) we explained how a consistency problem could arise in
multicache memory systems. We saw, for example, that in the two-cache
organization of Figure l(b) a word (A) that existed in both cachel and
cache2 could be nodified to (V.) in cachel and in main memory but not
in cache2, and thus giving rise to an inconsistent state. This
example, although rather simple, is quite adequate to demonstrate the
motivations behind the basic strategies that have been used to handle
the consistency problem in multicache systems. There are two such
strategies. First, we could restrict the "encacheability" of data
items, such that only those data items that cannot cause
inconsistencies are allowed to move into the cache level. tr example,
words that can only be READ would be encacheable. On the other hand,
all data items that could potentially cause inconsistencies are
prohibited from moving into the cache level, and thus all accesses to
them ar3 done through main memory. Word (A) of the above example
would, therefore, fall in this category, and so it (under this
7A j
12
strategy) would have been prohibited from moving into either cachel or
cache2. Thus accesses to word (A) by both CPUl1 and CTU2 would have
been made to its single copy in main memory, and no inconsistency would
have resUlted. The price paid, however, is that accesses to word (A)
are now done at main memory speed and not at the faster cache speed.
The second basic strategy that has been employed doesn.t put any
such restrictions on moving data items into the cache level. The idea 9here is to "invalidate" a cache line when there is a risk that its
contents have been modified elsewhere in the system. When a cache line
is invalidated (by setting, for example, a flag in the cache directory)
it is considered not in the cache. Referring again to the above
example, when the value of word (A) is modified in cachel (and in main
memwry) to (V.), word (A) in cache2 is invalidated. Thus if CPU2
happens to request word (A) at a later time, it will have to access it
from main irumory since the invalidated version (V) in its cache is
oonsidereU not to exist. CPU2 will, therefore, access the valid value
In the remainder of this section we will present more specific
approaches to handle the consistency problem in cache-based systems.
In particular, three approaches that are proposed in the literature
will be discussed, namely, the "Broadcasting," the "Store-Qontroller,"
and the "Multics" approaches. The "Broadcasting" and
"Store-Oontroller" approaches are based on the second strategy
discussed above. They, though, implement it differently. The
"Multics" approach, on the other hand, is based on the first strategy.
LAU'&',. --- - ----- ~ -
13
III.l. The "Broadcasting" A&h2roach
The idea here (as mentioned in the second strategy above) is to
invalidate a cache line when its contents is modified in another cache
in the system. When a cache line is modified its address is
broadcasted throughout the system so that other caches sharing the line
would invalidate their now outdated version of it.
Every cacie is connected to an auxiliary data path over which all
other caches send the addresses of lines to be modified. Each cache
constantly monitors this path and executes a searching algorithm on all
addresses thus received. In case of a "hit," the affected line is
invalidated.
When a CPU needs to read (or write) a wcord that doesn.t exist in
its own cache, the word will be seized from main memory, To ensure
consistency main memory must always be kept "up-to-date." This (in
general) can be guaranteed only if a store-through algorithm (in which
the cache and main memory are updated simultaneously) is used. As was
argued before, such a restriction is not without its cost: A decrease
in processing speed as store operations become limited by the speed of
main memory.
Another major drawback of this approach is that the invalidation
data path must acimdate a very high traffic. The mean write rate for
most pcocessor architectures lies in the range between 10 and 30 per
cent (Censier and Feautrier, 1978), and thus if the number of
processors is higher than two, the productive traffic between a cache
14
MW its aL=ojcatW processor may be lower than the parasitic traffic
between the cache and all other caches. This explains why this
approach has been confined to systems with at most two caches (Censier
and feautrier, 1978).
111.2. The "Store-Controller" Approach: )
The basic idea here is the sam as in the "Broadcasting" approach
i.e. to invalidate a cache line when its contents have been modified
elsewhere in the system. It is in the implementation that the two
approaches differ. Here, a "Store-Controller" SC (see Figure 3(a)) is
used at the cache level to keep track of every line in every cache.
The store-controller "knows," not only which lines are in which cache,
but also which caches share any single line. When, therefore, a line
that is shared by two or more caches is updated (i.e., modified) in one
of them we do not now need to broadcast invalidation requests to all
caches. We, instead, use the information in the store-controller to
send invalidation requests to only those caches that are sharing the
updated line, if any. In other words, the motivation behind using the
store-controller is to filter out all unnecessary invalidation
requests.
We will present, in a flowcharted form, an example implementation
of this approach which is largely based on Tang.s proposals (Tang,
1976). In this implementation, If a processor wants to write and the
line is not found in its cache, then the line is always brought to the
cacie so that the processor can always write to cache. The storealgorithm used is the "store-replacement" algorithm. This means that
when a line is modified in a cache, main memory is n concutrently
updated. It is updated later when thM line has to be replaced in the
cache or when other caches need to have the line.
Figure 3(a) shows how the store-controller fits conceptually into
the cache organization. Figures 3(b) and 3(c) show possible layouts
for the directories of both the cache and the stor.•-controller. The
"status" column of the cache directory shown In Figure 3(b) %eeds some
explanation. The status of a line can be one of three things:
1. Private: For a line which has been modified (with respect to
main memory) or is going to be modified. A private line exists in
only one cache.
2. Non-Private: For a line that exists in one or more caches and
which has not been modified with respect to ,main memory.
3. Invalid- Fir a line that has been modified elsewhere and thus
becomes outdated. An invalid line is oonsidered "not in the
cache."
In Figures 4 and 5 the READ and WRITE operations are illustrated
respectively.
The boo major drawbacks of implementing this approach in a highly
parallel multi-processor computer system such as INFCPLMC where the
Snrnber of caches is relatively large are:
16
CpýPU2 CPU3
ache irecor kacheIDi rec to ~ Zache virec tor4
Store Controll r
Main Memory
(a)
Ma.n Memory Addres Cache Line Location Status
(b) Cache Directory
. --- MndI fI rdM,,in Mem. A^t i,'.'acho [,I ch,2 Flag
One bit per cache is Set if li eset as follows: is a pri-vate li e
1 if line in cac]e inva0 if not __
(c) Central Directory
Figure (3)
AREAD instruction~
to CPU
Line in Cache Check •Line not in Cache
-. "Cche Dirac,
READ itV READ it Another CacheCont aPr -Checks its^3 a Private DDirec.
LineLEND - .. .. .
Change Status ofline in CacheDirec. from Privat Ito Non-Private No
Unset "ModifiedFlag" in Central 1
Directory.•
Transfer Line toLMain Memory
Transfer Line from1Main Memory to
Cache
Figure (4 ). I updatc Directories
11EAD Li ne
ENDL, A&,,, _z 9 &.i4 -----------.------------ -----. _-
18
A WRITE instto CPU
Line in Cach Check Line not in Cache
NonFPrivatte -As a Private ecC One or more
-~nt , Dir"•4 Caches contain• "" Pivae?• ILine " -j.... 2-Lne aM a Non-
ChanSqc STranser Lin Invalidateio Caivthe DiRj. to tol Min Memo y Line in all
__2 Invalidate ...- pda"t °° :: e;S~et "Modi- LieinCc irectorie
fied Flag" -END I
UpdateUDirectories
Other Caches Contain
Cent. DI
Invalidate Transfer Line froLine in Main Memory to
Line other Caches Cacheb not
inotherCaches Update Cent- Update Cache Direc
ral Drec.UpaeCceDrral Direc. to indicate Priva eLine
Set "Modified Flain Centr. Dirc
[ITE totCaCache .• IWRITE to Cache
Figure 5
19
1. The size of the centrP1 directory beazmes too large, which
means increased time in pcocessing it and increased costs in
bui ldi ng it.
2. The store-controller could beazme a bottleneck in the system
as the traffic between itself and the caches beaomes very large.
111,3. The "4JLTICS" Approach
This approach, which is used in Honeywell.s Multics computer
system (Greenberg, 1978), has two important features. Firstly, the
Multics cache is a "store through" cache. This means that the cache
and main memory are updated simultaneously. The second feature is that
the system address space is divided into segments, each of which has
associated with it names and per-user access rights which govern the
ability of each potential user of the segment to read and/or write its
contents (i.e., words).
Every segment is known by the system to be either "writable" or
"non-writable." The rnn-writable class of segments, that is those to
which no users have write access, is an imrportant and statistically
significant one in MJLTICS. All procedures, including all parts of the
operating system, utilities, libraries, translators, and so forth, fall
into this category. These segmer.ts are "encacheable" by all
processors. This means that their contents (or words) are allowed to
"migrate" to any or all caches. Notice that when such words find their
way into a cache (or more than one cache) there is no possibility for a
20
consistency problem to arise (for such words) since no processor can
write or moxify them.
oFr the other class of segments, those which are writable, the
situation is slightly more oomplicated. For a segment falling in this
category, there are tnree possible states:
I. One or more processes (users) are accessing the segment but
none of them has a write access to it. The segment is encacheable
by all processors but no consistency problem will arise.
2. One or more processes are accessing the segment, with at least
one of them having write access. The segment beoames
non-encacheable i .e., its words cannot migrate to any of the
caches. The consistency problem will not arise here also since
there will only be one copy (i.e., the one in main memory).
3. Only one process that has write access is accessing the
segment. The segment is encacheable only to that process. And it
is only in this third state that the consistency problem could
possibly arise. Consider the following scenario:
Assume a certain segment S1 is addressable by only one
process PROCl which is currently running for the first timn
on processor CPUI. Assume also that PROCI has write access
to SI. Thus SI is encacheable by CPU1. As long as CP•l runs
PROCl, words of SI may be drawn into CPUI.s cache and be
modified by CFUl with no problem. During this period, no
other processor can address SI, for by assumption it is
aduressable only by PRCCI, which is uniquely associated with
21
CPU. during the interval in question. Thus, it is impossible
for other processors to draw words of S1 into their caches as
long as PROC1 is associated with CPUI. Until CPUl leaves
PROCI, there is thus no danger of words of S1 in CPUI.s cache
becoming outdated, as no other processor can address Si.
Similarly, there cannot be words of Si in any other jprocessor.s cache, for by assumption, CPUl was the first and
only processor to run PROCl. Thus, there is no danger that
modifications to words of Si1 made by CPU1 can invalidate
copies in other processors. caches, since such copies camnot
exist. Potential difficulty arises when CPU1 has left PROC(,
and scme other processor attempts to run PROCl. The first
time this happens, there is no problem. Since all words in
main memory are accurate, by virtue of the store-through
cache, another processor, say CPU2, :annot have inaccurate
data, ir main memory is accurate, and we have just shown how
CPU2.s cache may not contain inaccurate data. However, while
PROC1 runs on CPU2, CPU2 may modify words of Si in its own
cache and in main memory. Still there is no problem. Main
memory is accurate, as is CPU2.s cache. This can go on like
this as long as processors which have never ran PROCI (sinceIPROCI started running) run it. However, the first time some
processor which has already ran PROCI since then attempts to
run it again, the scheme appears to break down. Assum CpUl
attempts to run PROC1 for the second time. There may be
words of Sl in CJl1.s cache from the previous time CpIu. ranPROCL. Some of these words may have been rmrdi fied by PROCi
while it ran on CPU2. Thus, these words are accurate in main
22
memory anu in CPU2.s cache, but are inaccurate in CPUl.s
cache, for CPU2 had no way of knowing or acting upon the fact
that they were in CPUI.s cache.
The MULTICS solution to "its" oonsistency problem is simple:
clear the cache of a processor upon entering a process if it was not
the last processor to run that process. This is performed by the
MULTICS process dispatcher, with a special processor instruction that
accxmplishes this task. This ensures that no words of any per-process
writable segment will be found in a processor.s cache if there was any
possibility that any of those segments may have been modified by other
processors. The operating system maintains in the control block
describing each process the identity of the last processor to have run
this process thus this check is easy to make when a processor is
dispatched into a process.
Fran an INFOPLEX-type-system view point, there are three major
drawbacks to the MULTICS approach, all of which are performance
related:
1. The class of segments that are non-encacheable can be of
significant size, and thus dampening the performance gains sought
by the cache organization. Note that in MULTICS there is the
significant class of segments which we termed "non-writable" aO.i
which are encacheable. In data base computers, like INFOPLEX,
this category which contains things like utilities, libraries, and
translators will probably be much smaller, and thus decreasing the
portion of encacheable segments. (There could, however, be
- - - -
23
special applications where this cdoesn.t apply e.g., READ-only data
base applications.) In addition, the LMULTICS approach doesn.t
discriminate between two processes who although both have write
access to a segment, one actually exercises the "right" and
modifies the segment, while the other doesn.t. In both cases the
segment will become non-encacheable if there are other processes
that are also reading it. This means that in both cases the
system.s speed will be slowed down to the speed of main memory.
2. Using a store-through algorithm has its own performance
disadvantages. As was stated earlier, it limits the speed of
store operations to that of main memory, and thus defeating the
very purpose of using a cache.
3. The procedure of clearing up the cache is also a wasteful one.
Note that a cache is always cleared if its processor wasn.t the
last one to run the process. All access requests to the cleared
cache contents that would have otherwise been serviced by the
cache, must now wait for transfers from main meemory. This will,
obviously, slow down the system.
111.4. Conclusion
We have analyzed the three common approaches that have been
proposed in the literature to handle the consistency problem in
multicache systems. We have considered each approach in the context of
the INFOPLEX framework presented earlier. Within this context we were
able to identify some major drawbacks in each of the approaches.
24
In the next section we propose a new approach to handle the
cacke-consistency problem in multi-processor architectures. In Part
(V) we will evaluate the performance of this proposed approach.
I.
I
25
IV. THE "'NO-C.PN E / PENDI )TRANSACTICN BUS" APPROt]I
IV. 1 Introduction
We argued in the beginning of Part (I) that in a single-cache
memory system (Figure 6(a)), where there is only one access path
between each level, no cache-consistency problem will arise. Cnce two
or more caches are used, however, the potential for the problem
develops.
The "traditional" approach in employing caches in multi-processor
sytems has been to basically replicate the structure of Figure 6(a)
for eacti of the processors, as shown in Figure 6(b), and then solve any
problems that arise. A problem that arises, of course, is the
cache-consistency problem, and the three basic approaches that have
been developed to handle it are those of Part (III).
Implementing any of the three approaches will obviously constitute
scme processing overheatd. M an example, consider theS*1I•
26
-CI'U/Cache Bum
Cache
Cache/Main Memory Bus
Main Memory
(a)
I
CI~t~i(CPU
tCPU/Cache Bus
Cache
Cache?/Main M.m y "-us
Memory Memoryain MerModule Module Module
(b) Private Caches (PC)
CPUI CU U
CPU/Cache Bus
Cache Cache Cah I omo ahCaIhe Commn Cache
m- Lory Memorr - - .... .-- -. - -- -
[........... Module Modulehg Main Mamory
4o) Comon Cachem (CC)
Figure (6)
27
"Store-(bntroller" approach and refer in particular to the flow-chart
of Figure 5. When a CPU needs to rndi fy a line that exists in its
caclh, and the line happens to be a "non-private" line, then the status
of the line is first changed to "private" and the "modified flag" is
set. Next, the Store-(bntroller.s directory is clhcked, and if the
line is found not to be shared by other caches it is modified. If,
however, it happens to be shared, then messages are sent to the
appropriate caches to invalidate the line, the central directory is
updated, and finally the line is modified. How much overhead did we
incur to maintain consistency? Well, compare the above steps with
those needed for a uniprocessor system where the cache-consistency
problem does not arise. In such a system, when the CPU needs to modify
a line that exists in its cache, it simply proceeds and modifies it.
None of the above checks, updates and messages are needed.
The concern over the processing overhead needed to maintain
consistency is a legitimate one. Such overhead does undoubtedly dampen
the performance gains (e.g., system throughput) which are sought by
introducing caches in the first place. Attempting to minimize this
overhead, therefore, seems an attractive direction for research work.
In this research endeavor, we are basically proposing an
architecture that eliminates the processing overhead associated with
handling the multicache-consistency problem. The architecture is
depicted in Figure 6(c). The important distinction between this
organization and that of Figure 6(b) is that, here, each of the cache
modules is accessed by, and thus services, all of the processors.
Thus, just as main memory is common to and shared by all processors,
le
28
the cache level in our proposed architecture is also coo_.n to and
shareu by all processors. In such a scheme there is no need to store
more than a single copy of any data item at the cache level. This
eliminates redundancy. And with no redundancy at the cache level, no
inconsistencies can obviously arise, which in turn eliminates the need
for mechanisms to maintain cache-consistency and the overhead
associated with such mechanisms.
It is important to note that this architecture preserves the basic
intent behind cache-based systems, namely, using a high speed memory
level between the CPU and main memory in order to bridge the speed gap
between the boo. It, however, introduces the concern as to whether the
CPU/cache bus (see Figure 6(c)) can haidle the needed high volume of
traffic. Comparing the Private Cache (PC) architecture of Figure 6(b)
against the Common Cache (CC) architecture of 6(c), we note that the
CPU/cache bus in PC handles the transaction load generated by a single
processor where as in CC it handles the load generated by all the CPUs.
We can, therefore, expect that the load on the CPU/cache bus in our
proposed (CC) archi tecture to be close to N-times that of the PC
architecture, where N is the number of processors. Of course, the load
will be scxewhat less than exactly N-times because in PC some overhead
traffic will be generated by the mechanism used to handle the
cache-consistency problem, and which will not be need-d in (CC).
Thus, to re-state, the Conamm Cache approach eliminates the
cache-consistency problem (i .e., by eliminating private caches) but
there is a fundamental concern that the CPU/cache bus will develop into
a bottleneck that degjrades the system.s performance. In the next
29
section we introduce the Pended Transaction Bus Protocol, which we
believe provides the basis for a viable solution to the problem.
IV.2 The Pended Transaction Bus (FT8)
The degree of bus utilization of a precessor is a function of the
physical characteristics of the bus, and the protocol used on it. The
physical characteristics of the bus which include the length, voltage
levels, impedence, termination, capacitance, noise immunity, and
overall reflection characteristics, affect its operating speed.
Ultimately, any bus is limited by the speed of electricity along a
conductor (0.6 to 0.9 nanoseoornds per foot typically, depending on wire
characteristics). Careful electrical analysis and physical layout can
optinize these parameters to achieve reasonable electrical speed.
Given an electrical bandwidth of the bus, as formed by the bus
wires and the driving logic, the actual data bandwidth becomes a
function of the protocol used on it. The traditionally high bus
utilizations of multi-microprocessor architectures that employ the
single bus as their interconnection scheme is primarily a result of the
bus protocol used, and which is called the "master-slave" pcotocol. In
such a protocol, the CPU (master) asserts a request on the bus, and the
memory (slave) that receives it does the appropriate action (e.g., a
MEAD), arx] then responds (with the requested data). The bus is viewed
as "busy" during this entire time. A large portion of this time is
actually spent waiting for the slave zo complete the requested action.
- .i-.-----'~~1~---
m .Sa
JU
The electrical and logical time to tranasit the actual request and
return the renly are a relatively small portion of the total bus usage
time. The period of time between the request and the acknowledge is a
wait interval, and nm useful work is done with the bus during this
time. Bus utilization could be significantly reduced if we released
the bus during the wait interval. To do this we split the transaction
into two parts, a request part and a reply part. The master requests
the bus, and upon being granted it, sends the request to the slave at
maximum speed. The slave acknctvledgles reception of the request, and
starts to work on it. The processor then releases the bus and waits
hi for the results. When the slave completes its task, it asks for the
bus, and upon receiving it sends, the reply back to the originating
master at maximum speed. The time between the request and reply can be
used by other masters and slaves to transfer other messages over the
bus. Because the slave stores the inamnpleted request from its master,
it is called a pended transaction, and the bus protocol, developed at
M.I.T., is called a Pended Transaction Bus (PlrB) protooDl (Toong, et
al, 1 SO) .
In the above discussions, it was presumed that the slaves were
always able to accept the master requests when presented. This would
imply that each master was using different slaves to guarantee such
separation. Given the shared nature of the cache data, it is likely
that two (or more) masters may make requests to the same cache-memory
slave. The simplest scheme to resolve this contention problem is for a
busy slave to refuse a new request and make the requesting master retry
at a later time when the slave beoomes free. This, however, is
wasteful of bus bandwidth, since the time spent to send a request out
31
the first time, only to be refused, is not useful. Furthermore, the
slave may still be busy when the master tries again later.
Goodrich (Goodrich II, 1980) has proposed putting queues on the
inputs of the slaves t buffer requests. That is, any requests that
come while the slave is busy would be placed in a first-in-first-out
queue, and these requests would be serviced in order when the slave ia
able to handle them. Such a sclhme would reduce the bus load to what
is actually needed for tranrldssion, without any extra cycles. In
addition, it would also improve the slave response time as seen by the
master over the simple scheme described before, since the slave would
have the request in its queue, and would service it as fast as it
could, not just when the master is finally successful in transmitting
it. Note that a queue overflow need not be fatal in this. system. It
can be treated like a refusal in the pLevious scheme, and would simply
require the master to retransmit the request later. If the quete size
is sufficiently large, such refusals would be rare. Finally, for best
slave throughput, there should also be a queue on the output of each
-"slave.
IV.3 An Application: TVe INFOPRMD( Storage Hierarchy
In the above sections we have introduced an approach to the
cache-consistency problem in highly parallel multi-processor computeri ~systems, which we will call the "Common-Cache / Pended Transaction Bus"
appyoach (CC/PrB). Its too distinctive features are: Firstly, the Ithe 'Qxun~nCace / endd TrnsationBus
32
conventional one to one relationship between processors and caches
i.e., where each CPU has its own private cache, is replaced by an
N-to-M relationship, where a pool of cache-irodules is commonly shared
by all processors. Secondly, we propose the use of the Pended
Transaction Bus (PrB) as the interconnection scheme that connects the
processors and the cache-modules.
In this section we describe how the CC/PTB architecture can be
incorporatec into the storage hierarchy of the INFOL data base )computer. Schematically the proposed architecture would look like
Figure 7(b). We will, henceforth, refer to this architecture as
INFOPt•E/C (for Common Cache) while referring to the original Private
Cache ocganization (shown in Figure 7(a)) as the IOPLEX/PC
archi tecture.
The key INFOPLEX storage hierarchy operations are the READ and
WRITE. in INFWLEX/PC two strategies are reeded to implement these
operations, a strategy for the cache level and another for all other
levels. The reason for this is manifested in Figure 7(a), where it can
be seen that the organization of the cache level is different fran the
other levels of the hierarchy. In IN LEX/CC, on the other hand, the
cache level organization is very similar to that of the lower storage
levels. As a result, the strategy used to implement the READ and WRITE
operations could be the samew for all levels of the hierarchy with very
minor provisions to account for the few differences that do still
distinguish the cache level. (For example, the absense of an MRP at
the cache level.)
L•I~A a•j
33
CPUt CPU$
Level
SSIC
Busr--mory
L Level
(a) Private Cache (PC) Architecture
CPUI CPUS
IBSGlobal
Bus
SL A iUSin Memory Level
(b) Coeuon Cache (CC) Architecture
LSUSI: Local BUS at level '1) - cache level
LBUS2: Local bUS at level (M) - min mem"ry level
SC: Storage-Corntrollet I needed only If 'Storage-Controller* approach
Is Impleterted
SLC: Storage Level Controller - It couplet a stor&;e level to theGlobal bus (I.e. serves as a gateway beNen levels)
MAP: Remry Roqjost Processor -perform., addr•ess empplag function
Li2ure_7
34
We will attempt now to explain briefly how the basic RAD and
WRrTE operations will be implemented in the INFOPLEVO(/C architecture.
To a large extent this will be based on the work of Lain (Lam, 75.),
where a much more detailed and complete discussion is presented.
All READ and WRITE operations are performed in the highest storage
level L(l) i.e., the cache level. If a referenced data item is not in
L(l), it is brought up to L(l) fran a lower storage level via a
READ-'rHROUQH operation. The effect of an update to a data item in L(l)
is later propagated down to the lower storage levels via a nuTber of
SIURE-BEHIND operations.
When a READ request is issued by a processor, the cache level is
checked to see if the requested data is in it. If the data is found in
a cache-module, it is retrieved and returned to the processor. If,
however, the requested data is not found in the cache level, a
READ-TOUGH request is queued to be sent to the next lower level L(2)
via the Storage Level Controller (SLC). As mentioned in Part (II) the
SWC serves as a gateway between the storage levels of the hierarchy.
At a storage level, a READ-JMUROGH request is handled by the
Memory RL quest Processor (MRP). An MRP perfotrms the address mapping
function. It contains a directory of all the data maintained in the
storage level. Using this directory, the MRP can determine if the
requested data is in one of the storage devices at that level. If the
data is not in the storage level, the RFAD-FHRCI•QI request is queued to
be sent to the next lower storage level via the Storage Level
Controller (S). .
LAI
35
If the data is found in a storage level L(i), the MLP maps the
main memory address of the requested data item into its real address in
L(i). This real address is used by the appropriate storage device to
retrieve the block containing the data and then passes it to the SIC.
The SLC would then broadcast the block to all upper storage levels by
dividing it into fixed size packets. Each upper storage level has a
buffer to receive these packets. A storage level only collects those
packets that assemble into a sub-block of an appropriate size (peculiar
to the storage level) that contains the requested data. This sub-block
is then stored in a storage device. At L(l), the sub-block (i.e., the
cache line in this case) containing the requested data is stored, and
the data is finally sent to the processor that initiated the request.
In a WRITE operation, the data item is written into a
cache-module, and the processor is notified of the completion of the
WRITE operation. We shall assume that the data item to be written is
already in L(l). (This can be realized by reading the data item into
L(l) before the WRITE operation.) A STORE-BEHIND operation is next
"generated by the cache-module and sent to the next lower storage level.
SINFOPLFX uses a two-level STORE-BEHIND strategy. This strategy ensures
that in a hierarchy with N levels, an updated block will not be
considered for eviction from a storage level L(i), until its "parent"
blocks at levels L(i+l) and L(i+2) are updated. This scheme will
ensure that at least two copies of the updated data exist in the
storage hierarchy at any time. The motivation behind using such a
strategy is two-fold. Firstly, the reliability of the system is
enhanced because at least two copies of newly written data are always
maintained until the data is "securely" copied at the lowest level of
4I
36
the hierarchy. Furthermore, the STORE-BEHIND strategy allows for the
updating of lower storage levels to be carried out at slack periods of
system operation, thus enhancing performance.
In the above discussions we didn.t show how a specific
cache-rwodule can be correctly selected to handle a READ or a WRITE
operation of a particular data item. In all but the cache level this1.function is performed by the Memory Request Processor (MRP), which maps
main memory addresses into their physical equivalents at a particular
level. What we need, therefore, is to augment the
processors/common-cache interface to translate main memory addresses to
physical addresses in the cache-modules.
There are two possible places to perform t-he translation
operation: at the processor interface and at the cache interface. At
the processor interface, the address gets translated before it reaches
the bus. This requires that each processor be "informed" about all
current lines at the cache level. This information would be used to
translate all the main memory addresses of these lines to their
equivalents at the cache level. An advantage of this is that the
translation can be performed while the procesor is arbitrating for the
bus, and if the translation operation is fast enough, it can be done
without any access time penalty.
In the second possible scheme the address gets translated at the
cache-module interface. What would be needed here is a mechanism
incorporated in the recognition circuitry of each cache-module that
would translate the main memory address put on he bus, and that would
37
accordingly decide IF the desired data item is present in thL cache
level and if so WHERE i.e. in which cache-module -nd in which location
in it. Both these should be done as fast as possi I1h. Tag directory
schemes incorj:orating set associative mapping are suggested by Mattick
(Mattick, 1977).
In Part (V) we will evaluate the performance of both these schemes
when implemented at the cache level of the INFOPLEX storage hierarchy.
4 • -' )
S.. . .. ....
38
V. EVALUATING THE PERFORMANCE CO THE "Ca44ON-CACHE / PENDED
TRANSACTION BUS" APPRC"
V.1. Introduction
To evaluate the performance of our CC/PTB approach we used the
INFOPLEX multi-level storage system as our test case. Very slight
modifications to the existing design were needed to incorporate
"CC/PTB" into the INEFLEX storage stucture. For example, instead of
using the four level memory hierarchy studied previously by Lam (Lam,
197%) just two levels, the cache level and main memory were sufficient
for our purposes.
The "CC/PrB" approach was o~mpared against two benchnarks.
Firstly, we evaluated the performance of the "Store-Controller"
approach as a representative of the traditional approaches. Selecting
the "Store-Controller" approach to evaluate our scheme against cannot,
however, provide an effective answer to the question of how "optimal"
either approach is. What is needed is an "absolute" benchmark against
39
which both approaches could be judged. For this purpose we evaluated
the performance of a traditional private cache architecture assuming no
overhead for handling the cache-consistency problem. This, then,
constitutes the "performance ceiling" that no cache-consistency
handling mechanism (for INFOPLED) could possibly exceed. It is also a
valuable reference point in that it tells us how close we are to an
"optimal" solution.
V.2. The Evaluation Tool
The evaluation was perforred by producing a simulation model in
GPSS. We developed four separate GPMS programs:
1. Program "OPr" ignores the cache-consistency problem, and thus
incorporates no overhead for handling it. It provides us with a
ceiling an performance.
2. Program "SMCR" incorporates the "Store--Controller" approach.
3. Program "OC/PIWC" incorporates ..he "CC/PTB" approach with the
translation operation performed at the cache-interface.
and 4. Program "OC/PTB/P" incorporates the "CC/PrB" approach with the
translation operation performed at the processor interface.
The GPSS code for each of the above four programs is presented in
a separate Appendix (Appendices I through IV). All four programs are
highly detailed simulators. A widely used index that characterizes the
degree of detail of a simulation model is its resclution in time, which
Lh i-.J. 77I'77J ~ TZj7
40
is defined as the shortest interval of simulated time between two
consecutive events being considered (Ferrari, 1978). In our four
simulations, the resolution in time is 10 nanoseconds. Such detailed
simulators are likely to be more accurate and to have broader field of
application than less detailed ones. However, they are certainly more
expensive to design, implement, test, document, and use.
In addition to deciding on the degree of detail of the models,,
another basic decision had to be made concerning the workload that
would drive the simulations. We decided to perform our measurements in
an operating environment that is not at all uncommon in present-day
computer systems, namely, running at capacity. Under such conditions,
there will always be at least one transaction in the system.s input
queue(s) waiting bo be serviced. In such a case, the performance
characteristics that are of interest to us, such as throughput per unit
time, beame insensitive to the distribution of job arrivals.
Before concluding this section we would like to emphasize some of
the structural differences between the four nrodels, as well as some of
their common characteristics. The basic structural differences are
highl ighteo in the diagrams of Figure 8. In Figure 8(a) a
two-stoL-.ge-lcvel version of the INFOPLhC storage hierarchy proposed in
(Lam, 110b) is shuwn. To support the "Store-Controller" approach an
SC "box" (dotted in Figure 8(a)) is added to the architecture. In
Figure 8(b) the architecture we proposed in Part IV to support the two
versions of the "(OZ/PTB" approach is incorporated into the INFOPLEX
storage system. Notice that in the CC/PrB architecture we deliberately
maintained the same number of caches (5) as in Figure 8(a). It is
a.
41
CPVI CPU%
Cache I Cache$5 Cache
SC'klobal I CIus ...
PlainL7 "ems ry
SL LBLevel
(a) Private Cache (PC) Architecture
CPUl CPes
sic ILBUSl
al obalBus
(b) Conan Cache (CC) Architecture
LBUSI: Local BUS at level (1) - cache level
LOUS2: Local BUS at level (2) - main memory level
SC: Storage-Controller - needed only If "Storage.Controller" approachis implemented
SLC: Storage Level Controller - It couples a stormse level to the
Global bus (I.e. serves as a gateway bet.een levels)
MRP: Pe'ory Request Processor - perform address napping function
Figure (8)
42
important b realize that while a 1:1 relationship between the CPUs and
the caches is inherent in the Private Cache (PC) architectures, such is
not the case for the Common Cache (CC) architectures. We, however,
chose to maintain the same number of caches in both architectures (and
four simulation models) to neutralize it as a factor that might affect
performance.
Finally, there is a set of ommion characteristics that are shared
by all four models, these are:
Degree of Multiprogramming of a CPU = 10Bus Width = 8 bytesSize of Transaction without data = 8 bytesSize of Transaction with data:
at Level 1 = 8 bytesat Level 2 = 64 bytes
Size of Data Buffers = 10 64-byte transactions
Finally, the Pended Transaction Bus (PTB) protocol will be used in
all three architectures (and four models). That is, we are committed
to the PTB protocol in INFK)'LEX as a result of our experimental work
(at M.I.T.), which demostrated its performance advantage.
V.3. The Simulation Fxperiment:
.,r criteric- for measuring performance in this experiment, and
which will also serve as our dependent variable, will be the total
system throughput as measured by the total number of transactions
processed per unit of time. As for the independent variable, there
43
were two candidates: (1) the hit ratio, which is the percentage of
time that a referenced data item is found in the cache level; and (2)
the transaction mix i.e., the percentage of READ requests versus WRITE
requests.
The latter was chosen because we felt it is in a sense, a more
independent variable. What we mean is that the transaction mix is
largely a function of the use of the system and as such we have very
little control over it. The hit ratio, on the other hand, is
system-dependent. It can he affected by manipulating such system
parameters as the total size of the cache level and the size of the
individual cache blocks.
Thus, the hit ratio will be held constant, throughout the
experiment, at a value of 0. 90 for READ requests and 1.00 for WRITE
requests. (That is, we are assuming that a WRITE of a data item is
always preceded by a READ to it.) The transaction mix i.e., the
perceritage of READs, will be allowed to vary in the range from 70 % to
90 %. This is the range in which the mean READ rate lies for most
processor architectures (Censier and Feautrier, 1078).
Both the hit ratio and the transaction mix are variables that
affect the performance of the system but which are independent of the
mecnanisms used to handle the cache-consistency problem. There are,
however, variables that are peculiar to the particular mechanism used,
and which influence the system.s performance significantly.
In the "Store-4Yontroller" approach the degree of sharing between
LAW",----- ~ 7T~L7JY;i.z:zi-n ~-
44
the caches is by far the most important such variable. We evaluated
the "Store-Controller" approach under two operational modes, a
"Pessimistic" mode and an "optimistic" mode. Under the optimistic mode
no sharing between caches takes place, yielding an upper bound on the
performance of this approach. Under the pessimistic mode a high degree
of sharing will be introduced (50 % of the cache blocks will be shared
by more than one cache-module). This will then provide us with a
oznservative lower bound on the performance of the "Store-Conttoller"
approach.
With respect to the "CC/PrB" approach we ..11 also follow the
above strategy, and evaluate it under both "pessimistic" and
"optimistic" conditions. Under the optimistic condition we will assume
the load on the cache-modules to be uniformly distributed (i .e., each
of the 5 cache-modules carries 20 % of the load). And for the
pessimistic case we will assume the load to be linearly distributed
between 10 % at the least loaded cache-module and 30 % at the most
loaded.
V.4. The Hardware Parameters
:t is necessary to determine the speeds of the different hardware
components in the IDOPL'M storage hierarchy. In particular, what we
sought were the best possible I.85 projections for these speeds, since
the first hardware prototype of the INFOPLEX data base computer was not
expected before then.
i -
F.....745
The forecasts shown below constitute a realistic, but ambitious, Jscenario for lS85. In other words, they incorporate the fastest
possible owuonents that we envision as being available (and
appropriate) for building an INFPLFEX in 1985. (Only a subset of the
parameters are shown.) The bus speed (b) is 10 nanoscooris, the cache
READ/WRITE spkxtd (c) is 20 nanoseconds, and the rcmaining parameters
are multiples of the latter, as shown. In other words, we used the
cache speed as a logical building block to "build" the forecasts of the 4
other hardware components (other than the bus). For example, if the
caclhe REAO/WRr E speed is 20 nanoseconds the READ/WRITE speed of main )memory would be 10 x c = 200 nanosecoonds. The parameter values are:
Bus speed (b) = 10 nanoseoondsCache READ/WRITE speed (c) = 20
Directory LIookup (2c) - 40Directory Update (4c) M 80Storage Level Controller (SW.) sp~Ed (2c) = 40Main Memory READ/WRITE speed (10c) = 200
Three other scenarios will be tested. The bus speed, in all
three, will remain at 10 nanoseconds. The cache READ/WRITE speed,
however, will take the increasing values of 40, 60, and 80 nanoseconds.
And finally, the remaining parameters will maintain their relative
values in terms of n, the cache READ/WRITE speed. For example, when
the cache RED/WRITE speed (c) becomes 40 nanoseconds the READ/WRITE
speed of main memory will be 10 x c - 400 nanoseconds.
Selecting several "good" 1585 scenarios reflects the fact that
different options, in building an INFr3PLEX, will be available to
aXaccomoxdate the cost/performance tradeoffs. rund because the bulk of the
- .J ZJ
46
system.s cost lies largely in the storage omxnents, the variations in
the forecasts between the different scenarios involved mainly those
o4 4.qnents.
V.5 Analysis of the Simulation Results:
ASt mentioned previously, our dependent variable in this experiment
is system throughput. It is also :he criterion we use to evaluate the
perforrk-ince of the cacho-consistency handling mechanisms.
The throughput of any computer system is bounded by one of two
factors, nawely, Lottlenecks in the system or the transaction load on
it. In section V.2 we mentioned that our models will operate in a
maximum load closed-loop environment, where new transactions are
continuously generated to replace serviced ones. In such an
environment, bottlenecks will definitely arise, limiting the throughput
of the system.
Thus, in the process of interpreting the simulation results, we
wish Wo identiLy and analyze system bottlenecks. In such an analysis,
one neeu. tu cotisider four major factors that directly influence the
evollution of a particular system component into bexooming the system.s
bottleneck. The four factors are:
1. Architecture: Oonsider for example the (a) PC and (b) CC
architectures of Figure 8. In PC the local bus of level 1 (the
47
cacne level) handles only the communications between the five
caches and main memory. In the "CC/PrB" architecture the cache
level bus must handle, in addition, the communications between all
the processors and the cache-modules. The chances for the local
bus of level 1 (LBUSI) to become the system bottleneck are,
therefore, much higher in the CC architecture than they are in PC.
2. Algorithms and Protocols: The set of algorithms supported by
an archi tecture and the protocols and meclanisms used to implement
them widoubtedly influence the utilization patterns of the
different architectural components. Consider for example the
central role of the Store-Controller "box" in implementing the
algorithmi-s that support the READ and WRITE operations in the
"Store-Controller" approach. Such a role will inevitably lead to
high levels of utilization of the Store-Controller.
3. Workload: We mentioned in section V.3 that we intend to
evaluate the "CC/PTB" approach using two different load
distributions on the cache-modules, a uliform distribution and a
linear one. In the latter case, the cache-module carrying the
highest load (i .e., 30 % of total load) could clearly develop into
a system bottleneck.
4. Hardware Components. Characteristics: The characteristic of
significance here is speed. For an example we refer again to
Figure 8. All communications between -level 1 and level 2 in the
INFOPLEX storage hierarchy go through the Storage Level
Controllers (SLCs) of both levels as well as through the Global
Bus. The time needed by the Global Bus to process any of the
communication messages (i .e., the time it takes to transmit the
message) will usually be less than that needed by the SWC to
-*------- -- *-------- ---
48
process the same message. This means that the SIC will always
saturate before the Global Buu can develop into a bottleneck.
As part of the standard GPSS output, the utilizations of all
hardware components (that are modelled as GPSS "facilities") are
printed. This allows us to precisely identify the system
bottleneck(s). An example is shown in Figure 9. In this case the
bottlenck is LBUSI (the local bus at level (1)) with a utilization of
approximately 100 %.
In Figure 10 our simulation results for the four scenarios are
presented. In the discussion that follows, we will identify the
different scenarios by their cache speed/bus speed ratios (n) (i.e., n
= 2, 4, 6, or 8) as it is a convenient parameter that completely
characterizes each.
For each technology scenario (i .e., n value) seven curves are
plotted. The single solid (-) curve portrays the performance of the
"CWr" model, in which no mechanisms for handling the cache-consistency
problem (and, therefore, no overheads) are i ncrporated. Thus the.
throughput of the "OFr" model, as is demonstrated in the figure,
provides a ceiling for all the other models. The "Store-Controller"
model.s results are depicted by the two dash-and-dot (-.-) curves (Sl)
and (S2). Curve (SI), which is always the dominating curve, is for the
ca.se where there is no data sharing between the caches, and (S2) is
when 50 % data sharing is introduced. And finally, there are two
curves for each of the two implementations of the "TC/PB" approach.
The two dashed (-...) curves (P1) and (P2) belong to the "CC/_TP"
-f-- - .---- ,--------'
49
I
FACILITY AVERAGE NUMBER AVERAGE SEIZING PREEMPTINGUTILIZATION ENTRIES TIME/TRAN TRANS. NO. TRANS. NO.G.JS .474 1199 3.9c9LBSI P. .999 6514 1.535 59LBUS2 .702 1702 4.126 72
0P11 .396 679 5.846 37ORPi2 .405 695 5.833.411 703 5.849ORP14 .369 629 5.879
RP 15 .398 681 5.856KRP1 .479 1199 4.000I•P2 .485 1214 4.000
R4P2 .2S7 719 3.997 52DRP21 .198 496 4.000
Figure (9)
5,9
CAL t Iu ) (In0
,•. .,,. \, -
J- ill
Cx3L
cc oI.- I X\
\.." -a,
..-.-. (~COD•
- I!
I.--- •*. , 0
0N.. ON~..c
N.• •." . . \\
CD
m. Cie
N1. \.\
I. -C*• --"-, I- *"
IN,-
N N.
-0
9x C6.
464 u c U
•- ',- - -C.)D
-CF g rON
- - ) I" . . .,.)
CD UD
R- z
a~ c to NW N %0 C -3 OD -D
Figjure (1.0)
51
implementation, while the two dotted (...) ones (CI) and (C2) are for
"OC/PTB/C." The two dominating curves in both implementations, namely
the (P1) and (Cl) curves, are for the case where the load is uniformly
distributed among all cache-modules. The two other curves (P2) and
(C2) depict the performances under the linearly distributed load.
As Figure 10 demonstrates, the "CC/PrB" architecture, in both its
implementations, cnmpletely dominates the "Store-Controller" in three
out of the four technology scenarios. Qnly in the first scenario (n
2) does the "Store-Controller" show a performance advantage and only
for high READ rates. The remaining part of this section will be
devoted to an analysis of the different factors affecting the
performance patterns demonstrated in Figure 10. Of particular help in
conducting this analysis is the information of Figure 11 depicting the
system bottlenecks in all the cases tested.
There are two basic patterns that deserve separate analysis. The
distinctive pattern of n = 2, and the pattern ommon among the three
other scenarios, n = 4, 6, and 8. To analyze the latter we will
arbitrarily pick the case of n = 6 as our analysis ve.dcle.
V.5.1. Case of n = 2
To many, the most surprising aspect of these results will be the
almost identical shapes of the "OPT" curve, and the "Store-Controller"
curve (Sl). To understand why this is so, we first note from Figure 11
that in both models LBUS2 (i.e., the local bus of level 2) is the
bottleneck. Now, the difference between the "Store-Controller" model
52
n 2 n=4 n=6 n=8
% READS % READS % READS % READSProgram I ! II 1L8 ,
70 80 185 90 70 80 85190 70180 18590 70180 85 9o
LBUS2 SLC
OPT . " / ... .... .. _ _,_
"SCLBUS2
STCR
LBUS2 SC
LBUSJ2 LUSU SLC
CC/PTB/Cin--LBUS1 SLC C1 * SLC SLC
LBU 2 LBUSI SLC
CC/PTF3/P
LBU,42 LBUSI SLC
*C is the Cache-Module carrying 30 percent of the load
Figure (11)
53
anm the "OPT" model lies in the overhead necessary to handle the
cache-consistency problem. All this overhead (in the
"Store-Controller" model) is in the form of additional operations which
all take place at the cache level. Thus, while removinq this overhead
(in the "(0pr" model) will necessarily decrease the load on the cache
level, it will not decrease the load on level 2, and in particular on
LBUS2. Thus, since LBUS2 is the bottleneck in both cases, and since
the load on it (by an "average" transaction) remains the same, the
performances in both are very similar. Flor the same reasons, the other
five models, namely, (S2), (P1), (P2) , (Cl), and (C2), have
performances close to that of "OPT" for % READs ,= 80 (i .e., they all
have LBUS2 as the system bottleneck).
The fact that LBUS2 is the bottleneck is itself, by the way, an
interesting finding. With a hit ratio of 0.90 for READ requests and
1.00 for MITE requests most of the "action" is clearly done at the
cache level. It would, therefore, seem that LBUSl must be saturated
before LBUS2. The answer goes back to the parameter values of section
V.2. Notice that the transaction size for level 2 (64 bytes) is eight
times larger than that for level 1 (8 bytes). This means that a single
transmission on LBUS2 will be approximately eight times longer in time
than a single transmission on LBUSl. Thus, although the absolute
number of transmissions on LBUS2 is less than that on LBUSl, the
utilization of LBUS2 in this case is higher.
Another interesting observation re]ates to curve (S2) of the
"Store Contruller." Curve (02) i3 always below curve (SI) because in
(S2) the "Store Controller" (SC) is the system bottleneck with a
Lh f ...--..... . .. •- ' - ..... --- -- =.,r---•
54
saturation point lower than that of LBES2. The reason this happens is
that the high degree of data sharing between the caches (and which is
"orchestrated" by the Store-Controller) means a higher utilization of
the Store-Controller by the "average" READA/rFE request. Notice also
that the bi curves (Sl) and (S2) diverge as the % of READs increases.
This is merely a reflection of the pattern by which the READ and WRITE
operations utilize the two respective bottlenecks, LBUS2 and SC.
(Straightforward analytic calculations would demonstrate that the
effective capacity of LBUS2 increases faster than does that of SC as
the % of READs increases.)
Next let us turn our attention to the "OC/PTB" results. Notice
first the deflections in curves (PI), (P2), (Cl), and (C2). This
happens because at approximately 80 % READs LBUSI (and not LBUS2)
beomes the system bottleneck in the four cases. However, even though
LBUSI is the bottleneck for both "CC/PrB/P" and "CC/PTB/C", "(C/PrB/P"
clearly domxinates. This simply is because an "average" READ/WRITE
request utilizes LBUSI less often in "OC/PWB/P" than it does in
"OC/FTB/C." For example, in CC/PTB/P when a requested data item is
found by the processor not to be in the cache level, a request is sent
to main memory via LBUSL. In CC/PTB/C, on the other hand, a CPU
request must first go to a cache-module (through LBUSl), only to be
found unavailable, and then forwarded by the cache-module to main
memory through LBUS1 again.
Notice finally the negligible effect that the load distribution on
the cache-modules has on performance (i.e., curves (Cl) and (C2) are
similar, as well as curves (P1) and (P2)). The reasons for this are
- -- ,J----
55
completely analogous to the ones explaining the close resemblance
between the "OPT" curve and curve (Sl). In other words, changing the
load distribution does not affect the utilization of LBUSl. As long as
no new bottleieck develops because of the change in the load
distribution, LBUSI will remain to be the bottleneck, and thus maintain
approximately the same throughput. The utilization of the cache-module
that carries the highest load under the linear load distribution for
both "(C/PrB/P" and "CC/PrB/C" turns out never to exceed 60 %, which is
far below the 100 % mark that has to be approached before it would
replace LBUS1 as the system bottleneck. This, in a sense, is very
comforting to know. It shows that the behavior of the models is, to a
large extent, insensitive to the shape of the load distribution on the
cache-modules that we used.
V.5.2. Case of n - 6
Most of the ideas of the above discussion are applicable to the
case of n=6. For example, "OPT," "OC/PTI/P".s (P1) and (P2), and
"CC/PtB/C".s (Cl) all have almost identical performances because they
all have the same bottleneck, namely, the Storage Level Controller
(SW).
Notice, on the other hand, that because the bus is now relatively
faster in omqparison to the other system oxnponents, and in particular
to the Store-Controller (SC), LBUS2 ceases to be the bottleneck in the
two "Store-O~ntroller" models. Instead, SC is now the bottleneck, and
the degradation in performance is quite evident. However, notice that
even though SC is the bottleneck for both (SI) and (S2), the
- - -..- -
56
throughputs are quite different. The reason for this is that an
"average" READ/WRITE request in (S2) (where there is 50 % data sharing)
requires more "services" from the "Store-Controller" than does an
"average" READ/WRITE request in (Si). (When data sharing exists
between the cache-modules, the SC has, for example, to invalidate
redundant copies when a data item is modified.)
The only remaining result that deserves some explanation, is the
deflection exhibi ted in curve (C2) of "OC/PTB/C" at % READs = 80. The
reason for this beha,-ior is that somewhere between % READs = 80 and
% READs = 85 the most heavily loaded cach(-module (i.e., the one
carrying 30 % of the load) replaces SrC as the system.s bottleneck.
(See Figure 11). Notice that this does not happen in "CC/PTB/P.s"
curve (P2) even though the same linear load distribution is used. The
reason for this is because, even though, the cache-modules in both
cases are subjected to the same load distribution, they are not
subjected to the same load. This, of course, is because the READ/WRITE
operations in the "OQ/PTMP" implementation use the cache-modules less
often.
V. 6. Conclusion
The above results clearly indicate that no one approach dominates
over all four teclmology scenarios. Technology, therefore, must remain
as an element of some uncertainty.
57
It is important to realize, though, that what is really important
in our technology forecasts is not the absolute values of the different
speeds, but rather the relative values for the different hardware
components. And the most important such value is the relative speed of
the bur, vis-a-vis the processing components that use it. Our results
clearly demonstrate that the faster the bus is relative to everything
else, the more appealling the "CC/PTB" approach beaomes. More
specifically, when the bus is four times as fast as the cache or faster
(i.e., n ¢= 4), the "OC/PTB" approach provides a 20 to 60 % performance
advantage over the "Store-Controller" approach.
But, what perhaps is the most interesting finding, is the fact
that for three out of the four technology scenarios (with r€ (= 4) the
performance of our "(C/PTB" scheme is very close to that of "CPT."
Notice that in the above statements no attempt was made to single
out any of the two different implementation schemes of the "CC/PTB
approach. Qie of the interesting findings in the simulation results is
that the performances of both schemes are very close indeed. Our own
intuition was that the "(C/PIWP" ivplementation would display a
performance advaitage. It was clear, that by utilizing "fast"
processors that would overlap address translation with arbitrating for
the bus, the utilizations of the bus and the cache-kodules would
decrease. Although the utilizations were indeed lower, this did not
materialize into the higher performance we anticipated. The reason:the execution of the INFOPLEX storage hierarchy operations and storage
mechanisms was such that the storage level controller (SW), whose
utilization is independent of the "CC/PTB" scheme used, would, in most
- - -1
58
cases, be the first ooqponent to saturate. The only exception to this
is when, under the linear load distribution case, the most heavily
utilized cache-module developed into the system.s bottleneck. In such
a case the higher performance potential of the "CC/PTB/P" scheme was
indeed realized.
! -I
b
59
VI. CCNCLWICN
This paper is a report on an ongoing research effort at the M.I.T.
Sloan School of Management to study the multicache-consistency problem
in highly parallel milti-processor computer systems.
There are three basic approaches proposed in the literature to
handle the multicache-consistency problem: The "Broadcasting"
approach, the "Store-Qontroller" approach, and the "Multics" approach.
However, serious drawbacks c,.. be identified in each. A new approach
called the "CC/PTB" was, therefore, developed. It attempts to minimize
performance degradation by minimizing the overhead of maintaining
cache-consistency.
The "OC/PTB" approach was implemented in the INFLEX sborage
hierarchy end evaluated using simulation modeling. The results are
very favorable.
I
60
This work opens up many areas for further investigation. No
mention was made in this paper, for example, of the replaoement
algorithms at the cache level. It would be interesting to find out the
value of implementing a "sophisticated" algorithm such as the "Least
Recently Used" (LRU) algorithm (or versions of it) as opposed to a
naive algorithm e.g., random replacement that requires much less
hardware overhead.
We have assumed, as is common in the literature, that all WRITE
operations for a particular data item are preceded by READ operations.
Relaxing this constraint, and developing efficient algorithms to
exploit both this relaxation and the architecture of the storage
hierarchy is definately worth investigating.
And finally, the "CX/PB" architecture should be exploited in the
development of algorithms that would improve the reliability of the
data storage hierarchy. The automatic data repair algoritTms, for
example, are particularly interesting and prcmising.
61
BIBLIOGRAPHY
1. Censier, Lucien M. and Feautrier, Paul. "A NLw Solution toColherence Problems in Multicache Systems," IEEE Trans. onComputers, Vol. C-27, No. 12, (Dec., 1.8,1112-1118
2. Ferrari, D. Computer Systems Performance Evaluation. 41glewxodCliffs, New Jersey: Prentice-lall, Inc., 1798.
3. Goodrich II, E. R. "A Distributed Operating System for a CentralShared Bus Multiprocessor System." Unpublished MasterThesis, M.I.T., 1960.
4. Greenberg, B. An internal Honeywell memorandum, 1978.
5. Kaplan, K. R. and Winder, R. 0. "Cache-based Computer Systems,"Computer, (March, 173), 31-36.
6. Lam, C. Y. "Simulation Studies of the D[H-II Data StorageHierarchy System." M.I.T. Sloan School Internal Report No.M010-7M06-04, 197%.
7. Lam, C. Y. "Data Storage Hierarchy Systems for Data BaseComputers." kiub.lfished Ph.D. Thesis, M.I.T., 1979b.
8. Madnick, S. E. "INFCLEX - Hierarchical Decxposition of a LargeInformation Management System Using a MicroprocessorComplex," AFIPS Proc., Vol. 44, 1973, 581-587.
9. Madnick, S. E. "The INFOPLF( Database Computer: Concepts andDirections," Proc. IEEE Computer Conference, Feb. 26, 1979,168-176.
10. Matick, R. Comuputer Storage Systems & Technol.g. New York: JohnWiley & Sons, 1c.77.
11. Tang, C. K. "Cache System Design in the Tightly CoupledMultiprocessor System," AFIPS Proc., Vol. 45, IV76, 746-753.
12. Toong, H. D.; StzcmTen, S. 0.; and Goodrich I, E. R. "AGeneral multi-Micropcocessor Interconnection Mchani sm forNbn-Numeric Processing," Proc. of the Fifth Workshop onComputer Architecture for Non-Numeric Processing, 1980,115-123.
62
APPENDIX (I) The "OPT" Program
61
UJII'-
00
Uw
v 0e
0 n.O t ULIJto
>0
a. a. (ý . .J.
It I -JJ01 u is s U,
- or0 uI < . 9 " i
l- .C --.---C K Ii
v ZLL IL) J U. -
0 ý~I~-JIt .. I U (At t 3In a3 0 -Z'J 0' 0- *n *4It ~ LA. **L* In * O *4
a.U~f WV>W WW CL a C) U000~00O a A - L 0 . lý
m- *LU0 1000z0 xR A -. M 0 wCCC s 1 Iai ar a .4 aL x I08 0
C) 0 0 M W 1, -C) M* * . * * A** aa *O6 **U-@O*811;wulw14
CL ~ ~ ~ ~ 4_ _ -1 X- r - -C-. )WUJ- 4I' )V
64A;
00fe 00..
cc0
otc t I.- .
0 W L>- 00t .;U z -
LAQ~J II l U.1 in
0 - CY(T I-jD 0w 0 1 - wJ Iit-ý 0 1--.
0 j VZ A0 -UM 70 -.z V)c 0l z aw a aS0
0. zs w w: ý
Ou01 0 , l t DD u-. I.4- -j 0)-~
2 . 0 z 04 A. L
0- 0. X 000J~t W (J(
U -I it Il n n -PC 4-U1s "11 4.I3 I
0.00.0~~ uulw.- .
! -t 4. .. I. -C0 000 4- k . 9
It -A Iti
IL0: I. a .bw .www zzz
-j tonI z V1 Vsi.J.J..M.J..jV.IX u(cJx.
BP I,
4 L
w
•C%
I.-
00000
In In 0S00000 0 ' 00
-A* * -In #An I'll
C) 040M in '.
I - it %- 0 0
414In in4 -04A1
.- A #A In
z .. .'" 1 .. :. .. .. , - . , . .. , ,
0 .C1C if -0 0 CA
U-u _0 0 -0
00
InA m AI)-
: 4 . 1. 0 10 100
3o to w.. -tot A cc1 A(A V) -.-- (At n 4 A ; 4 ,
CL innni ~ vIn 'S) -ji '
3 4 0 00 0 I S xS -'
I3 4y a- - 0 0 0 cc0 LL it4* 0-'MI . 4 .- **.* it ** **44
it IVU J 4y or W V W M WD El *3 0 C. 0 111 jt
ft 3 LL *j IA14 In IA t4 V)L As 0 i 4.1 Is 44
: t1 w * 4, s-
66
0I
am
w
z
a.
IO
0
2-Si ' - .- 1 9 4 . x0an" IV Aal4 A4 0ilI(.t A-z&UU a ý &.L &I. IV na oAWL
;ozzz z z z .U IA A .. =;2:zzz z0- i Aa A(
4/ ; ;/ I M ?a41 14 : )= ): 3: = ): ) * * I 1D : ):
- *t #A 440 4- 1- 1- #A w- v) 401440 #A 0- U) 4/4 4- "4 w an4
:1 4 I..I& . w wI&wI wwL WI : w %L4.L J.I4...4 4J44&4/4,4A LIU.U4L La. I.I
a n o" Il44/i i '4~n IiII I init"I mIIi/U inZ* Or 41In In I 1) n n4/l4n .0** -f -11*. 411RfU4j -de -9
al II LaCt 4 C4 a 4 4,
- ~ ~ ~ ~ ~ c 'A .4 g 44 -ac 61 4r tat44444SJ4W 4.44W 4 1.- -9 OrWLJ W.j 03 0- 4o .5m ~ olo ~ x , in 444 Si- 44 .4 Ic x co o.. Ip Or & at. J
4 ~ ~ ~ ~ ~ ~ ,QL IL X 441 9 g144x4fl444x4fitfl4111xmx44441x 3194111:t444I dc fr~ni4fm
67
II
Lu
th1AI
0r
ui-
NNN NN
-,Jý 9:0
01I 0ý
VI-h#Ai
04 (N N (N (N (d CV
10 in it 1 )C I
to ( t.) 0 0l v 00t- Z1) D- 43 n~ D D0zzzzzzz 90. ua+u
md(I3 ~ V 4k 0 R& *0 S v 4k
j * * u" 0 4 * 0 *L
W z z z I- I" z ~03 .9 It & w o4 IA.SO U
>) > > >* cI. 4. 0 I0 j 0i 4
CL wo o d - I 0 0n .9 a 4 0 l 4 - 0L >*nc
44 44) I-3 02x1 LL I- I- W 041 .
(N(N(NrA(N(NLN 1 0
W Fo-40 ~ S I0 3 4110
68
0
fA
0
zo w
> -4 -9U
0
•~ n,
I- Iou
m W
I-a 44 i
o 4I U w. in U.
x ITU
III'- ZU.
I w-J 2- I-Z U
wo o mn on. If LJ. . .1! In
I I-. 11mft 3 C) I9 -A I.- . • ll::l
1~~~~. 0• C)• "."1 1 'a 1i) I•1 I. cc, +I itIO -
ir A w w V " + 0 I.
4 u
am in "41 •I (Al U' ,U IllIlI • Z Z • • I . I UJ -• L J t * i
ow a ac a <
----------------------------------------------- -... .. .. ....-. .. - ......I
69
0
w U
U.
D- z c
w w
0 C%
C, 0 Cw a W
OL U.0- 0. I- Uz ;t1 w Ie A4 in
0. &- 0 CLa
t
I- * 4I.- C;
U - I- IS U Lx.W
0t wAM101 -4Wr uw W 4.1 At w t o 49 (~ 4 ;W
IA. 'm I
CA * r'i4 4ý
CDC
lz if 4A44 4
uj ~ ~ 39t
xr * - :Z LU:9 9I N ou to 4f OLOU I.
a LOO If 4 44
4 0 * *cc 0 I Cc in
4 CA
IX owl
It )I. w U
70
ftt0+
LL
IiLos
otA
_ i
L. . us 0 C. UPdc mC It 00nc 0 ( I.- La ftc.)
It Li u -- ,
us )- It,- b,, 0> u -a- ft 00• 0 A .
uO LO. s i " 0 f 4-u ; ft 0 a:4 I W. m at
ft-Rf 0i at -4 0.0. 0 C9v~ c 0 3 1.- 4011 m Ep
• -. u 2 =-= 0....,20
"j ~> 2 .4 4 > z LL
z -4 Z) I).
t AtoI- in It6 sZ 6-Cw0k. inIx 4 s- CL c xi 4c I 4u m~ GUM L
3-
4 3
, 0 011 - 0.3cc It z cc f'A #A0 0 4l 41
46 f* * x at 41 41 4*I 41 4-C :* 6
. I; it .) Ir Us m t4 3 6i
41 w - r : :9 dx i.- x _m 0 a 0 8 - "'0 r) u -- n I.. "14A: I. -uLt II
0 It 0 0 o0 .d 1,u IF c II
cy 17 0r 04/ Mi0 t A 6 6 I c 3z0ki z0uU.. 0 .. . Us o w a
I.U U s - L) ý - s 49I i 4 .- u-
71
0
ccz
0 U 0 r
IA f 0 W CA . 0 -I- !. Il>l Z >
ft. I. z @.-f
w Oftw u4AO , 0a v. w 0 - ef C
CL 0In cc 5- I- CL IT
ft~~ftIO~ 0 (
33 mI- - 5-~IXUJ co Vl I-
tht
vi m 4 *
to -x w - 0X gozI
ct 0 l 0 to 0 a C4 0 4. I f 4 40 I I 0 44 44. 0 0
IIA 0' 0 at 0 or W 0 w 0 ('f 0 0 IACVt 0I I~Al fA 0
> u -C U u -9 w 0 - - 0 *U0 !,- tth
Li 4d 0j us Is. of 4 4 1.-IC 0
0 0 *l04 5
U. * ft 2 n 4 0l IL i 0 U t 0
72
4t
U. U
U.wc-Cc
In o KW
o w w1.
L.e LL ; Wi.-> Z4 LL V(a. a I
_- P4 L .0 A
o: w w l i idw U.1 Li -.- 0 0 Mi Z.
I- 00 i. i.3cn 0n 0 I-M
o1 #A 0 0 0iM'K xw~ # 0 0 a. 0 46 I 3 -
In~U 71 Li.--. 111 4 In >In. 4( w O~r 0 0< M
Isr Xr U- C3 M- 0 u -(0 6 -1-40W 2U It 4 Ifl < . x xn
cc wu 0 0 ar InLiaO.
Lu III- In In 4wI- W tiCl
re uir 0111 a 0 0 -0 (I In 9 : r 0 & 0 1 Ac r Uz in (III -i -4 n -P : Z : Wf I.-70 u f 0 re - c pI0, . 0i .t I C :C . - j u . It n >. 34
ILV 4A II 01 It 0 iCL 0I/iu C . . . 3. 0 a 0 -U~
II A . .. 0 j #A l a IaA M ;A L 3 0 u #AVo
73
ww
a.a
cr. ui
o no o
I wj Lta ( f C
4ý 'a -C Iz04 - t
o L ALA ";;.1L x .0 co 0 W ( t
IA L 004 -
u 31. 1A - 1. x3Mi U) cc w U -w i
c w z -AV c .
CLw 0 tL 11 IA IzxC
fL A IA1
U -K -) 9 U6
L) 4 a:l IA A a ct i
w 0L I.- 0 -
or a w 2,) a o;00%A a y 0 f- 0 I4
Z- *L 4r > -9 Sr I- VZ ul x z O r
k- 7.L u It > uO C 1/I 0 w 0 1. 0 - u -A u I I U W A
fl4 04 4 .4 i
rl w
-9 in1
I.;-- -- w* 4-- r
74
IIccw
0. -
ana
w 0
I- * . 4 444
co com I -
to a I m
a il w a4 t W "cI a I
4ý X
a Z Ia irS -2~ ~~ ~~ rIa aa' a ,
0 a' a' alcl a' 0 D a a' a9 1or~~V tv IV & Zvcr
a u t .9 tit n I aini a > a
$A 4 -C cl IA- 4 31 b a
ul aw %na Ow 1w ot 4w w a of in in w a, alw 1LL -1 'A nW
75
o 4
0I.-
0
00u Sct-T44
w w w
71 0>
-A Ii C
0 In n
4.1 0
S.- .aCL4 O
aii w W, IL IL- u LL Z M. Z r---- z- -- in -
76
-. 04-
z-
-J4i
II!U' (4 >
n S..4n
W Of fn 0f> IA -9 -9 _
ai u u 0 0
-2rxr K cr
0- 0 , Ii t 4 I, II 0 03.9 Jr - IT -0 I O ) ' i
d, 44 . . 0r .- A a a.r0, - 44 u u 0 441 Qf I l
4 ckQ at w ckc 0v l -4 4 it ul, 6 -4 4KaA - rI 4 1 r
E) 0 .4
ki Wi A PA PA.IA
%L VI InI 4 4 444I
77
IL
II
600
#A n
Ix zIn C
CL L
cc 0IA LLI IAwu U I
Iz ICl 0 0 0 t iU f h0 ain
CL .9 0 - 4gf Ia In .A1
w -j U
* ..
78
0l
0
0s C4 I
> C , C4
oo 1>
o vcIl I II
(L) I l ID if
00 1
oo 0 i n PC
(AI "1 1AI I In 04 4 I I
In I
InW In I
': N N aV4 C4 C3 (. (
I n xn I nI l
I Y 0 n CLM"z -It :: IXI
4 I *-~ I - * *m IzM"
4.9 I* * u0 cc
In In IE A
L3 I
LU
:i
U, u U U Q 4A In U UU UUUU
in dc ( 4j 4A 4n 4t WW4 4 4 L
0 U
a I W z w
*z (L
IA U IJ
h w 2 LL LU
In W 4A Ii In I
U) 9w ku #A u an 1A =1 U 4An~ I ~ I 46 0.
79
0
-.4>
#A In .
01 a
0 0l0A f00 o
* 0
I a CK j cc 994
IA IA IAA-C-
z0 -j 11 -J 0 c
oj W 0 -C &I : 0g 0i * 9 40 09 0 0 A
04 4O M *f *9 4A 2
AL" 0 *C *f
08
InI
00c
1I w
Fn P It
I. z
o w mi1 w o
'o A U 1I nU49 Ij w
K z I V
81
II[I
I
APPENDIX (II): The "STCR" Program
.9 1[ _ _ _
82
I- t I
In A
> ) t oI 0 0
Z It N N11ol at toco U I
17 00
In tnC 0
W ~ I u ..IJW
4 N, c. "I t
ot WI O - - U,
)~ .1 .11.J.J l um ~ ~ ~ I ;;I CLU tb U U I % Zb > 0-l
tI- of ti u W I* I -it - IN l'- l') IA- 4 0 0 0
V) cr .9 it u. IIIIL oC A1--
It.J. 3 J
**YIL0Im ;b3 2' .0 *N ell NK U x 0 0 a St W40-Ill L C3 0 el IMPU 0 R Ill -4uw0 0 0CIc A1-- 1 41 2 I
I".- n .oa ntU Ill a ~ *.1 -.1i if * - : N
ofb U.1 I t 0 u 2. if w is 49 -4 ~ ~ JS-L
0 w 54. N 1- 1.-'a = a w )* -. 1- m N * 5- 4
M M4 t'l W VW Z LU 7 2 a 4-nfl4 #A U >2.0 0 00 0M - 11,U&/ 000 x/S a. a- ,*~J0nhV~I
1- - --- 5-.- a a IL I- --. in-in- -U
83
CIO0o
JI-
>. Z WthIU1-- 0
cc :If u z 9x z j
o 0 0 **O-0c0An-9 WDu Cz( 0 t- _ .j W _j
0a a. zL -K>- u- ii h, O u~ cc 1-1- 44 0 4 - j a) o 0 3: 44 so P- LU - U'2 0 C a o u 0 5 U- U4 or~o~ aIf wc
oj M 7I C.)0 001 0. 0j 0L 10 IW - 39 wC) U -C 0W - w QI W Q t~- x f 1x 0 W01 -u IA
M U t L M. I. Ru UW Z> -04 -- t.-W- LL4 ~ ~ t z~- 3) 4-LJ 21 Il IZ0 0r 44 I0 I
-. 0: z R- W U.1 00 22 u ~ ~ 0 ~ 1 L j inJU 4, xaa 0 o UI
I. CL LL W 9 -9a.w zox -Z W~ 0 a CI or'I.- 'J I.- z x W4W) (3 LL w 0 I-I-Ix0W I'- Wcc 0 0 1-01-1--1 - CI0 ID
cc 0 2 LL WLL 00 -01. I I.-L ~ - z 1A 0 x a o . Air :) T.-u I- I.IaUJ fz 1,- -w 4- 0 4a- *11te-2 -A 1-0U)-I + U3 0 SII
z O- ii man~- >l-IN4wv 0 -AW3O OWMa.0 XII-w 00V-
LL L -0W0N 0a-a A3 3 3 x .M z 031119110 00uu
U.1 C)J4 M N w UaI" L) 0- 0 11 11t 0r 91m0ua vr w It 0 .crt -- 0s .
U) - U)V) V (Jtr ) -J L) L 11 Z t Q0 A l U Z UJ V1UJ.
- --.- -U -ý- a-a--- -- IL I.- x~ x x A * aI- I-, In I---II .- I M z 444 7 wJ I& w rn-U U
a -- - - - - - - - - - - -. 38 is 39 CL* * 9% .6 0 4
0 (D - -0
84
C-1
00
9900000 0
4 00
o in W) -0 o W u
~ 00000
o tow 0 At .
U 4~ 000000
OL PA 0 0 0 0 Iflcc
000 LA0 IA v.414I.4*4)
In00000 5 -) : Irc 0 00
000 miw I . 4 0 -C-
.t 4 LA a 4 -9 . 0. N l :1 M t**A a lo 4n UA !A .-
I- U. 4L
0 g. 0 4 0 40 ft .. .. S - 4 J 4 0
44 lo A I&IWx 0 IA * I3 14 ' 0 u 4 4 * 0 4 4
UIku 4 4 t 4 I())V t S 0 t 4 t1
85
w t
1. 'N IC) 0 00 4mVI
C4In C)i o o4
00 44c q0N
'N N L)m
40 40Z Lr z0Ozz zz
0 OR A 41 4 IAW-o 0r z "I . - - - L L- ->-, = oza I
IA atp41zzz z zz z- zztzzz41 'nN lp U .t.4 It .U L( otL.L .wU 4L t 0LL 4 U wI
W * -. : 0oc0000.>- v 0c i -r 4 U4S U )-------------------w w s u
0l WiW -4l -K4 f 1- 4. 4-d4- 4 9.
o 0 90 ex *IL - -C-C . l-; I-. ý ntI
"1 444 * t* 4 -9 4444 4. 4 .4 a4 * -4- 9*r ui 4 * g0 4 '
0 0 *. 01 * 4 * * ll
L.) s o' . LL . $. t * 0 4
4- ~ ~ -C 6 in 0 0 inI xit W iMA14d O e 010 0a- -a # -
86
I
0,I
I-'
6-'A
- t It 10 t,- .1 ' . 11 *It ft, II I [" . f 'l•• [)I nJ L. -It ,- I I -J .I IJ . J 1 , .11 . • u . ,.ts
'5A
-11 D )";3 :3 L3 D' :) Z) =1i0'): 1
al- 91 i Ill M' In Il it il it) M [ailIlW a l i Di IiilIl I n W r,4 " 'A 1 to Wi %ol' 'to in In I^ top ,1t in w, vi 4.- wp 'At " n40._ A4 4#
4 ; e -` )D
z ZZZZZ zfz zzzzzzzI-1 L I i L W0LII l 11 I .. 11.U 6W 11 A & L LU LL L&
uI 1
- - r F vn~ m i -- ltI--0 - --`t -"--I""F
of ' OC, Ellfc '44 < l C 4r 4 4 -9 4c' I 1 -C on -it~' 9 -C -C 010- g ~ 10410 6- a. Cc .6 4 o .1 Ix 4o ca In w3 CD 41 CK WIF 1-M DW Com ww LP iIit- f1A , MAaa~aA u A LtFal lfsl0Ia~iifil#V~il
le9v i 9 1 9c c c a cc a Cc 41q~~~1 Ul i0~ta tiiiA liti,'iI AA lJ~~l~~l~IUU~'
87
'A
aUahIr
4A.
ce*
aj
wC
a0I.- zl L W W
X nw * W u - )q
cw UW zW 0 . c- j - j-a Au w* m i UZ4o a. I . w o w
0 N*. : Z. w 9- I- w- a-w 4Q I. - a -- " . , 7W O W o kt
0W 02
U)14A.J -9 -9 co uA -K 0UIIA4Ida
j Wt 4 ( 4mu0W 41 0
88
10
I...
XJ U.a. u -
th W. 0utzz U.
I- 00 U.0
I: I 0 w. w* WW-a0
c U. re m ao9 aL XJ- X coz1
U 31 -U~ v U;i 9
W 0.h w a 3 w $ 0 -In .9uC U Ac I-o . - 0.
cc4 t
I VIA
A IA
3- %0 K -4 0 .u NIU 4 u C axGC*
4L w- I- cc >.44 I* .
4L 4 A t- zI u~' 4, Il ' .. * u I. L u
w -a-a Z Z-a 0W a a > -C a z I LU CC IA a* a
w - 4A .. J ar M- (A I I *c j 0W 0 -a a' in z -K~ Il 4 a at zI 0 Ca a -I W0 .4Z l_ l- * U WUA~ * M . L0 a 0
WM I ILA - - '04a a -0 ~ q -.. ~ . . aU~f I -. 4IW 4/0a * I 4I O t * IA *Z1
1-~~ w 1* 4 ~ 0.4 1I . . W -4a. a - U ~ - a * U
51 O~i a
89
00a
c? w V) )
CU -
> u
.4 W > >
a! 0 0-- #'
(A CID co atIA 64A 0A Ift wa mi
-i kx C& 4 u NS S t
0 x
m In Nl
CL )- -I- .Ia 6- U, : C- IaLU -U4 c UZ
qw 6 n0. 6*c)' 0CN. Ut %ft r; 0 UJ VW I-- W
- , 2lr C, it~ (4 UZL4 ci.
0 a'o CUOhahlJ ~UC3tfl(~O0I-C ~ "o .rz4mC4n~'Jcr :X Ir L - IUL
90
43
0.
0 U.UrU
w U .
%L to
-.. I IL
Ixu
44~~ WO 1.- .- U -
cc of uWE' U Or 0.
cc1 a IT cle Ell uI a I- 4 II. IV C) in
#A -
uI II I I I II I III II III U
I .(LI . -. I II in- it6- In/ill) L
1N )I9II :j W W th C) .) N. WIn IA
C4f14.1 4 U(V ' 0. cf U4 x C 4
it) V) U. dIAII
(N C4 0) 7
C4 a) to*
* IA "IA In IA N. -U AU
C4 a C4 NN 4 N LU * 44 U
W 4 4 ) '1 (1 a I/ 4 .. - 4 CC IX -CC-U-4 U -U
- -4 .. M U in -0. m t:
>A * 4 ix CIE4I.
LU LUJ w lA z IkL 0 It Ll LL 4 - * U w LU t4 m- ZL LLL U.Z z.
I or UO ZO) cf Z UO O. z Ur 0 LLUD !-cC! 4r -4 Or U 4 w cf U Ur IIT u V c n
U U ) 4 -.U C U 49 U * 4 .'. U. I,_WZ u :(O u)
z 4 * (N a ge L_ 14 I, a Nt -w1-.
LU j 4 * 7 31 1. 4j a 3W w gx vTN U W 7 ) U
II.~CI ea0e n. * 4 4Kw Lu7 IL77 "I z2~~Z L
(A1
0
wwC,.0..
wa
0 0
-t U
4 . Isl a
to M 4u
D4 3Nn inW l
0~ 00az r dn Ix0 ~o t in 0 112
m ;3 mý -C; 4if)- II x, - I-Uco MD 04 u t t mu 1-Cc - M-. -0 - U x ; M oV!MO -ti mC4O x4 A
,-- 1 M. CY ) - *t .I u dCx X-
ft3 v! Q Zf
4j wU w' W c'4I 41 ** 4j 4." 44 4 4A 4DO-* o ( 0 '-ID zN- UN "'UI- 0 - 0 - IN 4r 44 Or znJ * nc' cJr C r z - ('w w N-1.>C - treIcN2 ft Z 2' " Et mI Wa o w ;' tt 4
Z - 5 > c c o&OU -Ca
- 9C 1 0uew )i )un a i 1 w e 4 ) IN. * 4l zr # rft o. -. 4 x Wc- .4 o. il 4 A 9 all-W W W lii 04 4 -3W W 4 1
in U. OA D0 .. 7 IA U. in LtX u IA
a/L IS III C4
Ll~
92
0
us
Qocww
aUUI-
aU
00
w w
I-
at ~w- IA 1
U 0 r QUOUs -K Z > 0
o,ý w I. ) ns ot
- u. 1J -
Z- XA W-; 8 aa
0 1 0 I --
a.: II. ui u w w
It In to 1- (A to co to
wO- Ill -0 0 0 -us -111
V; -s-a to (4 toIA
L! 8. ! a
x X 1-- .'LU-U •"W - 'A L" of* m~ mn - rt
t~~i in . ,0 - • >••t n•. , .i '
w.... V! .. '
4c.IIt Is W R III W Us x w4- I- i ir t n r4 m r)i
m r; 1;r -• m 141 Itr•{J m illr~)U -'JlU ~ "l.oWW.- 0 - - U -t - l us t Yb1
0 41 ii Un O-- in ;AI-3, Z) O a 1
> II Is~ IA*Iloe 41 wr of onI~r f
It .U 54 1- w4 i IbV . w UsnI, # Us IL Ix W >4 Z 4 U. l Z .6 US fl . 0
UL in Is :3 Or .Ow - or 00 CI 4 00 0 0' or IT.-wa. z lvusI. oor- -4 Us 1- - U- U 4 W0 ý -C 0 U .4O'U-U U -9 In 1- UI -.1 U U -
0811 UI a 11 a C t 0 a 4 W:, Z Or0 ; f 04.Ir 0U5a .4 m0,Z W U &I wu 8L-d.9 C
i-nLIL WOi
a T w .p I-. Lt> L w gZ-( z U.1 DO w zw~~~..aa.
in in 1A 6Ln
1A 40f5 50
93
0
i'
z
U'J IL
z 2 4o mui us
Ix IL
> 0l -W 4IICoU.Wm cc 0- ce 0 19 m
c) - -Z 0 t0 U -C..- a0 450-IaA zz- wU z LW.E 4 Z q-t LL X c c04
55 I to 0 I-I-I-. UiW it U' I- U'
-C 0 V CL V) U. 4 0 c
0- in0
IL c d U I 0 vc U. -Kfn (M** 0** k0* *0
94
al
Iu
CA
0'd-
0 Ix
xi
0 ~3.
44 0 l-3
vU n
m w Nm 0 4 toWrt 0 -. 31
2IInA Ix w a- 0 .UCLu. 0 x 09 2cc )Itx
- 4A C4 411 JA 3 4i x C-
qA IT 0n q 09 W o6I- In IA r): );D I AO nui
X it at c4 W cl4 Inm% l- c n 0P(A in (A V n ( & (A- II nU . C ji n 1 I 1 0 Oc
-J lk Il k&) .x3Lj L W uri Li 7f. &
IV v- . . W 'U- 4f t* 3 . " I .1 T"ty P . j 4.
I- -4 011 04 1 4 .4 - in *P.4. 1 ilioi lJIaI %I kit -(A 9fllf ;IfO *lj. X 6M 410 f-4 it In 04U A t j 9- A aI
0- It -- il &.->-0 t1 0- O.?4 1 1 6 I 4"-Int.g AJ 0- 0- 4 So A L14 q
o~t in **
ZIA V. w 4 ~ a5.j~~W o-I IW wo
WW 4. (5)ft-ftJ.'46"
0
I-wU).
LU .U
= u l ,OL -,. w I
I. 0 L I 0
w 00 ot-. w I
o -A W W4us
of Lw-~ 2 I.-
A-U 0lUJ a3 MU LA*Lh - U) In In
1-W)W~ 0 3 ~lI
A-x IAI- mU -
th. I, I
-cl IC
-
* ~~ ~ i In * n~n It n . n f n n *
n. * I 4 - 4 LI -In In 5- - - 0 .#A
C IAco 0.0 I 4 . * 0 * 0 0 IA . £ 0 I
ICC 4 l' U .- £.. cc 4 0
1--- ! 4-C4A m 4 0 1. 0 IAInc
p.r a . 20 i s . . U 4 "- 0A
wr i p
:: Vf 21. -4 IA IV .4.4 4
4 1.-
4)W Zi I . In. A 0 U.0 lf l'I I vA ) I. C
U.~ ~~~~~ I" b 111.')* 0 *4 IfJ~) b A 0 * A*1U
LA 0 In 4 aLwin cl 4 1i4A1
.0 4 o z ZW W 4 2cci g 4 0 4 .. V - 4 C 4l U4 a .l L
96
In
ar
Inan
0
44i I I> to x A
In I
Ito I^a
o st _j 4 I go f
ft ft 0 w N wt f
t~j In 'Im In1CV - 4 A 4 A -
a x w a x
to I 0 U Iii UJ a I" rv k u I- C 1 c 0 o i r 0 , v 0
00
w 44 L L 0 44
4Z U, 1 3 4A L fIOa- - >- LL 6. 4z I c c
i n4 I-II I. IA1
((-. > e I f44: v4 U MA& R'
CI- An &L. LA / 4A =1 0
IL 044 04 4 I f t IF
97
0
thi
C, II
> 4z cc
in l I >
-K O
o co 0 0 c I 0
IA IL
I IC
I~~ 0 0
0-n~ : 0 04A 1 1 1 ~I-
43 0 0 00 0 0
aq 0
or z A Z I IA 1.0 0
98
IWO
UL
CL-0
'- 0ftc0 i0- m o
- LJ0a Cel4o c IMa t ,
o Co "x J-Ito -KC 0 0jt41I '
> I -41
.Utw u U- I-W C u L
U'j u CAazu L. lJz#a M ui z 0 -C Wý- OL"-(3z 4A0 MtW U C--OW 1Z I
CWcr. .- W W w :; m > a c ua
01 uw 4 0 u u _j44 .- .. -j> a
IC 0.ujr U LP A - M IA -K - I U Z IA 0. J L 4 C f1-#AI- WUJ$AA 9 - I 2 _j1. 9 4 J. j 4 _ 1_ Q
V) L - c I
Ut,w LLIm
#A > mI, IL 0 INi 2II . - 0 l C
99
0
x
at>z
- Ic
oo aoo4t
Cc u
4 4c n 04
in it L t c9=
0 i S to 0 -.w -u @. ,U 1
- 9 w us 0. I .- w .9 w4- u n zzu I U Z. ", 9 x U U
0 0 1- 4rr rl-w.O hAtn1 0 .. VOW-- 41704m -0- . 4 I" A lI W N- 4 -" i 2.4 "I a
1- A 1- 0 0 111> .
wl 0. - 0 wC -jO 0. l
2~ IM 2 Ij1 w 4A! l
#Aa 2'ob *W
7 100
fA
k"
It
4 C
fto
oIn 0
4 1( 3. 1 * 1 11
ftw~~ ~ ~ u- 6 CVaV.M
:. (Ia n+ 9-" :- ! " I #
IAI *i w~ w U.
In n L - z n9'u * z '
#A f tA I..- In u eg u ) IS 40- Ij 99 'A u ~ W P
$A I-. *j L. .9 4 z A - j IA -9 ) a w CI 4 Ii 10I
in I- .).LIJIV A~. Ul 1 -b * .101 * j' f C ~~f. ~ O ~ t .. in 614 *ft E a
- $- w 4 f
IL 41 41 0 4 U
101
.41
At
00c
w IA I
ou to to
It 1 3
co 03 N Pj 1 10-2 x Ai Nl PA IAp
Uj N wh 4 Q 1 ? aI
at A3 - t I I
w 24 l U U; a " e 0 c M4 ,
te v% 0 A1 1 3 C C Aq ýI
40 1
I~~~~ xP U ~Pa3 @3 0 a 03 0 a S 3 a a 0 t u 1 A wP
24 -1 24 a ;4 1- LU1 al I 4 *aJ C uI aU w lux E t I A U P I %- 00 a 5 A Q'
o. A I4 04 IL W ft 0ý 4 w 4 2 4 P
I 0 k- 0- MA In U34 C ma I 0 ('4 t' 4 0 - - 0-
n in V% n IA n 0 0 a'Inq9 w
66 2 4D f P.
102
wAIn aof4.1t
LAUlU
ca
'- 0
I' -
N4 a0A: ýU
ift
to : a * ca
0 0 KlU U 0 Kln l o njU
LI Q 4 a IuI Iu Iii a l 4 ,U lI 1 1 911 I .U lf-3 9 q au U. 0-UK - P- -
am~~~~~r UtK a 0 ~ .K CA *0 . f
I-l ma 45l V il to Ut 4it ;.. 0) ft UN a22 0
K ~103*
0
o
I dc
- 0 U
-I - Io*t
0 0xx Z1 Lc 0 I
In I. m cL I W IW aKW1 -
Z > IX ft I1 4cý x 0w131 u q US-uuA I 1ý cwz -W mI ww. oz -dg U1O q W I
I.- j 1'3 0U IIn ~ I inn -
-I * I. Z w us- I-K I W4A - 0- 4h 0 z WU 0.--
vv in Z)a. 4A u A ".2 0 1 4- ~ 4A
104
i
APPFNNDTX (III): The "CC/PTB/C" Program
;-I
I ,,
105
0
I.-
;r I i
in I
InI n c2
0 0miu C)(IO
0 -j (l 0)
tIn Li)il i
it iniiO-O>. 1 LJ L.cc u uj u 0
CO ~ i 11 i
qr ~ ~ uj iU 11 - Y 40 11,1
11 u U 'A I .. toIn- .4
InW .90inI Do; inr.) l'- l'- * * l Wif >
lz Wt 44 4 An L. ,0
C) jI C CD 1 0 E L l ID~ 6 ) Ix 4U - In cc ui 00 00 o 0 4) el
W II eLx DIn.-'-LZ 0 0.-I Cc MA -. i0 4i -j CL .J 49 49 .11 4
u W LL 0OUj- 00 44 -j O g ( a I- 1.. -4 0 M40 ~ ~ ~ ~ ~ ~ ~ ~ ~C U) ago 00 000 046
-4W~*.t I- w)4z 4q to !If CA IA 4 U I.1-1. -P
0j 0 aUU .4 CL I.-- -0 - I * ) W W 4L 11 _j a4 i A
43 4 In00~0O 4 4 0 n
wWW~~ : :
106
(SS
.49
w 0.
an - 4'COI
0 cm a I.
la t o a -9 wa: .4 90 -. uI.
z0 a > 2L4 1- L -0 Wa.- c Li - V Cc W DOx z
U C U 1 0 us~ a ~ -M 31
CL Op 4 0L X X2- ý tIJ Z- 2 00 -W 4 O Wa tI.-
U OM 0 0 > M. -J i WI-`-Ius - . 0.1- wmý
0.I 0i 1- W W -J 0 tA 0. 1-
2 . C ENOUW >C 4WUI
o i -. 111 .7 1-1..-OW 1!-U!o C01 fto 0s --- rUl Wat aWUI..-!
0.~~~~ 040000t)ZW4 2
0 m m 0.000
'I 0b.- ~ UOOI Intt1 Irr.-I
0 I( 000
cA c I.
0 1- ".4.
-AJ-6- w-an 'C oo0 uc' .. lc4K --aaa w I
at Ut . * -III - -- -I- lt~* I I .- IA .- :U ID a co*
'C4* L'4 * >~1- 0 1- 00 LLtL U
> 0. IA 4
21 01-C * * ac *Z
*m a XI W 4 0A~~ 4A z n#1-tt~t t-t-a * .t~J4. 4 a 4.4 * 4 6 W IC *1-
1- - - - - -1 1 1 1 1 1--- 6 4 4 0 4 4 U t 4 A4 Jl4 B Z *
107
13I
a C
U) 03 a
00
cc ~00000 0I
z .C4 m inl
cc00n00 0
in 4-, 0 th i
U~~flV) in
ih 3 W0000011 00
m 3c I 0 .0 Ct0# C w6
,- 00 wo 0.0 4n nu 000~ 0n U n L
in3 m in en 00
*. ~ ~ a 0 0 * -400 0-4)Wf 0*0 No a0~ . -*00
Is a 0 .. 0 0- *-0 1- 0
9 0 L2 4w k 0 0 0
a, i 0 0 0 im
u) M ý E z C; S N~ ) S N w * *03
0 @A IA $A f 0 W L0
4 - 3 0 004
on 4- 0 #A 0) 0
4t 380
:
00i 000 000 0
U' a 0k0 4 WO 00 ) 0 )
108
00
a.
00
0 .00
2 0
ofo
-0
oo 00in A-
mL ac 3n c wvwr inuanr
C4e4II ' '5-"5- C rIA-
ortAtA A hI0 4#i)toi'6 1I f A4'4 l 0 8 J tftn~ S -Im ut#A 4. i I %
v :17 Z,1; :1 Zo Z, C, ' 0 0 to Ion A . 0
IV o f 0 4 #A %a O U)h4A 40 0i nt0 41I.A&AAOw
of @A 41 D w 4 41 f i 1, Id I wwwmwmwwLwwm w &LILilmLmL L o. i L U o.
ý1 Ni 41. 0 ~ 0 : wwwww w WWWWWWWW~WWW
9 1 a 4_ i j"_j_ i j_ j j-1 j_ -0- j j- 1- j jjW
109
Fcc
0
z
z
-01
-J-4f A2 I
L&I L LU0 zz
9-.hT N0C oot""0( A% %4A0 V )00a4
:)nnZ")f-
IA. A& U. L6 00 6L.U L LI
J1.W u UlLLIW. 0. - .1 A . U. A . -. -.. -J -4 U. U. U
w ~ IUU K.4flV 4E4 *44
a 4lac c 4 1 4 4
41 '1 u 4i 4
o >- D .4
; 4 t*44 m 4y 4l C4 p
lip 9 4' w2 4c 4 4
-J~ol 4 WW 4WW.WWWWW . ~ U
C~~~~~~~0U~~ ox ,da sm ba .4 U
a o * I* 4 4
110
w wIL UmuQ+u- ,co oSSZ0 4zI #
0.ww
-4 j Hj
I l W z * U UzI 4 9- 4oI-l fw1 31 J- a 3- 44 - ý 3 1 m .a0aA g
4A0 e -4 4 11a0 týWI JU
040I-
09 10>W t UW
ILI
Ic C41 04v
-4 M
a U
0 R 4 V4
us a "13. c
-j w hi
at 4* MAW§ 1CC4 e ' 1
cc wo
Qn #A* It. ft,
46 4
I-i
0
oo
4c 3
LA UA
S.-j: Z-M a
z 0 tA W0CV
0AQC In C, -- -a m 1 U
al N W; mm InI Ir i C00
w b. w ItofatI0 - t- w w. 0i L i 9 1
o N .. x &w. 'I u Pot cr 3# a0N 40I W 4. 1 O--19 0
in Z C i.,i #A S~ 0 u c u5-~ -. u Q -S~~ * a aS N m - uyg ( 091-1 a -9 44 U- -j
* ~ ~ O s IA 1- -4 a. I.. - - I . .- I
a a 4-W 3 a cc Ns. 2 .3 u zz- z &I
113
44aat
Ua
Cc4
Z I UA
cs Cci
tos
Vo -" 4 a cPUa ~ A --
kn -a c a u, . o
fA it
4 A. W aU 0 0La to 0
114
I-
IA f 9#A a1
a ' fto0 I
CL-tIE v-C sIa!-JJC 0"4
4u
awWA- iw .d4z
-4 L, " n- i jt
VAooa mo 4 4444
IL h
-i
ot I-%
.140
I-I
tcQ % a I- k
'a w 4A Z
0 x v a 0 x 1IA 0
3 m C; C; Ala
tn 6U n0 C, A Q 0 . : a
.K .1 OE MI K
u1 .4 IA It( I
O4 02-51 IAt* -4 I
u 4A (I c r -4 a .94C r i n I $-Ia~~~t a 4-. w0 01 - - 2 * * -
'WA V414 0 IS 2 w
hi U400
116
w A
IxI
a afa ;aI : W
aa-
'A IL0 0- OWOCIa
IA 'cma la* P
go, 0C CP C)
41 IL t- o- u UIA .1Z
'A 0 initd -W -ILP
WPC I u "
117
Iz
ICI 4NMa.
0
'-4 0
oc I
o I
-J It I
2CIA
OW I C
44 I
2~# 0 0
-z 0 I 4
0L 0 0
z 0 0
a. 0 0 0
~ it
I C'4 0 h -
I 1 0 a ,0
z w o cc'i cw o4-ca I I 0 - #A 4 iUA~ cc w 4
118
ICIO
q>
•4
i-
m I-," I-
wl- , . - - € 0x 0 0l toi 0
So C
ft - x
- -. IA I- IA • .
4A #A x 0U 39I
-
- 4 . C -A• u A >
ol m
-, . .. . •----
1- * 3 -4 -
4 1 X z-.--
z -C..I I .. I A L 2
* ~ I ZJ W 4 A0
* 0 4 #A 9L
JA - 4 0 9 UK- -L . a w' at flu 0~ a A- - * ~ * . -
U IA A 0 -
uU C4 ol 4 l
4f I- 41
119
IL
th4a
cc
- IA
442
- -z
to 0 x34 0 U) u
-tLi 0 0L w
# -
0z LU L
0 0
a. of IX & -Z-uy-9 de 4 I A -9 a. IA lLCeU - c a. ( ctU K a 0 1- 1. 0 0 Ia 2 ao oI- 1 0 2l 4
-a a * a - > zu * I.- - - I - * - . a0 : " ~ * a # - 0' 1.0 a -" -6- z km * w - a .4b *- cc 4 -aA W 0W a 4 I0 *
120
10U434c
I-of
#A
0 IA
Cz >
0 aD a a
C4 IA I Ia3 ao a I
z In I a c 000 a - ! xa
IA a 9
o - 6 IA 0 6
Ir I IA cr a a
z I a a0000W0 a a 0 0
.9 4 a a w - a -C aC a 1 wI L
a zin a 2 4A = A n A a
aLall"-I
'ILAcc0t
a.
uI>I-e
03c
01
03g
I-K
,w 4
- , 9
U) 4 0 0 0 g 0-0j* aai 0 0 W
v w -C a wU
CA z c wL A 44 dl CAA
u .19
*A 0 It K * kIii U)Sa
122
In1iti
w
S.- I I u
4 I I
0- 1. Io.0 tl a,' S
Uw- In U 0 I *
4 I
4 I0
I-) 'P
APPENDIX (IV): The "CC/PTB/P" Program
- -l w- -- -
12~4
I 40.
at cc x x j-.1 Iw j-> A c n. . C
2 -
41 cc -4j n
l- 0. .. 0 ; f4 0 '
wr W w-i-Jam
0A- I in I.
A 11 I t , -m-I .ini f .%
*v .01 to t t) a16 ~~i ., in;-0
Ix oil' N 'N I0 I0,10m
:- W : IAI ut 1 l06 w.- - ? oU b.- 11n 4,t ..1wU *w
4 ~ * - 4X U n K jt, * * * * t0* ro a 4 *4in* * 4*0401. 1A. V% OSC .. W IIUJ a U WE .4 It I
u.tfN--4 CX at n IO In 4 4b D* 0 0 a.114CX 1- 1. .L 4, IL I. 4 42 or0 4
OU) com 00 U L) 99 in 0 ll 0 VI 411-in~41 0 0I1 .~ I N aA 0LS N11 0 S 0 * in 0n 0 41 41 N 1 1 1
cl0
MA
44
2..
0 -1 I
z-do U.1 2 I. - i
MJI 0 z5D 0 WI I- U. D0
4C U 4 - ),- W I2 ) ..j m 0 .z
- UI Z M toal -C WO 0l'- OL ha. - I.- -U
ta cl) h. Mi 0 0 1 1 at IT0r 0 X %L.M LL - A* I, nl4
1. m ;: w w 3 In
U c. c -. M .-s.'~X hslh haS II l-t WU a.J 0 U i,- 5- OWO' g -t 0.5-aU be
U -a. IS > s-A -J.0- - .
U. c 1 r U Wt Ofl-JO .9-.-7
'L0. > SUI . w . w3 -103I3u ot
0Z2w 0 N000
O 000o~ o- --
Nl 0 0 0200000 0 th 000
40.S0 W-U 000
7. .. 9S. * ; ~ w M ~ * 4 4 4 4 4 4 4 S-0 4 4 . 4ox -4-d W-0 )(W x x 4i IL a 0 CA 11 -2 a, *4 "1 4
.1w -wW iWw L s us 6- In. CL ~CIL K x zz
o oi A#At o% #45 4 A44#A t t o 44A a- It 4 O4 efll w #: 4 4 tox Cb KP cx) K( It .- SP-M h 4 4 06 IL CL. 0 21 Its
* L %AU * 4 4I/S P ~ ~~ USC 4 4 44
W -9a. 4 i6 4x -W4. 4 : 4 .iwha w 4
4 of .4 14 455 4J 0 Vp - 1
IL zz z zz z z 5-.--I 000 M4 J 0160.. 4 * 02
U - - - - -- - - - -
ca 41- 3-.1-S .1441.
126
£0
C)V
w #U4 00 0
IAI.1 AN, IN
0 00 0
#A VLA inA 0 6 1
ccW h hIAt %An 0 A2 N'sN,
---- IC
-4-4-(4-r # I-. C) ff[
'n vI W; 0 11111 40 "A w Ir V;
IA U U .-L V 00 v0Ic 1 A0'AtU A C AO
(4 .z C4 6%
4 4 - a C- 44 44 0 0m e -e*0e o * Cc U 4 4 0 0 0 l
I~(I-. a 0- n. o. 49 4 1K ,- -9..-., 0 f A* 0 .0 ~.4- W 0 0'! J! * IiIUYS In 4 (4 0 * IU* * Uc cc4
IL N ~ m t C4 a NI~I~ N l I.. I.- I.- 4- 6A. * uski . IL #AU %AT $A 6 W In a Uo *l Vte IC /~n *4 I/4 19 w '-9 A 4 4 *
IA tI (4 (4 (4 * 4 .. 4 ~ 4 4 ~ * *x a N 1093 l . ~ * * (Cal 0~ C C . I .
I. qe$- t
- ~ -- , --. In
44C
ft 00Go-
040C
S00
0 co
0 00
C4C lx fm m moboý4 f t4
-06 . .1t n 4 01,0A00111110A04V)t k" .U L ,L , wýwLwwwT0 .w&
It D
0 - 4- 00 0 -444 9---
t3 : z zzz1Uz z LJzz zztAUz Vbz bU:)-. I.I U.4 9 * 99 9 W9 * 9 OA 9.w9 .w wU L.L L LWU
00 419 1
4 j i . 4 .. . 9 .. 1 4 j 994 A j j-jý 4-1-
* :e. 4 * 4. 14o 1.0 9mrmm roftwf'm wrirom wowoo 0. 0 414 .M 0,100 0 . 41401-t .OC0* a * dc * -. 9 " -J .J-... -- .J. . J..~ ...... " 0
141 -o . 4 9 k4 4 !Ir cc : tr el -ha9 ycra rI ~ w r i~-. 4 - I. 9 . . 41 a A :) .;-'0 ý 1
> ~~~ ~~ 1A 9 04 41 9#Au 9 4 W WW W W W W W W WWh W W W
lAW~ 9b 4 f 0 I t 'X 00 m0atn W0w 0 0 b wu 4 0 I* 44 41 go cc 9 a 4k a a a Alamonama4 *
fttl ~ ~ 1 14 w at cc at Is cc a *rr ftr ffflz rtI m ~ WiKf Vl
128
IA
N 1 4(4N NN- ~C4C4C
0 cvv;;-0---r-Av
4.I4- 4 k 0, 0 % f A ýf o ' L U
o n # A 4 t I * %
MDnn
w tmmU sC w r nt VC0 Il001_ -l- 1jTwt # nt Aitf ot
.4 49 .44444.44a.U 2 $A 0*0
N:~ .i ., 0 * J6 L 0 4 49 #
'na .4 0 a,0 , I-4, 'd.1 * 0C .4 J. ~J .J1 .4.4 .. 1 4 44 " 0 A - A-
Q6 * ý.>>,*w mu .0 : : ae;b .II 1- A mm1 > ap 1 , A- I-4 woowm I*j 4 4444I D0 t * ol 0 0 m444a4on I'444K44 0A An -K -C ZaU w
V V44 V Ili * 44 4 4 4444 N0 .4 NN4~S NA 2 lu 0, 11 bU* a.
9 ot ag * C s m
Mori,5
x 4
34 It l
I. s 03
Ci I. fI Cw C 0 It 1 cW i 9q L - ý1ý .ý 0M 6
Z 2A S c1 l -. Actý p>ý . h#
0 LLI c 0Cea
u a - *: I a I a ;4
- .41. 4 1 41
U' an 4 5
130
ImIw0
IIb a a
w zL
anw W r a
m CW4-I
U) 0 0 fl A2 l
44 -C ý4 a 41c0 L 2 11 1 11 - t%j ' I a is #A If ff 0. In . r i isI . if ft is * .1
l1 It a F *@ar0 . - Z * wf 0 * 9 "S(4 a IL 0 v@(~ -
IL 1 IL 9 442U' (" * *
9 * a ** *a-au OA a-1 1&a.; a-M 0 u a IN. a aa
It 41 SU01 V ~' S S WV ODA * a ~ tU 4 *i. a a * 4 tI ~~ . 2U
a I aa~t.4J a a a wO~J a a w0 -77
0
w r
00c
X fA
39 inwtft ZZ
o _j-- - o 1
, - ? 0 ftcc 4 44 c0xwC
- t * 4. 'i-t .) -" O 4
Im C4U in * - ajow - -at0 I
w ~ ilxz - ). itl- I.- * m * Ww m .0O- 2 Z0 LL w LZuZ uzt ZZ" L
2 *- 2m *Z 4
wV) qt A a0-~- MA 4 4 A 4 4 M W M ~ 4
4'. ZL* In 4 Zpo I. zu z- Afttu,* ~ a aw 0 Z4~-4f~
IL0 I. a- a - l4 4) cc 0z l4 ul '
* LLM
132
03
U.1
I.-VIIn
in4f2 n> *:
>0o>c
H4JII u
mLAcc r ) z 0a4mV 0t
LA 0. - a)Wi
IIam a n in a
. rr 1r
Iu I I II s; W 0 W ~ mIn (I a I Ill a
IC Pf C It fit C1a ( -- r * 4 .1 .rI jMc lPCi jo IX -n IT0 .
-IT 2_ i- 2 o-I ccLAW6 Hj w 9 l- I4 wA j -c w LA
IL ma 0 n Z .i IL ma iLL n * maL.c of an ta t5 an aa. ,- Zn I 0 1 l a 0 0s. -3 an0
CLofN4 rr.-.-r a wI ar N -. ~~ -a- LW Ntr V It 6 4 4 1ot v..- V Inil Pj v I C3 in S >n-~ Il Ji PAI. t Cj IZ S I-i uK 0 KLh(I ~ ~ ~ ~ ~ ~ .r.igif 11. l a a *M.4t11 A a h-A * 1.- I) m I- SL a4--4t
a a S SJ
C)0 -MIA 2U"LALDOCWat4WAS-L-WOccLA
41 0 r0
F- T--- -_ _
133
0
w
-0
lz
I0a
z0w
w IGoc
oo in
4 PC
1.4 W 0.-Jm a 0
W#A
0 #A In 40 " Z
44 -
M- 0 39
+ co
.1 + tol 0 4in
14 T z I0
2 * w ~ z IU * -44
W4 I1' w 0 a. 0 Lh Z A00 0 4 u 0 1-n 2
C4 w wt 4 4 4 - 4 ow-
I/u 4 4 m As
z~ Wo0
4 0
134
0.
a>
wwco th-
m i
to nh
co i
- i-:
4 A
#A4
Sa 0 LinW in IF in 410
VI 0 a CC)
0; IV ti, it
k~ t 0 ý - -: n 41 3 ru 7
VIVI~C 0 L2 31V 0 I-II
4 on~CVA 0A 4I44 a~ VI0 444 0V 0 4-I% in* * .3 3 4 4.
LoI..-
In 4
> co
4 x
- IID
o~~4 0 x t f
oD 0o1
#A IA - CC W
x 34 *s I
kn -Z -9 1 AI IIA z 3c I- x I L
10 Cc ,
IAn~ ~ I i I l M
4 ~ ~~~ -5, 0A 0 r~U In Iau- A I
0A u uU -j IA -A IA U U a
-- ~~~~~~ -; 4 -- & I i I 0 - (4 . .
u, .4 CL 4 S.
302 0 1
116
4A
o
4
z
tht~
al .
to
in u 0 1 oI. x -
,o ad • : - a: , .*
S . t •1 t* l
a v U - 0
r a a
I Z I -1 |
'It
,~~6 t -•- ,, .
it ~ ~ ~ ~ f t nI-c l 0 P r o
on~~~ a VUi A o
a 0 U.11i
V i l a I
41 in ~-JL4LJ0 I
LII,
LLI UI41I
toi
Sx(a . wI.
4Aa#A0#
K- 0auC
o, 44NI . #
2
ao 0 L0 % L M0 U4R
UU
a2I 0 00 1-
o b. 'C IA4 C -a 00 - Ga ar a ait.
I!,au mO a a, a u -aL a6- Lu f c 9 - a a a r o c A . 0w a
.4 ia an ao3- c s 0 "a aA 0 aW a - a
660 0 0 0 0 5
I A
0
C10
18
a
iit2( P A0 - 0o0
0
x •
i 4 k 4l w
I'D I'
II
!0 4A 0 (A 0C4 A u
Iw
x 0 i U X
wW to w i
0 [I .0 4 w
>Uw I - -r Il
it IV I - 'T 0 0 4 .. i ( V l r It ) A t It 46 . 12 j (.6 wL -a u- en III - . 4 .
%A 0 &A IA- 0 4 S'A Cl W C3, 3 0 'A IA Il I f 4 C
usI I A w - IA '-
'* u W IA in In in mi*~~& to to 0 -- *
IL 0
139
0.
0.
U, C
0
x j In 0
oo 0 a I to
x in th It
2I z I -j CI I
I I I aý LaCI- o I w x afa a 4
I/A I I
co cc I tC4 c C a x (Ia 0 m - a m Ic
"1 0 X l IA xA #A 4A 0 Wa Cc
1 IA ft 0 I
> IA>
LU - IZ 12 0 0 I Zw~~~ 1 01000u
aj wA 4 .4 1 r w W
u I'
02Ui Ui n 6
Lo N
140
a.
w
I-
I- ,
I,-
w2
U,
U.!
In
.4
U
ww
-J€
•'" ' ", • : . _ - ; • -I . - _, •.. .. . . . . • . 2 . . . 2Z i . . . -. " i . . .. . .. , , . .. , •.• • .•. :.. .. ..•: • .• I. .A,,..-" :