Upload
priyanka-meena
View
223
Download
2
Embed Size (px)
Citation preview
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
1/105
COMPUTER ORGANIZATION AND DThe Hardware/Software Interface
5thEdition
Chapter
Large and Fast:Exploiting Memory
Hierarchy
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
2/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2
Principle of Locality
Programs access a small proportion of theiraddress space at any time Temporal locality
Items accessed recently are likely to beaccessed again soon e.g., instructions in a loop, induction ariables
Spatial locality Items near those accessed recently are likely to
be accessed soon E.g., se!uential instruction access, array data
"#.$Int ro
duction
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
3/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3
a!ing "d#antage of Locality
%emory hierarchy Store eerything on disk &opy recently accessed 'and nearby(
items from disk to smaller )*+% memory %ain memory
&opy more recently accessed 'and
nearby( items from )*+% to smallerS*+% memory &ache memory attached to &P
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
4/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $
Memory Hierarchy Le#els
-lock 'aka line( unit of copying %ay be multiple words
If accessed data is present inupper leel Hit access satisfied by upper leel
Hit ratio hits/accesses
If accessed data is absent %iss block copied from lower leel
Time taken miss penalty
%iss ratio misses/accesses $ 0 hit ratio
Then accessed data supplied fromupper leel
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
5/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5
Memory echnology
Static *+% 'S*+%( 1.#ns 0 2.#ns, 32111 0 3#111 per 4-
)ynamic *+% ')*+%(
#1ns 0 51ns, 321 0 35# per 4- %agnetic disk
#ms 0 21ms, 31.21 0 32 per 4-
Ideal memory +ccess time of S*+% &apacity and cost/4- of disk
"#.2%e
mory
Technolo
gies
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
6/105
%&"M echnology
)ata stored as a charge in a capacitor Single transistor used to access the charge %ust periodically be refreshed
*ead contents and write back Performed on a )*+% 6row7
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
7/105Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (
"d#anced %&"M )rgani*ation
-its in a )*+% are organi8ed as arectangular array )*+% accesses an entire row -urst mode supply successie words from a
row with reduced latency
)ouble data rate '))*( )*+% Transfer on rising and falling clock edges
9uad data rate '9)*( )*+% Separate ))* inputs and outputs
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
8/105Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +
%&"M ,enerations
:ear &apacity 3/4-
$;?bit 3$#11111
$; #$2%bit 32#1
2115 $4bit 3#1
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
9/105
%&"M Performance Factors
*ow buffer +llows seeral words to be read and refreshed in
parallel
Synchronous )*+%
+llows for consecutie accesses in bursts withoutneeding to send each address
Improes bandwidth
)*+% banking +llows simultaneous access to multiple )*+%s Improes bandwidth
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
10/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./
0ncreasing Memory 1andidth
>Aword wide memory %iss penalty $ B $# B $ $5 bus cycles -andwidth $= bytes / $5 cycles 1.;> -/cycle
>Abank interleaed memory %iss penalty $ B $# B >C$ 21 bus cycles -andwidth $= bytes / 21 cycles 1.< -/cycle
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
11/105
Chapter ' — torage and )ther 04) opics — ..
Flash torage
Donolatile semiconductor storage $11C 0 $111C faster than disk Smaller, lower power, more robust -ut more 3/4- 'between disk and )*+%(
"=.>ElashSto
rage
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
12/105
Chapter ' — torage and )ther 04) opics — .2
Flash ypes
DF* flash bit cell like a DF* gate *andom read/write access sed for instruction memory in embedded systems
D+D) flash bit cell like a D+D) gate )enser 'bits/area(, but blockAatAaAtime access &heaper per 4- sed for S- keys, media storage, G
lash bits wears out after $111s of accesses Dot suitable for direct *+% or disk replacement ear leeling remap data to less used blocks
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
13/105
Chapter ' — torage and )ther 04) opics — .3
%is! torage
Donolatile, rotating magnetic storage
"=.@)iskStora g
e
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
14/105
Chapter ' — torage and )ther 04) opics — .$
%is! ectors and "ccess
Each sector records Sector I) )ata '#$2 bytes, >1;= bytes proposed( Error correcting code 'E&&(
sed to hide defects and recording errors
Synchroni8ation fields and gaps
+ccess to a sector inoles 9ueuing delay if other accesses are pending
Seek moe the heads *otational latency )ata transfer &ontroller oerhead
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
15/105
Chapter ' — torage and )ther 04) opics — .5
%is! "ccess Example
4ien #$2- sector, $#,111rpm, >ms aerage seektime, $11%-/s transfer rate, 1.2ms controlleroerhead, idle disk
+erage read time >ms seek timeB J / '$#,111/=1( 2ms rotational latencyB #$2 / $11%-/s 1.11#ms transfer timeB 1.2ms controller delay
=.2ms If actual aerage seek time is $ms
+erage read time @.2ms
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
16/105
Chapter ' — torage and )ther 04) opics — .'
%is! Performance 0sses
%anufacturers !uote aerage seek time -ased on all possible seeks Kocality and FS scheduling lead to smaller actual
aerage seek times
Smart disk controller allocate physical sectors ondisk Present logical sector interface to host S&SI, +T+, S+T+
)isk dries include caches Prefetch sectors in anticipation of access +oid seek and rotational delay
"
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
17/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — .(
Cache Memory
&ache memory The leel of the memory hierarchy closest tothe &P
4ien accesses L$, G, Ln0$, Ln
"#.@Th
e-asicsof&a
ches
How do we know ifthe data is presentM
here do we lookM
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
18/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — .+
%irect Mapped Cache
Kocation determined by address )irect mapped only one choice
'-lock address( modulo 'N-locks in cache(
N-locks is apower of 2
se lowAorderaddress bits
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
19/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — .-
ags and 6alid 1its
How do we know which particular block isstored in a cache locationM Store block address as well as the data +ctually, only need the highAorder bits &alled the tag
hat if there is no data in a locationM Oalid bit $ present, 1 not present Initially 1
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
20/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2/
Cache Example
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
21/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2.
Cache Example
Inde O Tag )ata
111 D
11$ D
1$1 D
1$$ D
$11 D
$1$ D
../ 7 ./ Mem8./../9
$$$ D
ord addr -inary addr Hit/miss &ache block
22 $1 $$1 %iss $$1
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
22/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22
Cache Example
Inde O Tag )ata
111 D
11$ D
/./ 7 .. Mem8.././9
1$$ D
$11 D
$1$ D
$$1 : $1 %emQ$1$$1R
$$$ D
ord addr -inary addr Hit/miss &ache block
2= $$ 1$1 %iss 1$1
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
23/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23
Cache Example
Inde O Tag )ata
111 D
11$ D
1$1 : $$ %emQ$$1$1R
1$$ D
$11 D
$1$ D
$$1 : $1 %emQ$1$$1R
$$$ D
ord addr -inary addr Hit/miss &ache block
22 $1 $$1 Hit $$1
2= $$ 1$1 Hit 1$1
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
24/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2$
Cache Example
Inde O Tag )ata
/// 7 ./ Mem8.////9
11$ D
1$1 : $$ %emQ$$1$1R
/.. 7 // Mem8///..9
$11 D
$1$ D
$$1 : $1 %emQ$1$$1R
$$$ D
ord addr -inary addr Hit/miss &ache block
$= $1 111 %iss 111
@ 11 1$$ %iss 1$$
$= $1 111 Hit 111
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
25/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25
Cache Example
Inde O Tag )ata
111 : $1 %emQ$1111R
11$ D
/./ 7 ./ Mem8.//./9
1$$ : 11 %emQ111$$R
$11 D
$1$ D
$$1 : $1 %emQ$1$$1R
$$$ D
ord addr -inary addr Hit/miss &ache block
$< $1 1$1 %iss 1$1
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
26/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2'
"ddress di#ision
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
27/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2(
Example: Larger 1loc! i*e
=> blocks, $= bytes/block To what block number does address $211mapM
-lock address $211/$= 5# -lock number 5# modulo => $$
Tag Inde Fffset
1@>;$1@$
> bits= bits22 bits
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
28/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2+
1loc! i*e Considerations
Karger blocks should reduce miss rate )ue to spatial locality -ut in a fiedAsi8ed cache
Karger blocks⇒
fewer of them %ore competition ⇒ increased miss rate Karger blocks ⇒ pollution
Karger miss penalty &an oerride benefit of reduced miss rate Early restart and criticalAwordAfirst can help
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
29/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2-
Cache Misses
Fn cache hit, &P proceeds normally Fn cache miss
Stall the &P pipeline
etch block from net leel of hierarchy Instruction cache miss
*estart instruction fetch
)ata cache miss &omplete data access
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
30/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3/
;rite
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
31/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3.
;rite
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
32/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 32
;rite "llocation
hat should happen on a write missM +lternaties for writeAthrough
+llocate on miss fetch the block
rite around dont fetch the block Since programs often write a whole block beforereading it 'e.g., initiali8ation(
or writeAback sually fetch the block
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
33/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 33
Example: 0ntrinsity FastM"H
Embedded %IPS processor $2Astage pipeline Instruction and data access on each cycle
Split cache separate IAcache and )Acache Each $=?- 2#= blocks C $= words/block )Acache writeAthrough or writeAback
SPE&2111 miss rates IAcache 1.> )Acache $$.> eighted aerage @.2
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
34/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3$
Example: 0ntrinsity FastM"H
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
35/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 35
Main Memory pporting Caches
se )*+%s for main memory ied width 'e.g., $ word( &onnected by fiedAwidth clocked bus
-us clock is typically slower than &P clock
Eample cache block read $ bus cycle for address transfer $# bus cycles per )*+% access $ bus cycle per data transfer
or >Aword block, $AwordAwide )*+% %iss penalty $ B >C$# B >C$ =# bus cycles -andwidth $= bytes / =# cycles 1.2# -/cycle
"#
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
36/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3'
Measring Cache Performance
&omponents of &P time Program eecution cycles
Includes cache hit time %emory stall cycles
%ainly from cache misses
ith simplifying assumptions
#.>%easurin
g andImproin
g
&ach
ePerform
ance
penalty%issnInstructio
%isses
Program
nsInstructio
penalty%issrate%issProgram
accesses%emory
cyclesstall%emory
××=
××=
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
37/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3(
Cache Performance Example
4ien IAcache miss rate 2 )Acache miss rate > %iss penalty $11 cycles
-ase &PI 'ideal cache( 2 Koad U stores are @= of instructions
%iss cycles per instruction IAcache 1.12 C $11 2 )Acache 1.@= C 1.1> C $11 $.>>
+ctual &PI 2 B 2 B $.>> #.>> Ideal &P is #.>>/2 2.52 times faster
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
38/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3+
"#erage "ccess ime
Hit time is also important for performance +erage memory access time '+%+T(
+%+T Hit time B %iss rate C %iss penalty
Eample &P with $ns clock, hit time $ cycle, miss
penalty 21 cycles, IAcache miss rate # +%+T $ B 1.1# C 21 2ns
2 cycles per instruction
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
39/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3-
Performance mmary
hen &P performance increased %iss penalty becomes more significant
)ecreasing base &PI
4reater proportion of time spent on memorystalls
Increasing clock rate %emory stalls account for more &P cycles
&ant neglect cache behaior whenealuating system performance
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
40/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $/
"ssociati#e Caches
ully associatie +llow a gien block to go in any cache entry *e!uires all entries to be searched at once &omparator per entry 'epensie(
nAway set associatie Each set contains n entries -lock number determines which set
'-lock number( modulo 'NSets in cache(
Search all entries in a gien set at once n comparators 'less epensie(
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
41/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $.
"ssociati#e Cache Example
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
42/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $2
pectrm of "ssociati#ity
or a cache with < entries
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
43/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $3
"ssociati#ity Example
&ompare >Ablock caches )irect mapped, 2Away set associatie,
fully associatie -lock access se!uence 1,
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
44/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $$
"ssociati#ity Example
2Away set associatie-lock
address&acheinde
Hit/miss &ache content after accessSet 1 Set $
1 1 miss Mem8/9< 1 miss %emQ1R Mem8+91 1 hit Mem8/9 %emQ
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
45/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $5
Ho Mch "ssociati#ity
Increased associatiity decreases missrate -ut with diminishing returns
Simulation of a system with =>?-)Acache, $=Aword blocks, SPE&2111 $Away $1.@ 2Away Away
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
46/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $'
et "ssociati#e Cache )rgani*ation
& l t P li
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
47/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $(
&eplacement Policy
)irect mapped no choice Set associatie
Prefer nonAalid entry, if there is one Ftherwise, choose among entries in the set
KeastArecently used 'K*( &hoose the one unused for the longest time
Simple for 2Away, manageable for >Away, too hardbeyond that
*andom 4ies approimately the same performance
as K* for high associatiity
M ltil l C h
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
48/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $+
Mltile#el Caches
Primary cache attached to &P Small, but fast
KeelA2 cache serices misses fromprimary cache Karger, slower, but still faster than main
memory
%ain memory serices KA2 cache misses Some highAend systems include KA@ cache
M ltil l C h E l
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
49/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $-
Mltile#el Cache Example
4ien &P base &PI $, clock rate >4H8 %iss rate/instruction 2 %ain memory access time $11ns
ith ust primary cache %iss penalty $11ns/1.2#ns >11 cycles Effectie &PI $ B 1.12 C >11 ;
E l = t ?
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
50/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5/
Example =cont>?
Dow add KA2 cache +ccess time #ns 4lobal miss rate to main memory 1.#
Primary miss with KA2 hit Penalty #ns/1.2#ns 21 cycles
Primary miss with KA2 miss
Etra penalty #11 cycles &PI $ B 1.12 C 21 B 1.11# C >11 @.> Performance ratio ;/@.> 2.=
M ltil l C h C id ti
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
51/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5.
Mltile#el Cache Considerations
Primary cache ocus on minimal hit time
KA2 cache ocus on low miss rate to aoid main memory
access Hit time has less oerall impact
*esults KA$ cache usually smaller than a single cache KA$ block si8e smaller than KA2 block si8e
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
52/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 52
0nteractions ith "d#anced CP@s
FutAofAorder &Ps can eecuteinstructions during cache miss Pending store stays in load/store unit )ependent instructions wait in reseration
stations Independent instructions continue
Effect of miss depends on program data
flow %uch harder to analyse se system simulation
0 t ti ith ft
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
53/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 53
0nteractions ith oftare
%isses depend onmemory accesspatterns
+lgorithm behaior &ompiler
optimi8ation for
memory access
f ) i i i i 1l !i
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
54/105
oftare )ptimi*ation #ia 1loc!ing
4oal maimi8e accesses to data before itis replaced
&onsider inner loops of )4E%%
for (int j = 0; j < n; ++j)
{
double cij = C[i+j*n];
for( int k = 0; k < n; k++ )
cij += A[i+k*n] * B[k+j*n];
C[i+j*n] = cij;
}
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5$
%,EMM " P tt
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
55/105
%,EMM "ccess Pattern
&, +, and - arrays
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 55
older accesses
new accesses
Cache 1loc!ed %,EMM
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
56/105
Cache 1loc!ed %,EMM1 define B!"C#$%&'
oid doblock (int n, int -i, int -j, int -k, double *A, double
*B, double *C)
. {
/ for (int i = -i; i < -i+B!"C#$%&'; ++i)
for (int j = -j; j < -j+B!"C#$%&'; ++j)
{
2 double cij = C[i+j*n];3* cij = C[i][j] *3
4 for( int k = -k; k < -k+B!"C#$%&'; k++ )10 cij += A[i+k*n] * B[k+j*n];3* cij+=A[i][k]*B[k][j] *3
11 C[i+j*n] = cij;3* C[i][j] = cij *3
1 }
1 }
1. oid d5e66 (int n, double* A, double* B, double* C)
1/ {1 for ( int -j = 0; -j < n; -j += B!"C#$%&' )
1 for ( int -i = 0; -i < n; -i += B!"C#$%&' )
12 for ( int -k = 0; -k < n; -k += B!"C#$%&' )
14 doblock(n, -i, -j, -k, A, B, C);
0 }
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5'
1l ! d %,EMM " P tt
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
57/105
1loc!ed %,EMM "ccess Pattern
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5(
noptimi8ed -locked
%ependaility
"#.#
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
58/105
Chapter ' — torage and )ther 04) opics — 5+
%ependaility
ault failure of a
component %ay or may not lead
to system failure
Serice accomplishmentSerice deliered
as specified
Serice interruption)eiation from
specified serice
ailure*estoration
#)e
pendable%em
oryHie
rarchy
%ependaility Measres
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
59/105
Chapter ' — torage and )ther 04) opics — 5-
%ependaility Measres
*eliability mean time to failure '%TT( Serice interruption mean time to repair '%TT*( %ean time between failures
%T- %TT B %TT*
+ailability %TT / '%TT B %TT*( Improing +ailability
Increase %TT fault aoidance, fault tolerance, faultforecasting
*educe %TT* improed tools and processes fordiagnosis and repair
he Hamming EC Code
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
60/105
he Hamming EC Code
Hamming distance Dumber of bits that are different between two
bit patterns
%inimum distance 2 proides single bit
error detection E.g. parity code
%inimum distance @ proides single
error correction, 2 bit error detection
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '/
Encoding EC
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
61/105
Encoding EC
To calculate Hamming code Dumber bits from $ on the left +ll bit positions that are a power 2 are parity
bits Each parity bit checks certain data bits
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '.
%ecoding EC
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
62/105
%ecoding EC
Oalue of parity bits indicates which bits arein error se numbering from encoding procedure E.g.
Parity bits 1111 indicates no error Parity bits $1$1 indicates bit $1 was flipped
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '2
EC4%EC Code
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
63/105
EC4%EC Code
+dd an additional parity bit for the whole word
'pn( %ake Hamming distance > )ecoding
Ket H SE& parity bits H een, pn een, no error
H odd, pn odd, correctable single bit error
H een, pn odd, error in pn bit
H odd, pn een, double error occurred Dote E&& )*+% uses SE&/)E& with < bits
protecting each => bits
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '3
6irtal Machines"#.=
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
64/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '$
6irtal Machines
Host computer emulates guest operating system
and machine resources Improed isolation of multiple guests +oids security and reliability problems
+ids sharing of resources Oirtuali8ation has some performance impact
easible with modern highAperformance comptuers
Eamples I-% O%/@51 '$;51s technologyV( O%are %icrosoft Oirtual P&
Oirtual%a
chines
6irtal Machine Monitor
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
65/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '5
6irtal Machine Monitor
%aps irtual resources to physicalresources %emory, I/F deices, &Ps
4uest code runs on natie machine in
user mode Traps to O%% on priileged instructions and
access to protected resources
4uest FS may be different from host FS O%% handles real I/F deices Emulates generic irtual I/F deices for guest
Example: imer 6irtali*ation
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
66/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ''
Example: imer 6irtali*ation
In natie machine, on timer interrupt FS suspends current process, handles
interrupt, selects and resumes net process
ith Oirtual %achine %onitor O%% suspends current O%, handles interrupt,
selects and resumes net O%
If a O% re!uires timer interrupts O%% emulates a irtual timer Emulates interrupt for O% when physical timer
interrupt occurs
0nstrction et pport
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
67/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '(
0nstrction et pport
ser and System modes Priileged instructions only aailable in
system mode Trap to system if eecuted in user mode
+ll physical resources only accessibleusing priileged instructions Including page tables, interrupt controls, I/F
registers *enaissance of irtuali8ation support
&urrent IS+s 'e.g.,
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
68/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '+
6irtal Memory
se main memory as a 6cache7 forsecondary 'disk( storage %anaged ointly by &P hardware and the
operating system 'FS( Programs share main memory
Each gets a priate irtual address spaceholding its fre!uently used code and data
Protected from other programs &P and FS translate irtual addresses to
physical addresses O% 6block7 is called a page O% translation 6miss7 is called a page fault
Oir tu
al%e
mory
"ddress ranslation
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
69/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '-
"ddress ranslation
iedAsi8e pages 'e.g., >?(
Page Falt Penalty
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
70/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (/
Page Falt Penalty
Fn page fault, the page must be fetchedfrom disk Takes millions of clock cycles Handled by FS code
Try to minimi8e page fault rate ully associatie placement Smart replacement algorithms
Page ales
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
71/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (.
Page ales
Stores placement information +rray of page table entries, indeed by irtual
page number Page table register in &P points to page table
in physical memory If page is present in memory
PTE stores the physical page number Plus other status bits 'referenced, dirty, G(
If page is not present PTE can refer to location in swap space on disk
ranslation @sing a Page ale
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
72/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (2
ranslation @sing a Page ale
Mapping Pages to torage
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
73/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (3
Mapping Pages to torage
&eplacement and ;rites
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
74/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ($
&eplacement and ;rites
To reduce page fault rate, prefer leastArecently used 'K*( replacement *eference bit 'aka use bit( in PTE set to $ on
access to page Periodically cleared to 1 by FS + page with reference bit 1 has not been
used recently )isk writes take millions of cycles
-lock at once, not indiidual locations rite through is impractical se writeAback )irty bit in PTE set when page is written
Fast ranslation @sing a L1
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
75/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (5
Fast ranslation @sing a L1
+ddress translation would appear to re!uire
etra memory references Fne to access the PTE Then the actual memory access
-ut access to page tables has good locality So use a fast cache of PTEs within the &P &alled a Translation KookAaside -uffer 'TK-( Typical $=0#$2 PTEs, 1.#0$ cycle for hit, $10$11
cycles for miss, 1.1$0$ miss rate %isses could be handled by hardware or software
Fast ranslation @sing a L1
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
76/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ('
Fast ranslation @sing a L1
L1 Misses
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
77/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ((
L1 Misses
If page is in memory Koad the PTE from memory and retry &ould be handled in hardware
&an get comple for more complicated page tablestructures
Fr in software *aise a special eception, with optimi8ed handler
If page is not in memory 'page fault( FS handles fetching the page and updating
the page table Then restart the faulting instruction
L1 Miss Handler
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
78/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (+
L1 Miss Handler
TK- miss indicates Page present, but PTE not in TK- Page not preset
%ust recogni8e TK- miss beforedestination register oerwritten *aise eception
Handler copies PTE from memory to TK- Then restarts instruction If page not present, page fault will occur
Page Falt Handler
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
79/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (-
Page Falt Handler
se faulting irtual address to find PTE Kocate page on disk &hoose page to replace
If dirty, write to disk first
*ead page into memory and update pagetable
%ake process runnable again *estart from faulting instruction
L1 and Cache 0nteraction
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
80/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +/
L1 and Cache 0nteraction
If cache tag uses
physical address Deed to translate
before cache lookup
+lternatie use irtual
address tag &omplications due toaliasing )ifferent irtual
addresses for sharedphysical address
Memory Protection
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
81/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +.
Memory Protection
)ifferent tasks can share parts of their
irtual address spaces -ut need to protect against errant access *e!uires FS assistance
Hardware support for FS protection Priileged superisor mode 'aka kernel mode( Priileged instructions
Page tables and other state information onlyaccessible in superisor mode
System call eception 'e.g., syscall in %IPS(
he Memory Hierarchy"#.<+
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
82/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +2
he Memory Hierarchy
&ommon principles apply at all leels ofthe memory hierarchy
-ased on notions of caching +t each leel in the hierarchy
-lock placement
inding a block *eplacement on a miss rite policy
+&ommo
nErameworkf o
r%em
oryHierar
chies
The BIG Pictre
1loc! Placement
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
83/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +3
1loc! Placement
)etermined by associatiity )irect mapped '$Away associatie(
Fne choice for placement
nAway set associatie n choices within a set
ully associatie +ny location
Higher associatiity reduces miss rate Increases compleity, cost, and access time
Finding a 1loc!
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
84/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +$
Finding a 1loc!
Hardware caches *educe comparisons to reduce cost
Oirtual memory ull table lookup makes full associatiity feasible -enefit in reduced miss rate
+ssociatiity Kocation method Tag comparisons
)irect mapped Inde $
nAway setassociatie
Set inde, then searchentries within the set
n
ully associatie Search all entries Nentries
ull lookup table 1
&eplacement
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
85/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +5
&eplacement
&hoice of entry to replace on a miss Keast recently used 'K*(
&omple and costly hardware for high associatiity
*andom &lose to K*, easier to implement
Oirtual memory K* approimation with hardware support
;rite Policy
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
86/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +'
;rite Policy
riteAthrough pdate both upper and lower leels Simplifies replacement, but may re!uire write
buffer riteAback
pdate upper leel only pdate lower leel when block is replaced Deed to keep more state
Oirtual memory Fnly writeAback is feasible, gien disk write
latency
orces of Misses
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
87/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +(
orces of Misses
&ompulsory misses 'aka cold start misses( irst access to a block
&apacity misses )ue to finite cache si8e
+ replaced block is later accessed again &onflict misses 'aka collision misses(
In a nonAfully associatie cache
)ue to competition for entries in a set ould not occur in a fully associatie cache ofthe same total si8e
Cache %esign rade
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
88/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ++
g
)esign change Effect on miss rate Degatie performanceeffect
Increase cache si8e )ecrease capacitymisses
%ay increase accesstime
Increase associatiity )ecrease conflictmisses
%ay increase accesstime
Increase block si8e )ecrease compulsorymisses
Increases misspenalty. or ery largeblock si8e, mayincrease miss rate
due to pollution.
Cache Control"#.;,
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
89/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +-
Cache Control
Eample cache characteristics )irectAmapped, writeAback, write allocate -lock si8e > words '$= bytes( &ache si8e $= ?- '$12> blocks( @2Abit byte addresses Oalid bit and dirty bit per block -locking cache
&P waits until access is complete
singaEinite
State%achine
to&on
trol+
Simple&
ache
Tag Inde Fffset
1@>;$1@$
> bits$1 bits$< bits
0nterface ignals
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
90/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -/
te ace g a s
&ache&P %emory
*ead/rite
Oalid
+ddress
rite )ata
*ead )ata
*eady
@2
@2
@2
*ead/rite
Oalid
+ddress
rite )ata
*ead )ata
*eady
@2
$2<
$2<
%ultiple cyclesper access
Finite tate Machines
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
91/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -.
se an S% to
se!uence control steps Set of states, transition
on each clock edge State alues are binary
encoded &urrent state stored in a
register Det state
f n 'current state,current inputs(
&ontrol output signals f o 'current state(
Cache Controller FM
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
92/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -2
&ould partitioninto separate
states toreduce clock
cycle time
Cache Coherence Prolem"#.$1P
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
93/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -3
Suppose two &P cores share a physical
address space riteAthrough caches
Parallel is
mand%emo
ry
Hier a
rchies&ache
&oher e
nce
Timestep
Eent &P +scache
&P -scache
%emory
1 1
$ &P + reads L 1 1
2 &P - reads L 1 1 1
@ &P + writes $ to L $ 1 $
Coherence %efined
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
94/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -$
Informally *eads return most recently
written alue ormally
P writes LW P reads L 'no interening writes(
⇒ read returns written alue P$ writes LW P2 reads L 'sufficiently later(⇒ read returns written alue c.f. &P - reading L after step @ in eample
P$ writes L, P2 writes L⇒ all processors see writes in the same order End up with the same final alue for L
Cache Coherence Protocols
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
95/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -5
Fperations performed by caches in
multiprocessors to ensure coherence %igration of data to local caches
*educes bandwidth for shared memory
*eplication of readAshared data *educes contention for access Snooping protocols
Each cache monitors bus reads/writes
)irectoryAbased protocols &aches and memory record sharing status of
blocks in a directory
0n#alidating nooping Protocols
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
96/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -'
g p g
&ache gets eclusie access to a block
when it is to be written -roadcasts an inalidate message on the bus Subse!uent read in another cache misses
Fwning cache supplies updated alue&P actiity -us actiity &P +s
cache&P -scache
%emory
1
&P + reads L &ache miss for L 1 1
&P - reads L &ache miss for L 1 1 1
&P + writes $ to L Inalidate for L $ 1
&P - read L &ache miss for L $ $ $
Memory Consistency
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
97/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -(
y y
hen are writes seen by other processors 6Seen7 means a read returns the written alue &ant be instantaneously
+ssumptions + write completes only when all processors hae seen
it + processor does not reorder writes with other
accesses
&onse!uence P writes L then writes :⇒ all processors that see new : also see new L
Processors can reorder reads, but not writes
Mltile#el )n
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
98/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -+
p The+*%
&orte
A+<an
d
Intel&orei5
%e
moryHierarchie
s
2
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
99/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — --
g
pporting Mltiple 0sse
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
100/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — .//
pp g p
-oth hae multiAbanked caches that allow
multiple accesses per cycle assuming nobank conflicts
&ore i5 cache optimi8ations *eturn re!uested word first DonAblocking cache
Hit under miss
%iss under miss )ata prefetching
%,EMM"#.$>4
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
101/105
&ombine cache blocking and subword
parallelism
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./.
4oingEa
ster&
ache-
lo
ckingand%
atri%
ultiply
Pitfalls"#.$#E
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
102/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./2
-yte s. word addressing Eample @2Abyte directAmapped cache,
>Abyte blocks -yte @= maps to block $
ord @= maps to block > Ignoring memory system effects when
writing or generating code
Eample iterating oer rows s. columns ofarrays Karge strides result in poor locality
allacies
andPitfalls
Pitfalls
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
103/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./3
In multiprocessor with shared K2 or K@
cache Kess associatiity than cores results in conflict
misses
%ore cores ⇒ need to increase associatiity sing +%+T to ealuate performance of
outAofAorder processors
Ignores effect of nonAblocked accesses Instead, ealuate performance by simulation
Pitfalls
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
104/105
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./$
Etending address range using segments E.g., Intel
8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
105/105
ast memories are small, large memories are
slow e really want fast, large memories &aching gies this illusion
Principle of locality Programs use a small part of their memory space
fre!uently
%emory hierarchy K$ cache ↔ K2 cache ↔ G ↔ )*+% memory
↔ disk %emory system design is critical for
multiprocessors
oncluding*em
arks