Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

Embed Size (px)

Citation preview

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    1/105

    COMPUTER ORGANIZATION AND DThe Hardware/Software Interface

    5thEdition

    Chapter

    Large and Fast:Exploiting Memory

    Hierarchy

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    2/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2

    Principle of Locality

    Programs access a small proportion of theiraddress space at any time Temporal locality

    Items accessed recently are likely to beaccessed again soon e.g., instructions in a loop, induction ariables

    Spatial locality Items near those accessed recently are likely to

    be accessed soon E.g., se!uential instruction access, array data

    "#.$Int ro

    duction

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    3/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3

    a!ing "d#antage of Locality

    %emory hierarchy Store eerything on disk &opy recently accessed 'and nearby(

    items from disk to smaller )*+% memory %ain memory

    &opy more recently accessed 'and

    nearby( items from )*+% to smallerS*+% memory &ache memory attached to &P

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    4/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $

    Memory Hierarchy Le#els

    -lock 'aka line( unit of copying %ay be multiple words

    If accessed data is present inupper leel Hit access satisfied by upper leel

    Hit ratio hits/accesses

    If accessed data is absent %iss block copied from lower leel

    Time taken miss penalty

    %iss ratio misses/accesses $ 0 hit ratio

    Then accessed data supplied fromupper leel

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    5/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5

    Memory echnology

    Static *+% 'S*+%( 1.#ns 0 2.#ns, 32111 0 3#111 per 4-

    )ynamic *+% ')*+%(

    #1ns 0 51ns, 321 0 35# per 4- %agnetic disk

    #ms 0 21ms, 31.21 0 32 per 4-

    Ideal memory  +ccess time of S*+% &apacity and cost/4- of disk

    "#.2%e

    mory

    Technolo

    gies

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    6/105

    %&"M echnology

    )ata stored as a charge in a capacitor  Single transistor used to access the charge %ust periodically be refreshed

    *ead contents and write back Performed on a )*+% 6row7

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    7/105Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (

    "d#anced %&"M )rgani*ation

    -its in a )*+% are organi8ed as arectangular array )*+% accesses an entire row -urst mode supply successie words from a

    row with reduced latency

    )ouble data rate '))*( )*+% Transfer on rising and falling clock edges

    9uad data rate '9)*( )*+% Separate ))* inputs and outputs

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    8/105Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +

    %&"M ,enerations

    :ear &apacity 3/4-

    $;?bit 3$#11111

    $; #$2%bit 32#1

    2115 $4bit 3#1

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    9/105

    %&"M Performance Factors

    *ow buffer   +llows seeral words to be read and refreshed in

    parallel

    Synchronous )*+%

     +llows for consecutie accesses in bursts withoutneeding to send each address

    Improes bandwidth

    )*+% banking  +llows simultaneous access to multiple )*+%s Improes bandwidth

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    10/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./

    0ncreasing Memory 1andidth

    >Aword wide memory %iss penalty $ B $# B $ $5 bus cycles -andwidth $= bytes / $5 cycles 1.;> -/cycle

    >Abank interleaed memory %iss penalty $ B $# B >C$ 21 bus cycles -andwidth $= bytes / 21 cycles 1.< -/cycle

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    11/105

    Chapter ' — torage and )ther 04) opics — ..

    Flash torage

    Donolatile semiconductor storage $11C 0 $111C faster than disk Smaller, lower power, more robust -ut more 3/4- 'between disk and )*+%(

    "=.>ElashSto

    rage

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    12/105

    Chapter ' — torage and )ther 04) opics — .2

    Flash ypes

    DF* flash bit cell like a DF* gate *andom read/write access sed for instruction memory in embedded systems

    D+D) flash bit cell like a D+D) gate )enser 'bits/area(, but blockAatAaAtime access &heaper per 4- sed for S- keys, media storage, G

    lash bits wears out after $111s of accesses Dot suitable for direct *+% or disk replacement ear leeling remap data to less used blocks

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    13/105

    Chapter ' — torage and )ther 04) opics — .3

    %is! torage

    Donolatile, rotating magnetic storage

    "=.@)iskStora g

    e

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    14/105

    Chapter ' — torage and )ther 04) opics — .$

    %is! ectors and "ccess

    Each sector records Sector I) )ata '#$2 bytes, >1;= bytes proposed( Error correcting code 'E&&(

    sed to hide defects and recording errors

    Synchroni8ation fields and gaps

     +ccess to a sector inoles 9ueuing delay if other accesses are pending

    Seek moe the heads *otational latency )ata transfer  &ontroller oerhead

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    15/105

    Chapter ' — torage and )ther 04) opics — .5

    %is! "ccess Example

    4ien #$2- sector, $#,111rpm, >ms aerage seektime, $11%-/s transfer rate, 1.2ms controlleroerhead, idle disk

     +erage read time >ms seek timeB J / '$#,111/=1( 2ms rotational latencyB #$2 / $11%-/s 1.11#ms transfer timeB 1.2ms controller delay

    =.2ms If actual aerage seek time is $ms

     +erage read time @.2ms

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    16/105

    Chapter ' — torage and )ther 04) opics — .'

    %is! Performance 0sses

    %anufacturers !uote aerage seek time -ased on all possible seeks Kocality and FS scheduling lead to smaller actual

    aerage seek times

    Smart disk controller allocate physical sectors ondisk Present logical sector interface to host S&SI, +T+, S+T+

    )isk dries include caches Prefetch sectors in anticipation of access  +oid seek and rotational delay

    "

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    17/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — .(

    Cache Memory

    &ache memory The leel of the memory hierarchy closest tothe &P

    4ien accesses L$, G, Ln0$, Ln

    "#.@Th

    e-asicsof&a

    ches

    How do we know ifthe data is presentM

    here do we lookM

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    18/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — .+

    %irect Mapped Cache

    Kocation determined by address )irect mapped only one choice

    '-lock address( modulo 'N-locks in cache(

    N-locks is apower of 2

    se lowAorderaddress bits

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    19/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — .-

    ags and 6alid 1its

    How do we know which particular block isstored in a cache locationM Store block address as well as the data  +ctually, only need the highAorder bits &alled the tag

    hat if there is no data in a locationM Oalid bit $ present, 1 not present Initially 1

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    20/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2/

    Cache Example

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    21/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2.

    Cache Example

    Inde O Tag )ata

    111 D

    11$ D

    1$1 D

    1$$ D

    $11 D

    $1$ D

    ../ 7 ./ Mem8./../9

    $$$ D

    ord addr -inary addr Hit/miss &ache block

    22 $1 $$1 %iss $$1

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    22/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22

    Cache Example

    Inde O Tag )ata

    111 D

    11$ D

    /./ 7 .. Mem8.././9

    1$$ D

    $11 D

    $1$ D

    $$1 : $1 %emQ$1$$1R

    $$$ D

    ord addr -inary addr Hit/miss &ache block

    2= $$ 1$1 %iss 1$1

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    23/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23

    Cache Example

    Inde O Tag )ata

    111 D

    11$ D

    1$1 : $$ %emQ$$1$1R

    1$$ D

    $11 D

    $1$ D

    $$1 : $1 %emQ$1$$1R

    $$$ D

    ord addr -inary addr Hit/miss &ache block

    22 $1 $$1 Hit $$1

    2= $$ 1$1 Hit 1$1

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    24/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2$

    Cache Example

    Inde O Tag )ata

    /// 7 ./ Mem8.////9

    11$ D

    1$1 : $$ %emQ$$1$1R

    /.. 7 // Mem8///..9

    $11 D

    $1$ D

    $$1 : $1 %emQ$1$$1R

    $$$ D

    ord addr -inary addr Hit/miss &ache block

    $= $1 111 %iss 111

    @ 11 1$$ %iss 1$$

    $= $1 111 Hit 111

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    25/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25

    Cache Example

    Inde O Tag )ata

    111 : $1 %emQ$1111R

    11$ D

    /./ 7 ./ Mem8.//./9

    1$$ : 11 %emQ111$$R

    $11 D

    $1$ D

    $$1 : $1 %emQ$1$$1R

    $$$ D

    ord addr -inary addr Hit/miss &ache block

    $< $1 1$1 %iss 1$1

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    26/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2'

    "ddress di#ision

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    27/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2(

    Example: Larger 1loc! i*e

    => blocks, $= bytes/block To what block number does address $211mapM

    -lock address $211/$=  5# -lock number 5# modulo => $$

    Tag Inde Fffset

    1@>;$1@$

    > bits= bits22 bits

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    28/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2+

    1loc! i*e Considerations

    Karger blocks should reduce miss rate )ue to spatial locality -ut in a fiedAsi8ed cache

    Karger blocks⇒

     fewer of them %ore competition ⇒ increased miss rate Karger blocks ⇒ pollution

    Karger miss penalty &an oerride benefit of reduced miss rate Early restart and criticalAwordAfirst can help

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    29/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2-

    Cache Misses

    Fn cache hit, &P proceeds normally Fn cache miss

    Stall the &P pipeline

    etch block from net leel of hierarchy Instruction cache miss

    *estart instruction fetch

    )ata cache miss &omplete data access

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    30/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3/

    ;rite

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    31/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3.

    ;rite

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    32/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 32

    ;rite "llocation

    hat should happen on a write missM  +lternaties for writeAthrough

     +llocate on miss fetch the block

    rite around dont fetch the block Since programs often write a whole block beforereading it 'e.g., initiali8ation(

    or writeAback sually fetch the block

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    33/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 33

    Example: 0ntrinsity FastM"H

    Embedded %IPS processor  $2Astage pipeline Instruction and data access on each cycle

    Split cache separate IAcache and )Acache Each $=?- 2#= blocks C $= words/block )Acache writeAthrough or writeAback

    SPE&2111 miss rates IAcache 1.> )Acache $$.> eighted aerage @.2

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    34/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3$

    Example: 0ntrinsity FastM"H

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    35/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 35

    Main Memory pporting Caches

    se )*+%s for main memory ied width 'e.g., $ word( &onnected by fiedAwidth clocked bus

    -us clock is typically slower than &P clock

    Eample cache block read $ bus cycle for address transfer  $# bus cycles per )*+% access $ bus cycle per data transfer 

    or >Aword block, $AwordAwide )*+% %iss penalty $ B >C$# B >C$ =# bus cycles -andwidth $= bytes / =# cycles 1.2# -/cycle

    "#

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    36/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3'

    Measring Cache Performance

    &omponents of &P time Program eecution cycles

    Includes cache hit time %emory stall cycles

    %ainly from cache misses

    ith simplifying assumptions

    #.>%easurin

    g andImproin

    g

    &ach

    ePerform

    ance

    penalty%issnInstructio

    %isses

    Program

    nsInstructio

    penalty%issrate%issProgram

    accesses%emory

    cyclesstall%emory

    ××=

    ××=

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    37/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3(

    Cache Performance Example

    4ien IAcache miss rate 2 )Acache miss rate > %iss penalty $11 cycles

    -ase &PI 'ideal cache( 2 Koad U stores are @= of instructions

    %iss cycles per instruction IAcache 1.12 C $11 2 )Acache 1.@= C 1.1> C $11 $.>>

     +ctual &PI 2 B 2 B $.>> #.>> Ideal &P is #.>>/2 2.52 times faster 

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    38/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3+

    "#erage "ccess ime

    Hit time is also important for performance  +erage memory access time '+%+T(

     +%+T Hit time B %iss rate C %iss penalty

    Eample &P with $ns clock, hit time $ cycle, miss

    penalty 21 cycles, IAcache miss rate #  +%+T $ B 1.1# C 21 2ns

    2 cycles per instruction

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    39/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3-

    Performance mmary

    hen &P performance increased %iss penalty becomes more significant

    )ecreasing base &PI

    4reater proportion of time spent on memorystalls

    Increasing clock rate %emory stalls account for more &P cycles

    &ant neglect cache behaior whenealuating system performance

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    40/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $/

    "ssociati#e Caches

    ully associatie  +llow a gien block to go in any cache entry *e!uires all entries to be searched at once &omparator per entry 'epensie(

    nAway set associatie Each set contains n entries -lock number determines which set

    '-lock number( modulo 'NSets in cache(

    Search all entries in a gien set at once n comparators 'less epensie(

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    41/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $.

    "ssociati#e Cache Example

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    42/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $2

    pectrm of "ssociati#ity

    or a cache with < entries

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    43/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $3

    "ssociati#ity Example

    &ompare >Ablock caches )irect mapped, 2Away set associatie,

    fully associatie -lock access se!uence 1,

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    44/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $$

    "ssociati#ity Example

    2Away set associatie-lock

    address&acheinde

    Hit/miss &ache content after accessSet 1 Set $

    1 1 miss Mem8/9< 1 miss %emQ1R Mem8+91 1 hit Mem8/9 %emQ

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    45/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $5

    Ho Mch "ssociati#ity

    Increased associatiity decreases missrate -ut with diminishing returns

    Simulation of a system with =>?-)Acache, $=Aword blocks, SPE&2111 $Away $1.@ 2Away Away

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    46/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $'

    et "ssociati#e Cache )rgani*ation

    & l t P li

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    47/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $(

    &eplacement Policy

    )irect mapped no choice Set associatie

    Prefer nonAalid entry, if there is one Ftherwise, choose among entries in the set

    KeastArecently used 'K*( &hoose the one unused for the longest time

    Simple for 2Away, manageable for >Away, too hardbeyond that

    *andom 4ies approimately the same performance

    as K* for high associatiity

    M ltil l C h

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    48/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $+

    Mltile#el Caches

    Primary cache attached to &P Small, but fast

    KeelA2 cache serices misses fromprimary cache Karger, slower, but still faster than main

    memory

    %ain memory serices KA2 cache misses Some highAend systems include KA@ cache

    M ltil l C h E l

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    49/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — $-

    Mltile#el Cache Example

    4ien &P base &PI $, clock rate >4H8 %iss rate/instruction 2 %ain memory access time $11ns

    ith ust primary cache %iss penalty $11ns/1.2#ns >11 cycles Effectie &PI $ B 1.12 C >11 ;

    E l = t ?

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    50/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5/

    Example =cont>?

    Dow add KA2 cache  +ccess time #ns 4lobal miss rate to main memory 1.#

    Primary miss with KA2 hit Penalty #ns/1.2#ns 21 cycles

    Primary miss with KA2 miss

    Etra penalty #11 cycles &PI $ B 1.12 C 21 B 1.11# C >11 @.> Performance ratio ;/@.> 2.=

    M ltil l C h C id ti

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    51/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5.

    Mltile#el Cache Considerations

    Primary cache ocus on minimal hit time

    KA2 cache ocus on low miss rate to aoid main memory

    access Hit time has less oerall impact

    *esults KA$ cache usually smaller than a single cache KA$ block si8e smaller than KA2 block si8e

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    52/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 52

    0nteractions ith "d#anced CP@s

    FutAofAorder &Ps can eecuteinstructions during cache miss Pending store stays in load/store unit )ependent instructions wait in reseration

    stations Independent instructions continue

    Effect of miss depends on program data

    flow %uch harder to analyse se system simulation

    0 t ti ith ft

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    53/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 53

    0nteractions ith oftare

    %isses depend onmemory accesspatterns

     +lgorithm behaior  &ompiler

    optimi8ation for

    memory access

    f ) i i i i 1l !i

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    54/105

    oftare )ptimi*ation #ia 1loc!ing

    4oal maimi8e accesses to data before itis replaced

    &onsider inner loops of )4E%%

      for (int j = 0; j < n; ++j)

      {

      double cij = C[i+j*n];

      for( int k = 0; k < n; k++ )

      cij += A[i+k*n] * B[k+j*n];

      C[i+j*n] = cij;

      }

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5$

    %,EMM " P tt

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    55/105

    %,EMM "ccess Pattern

    &, +, and - arrays

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 55

    older accesses

    new accesses

    Cache 1loc!ed %,EMM

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    56/105

    Cache 1loc!ed %,EMM1 define B!"C#$%&'

    oid doblock (int n, int -i, int -j, int -k, double *A, double

    *B, double *C)

    . {

    / for (int i = -i; i < -i+B!"C#$%&'; ++i)

    for (int j = -j; j < -j+B!"C#$%&'; ++j)

    {

    2 double cij = C[i+j*n];3* cij = C[i][j] *3

    4 for( int k = -k; k < -k+B!"C#$%&'; k++ )10 cij += A[i+k*n] * B[k+j*n];3* cij+=A[i][k]*B[k][j] *3

    11 C[i+j*n] = cij;3* C[i][j] = cij *3

    1 }

    1 }

    1. oid d5e66 (int n, double* A, double* B, double* C)

    1/ {1 for ( int -j = 0; -j < n; -j += B!"C#$%&' )

    1 for ( int -i = 0; -i < n; -i += B!"C#$%&' )

    12 for ( int -k = 0; -k < n; -k += B!"C#$%&' )

    14 doblock(n, -i, -j, -k, A, B, C);

    0 }

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5'

    1l ! d %,EMM " P tt

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    57/105

    1loc!ed %,EMM "ccess Pattern

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5(

    noptimi8ed -locked

    %ependaility

    "#.#

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    58/105

    Chapter ' — torage and )ther 04) opics — 5+

    %ependaility

    ault failure of a

    component %ay or may not lead

    to system failure

    Serice accomplishmentSerice deliered

    as specified

    Serice interruption)eiation from

    specified serice

    ailure*estoration

    #)e

    pendable%em

    oryHie

    rarchy

    %ependaility Measres

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    59/105

    Chapter ' — torage and )ther 04) opics — 5-

    %ependaility Measres

    *eliability mean time to failure '%TT( Serice interruption mean time to repair '%TT*( %ean time between failures

    %T- %TT B %TT*

     +ailability %TT / '%TT B %TT*( Improing +ailability

    Increase %TT fault aoidance, fault tolerance, faultforecasting

    *educe %TT* improed tools and processes fordiagnosis and repair 

    he Hamming EC Code

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    60/105

    he Hamming EC Code

    Hamming distance Dumber of bits that are different between two

    bit patterns

    %inimum distance 2 proides single bit

    error detection E.g. parity code

    %inimum distance @ proides single

    error correction, 2 bit error detection

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '/

    Encoding EC

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    61/105

    Encoding EC

    To calculate Hamming code Dumber bits from $ on the left  +ll bit positions that are a power 2 are parity

    bits Each parity bit checks certain data bits

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '.

    %ecoding EC

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    62/105

    %ecoding EC

    Oalue of parity bits indicates which bits arein error  se numbering from encoding procedure E.g.

    Parity bits 1111 indicates no error  Parity bits $1$1 indicates bit $1 was flipped

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '2

    EC4%EC Code

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    63/105

    EC4%EC Code

     +dd an additional parity bit for the whole word

    'pn( %ake Hamming distance > )ecoding

    Ket H SE& parity bits H een, pn een, no error 

    H odd, pn odd, correctable single bit error 

    H een, pn odd, error in pn bit

    H odd, pn een, double error occurred Dote E&& )*+% uses SE&/)E& with < bits

    protecting each => bits

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '3

    6irtal Machines"#.=

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    64/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '$

    6irtal Machines

    Host computer emulates guest operating system

    and machine resources Improed isolation of multiple guests  +oids security and reliability problems

     +ids sharing of resources Oirtuali8ation has some performance impact

    easible with modern highAperformance comptuers

    Eamples I-% O%/@51 '$;51s technologyV( O%are %icrosoft Oirtual P&

    Oirtual%a

    chines

    6irtal Machine Monitor

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    65/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '5

    6irtal Machine Monitor 

    %aps irtual resources to physicalresources %emory, I/F deices, &Ps

    4uest code runs on natie machine in

    user mode Traps to O%% on priileged instructions and

    access to protected resources

    4uest FS may be different from host FS O%% handles real I/F deices Emulates generic irtual I/F deices for guest

    Example: imer 6irtali*ation

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    66/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ''

    Example: imer 6irtali*ation

    In natie machine, on timer interrupt FS suspends current process, handles

    interrupt, selects and resumes net process

    ith Oirtual %achine %onitor  O%% suspends current O%, handles interrupt,

    selects and resumes net O%

    If a O% re!uires timer interrupts O%% emulates a irtual timer  Emulates interrupt for O% when physical timer

    interrupt occurs

    0nstrction et pport

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    67/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '(

    0nstrction et pport

    ser and System modes Priileged instructions only aailable in

    system mode Trap to system if eecuted in user mode

     +ll physical resources only accessibleusing priileged instructions Including page tables, interrupt controls, I/F

    registers *enaissance of irtuali8ation support

    &urrent IS+s 'e.g.,

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    68/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '+

    6irtal Memory

    se main memory as a 6cache7 forsecondary 'disk( storage %anaged ointly by &P hardware and the

    operating system 'FS( Programs share main memory

    Each gets a priate irtual address spaceholding its fre!uently used code and data

    Protected from other programs &P and FS translate irtual addresses to

    physical addresses O% 6block7 is called a page O% translation 6miss7 is called a page fault

    Oir tu

    al%e

    mory

    "ddress ranslation

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    69/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — '-

    "ddress ranslation

    iedAsi8e pages 'e.g., >?(

    Page Falt Penalty

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    70/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (/

    Page Falt Penalty

    Fn page fault, the page must be fetchedfrom disk Takes millions of clock cycles Handled by FS code

    Try to minimi8e page fault rate ully associatie placement Smart replacement algorithms

    Page ales

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    71/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (.

    Page ales

    Stores placement information  +rray of page table entries, indeed by irtual

    page number  Page table register in &P points to page table

    in physical memory If page is present in memory

    PTE stores the physical page number  Plus other status bits 'referenced, dirty, G(

    If page is not present PTE can refer to location in swap space on disk

    ranslation @sing a Page ale

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    72/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (2

    ranslation @sing a Page ale

    Mapping Pages to torage

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    73/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (3

    Mapping Pages to torage

    &eplacement and ;rites

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    74/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ($

    &eplacement and ;rites

    To reduce page fault rate, prefer leastArecently used 'K*( replacement *eference bit 'aka use bit( in PTE set to $ on

    access to page Periodically cleared to 1 by FS  + page with reference bit 1 has not been

    used recently )isk writes take millions of cycles

    -lock at once, not indiidual locations rite through is impractical se writeAback )irty bit in PTE set when page is written

    Fast ranslation @sing a L1

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    75/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (5

    Fast ranslation @sing a L1

     +ddress translation would appear to re!uire

    etra memory references Fne to access the PTE Then the actual memory access

    -ut access to page tables has good locality So use a fast cache of PTEs within the &P &alled a Translation KookAaside -uffer 'TK-( Typical $=0#$2 PTEs, 1.#0$ cycle for hit, $10$11

    cycles for miss, 1.1$0$ miss rate %isses could be handled by hardware or software

    Fast ranslation @sing a L1

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    76/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ('

    Fast ranslation @sing a L1

    L1 Misses

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    77/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ((

    L1 Misses

    If page is in memory Koad the PTE from memory and retry &ould be handled in hardware

    &an get comple for more complicated page tablestructures

    Fr in software *aise a special eception, with optimi8ed handler 

    If page is not in memory 'page fault( FS handles fetching the page and updating

    the page table Then restart the faulting instruction

    L1 Miss Handler

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    78/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (+

    L1 Miss Handler 

    TK- miss indicates Page present, but PTE not in TK- Page not preset

    %ust recogni8e TK- miss beforedestination register oerwritten *aise eception

    Handler copies PTE from memory to TK- Then restarts instruction If page not present, page fault will occur 

    Page Falt Handler

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    79/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — (-

    Page Falt Handler 

    se faulting irtual address to find PTE Kocate page on disk &hoose page to replace

    If dirty, write to disk first

    *ead page into memory and update pagetable

    %ake process runnable again *estart from faulting instruction

    L1 and Cache 0nteraction

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    80/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +/

    L1 and Cache 0nteraction

    If cache tag uses

    physical address Deed to translate

    before cache lookup

     +lternatie use irtual

    address tag &omplications due toaliasing )ifferent irtual

    addresses for sharedphysical address

    Memory Protection

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    81/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +.

    Memory Protection

    )ifferent tasks can share parts of their

    irtual address spaces -ut need to protect against errant access *e!uires FS assistance

    Hardware support for FS protection Priileged superisor mode 'aka kernel mode( Priileged instructions

    Page tables and other state information onlyaccessible in superisor mode

    System call eception 'e.g., syscall in %IPS(

    he Memory Hierarchy"#.<+

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    82/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +2

    he Memory Hierarchy

    &ommon principles apply at all leels ofthe memory hierarchy

    -ased on notions of caching  +t each leel in the hierarchy

    -lock placement

    inding a block *eplacement on a miss rite policy

    +&ommo

    nErameworkf o

    r%em

    oryHierar

    chies

    The BIG Pictre

    1loc! Placement

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    83/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +3

    1loc! Placement

    )etermined by associatiity )irect mapped '$Away associatie(

    Fne choice for placement

    nAway set associatie n choices within a set

    ully associatie  +ny location

    Higher associatiity reduces miss rate Increases compleity, cost, and access time

    Finding a 1loc!

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    84/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +$

    Finding a 1loc!

    Hardware caches *educe comparisons to reduce cost

    Oirtual memory ull table lookup makes full associatiity feasible -enefit in reduced miss rate

     +ssociatiity Kocation method Tag comparisons

    )irect mapped Inde $

    nAway setassociatie

    Set inde, then searchentries within the set

    n

    ully associatie Search all entries Nentries

    ull lookup table 1

    &eplacement

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    85/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +5

    &eplacement

    &hoice of entry to replace on a miss Keast recently used 'K*(

    &omple and costly hardware for high associatiity

    *andom &lose to K*, easier to implement

    Oirtual memory K* approimation with hardware support

    ;rite Policy

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    86/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +'

    ;rite Policy

    riteAthrough pdate both upper and lower leels Simplifies replacement, but may re!uire write

    buffer  riteAback

    pdate upper leel only pdate lower leel when block is replaced Deed to keep more state

    Oirtual memory Fnly writeAback is feasible, gien disk write

    latency

    orces of Misses

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    87/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +(

    orces of Misses

    &ompulsory misses 'aka cold start misses( irst access to a block

    &apacity misses )ue to finite cache si8e

     + replaced block is later accessed again &onflict misses 'aka collision misses(

    In a nonAfully associatie cache

    )ue to competition for entries in a set ould not occur in a fully associatie cache ofthe same total si8e

    Cache %esign rade

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    88/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ++

    g

    )esign change Effect on miss rate Degatie performanceeffect

    Increase cache si8e )ecrease capacitymisses

    %ay increase accesstime

    Increase associatiity )ecrease conflictmisses

    %ay increase accesstime

    Increase block si8e )ecrease compulsorymisses

    Increases misspenalty. or ery largeblock si8e, mayincrease miss rate

    due to pollution.

    Cache Control"#.;,

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    89/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — +-

    Cache Control

    Eample cache characteristics )irectAmapped, writeAback, write allocate -lock si8e > words '$= bytes( &ache si8e $= ?- '$12> blocks( @2Abit byte addresses Oalid bit and dirty bit per block -locking cache

    &P waits until access is complete

    singaEinite

    State%achine

    to&on

    trol+

    Simple&

    ache

    Tag Inde Fffset

    1@>;$1@$

    > bits$1 bits$< bits

    0nterface ignals

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    90/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -/

    te ace g a s

    &ache&P %emory

    *ead/rite

    Oalid

     +ddress

    rite )ata

    *ead )ata

    *eady

    @2

    @2

    @2

    *ead/rite

    Oalid

     +ddress

    rite )ata

    *ead )ata

    *eady

    @2

    $2<

    $2<

    %ultiple cyclesper access

    Finite tate Machines

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    91/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -.

    se an S% to

    se!uence control steps Set of states, transition

    on each clock edge State alues are binary

    encoded &urrent state stored in a

    register  Det state

    f n 'current state,current inputs(

    &ontrol output signals f o 'current state(

    Cache Controller FM

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    92/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -2

    &ould partitioninto separate

    states toreduce clock

    cycle time

    Cache Coherence Prolem"#.$1P

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    93/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -3

    Suppose two &P cores share a physical

    address space riteAthrough caches

    Parallel is

    mand%emo

    ry

    Hier a

    rchies&ache

    &oher e

    nce

    Timestep

    Eent &P +scache

    &P -scache

    %emory

    1 1

    $ &P + reads L 1 1

    2 &P - reads L 1 1 1

    @ &P + writes $ to L $ 1 $

    Coherence %efined

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    94/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -$

    Informally *eads return most recently

    written alue ormally

    P writes LW P reads L 'no interening writes(

    ⇒ read returns written alue P$ writes LW P2 reads L 'sufficiently later(⇒ read returns written alue c.f. &P - reading L after step @ in eample

    P$ writes L, P2 writes L⇒ all processors see writes in the same order  End up with the same final alue for L

    Cache Coherence Protocols

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    95/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -5

    Fperations performed by caches in

    multiprocessors to ensure coherence %igration of data to local caches

    *educes bandwidth for shared memory

    *eplication of readAshared data *educes contention for access Snooping protocols

    Each cache monitors bus reads/writes

    )irectoryAbased protocols &aches and memory record sharing status of

    blocks in a directory

    0n#alidating nooping Protocols

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    96/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -'

    g p g

    &ache gets eclusie access to a block

    when it is to be written -roadcasts an inalidate message on the bus Subse!uent read in another cache misses

    Fwning cache supplies updated alue&P actiity -us actiity &P +s

    cache&P -scache

    %emory

    1

    &P + reads L &ache miss for L 1 1

    &P - reads L &ache miss for L 1 1 1

    &P + writes $ to L Inalidate for L $ 1

    &P - read L &ache miss for L $ $ $

    Memory Consistency

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    97/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -(

    y y

    hen are writes seen by other processors 6Seen7 means a read returns the written alue &ant be instantaneously

     +ssumptions  + write completes only when all processors hae seen

    it  + processor does not reorder writes with other

    accesses

    &onse!uence P writes L then writes :⇒ all processors that see new : also see new L

    Processors can reorder reads, but not writes

    Mltile#el )n

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    98/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — -+

    p The+*%

    &orte

    A+<an

    d

    Intel&orei5

    %e

    moryHierarchie

    s

    2

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    99/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — --

    g

    pporting Mltiple 0sse

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    100/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — .//

    pp g p

    -oth hae multiAbanked caches that allow

    multiple accesses per cycle assuming nobank conflicts

    &ore i5 cache optimi8ations *eturn re!uested word first DonAblocking cache

    Hit under miss

    %iss under miss )ata prefetching

    %,EMM"#.$>4

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    101/105

    &ombine cache blocking and subword

    parallelism

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./.

    4oingEa

    ster&

    ache-

    lo

    ckingand%

    atri%

    ultiply

    Pitfalls"#.$#E

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    102/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./2

    -yte s. word addressing Eample @2Abyte directAmapped cache,

    >Abyte blocks -yte @= maps to block $

    ord @= maps to block > Ignoring memory system effects when

    writing or generating code

    Eample iterating oer rows s. columns ofarrays Karge strides result in poor locality

    allacies

    andPitfalls

    Pitfalls

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    103/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./3

    In multiprocessor with shared K2 or K@

    cache Kess associatiity than cores results in conflict

    misses

    %ore cores ⇒ need to increase associatiity sing +%+T to ealuate performance of

    outAofAorder processors

    Ignores effect of nonAblocked accesses Instead, ealuate performance by simulation

    Pitfalls

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    104/105

    Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — ./$

    Etending address range using segments E.g., Intel

  • 8/17/2019 Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    105/105

    ast memories are small, large memories are

    slow e really want fast, large memories &aching gies this illusion

    Principle of locality Programs use a small part of their memory space

    fre!uently

    %emory hierarchy K$ cache ↔ K2 cache ↔ G ↔ )*+% memory

    ↔ disk %emory system design is critical for

    multiprocessors

    oncluding*em

    arks