Historically, The Limiting Factor in a Computer’s

Embed Size (px)

Citation preview

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    1/29

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    2/29

    2

    EE 4504 Section 3 3

    Terminology

    Capacity: the amount of information thatcan be contained in a memory unit --usually in terms of words or bytesWord: the natural unit of organization inthe memory, typically the number of bits

    used to represent a numberAddressable unit: the fundamental dataelement size that can be addressed in thememory -- typically either the word size orindividual bytesUnit of transfer: The number of dataelements transferred at a time -- usuallybits in main memory and blocks insecondary memory

    Transfer rate: Rate at which data istransferred to/from the memory device

    EE 4504 Section 3 4

    Access time: For RAM, the time to address the unit and

    perform the transfer For non-random access memory, the time to

    position the R/W head over the desired location

    Memory cycle time: Access time plus anyother time required before a second accesscan be startedAccess technique: how are memorycontents accessed Random access:

    Each location has a unique physical address Locations can be accessed in any order and

    all access times are the same What we term RAM is more aptly called

    read/write memory since this accesstechnique also applies to ROMs as well Example: main memory

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    3/29

    3

    EE 4504 Section 3 5

    Sequential access: Data does not have a unique address Must read all data items in sequence until

    the desired item is found Access times are highly variable Example: tape drive units

    Direct access: Data items have unique addresses Access is done using a combination of

    moving to a general memory areafollowed by a sequential access to reach thedesired data item

    Example: disk drives

    EE 4504 Section 3 6

    Associative access: A variation of random access memory Data items are accessed based on their

    contents rather than their actual location Search all data items in parallel for a match

    to a given search pattern All memory locations searched in parallel

    without regard to the size of the memoryExtremely fast for large memory sizes

    Cost per bit is 5-10 times that of a normalRAM cell

    Example: some cache memory units

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    4/29

    4

    EE 4504 Section 3 7

    Memory Hierarchy

    Major design objective of any memorysystem To provide adequate storage capacity at An acceptable level of performance At a reasonable cost

    Four interrelated ways to meet this goal Use a hierarchy of storage devices Develop automatic space allocation methods for

    efficient use of the memory Through the use of virtual memory techniques,

    free the user from memory management tasks Design the memory and its related

    interconnection structure so that the processorcan operate at or near its maximum rate

    EE 4504 Section 3 8

    Basis of the memory hierarchy Registers internal to the CPU for temporary

    data storage (small in number but very fast) External storage for data and programs

    (relatively large and fast) External permanent storage (much larger and

    much slower)

    Characteristics of the memory hierarchy Consists of distinct levels of memory

    components Each level characterized by its size, access

    time, and cost per bit Each increasing level in the hierarchy consists

    of modules of larger capacity, slower accesstime, and lower cost/bit

    Goal of the memory hierarchy Try to match the processor speed with the rate

    of information transfer from the lowest elementin the hierarchy

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    5/29

    5

    EE 4504 Section 3 9

    The memory hierarchy

    Registersin the CPU

    Cache

    Main memory

    Magnetic disk

    Magnetic tapeOptical disk

    Disk cache

    EE 4504 Section 3 10

    MemoryType

    Technology Size AccessTime

    Cache SemiconductorRAM

    128-512KB

    10 ns

    MainMemory

    SemiconductorRAM

    4-128 MB 50 ns

    MagneticDisk

    Hard Disk Gigabyte 10 ms,10 MB/sec

    Optical Disk CD-ROM Gigabyte 300 ms,600 KB/sec

    MagneticTape

    Tape 100s MB Sec-min.,10MB/min

    Typical memory Parameters

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    6/29

    6

    EE 4504 Section 3 11

    The memory hierarchy works because of locality of reference Memory references made by the processor, for

    both instructions and data, tend to clustertogether

    Instruction loops, subroutines Data arrays, tables

    Keep these clusters in high speed memory toreduce the average delay in accessing data

    Over time, the clusters being referenced willchange -- memory management must deal withthis

    EE 4504 Section 3 12

    Example: Two-level memory system Level 1 access time of 1 us Level 2 access time of 10us Ave access time = H(1) + (1-H)(10) us

    Figure 4.2 2-level memory performance

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    7/29

    7

    EE 4504 Section 3 13

    Main Memory

    Core memory Used in generations 2 and 3 Magnetic cores (toroids) used to store logical 0

    or 1 state by inducing an E-field in them(hysteresis loop)

    1 core = 1 bit of storage Required addressing and sensing wires ran

    through each core Destructive readout Obsolete

    Replaced in the 1970s by semiconductormemory

    EE 4504 Section 3 14

    Semiconductor memory Typically random access RAM: actually read-write memory

    Dynamic RAMStorage cell is essentially a transistoracting as a capacitorCapacitor charge dissipates over timecausing a 1 to flip to a zeroCells must be refreshed periodically toavoid thisVery high packaging density

    Static RAM: basically an array of flip-flopstorage cells

    Uses 5-10x more transistors thansimilar dynamic cell so packagingdensity is 10x lower

    Faster than a dynamic cell

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    8/29

    8

    EE 4504 Section 3 15

    Read Only Memories (ROM) Permanent data storage ROMs

    Data is wired in during fabrication ata chip manufacturers plantPurchased in lots of 10k or more

    PROMsProgrammable ROMData can be written once by the useremploying a PROM programmerUseful for small production runs

    EPROMErasable PROMProgramming is similar to a PROMCan be erased by exposing to UV light

    EE 4504 Section 3 16

    EEPROMSElectrically erasable PROMsCan be written to many times whileremaining in a systemDoes not have to be erased firstProgram individual bytesWrites require several hundred usec perbyteUsed in systems for development,personalization, and other tasksrequiring unique information to bestored

    Flash MemorySimilar to EEPROM in using electricaleraseFast erasures, block erasures

    Higher density than EEPROM

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    9/29

    9

    EE 4504 Section 3 17

    Organization Each memory chip contains a number of 1-

    bit cells1, 4, and 16 million cell chips arecommon

    Cells can be arranged as a single bit column

    (e.g., 4Mx1) or in multiple bits per addresslocation (e.g., 1Mx4)

    To reduce pin count, address lines can bemultiplexed with data and/or as high andlow halves

    Trade off is in slower operation Typical control lines

    W* (write), OE* (output enable) forwrite and read operationsCS* (chip select) derived from externaladdress decoding logicRAS*, CAS* (row and column addressselects) used when address is applied tothe chip in 2 halves

    EE 4504 Section 3 18

    Figure 4.8 256Kx8 memory from 256Kx1 chips

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    10/29

    10

    EE 4504 Section 3 19

    Figure 4.9 1Mx8 memory from 256Kx1 chips

    EE 4504 Section 3 20

    Improvements to DRAM Basic DRAM design has not changed much

    since its development in the 70s Cache was introduced to improve

    performanceLimited to no gain in performance after

    a certain amount of cache isimplemented

    Enhanced DRAMAdd fast 1-line SRAM cache to DRAMchipConsecutive reads to the same line arefrom this cache and thus faster than theDRAM itself Tests indicate these chips can performas well as tradition DRAM-cachecombinations

    Cache DRAMUse larger SRAM cache on the chip as

    a true multi-line cacheUse it as a serial data stream buffer forblock data transfers

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    11/29

    11

    EE 4504 Section 3 21

    Error Correction

    Semiconductor memories are subject toerrors Hard (permanent) errors

    Environmental abuse Manufacturing defects

    Wear Soft (transient) errors Power supply problems Alpha particles

    Problematic as feature sizes shrink Memory systems include logic to detect and/or

    correct errors Width of memory word is increased Additional bits are parity bits Number of parity bits required depends on

    the level of detection and correction needed

    EE 4504 Section 3 22

    Figure 4.10 Basic error detection and correction circuitry

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    12/29

    12

    EE 4504 Section 3 23

    General error detection and correction A single error is a bit flip -- multiple bit flips

    can occur in a word 2M valid data words 2M+K codeword combinations in the memory Distribute the 2 M valid data words among the

    2 M+K codeword combinations such that thedistance between valid words is sufficient todistinguish the error

    Valid codeword

    Valid codeword

    1 bit flipbetween eachcodeword

    7 bit flips wouldmap the upper valid

    codeword into thelower one

    Detect up to 6 errors,Correct up to 3 errors

    EE 4504 Section 3 24

    Single error detection and correction For each valid codeword, there will be 2 K-1

    invalid codewords 2K-1 must be large enough to identify which of

    the M+K bit positions is in error Therefore 2 K-1 > M+K

    8-bit data, 4 check bits 32-bit data, 6 check bits

    Arrange bits as shown in Figure 4.12

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    13/29

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    14/29

    14

    EE 4504 Section 3 27

    If there are 2 n words in the main memory, thenthere will be M= 2 n /K blocks in the memory

    M will be much greater than the number of lines, C, in the cache

    Every line of data in the cache must be taggedin some way to identify what main memory

    block it is The line of data and its tag are stored in the

    cache Factors in the cache design

    Mapping function between main memoryand the cache

    Line replacement algorithm Write policy Block size Number and type of caches

    EE 4504 Section 3 28

    Mapping functions -- since M>>C, howare blocks mapped to specific lines incacheDirect mapping Each main memory block is assigned to a

    specific line in the cache:i = j modulo C

    where i is the cache line number assigned tomain memory block j

    If M=64, C=4:

    Line 0 can hold blocks 0, 4, 8, 12, ...Line 1 can hold blocks 1, 5, 9, 13, ...Line 2 can hold blocks 2, 6, 10, 14, ...Line 3 can hold blocks 3, 7, 11, 15, ...

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    15/29

    15

    EE 4504 Section 3 29

    Direct mapping cache treats a main memoryaddress as 3 distinct fields

    Tag identifier Line number identifier Word identifier (offset)

    Word identifier specifies the specific word (oraddressable unit) in a cache line that is to beread

    Line identifier specifies the physical line incache that will hold the referenced address

    The tag is stored in the cache along with thedata words of the line

    For every memory reference that the CPUmakes, the specific line that would hold thereference (if it is has already been copiedinto the cache) is determined

    The tag held in that line is checked to see if the correct block is in the cache

    EE 4504 Section 3 30

    Figure 4.18 Direct mapping cache organization

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    16/29

    16

    EE 4504 Section 3 31

    Example:

    Memory size of 1 MB (20 address bits)addressable to the individual byte

    Cache size of 1 K lines, each holding 8 bytes

    Word id = 3 bitsLine id = 10 bitsTag id = 7 bits

    Where is the byte stored at main memorylocation $ABCDE stored?

    $ABCDE=1010101 1110011011 110

    Cache line $39B, word offset $6, tag $55

    EE 4504 Section 3 32

    Advantages of direct mapping Easy to implement Relatively inexpensive to implement Easy to determine where a main memory

    reference can be found in cache Disadvantage

    Each main memory block is mapped to aspecific cache line

    Through locality of reference, it is possibleto repeatedly reference to blocks that mapto the same line number

    These blocks will be constantly swapped inand out of cache, causing the hit ratio to below

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    17/29

    17

    EE 4504 Section 3 33

    Associative mapping Let a block be stored in any cache line that is

    not in use Overcomes direct mappings main

    weakness Must examine each line in the cache to find the

    right memory block Examine the line tag id for each line Slow process for large caches!

    Line numbers (ids) have no meaning in thecache

    Parse the main memory address into 2fields (tag and word offset) rather than 3 asin direct mapping

    Implement cache in 2 parts The lines themselves in SRAM

    The tag storage in associative memory Perform an associative search over all tags to

    find the desired line (if its in the cache)

    EE 4504 Section 3 34

    Figure 4.20 Associate cache organization

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    18/29

    18

    EE 4504 Section 3 35

    Our memory example again ...

    Word id = 3 bitsTag id = 17 bits

    Where is the byte stored at main memorylocation $ABCDE stored?

    $ABCDE=10101011110011011 110

    Cache line unknown, word offset $6, tag$1579D

    Advantages Fast Flexible

    Disadvantage Implementation cost

    Example above has 8 KB cache andrequires 1024 x 17 = 17,408 bits of associative memory for the tags!

    EE 4504 Section 3 36

    Set associative mapping Compromise between direct and fully

    associative mappings that builds on thestrengths of both

    Divide cache into a number of sets (v), each setholding a number of lines (k)

    A main memory block can be stored in any oneof the k lines in a set such that

    set number = j modulo v If a set can hold X lines, the cache is referred to

    as an X-way set associative cache Most cache systems today that use set

    associative mapping are 2- or 4-way setassociative

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    19/29

    19

    EE 4504 Section 3 37

    Figure 4.22 Set associative cache organization(caption in the text is misleading -- not 2-way!)

    EE 4504 Section 3 38

    Our memory example again

    Assume the 1024 lines are 4-way set associative

    1024/4 = 256 sets

    Word id = 3 bitsSet id = 8 bitsTag id = 9 bits

    Where is the byte stored at main memorylocation $ABCDE stored?

    $ABCDE=101010111 10011011 110

    Cache set $9B, word offset $6, tag $157

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    20/29

    20

    EE 4504 Section 3 39

    Line replacement algorithms When an associative cache or a set associative

    cache set is full, which line should be replacedby the new line that is to be read from memory?

    Not a problem for direct mapping sinceeach block has a predetermined line it must

    use Least recently used First in first out Least frequently used Random

    EE 4504 Section 3 40

    Write policy When a line is to be replaced, must update the

    original copy of the line in main memory if anyaddressable unit in the line has been changed

    Write through Anytime a word in cache is changed, it is

    also changed in main memory Both copies always agree Generates lots of memory writes to main

    memory Write back

    During a write, only change the contents of the cache

    Update main memory only when the cacheline is to be replaced

    Causes cache coherency problems --

    different values for the contents of anaddress are in the cache and the mainmemory

    Complex circuitry to avoid this problem

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    21/29

    21

    EE 4504 Section 3 41

    Block / line sizes How much data should be transferred from

    main memory to the cache in a single memoryreference

    Complex relationship between block size andhit ratio as well as the operation of the system

    bus itself As block size increases,

    Locality of reference predicts that theadditional information transferred willlikely be used and thus increases the hitratio (good)

    Number of blocks in cache goes down,limiting the total number of blocks in thecache (bad)

    As the block size gets big, the probability of referencing all the data in it goes down (hitratio goes down) (bad)

    Size of 4-8 addressable units seems aboutright for current systems

    EE 4504 Section 3 42

    Number of caches Single vs. 2-level

    Modern CPU chips have on-board cache(L1)

    80486 -- 8KBPentium -- 16 KBPower PC -- up to 64 KB

    L1 provides best performance gains Secondary, off-chip cache (L2) provides

    higher speed access to main memory L2 is generally 512KB or less -- more than

    this is not cost-effective

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    22/29

    22

    EE 4504 Section 3 43

    Unified vs. split cache Unified cache stores data and instructions in

    1 cacheOnly 1 cache to design and operateCache is flexible and can balanceallocation of space to instructions or

    data to best fit the execution of theprogram -- higher hit ratio Split cache uses 2 caches -- 1 for

    instructions and 1 for dataMust build and manage 2 cachesStatic allocation of cache sizesCan out perform unified cache insystems that support parallel executionand pipelining (reduces cachecontention)

    EE 4504 Section 3 44

    Pentium and PowerPC Cache

    In the last 10 years, we have seen theintroduction of cache into microprocessorchipsOn board cache (L1) is supplemented withexternal fast SRAM cache (L2)

    While off-chip, it can provide zero wait stateperformance compared to a relatively slowmain memory access

    Intel family 386 had no internal cache 486 had 8KB unified cache Pentium has 16KB split cache: 8 KB data and

    8 KB instruction Pentium support 256 or 512 KB external L2

    cache that is 2-way set associative

    PowerPC 601 had 1 32 KB cache 603/604/620 has split cache of size 16/32/64

    KB

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    23/29

    23

    EE 4504 Section 3 45

    MESI Cache Coherency Protocol

    MESI protocol provides cache coherencyin both the Pentium and the PowerPCStands for Modified Exclusive

    Shared Invalid

    Implemented with an additional 2-bit fieldfor each cache lineBecomes interesting in the interactions of the L1 and the L2 caches -- each track thelocal MESI status as a line moves frommain memory to L2 and then to L1PowerPC adds an addition state A for

    allocated A line is marked as A while its data is beingswapped out

    EE 4504 Section 3 46

    Operation of 2-Level Memory

    Recall the goal of the memory system: Provide an average access time to all memory

    locations that is approximately the same as thatof the fastest memory component

    Provide a system memory with an average costapproximately equal to the cost/bit of the

    cheapest memory componentSimplistic approach,

    Ts = H1xT1 + H2(T1 + T2 + Tb21)

    H2 = 1 - H1

    T1, T2 are the access times to level 1and 2

    Tb21 is the block transfer time fromlevel 2 to level 1Can be generalized to 3 or more levels

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    24/29

    24

    EE 4504 Section 3 47

    External Memory

    Magnetic disks The disk is a metal or plastic platter coated with

    magnetizable material Data is recorded onto and later read from the

    disk using a conducting coil, the head Data is organized into concentric rings, called

    tracks, on the platter Tracks are separated by gaps Disk rotates at a constant speed -- constant

    angular velocity Number of data bits per track is a constant Data density is higher on the inner tracks

    Logical data transfer unit is the sector Sectors are identified on each track during

    the formatting process

    EE 4504 Section 3 48

    Figures 5.1 and 5.2 Disk organization

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    25/29

    25

    EE 4504 Section 3 49

    Disk characteristics Single vs. multiple platters per drive (each

    platter has its own read/write head) Fixed vs. movable head

    Fixed head has a head per track Movable head uses one head per platter

    Removable vs. nonremovable plattersRemovable platter can be removedfrom disk drive for storage of transferto another machine

    Data accessing timesSeek time -- position the head over thecorrect track Rotational latency -- wait for thedesired sector to come under the headAccess time -- seek time plus rotational

    latencyBlock transfer time -- time to read theblock (sector) off of the disk andtransfer it to main memory

    EE 4504 Section 3 50

    RAID Technology Disk drive performance has not kept pace with

    improvements in other parts of the system Limited in many cases by the electro-

    mechanical transport means Capacity of a high performance disk drive can

    be duplicated by operating many (muchcheaper) disks in parallel with simultaneousaccess

    Data is distributed across all disks With many parallel disks operating as if they

    were a single unit, redundancy techniques canbe used to guard against data loss in the unit(due to aggregate failure rate being higher)

    RAID developed at Berkeley -- RedundantArray of Independent Disks

    Six levels: 0 -- 5

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    26/29

    26

    EE 4504 Section 3 51

    RAID 0 No redundancy techniques are used Data is distributed over all disks in the array Data is divided into strips for actual storage

    Similar in operation to interleavedmemory data storage

    Can be used to support high data transferrates by having block transfer size be inmultiples of the strip

    Can support low response time by havingthe block transfer size equal a strip --support multiple strip transfers in parallel

    RAID 1 All disks are mirrored -- duplicated

    Data is stored on a disk and its mirrorRead from either the disk or its mirrorWrite must be done to both the disk

    and mirror Fault recovery is easy -- use the data on the

    mirror System is expensive!

    EE 4504 Section 3 52

    RAID 2 All disks are used for every access -- disks

    are synchronized together Data strips are small (byte) Error correcting code computed across all

    disks and stored on additional disks

    Uses fewer disks than RAID 1 but stillexpensive

    Number of additional disks isproportional to log of number of datadisks

    RAID 3 Like RAID 2 however only a single

    redundant disk is used -- the parity drive Parity bit is computed for the set of

    individual bits in the same position on alldisks

    If a drive fails, parity information on theredundant disks can be used to calculate thedata from the failed disk on the fly

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    27/29

    27

    EE 4504 Section 3 53

    RAID 4 Access is to individual strips rather than to

    all disks at once (RAID 3) Bit-by-bit parity is calculated across

    corresponding strips on each disk Parity bits stored in the redundant disk

    Write penaltyFor every write to a strip, the paritystrip must also be recalculated andwrittenThus 1 logical write equals 2 physicaldisk accessesThe parity drive is always written toand can thus be a bottleneck

    Raid 5 Parity information is distributed on data

    disks in a round-robin scheme No parity disk needed

    EE 4504 Section 3 54

    Optical disks Advent of CDs in the early 1980s

    revolutionized the audio and computerindustries

    Basic operation CD is operated using constant linear

    velocity Essentially one long track spiraled onto the

    disk Track passes under the disks head at a

    constant rate -- requires the disk to changerotational speed based on what part of thetrack you are on

    To write to the disk, a laser is used to burnpits into the track -- write once!

    During reads, a low power laser illuminatesthe track and its pits

    In the track, pits reflect light differentlythan no pits thus allowing you to store1s and 0s

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    28/29

    28

    EE 4504 Section 3 55

    Master disk is made using the laserMaster is used to press copies in amass production mechanical styleCheaper than production of information on magnetic disks

    Storage capacity roughly 775 NB or 550 3.5

    disks Transfer rate standard is 176 MB/second Only economical for production of large

    quantities of disks Disks are removable and thus archival Slower than magnetic disks

    EE 4504 Section 3 56

    WORMs -- Write Once, Read Many disks User can produce CD ROMs in limited

    quantities Specially prepared disk is written to using a

    medium power laser Can be read many times just like a normal

    CD ROM Permits archival storage of user

    information, distribution of large amountsof information by a user

    Erasable optical disk Combines laser and magnetic technology to

    permit information storage Laser heats an area that can then have an e-

    field orientation changed to alterinformation storage

    State of the e-field can be detected usingpolarized light during reads

  • 8/14/2019 Historically, The Limiting Factor in a Computers

    29/29

    29

    EE 4504 Section 3 57

    Magnetic Tape The first kind of secondary memory Still widely used

    Very cheap Very slow

    Sequential access Data is organized as records with physical

    air gaps between records One words is stored across the width of the

    tape and read using multiple read/writeheads

    EE 4504 Section 3 58

    Summary

    Goal of the memory hierarchy is toproduce a memory system that has anaverage access time of roughly the L1memory and an average cost per bitroughly equal to the lowest level in thehierarchyRange of performance spans 10 orders of magnitude!Components / levels discussed Cache Main memory Secondary memory