Upload
ullas-kumar
View
224
Download
0
Embed Size (px)
Citation preview
8/14/2019 Historically, The Limiting Factor in a Computers
1/29
8/14/2019 Historically, The Limiting Factor in a Computers
2/29
2
EE 4504 Section 3 3
Terminology
Capacity: the amount of information thatcan be contained in a memory unit --usually in terms of words or bytesWord: the natural unit of organization inthe memory, typically the number of bits
used to represent a numberAddressable unit: the fundamental dataelement size that can be addressed in thememory -- typically either the word size orindividual bytesUnit of transfer: The number of dataelements transferred at a time -- usuallybits in main memory and blocks insecondary memory
Transfer rate: Rate at which data istransferred to/from the memory device
EE 4504 Section 3 4
Access time: For RAM, the time to address the unit and
perform the transfer For non-random access memory, the time to
position the R/W head over the desired location
Memory cycle time: Access time plus anyother time required before a second accesscan be startedAccess technique: how are memorycontents accessed Random access:
Each location has a unique physical address Locations can be accessed in any order and
all access times are the same What we term RAM is more aptly called
read/write memory since this accesstechnique also applies to ROMs as well Example: main memory
8/14/2019 Historically, The Limiting Factor in a Computers
3/29
3
EE 4504 Section 3 5
Sequential access: Data does not have a unique address Must read all data items in sequence until
the desired item is found Access times are highly variable Example: tape drive units
Direct access: Data items have unique addresses Access is done using a combination of
moving to a general memory areafollowed by a sequential access to reach thedesired data item
Example: disk drives
EE 4504 Section 3 6
Associative access: A variation of random access memory Data items are accessed based on their
contents rather than their actual location Search all data items in parallel for a match
to a given search pattern All memory locations searched in parallel
without regard to the size of the memoryExtremely fast for large memory sizes
Cost per bit is 5-10 times that of a normalRAM cell
Example: some cache memory units
8/14/2019 Historically, The Limiting Factor in a Computers
4/29
4
EE 4504 Section 3 7
Memory Hierarchy
Major design objective of any memorysystem To provide adequate storage capacity at An acceptable level of performance At a reasonable cost
Four interrelated ways to meet this goal Use a hierarchy of storage devices Develop automatic space allocation methods for
efficient use of the memory Through the use of virtual memory techniques,
free the user from memory management tasks Design the memory and its related
interconnection structure so that the processorcan operate at or near its maximum rate
EE 4504 Section 3 8
Basis of the memory hierarchy Registers internal to the CPU for temporary
data storage (small in number but very fast) External storage for data and programs
(relatively large and fast) External permanent storage (much larger and
much slower)
Characteristics of the memory hierarchy Consists of distinct levels of memory
components Each level characterized by its size, access
time, and cost per bit Each increasing level in the hierarchy consists
of modules of larger capacity, slower accesstime, and lower cost/bit
Goal of the memory hierarchy Try to match the processor speed with the rate
of information transfer from the lowest elementin the hierarchy
8/14/2019 Historically, The Limiting Factor in a Computers
5/29
5
EE 4504 Section 3 9
The memory hierarchy
Registersin the CPU
Cache
Main memory
Magnetic disk
Magnetic tapeOptical disk
Disk cache
EE 4504 Section 3 10
MemoryType
Technology Size AccessTime
Cache SemiconductorRAM
128-512KB
10 ns
MainMemory
SemiconductorRAM
4-128 MB 50 ns
MagneticDisk
Hard Disk Gigabyte 10 ms,10 MB/sec
Optical Disk CD-ROM Gigabyte 300 ms,600 KB/sec
MagneticTape
Tape 100s MB Sec-min.,10MB/min
Typical memory Parameters
8/14/2019 Historically, The Limiting Factor in a Computers
6/29
6
EE 4504 Section 3 11
The memory hierarchy works because of locality of reference Memory references made by the processor, for
both instructions and data, tend to clustertogether
Instruction loops, subroutines Data arrays, tables
Keep these clusters in high speed memory toreduce the average delay in accessing data
Over time, the clusters being referenced willchange -- memory management must deal withthis
EE 4504 Section 3 12
Example: Two-level memory system Level 1 access time of 1 us Level 2 access time of 10us Ave access time = H(1) + (1-H)(10) us
Figure 4.2 2-level memory performance
8/14/2019 Historically, The Limiting Factor in a Computers
7/29
7
EE 4504 Section 3 13
Main Memory
Core memory Used in generations 2 and 3 Magnetic cores (toroids) used to store logical 0
or 1 state by inducing an E-field in them(hysteresis loop)
1 core = 1 bit of storage Required addressing and sensing wires ran
through each core Destructive readout Obsolete
Replaced in the 1970s by semiconductormemory
EE 4504 Section 3 14
Semiconductor memory Typically random access RAM: actually read-write memory
Dynamic RAMStorage cell is essentially a transistoracting as a capacitorCapacitor charge dissipates over timecausing a 1 to flip to a zeroCells must be refreshed periodically toavoid thisVery high packaging density
Static RAM: basically an array of flip-flopstorage cells
Uses 5-10x more transistors thansimilar dynamic cell so packagingdensity is 10x lower
Faster than a dynamic cell
8/14/2019 Historically, The Limiting Factor in a Computers
8/29
8
EE 4504 Section 3 15
Read Only Memories (ROM) Permanent data storage ROMs
Data is wired in during fabrication ata chip manufacturers plantPurchased in lots of 10k or more
PROMsProgrammable ROMData can be written once by the useremploying a PROM programmerUseful for small production runs
EPROMErasable PROMProgramming is similar to a PROMCan be erased by exposing to UV light
EE 4504 Section 3 16
EEPROMSElectrically erasable PROMsCan be written to many times whileremaining in a systemDoes not have to be erased firstProgram individual bytesWrites require several hundred usec perbyteUsed in systems for development,personalization, and other tasksrequiring unique information to bestored
Flash MemorySimilar to EEPROM in using electricaleraseFast erasures, block erasures
Higher density than EEPROM
8/14/2019 Historically, The Limiting Factor in a Computers
9/29
9
EE 4504 Section 3 17
Organization Each memory chip contains a number of 1-
bit cells1, 4, and 16 million cell chips arecommon
Cells can be arranged as a single bit column
(e.g., 4Mx1) or in multiple bits per addresslocation (e.g., 1Mx4)
To reduce pin count, address lines can bemultiplexed with data and/or as high andlow halves
Trade off is in slower operation Typical control lines
W* (write), OE* (output enable) forwrite and read operationsCS* (chip select) derived from externaladdress decoding logicRAS*, CAS* (row and column addressselects) used when address is applied tothe chip in 2 halves
EE 4504 Section 3 18
Figure 4.8 256Kx8 memory from 256Kx1 chips
8/14/2019 Historically, The Limiting Factor in a Computers
10/29
10
EE 4504 Section 3 19
Figure 4.9 1Mx8 memory from 256Kx1 chips
EE 4504 Section 3 20
Improvements to DRAM Basic DRAM design has not changed much
since its development in the 70s Cache was introduced to improve
performanceLimited to no gain in performance after
a certain amount of cache isimplemented
Enhanced DRAMAdd fast 1-line SRAM cache to DRAMchipConsecutive reads to the same line arefrom this cache and thus faster than theDRAM itself Tests indicate these chips can performas well as tradition DRAM-cachecombinations
Cache DRAMUse larger SRAM cache on the chip as
a true multi-line cacheUse it as a serial data stream buffer forblock data transfers
8/14/2019 Historically, The Limiting Factor in a Computers
11/29
11
EE 4504 Section 3 21
Error Correction
Semiconductor memories are subject toerrors Hard (permanent) errors
Environmental abuse Manufacturing defects
Wear Soft (transient) errors Power supply problems Alpha particles
Problematic as feature sizes shrink Memory systems include logic to detect and/or
correct errors Width of memory word is increased Additional bits are parity bits Number of parity bits required depends on
the level of detection and correction needed
EE 4504 Section 3 22
Figure 4.10 Basic error detection and correction circuitry
8/14/2019 Historically, The Limiting Factor in a Computers
12/29
12
EE 4504 Section 3 23
General error detection and correction A single error is a bit flip -- multiple bit flips
can occur in a word 2M valid data words 2M+K codeword combinations in the memory Distribute the 2 M valid data words among the
2 M+K codeword combinations such that thedistance between valid words is sufficient todistinguish the error
Valid codeword
Valid codeword
1 bit flipbetween eachcodeword
7 bit flips wouldmap the upper valid
codeword into thelower one
Detect up to 6 errors,Correct up to 3 errors
EE 4504 Section 3 24
Single error detection and correction For each valid codeword, there will be 2 K-1
invalid codewords 2K-1 must be large enough to identify which of
the M+K bit positions is in error Therefore 2 K-1 > M+K
8-bit data, 4 check bits 32-bit data, 6 check bits
Arrange bits as shown in Figure 4.12
8/14/2019 Historically, The Limiting Factor in a Computers
13/29
8/14/2019 Historically, The Limiting Factor in a Computers
14/29
14
EE 4504 Section 3 27
If there are 2 n words in the main memory, thenthere will be M= 2 n /K blocks in the memory
M will be much greater than the number of lines, C, in the cache
Every line of data in the cache must be taggedin some way to identify what main memory
block it is The line of data and its tag are stored in the
cache Factors in the cache design
Mapping function between main memoryand the cache
Line replacement algorithm Write policy Block size Number and type of caches
EE 4504 Section 3 28
Mapping functions -- since M>>C, howare blocks mapped to specific lines incacheDirect mapping Each main memory block is assigned to a
specific line in the cache:i = j modulo C
where i is the cache line number assigned tomain memory block j
If M=64, C=4:
Line 0 can hold blocks 0, 4, 8, 12, ...Line 1 can hold blocks 1, 5, 9, 13, ...Line 2 can hold blocks 2, 6, 10, 14, ...Line 3 can hold blocks 3, 7, 11, 15, ...
8/14/2019 Historically, The Limiting Factor in a Computers
15/29
15
EE 4504 Section 3 29
Direct mapping cache treats a main memoryaddress as 3 distinct fields
Tag identifier Line number identifier Word identifier (offset)
Word identifier specifies the specific word (oraddressable unit) in a cache line that is to beread
Line identifier specifies the physical line incache that will hold the referenced address
The tag is stored in the cache along with thedata words of the line
For every memory reference that the CPUmakes, the specific line that would hold thereference (if it is has already been copiedinto the cache) is determined
The tag held in that line is checked to see if the correct block is in the cache
EE 4504 Section 3 30
Figure 4.18 Direct mapping cache organization
8/14/2019 Historically, The Limiting Factor in a Computers
16/29
16
EE 4504 Section 3 31
Example:
Memory size of 1 MB (20 address bits)addressable to the individual byte
Cache size of 1 K lines, each holding 8 bytes
Word id = 3 bitsLine id = 10 bitsTag id = 7 bits
Where is the byte stored at main memorylocation $ABCDE stored?
$ABCDE=1010101 1110011011 110
Cache line $39B, word offset $6, tag $55
EE 4504 Section 3 32
Advantages of direct mapping Easy to implement Relatively inexpensive to implement Easy to determine where a main memory
reference can be found in cache Disadvantage
Each main memory block is mapped to aspecific cache line
Through locality of reference, it is possibleto repeatedly reference to blocks that mapto the same line number
These blocks will be constantly swapped inand out of cache, causing the hit ratio to below
8/14/2019 Historically, The Limiting Factor in a Computers
17/29
17
EE 4504 Section 3 33
Associative mapping Let a block be stored in any cache line that is
not in use Overcomes direct mappings main
weakness Must examine each line in the cache to find the
right memory block Examine the line tag id for each line Slow process for large caches!
Line numbers (ids) have no meaning in thecache
Parse the main memory address into 2fields (tag and word offset) rather than 3 asin direct mapping
Implement cache in 2 parts The lines themselves in SRAM
The tag storage in associative memory Perform an associative search over all tags to
find the desired line (if its in the cache)
EE 4504 Section 3 34
Figure 4.20 Associate cache organization
8/14/2019 Historically, The Limiting Factor in a Computers
18/29
18
EE 4504 Section 3 35
Our memory example again ...
Word id = 3 bitsTag id = 17 bits
Where is the byte stored at main memorylocation $ABCDE stored?
$ABCDE=10101011110011011 110
Cache line unknown, word offset $6, tag$1579D
Advantages Fast Flexible
Disadvantage Implementation cost
Example above has 8 KB cache andrequires 1024 x 17 = 17,408 bits of associative memory for the tags!
EE 4504 Section 3 36
Set associative mapping Compromise between direct and fully
associative mappings that builds on thestrengths of both
Divide cache into a number of sets (v), each setholding a number of lines (k)
A main memory block can be stored in any oneof the k lines in a set such that
set number = j modulo v If a set can hold X lines, the cache is referred to
as an X-way set associative cache Most cache systems today that use set
associative mapping are 2- or 4-way setassociative
8/14/2019 Historically, The Limiting Factor in a Computers
19/29
19
EE 4504 Section 3 37
Figure 4.22 Set associative cache organization(caption in the text is misleading -- not 2-way!)
EE 4504 Section 3 38
Our memory example again
Assume the 1024 lines are 4-way set associative
1024/4 = 256 sets
Word id = 3 bitsSet id = 8 bitsTag id = 9 bits
Where is the byte stored at main memorylocation $ABCDE stored?
$ABCDE=101010111 10011011 110
Cache set $9B, word offset $6, tag $157
8/14/2019 Historically, The Limiting Factor in a Computers
20/29
20
EE 4504 Section 3 39
Line replacement algorithms When an associative cache or a set associative
cache set is full, which line should be replacedby the new line that is to be read from memory?
Not a problem for direct mapping sinceeach block has a predetermined line it must
use Least recently used First in first out Least frequently used Random
EE 4504 Section 3 40
Write policy When a line is to be replaced, must update the
original copy of the line in main memory if anyaddressable unit in the line has been changed
Write through Anytime a word in cache is changed, it is
also changed in main memory Both copies always agree Generates lots of memory writes to main
memory Write back
During a write, only change the contents of the cache
Update main memory only when the cacheline is to be replaced
Causes cache coherency problems --
different values for the contents of anaddress are in the cache and the mainmemory
Complex circuitry to avoid this problem
8/14/2019 Historically, The Limiting Factor in a Computers
21/29
21
EE 4504 Section 3 41
Block / line sizes How much data should be transferred from
main memory to the cache in a single memoryreference
Complex relationship between block size andhit ratio as well as the operation of the system
bus itself As block size increases,
Locality of reference predicts that theadditional information transferred willlikely be used and thus increases the hitratio (good)
Number of blocks in cache goes down,limiting the total number of blocks in thecache (bad)
As the block size gets big, the probability of referencing all the data in it goes down (hitratio goes down) (bad)
Size of 4-8 addressable units seems aboutright for current systems
EE 4504 Section 3 42
Number of caches Single vs. 2-level
Modern CPU chips have on-board cache(L1)
80486 -- 8KBPentium -- 16 KBPower PC -- up to 64 KB
L1 provides best performance gains Secondary, off-chip cache (L2) provides
higher speed access to main memory L2 is generally 512KB or less -- more than
this is not cost-effective
8/14/2019 Historically, The Limiting Factor in a Computers
22/29
22
EE 4504 Section 3 43
Unified vs. split cache Unified cache stores data and instructions in
1 cacheOnly 1 cache to design and operateCache is flexible and can balanceallocation of space to instructions or
data to best fit the execution of theprogram -- higher hit ratio Split cache uses 2 caches -- 1 for
instructions and 1 for dataMust build and manage 2 cachesStatic allocation of cache sizesCan out perform unified cache insystems that support parallel executionand pipelining (reduces cachecontention)
EE 4504 Section 3 44
Pentium and PowerPC Cache
In the last 10 years, we have seen theintroduction of cache into microprocessorchipsOn board cache (L1) is supplemented withexternal fast SRAM cache (L2)
While off-chip, it can provide zero wait stateperformance compared to a relatively slowmain memory access
Intel family 386 had no internal cache 486 had 8KB unified cache Pentium has 16KB split cache: 8 KB data and
8 KB instruction Pentium support 256 or 512 KB external L2
cache that is 2-way set associative
PowerPC 601 had 1 32 KB cache 603/604/620 has split cache of size 16/32/64
KB
8/14/2019 Historically, The Limiting Factor in a Computers
23/29
23
EE 4504 Section 3 45
MESI Cache Coherency Protocol
MESI protocol provides cache coherencyin both the Pentium and the PowerPCStands for Modified Exclusive
Shared Invalid
Implemented with an additional 2-bit fieldfor each cache lineBecomes interesting in the interactions of the L1 and the L2 caches -- each track thelocal MESI status as a line moves frommain memory to L2 and then to L1PowerPC adds an addition state A for
allocated A line is marked as A while its data is beingswapped out
EE 4504 Section 3 46
Operation of 2-Level Memory
Recall the goal of the memory system: Provide an average access time to all memory
locations that is approximately the same as thatof the fastest memory component
Provide a system memory with an average costapproximately equal to the cost/bit of the
cheapest memory componentSimplistic approach,
Ts = H1xT1 + H2(T1 + T2 + Tb21)
H2 = 1 - H1
T1, T2 are the access times to level 1and 2
Tb21 is the block transfer time fromlevel 2 to level 1Can be generalized to 3 or more levels
8/14/2019 Historically, The Limiting Factor in a Computers
24/29
24
EE 4504 Section 3 47
External Memory
Magnetic disks The disk is a metal or plastic platter coated with
magnetizable material Data is recorded onto and later read from the
disk using a conducting coil, the head Data is organized into concentric rings, called
tracks, on the platter Tracks are separated by gaps Disk rotates at a constant speed -- constant
angular velocity Number of data bits per track is a constant Data density is higher on the inner tracks
Logical data transfer unit is the sector Sectors are identified on each track during
the formatting process
EE 4504 Section 3 48
Figures 5.1 and 5.2 Disk organization
8/14/2019 Historically, The Limiting Factor in a Computers
25/29
25
EE 4504 Section 3 49
Disk characteristics Single vs. multiple platters per drive (each
platter has its own read/write head) Fixed vs. movable head
Fixed head has a head per track Movable head uses one head per platter
Removable vs. nonremovable plattersRemovable platter can be removedfrom disk drive for storage of transferto another machine
Data accessing timesSeek time -- position the head over thecorrect track Rotational latency -- wait for thedesired sector to come under the headAccess time -- seek time plus rotational
latencyBlock transfer time -- time to read theblock (sector) off of the disk andtransfer it to main memory
EE 4504 Section 3 50
RAID Technology Disk drive performance has not kept pace with
improvements in other parts of the system Limited in many cases by the electro-
mechanical transport means Capacity of a high performance disk drive can
be duplicated by operating many (muchcheaper) disks in parallel with simultaneousaccess
Data is distributed across all disks With many parallel disks operating as if they
were a single unit, redundancy techniques canbe used to guard against data loss in the unit(due to aggregate failure rate being higher)
RAID developed at Berkeley -- RedundantArray of Independent Disks
Six levels: 0 -- 5
8/14/2019 Historically, The Limiting Factor in a Computers
26/29
26
EE 4504 Section 3 51
RAID 0 No redundancy techniques are used Data is distributed over all disks in the array Data is divided into strips for actual storage
Similar in operation to interleavedmemory data storage
Can be used to support high data transferrates by having block transfer size be inmultiples of the strip
Can support low response time by havingthe block transfer size equal a strip --support multiple strip transfers in parallel
RAID 1 All disks are mirrored -- duplicated
Data is stored on a disk and its mirrorRead from either the disk or its mirrorWrite must be done to both the disk
and mirror Fault recovery is easy -- use the data on the
mirror System is expensive!
EE 4504 Section 3 52
RAID 2 All disks are used for every access -- disks
are synchronized together Data strips are small (byte) Error correcting code computed across all
disks and stored on additional disks
Uses fewer disks than RAID 1 but stillexpensive
Number of additional disks isproportional to log of number of datadisks
RAID 3 Like RAID 2 however only a single
redundant disk is used -- the parity drive Parity bit is computed for the set of
individual bits in the same position on alldisks
If a drive fails, parity information on theredundant disks can be used to calculate thedata from the failed disk on the fly
8/14/2019 Historically, The Limiting Factor in a Computers
27/29
27
EE 4504 Section 3 53
RAID 4 Access is to individual strips rather than to
all disks at once (RAID 3) Bit-by-bit parity is calculated across
corresponding strips on each disk Parity bits stored in the redundant disk
Write penaltyFor every write to a strip, the paritystrip must also be recalculated andwrittenThus 1 logical write equals 2 physicaldisk accessesThe parity drive is always written toand can thus be a bottleneck
Raid 5 Parity information is distributed on data
disks in a round-robin scheme No parity disk needed
EE 4504 Section 3 54
Optical disks Advent of CDs in the early 1980s
revolutionized the audio and computerindustries
Basic operation CD is operated using constant linear
velocity Essentially one long track spiraled onto the
disk Track passes under the disks head at a
constant rate -- requires the disk to changerotational speed based on what part of thetrack you are on
To write to the disk, a laser is used to burnpits into the track -- write once!
During reads, a low power laser illuminatesthe track and its pits
In the track, pits reflect light differentlythan no pits thus allowing you to store1s and 0s
8/14/2019 Historically, The Limiting Factor in a Computers
28/29
28
EE 4504 Section 3 55
Master disk is made using the laserMaster is used to press copies in amass production mechanical styleCheaper than production of information on magnetic disks
Storage capacity roughly 775 NB or 550 3.5
disks Transfer rate standard is 176 MB/second Only economical for production of large
quantities of disks Disks are removable and thus archival Slower than magnetic disks
EE 4504 Section 3 56
WORMs -- Write Once, Read Many disks User can produce CD ROMs in limited
quantities Specially prepared disk is written to using a
medium power laser Can be read many times just like a normal
CD ROM Permits archival storage of user
information, distribution of large amountsof information by a user
Erasable optical disk Combines laser and magnetic technology to
permit information storage Laser heats an area that can then have an e-
field orientation changed to alterinformation storage
State of the e-field can be detected usingpolarized light during reads
8/14/2019 Historically, The Limiting Factor in a Computers
29/29
29
EE 4504 Section 3 57
Magnetic Tape The first kind of secondary memory Still widely used
Very cheap Very slow
Sequential access Data is organized as records with physical
air gaps between records One words is stored across the width of the
tape and read using multiple read/writeheads
EE 4504 Section 3 58
Summary
Goal of the memory hierarchy is toproduce a memory system that has anaverage access time of roughly the L1memory and an average cost per bitroughly equal to the lowest level in thehierarchyRange of performance spans 10 orders of magnitude!Components / levels discussed Cache Main memory Secondary memory