MICROPROCESSOR MEMORY ORGANIZATION 1. 3.1 Introduction 3.2 Main memory 3.3 Microprocessor on-chip memory management unit and cache 2

MICROPROCESSORMEMORY ORGANIZATION

1

3.1 Introduction 3.2 Main memory 3.3 Microprocessor on-chip memory

management unit and cache

2

A memory unit is an integral part of any microcomputer, and its primary purpose is to hold instructions and data.

Memory system can be divided into three groups:

1. Microprocessor memory2. Primary or main memory3. Secondary memory

3

Microprocessor memory is a set of microprocessor registers, used to hold temporary results

Main memory is the storage area in which all programs are executed, include ROM & RAM

Secondary memory (Electromechanical memory )devices such as hard disks, also called virtual memory.

The microcomputer cannot execute programs stored in the secondary memory directly, so to execute these programs the microcomputer must

transfer them to its main memory by a program called the operating system.

4

Microprocessor memory

main memory Secondary memory

The fastest The slower The slowest

The smallest The Largest The larger

5

8-bit microprocessors:The memory is divided into a number of 8-bit

units called memory words. An 8-bit unit of data is termed a byte. Therefore, for an 8-bit microprocessor,memory word and memory byte mean the same thing.

6

16-bit microprocessors: The memory is divided into a word contains

2 bytes (16 bits). A memory word is identified in the memory by an address.

For example, the Pentium microprocessor uses 32-bit addresses for accessing memory words.

This provides a maximum of 232 = 4,294,964,296 = 4 GB of memory addresses, ranging from 00000000,, to FFFFFFFF,, in hexadecimal.

7

Intel Pentium microprocessors: The memory is divided into segments Segment = 216 =64KB= addressed by16bits

8

Intel Pentium microprocessors (1MB):

9

LOW bit for segment number

High bit for address

I MB memory 220 / 216 = 24

For example, the computer uses 24 address pins to address 224= 16 MB of memory directly with addresses from 000000,, to FFFFFF,,.

10

No. of segment =size of memory / size of one segment(216)

An important characteristic of a memory is whether it is volatile or nonvolatile.

The contents of a volatile memory are lost if the power is turned off.

On the other hand, a nonvolatile memory retains its contents after power is switched off. ROM is a typical example of nonvolatile memory. RAM is a volatile memory.

11

ROMs can only be read, so is nonvolatile memory.

CMOS technology is used to fabricate it ROMs are divided to: mask ROM and

erasable PROM(EPROM), and EAROM (electrically alterable ROM)[also called EEPROM or E2PROM (electrically erasable PROM)]

12

13

Mask ROMs are programmed by a masking operation performed on a chip during the manufacturing process. The contents of mask ROMs are permanent and cannot be

changed by the user. EPROMs can be programmed, and their contents

can also be altered by using special equipment, called an EPROM programmer.

When designing a microcomputer for a particular application, permanent programs are stored in ROMs. Control memories

used to microprogram the control unit are ROMs.

14

EPROMs can be reprogrammed and erased. The chip must be removed from the microcomputer system for programming. This memory is erased by exposing the chip to ultraviolet light

Typical erase times vary between 10 and 20 min.

15

EAROMs can be programmed without removing the memory from the ROM’s sockets.

These memories are also called read-mostly memories (RMMs), because they have much slower write times than read times. Therefore, these memories are usually suited for operations when mostly reading rather that writing will be performed.

Another type of memory, called Flush memory(nonvolatile), is designed using a combination of EPROM and E2PROM technologies.

Flash memory can be reprogrammed electrically while embedded on the board. An example of flash memory is used in cellular phones and digital cameras.

16

There are two types of RAM: static RAM (SRAM), and dynamic RAM (DRAM).

17

SRAM DRAM

stores data in flip-flops. stores data in capacitors.

memory does not need to be refreshed.

it can hold data for a few milliseconds, need to be refreshed

have lower densities have higher densities

DRAMs are inexpensive, occupy less space, and dissipate less power than SRAMs.

Two enhanced versions of DRAM are ED0 DRAM (extended data output DRAM) and SDRAM (synchronous DRAM).

The ED0 DRAM provides fast access by allowing the DRAM controller to output the next address at the same time the current data is being read.

An SDRAM contains multiple DRAMs (typically, four) internally. SDRAMs utilize the multiplexed addressing of conventional DRAMs.

18

We consider the instruction fetch, memory READ, and memory WRITE timing diagrams

19

20

READ timing 1. The microprocessor performs the instruction fetch cycle as

before to READ the opcode.2. The microprocessor interprets the op-code as a memory

READ operation.3. When the clock pin signal goes HIGH, the microprocessor

places the contents of the memory address register on the address pins A0,-A15,, of the chip.

4. At the same time, the microprocessor raises the READ pin signal to HIGH.

5. The logic external to the microprocessor gets the contents of the location in the main ROM/RAM addressed by the memory address register and places it on the data bus.

6. Finally, the microprocessor gets this data from the data bus via pins D0, - D7, and stores it in an internal register.

21

22

Write timing 1. When the clock pin signal goes HIGH, the

microprocessor places the contents of the memory address register on the address pins A0,-A15,, of the chip.

2. At the same time, the microprocessor raises the WRITE pin signal to HIGH.

3. The microprocessor places data to be stored from the contents of an internal register onto data pins Do-D7,.

4. The logic external to the microprocessor stores the data from the register into a RAM location addressed by the memory address register.

23

DRAM Organization DRAMs are typically used when memory requirements are 16K words or larger. DRAM is addressed via row and column

addressing.

24

DRAM Organization 1 -Mb (one megabit) DRAM requiring 20

address bits is addressed using 10 address lines and two control lines, RAS (row address strobe) and CAS (column address strobe).

To provide a 20-bit address into the DRAM, a LOW is applied to RAS and 1 0 bits of the address are latched. The other 10 bits of the address are applied next and CAS is then held LOW.

25

The addressing capability of the DRAM can be increased by a factor of 4 by adding

External logic is required to generate the RAS and CAS signals and to output the current address bits to the DRAM.

26

220 X 4 = 220 X 22

DRAM controller chips take care of the refreshing and timing requirements needed by DRAMs. DRAMs typically require a 4-ms refresh time, it sends a wait signal to the microprocessor if the microprocessor tries to access memory during a refresh cycle

27

Memory Array Design means: interconnecting several memory chips.

A microprocessor can address directly a maximum of 216 = 65,536 or 64K bytes of memory locations.

28

The control line M /IO goes LOW if the microprocessor executes an I/O instruction; it is held HIGH if the microprocessor executes a memory instruction.

29

30

Chip Select

M/IO

31

Disable

To connect a microprocessor to ROM/RAM chips, two address-decoding techniques are commonly used: linear decoding and full decoding.

32

linear decodingSuppose we have 4K SRAM chip array

comprised of the four 1K SRAM chips of Figure 3.7

See Figure 3.8

33

linear decoding

34

35

linear decoding Advantage does not require decoding hardware. linear decoding Disadvantage1. If two or more of lines A10-A13are low at the

same time, more than one SRAM chip are selected, and this causes a bus conflict.

Solution :software must be written such that it never reads into or writes from any address in which more than one of bits A10-A13are low.

36

linear decoding Disadvantage (cont.)2. Wastes a large amount of address space. For

example, whenever the address value is B800 or 3800, SRAM chip I is selected. (this situation is also called (memory foldback).

Solution: To resolve problems with linear decoding, we use full decoded memory addressing.

The system of Figure 3.8 can be expanded up to a total capacity of 6K using A14, and A15, as chip selects for two more 1K SRAM chips.

37

full decoding. Use Decoder In Figure 3.9 the decoder

output selects one of the four IK SRAM chips,

depending on the values of A12, A11, and A10(Table3.3).

38

39

Note that the decoder output will be enabled only when E3 = E2 = 0 and E l = 1.

Using 3X8 decoder, when any one of the high-order bits A15, A14,or A13, is 1, the decoder will be disabled, and thus none of the SRAM chips will be selected.

40

41

Typical 32-bit microprocessors such as the Pentium contain on-chip memory management unit hardware and on-chip cache memory. These topics are discussed next.

42

Because access to a hard disk, system throughput will be reduced to unacceptable levels. An obvious solution is to use a large and fast locally accessed semiconductor memory. Unfortunately, the storage cost per bit for this solution is very high.

A combination of both off-board disk (secondary memory) and on-board semiconductor main memory must be designed into a system.

43

Memory management unit (MMU):a device, located between the

microprocessor and memory, to control accesses,perform address mappings, and act as an interface between the logical (programmer’s

memory) and physical (microprocessor’s directly addressable memory) address spaces.

44

MMU address translation: It translates logical program addresses to

physical memory address. Note that in assembly language programming, addresses are referred to by symbolic names.

These addresses in a program are called logical addresses because they indicate the logical positions of instructions and data.

45

MMU address translation: The MMU can perform address translation in

one of two ways:1. By using the substitution technique [Figure

3.10(a)].2. By adding an offset to each logical address

to obtain the corresponding physical address [Figure 3.10(b)].

46

47

MMU address translation:

MMU address translation: Address translation using the substitution

technique is faster than translation using the offset method. However, the offset

method has the advantage of mapping a logical address to any physical address as determined by the offset value.

48

MMU address translation: Memory is usually divided into small

manageable units: page and segment. Paging divides the memory into equal sized

pages; segmentation divides the memory into variable-sized segments.

It is relatively easier to implement the address translation table if the logical and main memory spaces are divided into pages.

49

MMU address translation (mapping):There are three ways to map logical

addresses to physical addresses: paging, segmentation, and combined paging-segmentation.

50

The paging method The virtual memory system is managed by both hardware and software. The

hardware included in the memory management unit

handles address translation. The memory management software in the operating system performs all functions, including page replacement policies to provide efficient memory utilization.

51

The Segmentation method an MMU utilizes the segment selector to

obtain a descriptor from a table in memory containing several descriptors. A descriptor contains the physical base address for a segment, the segment’s privilege level, and some control bits.

52

The Segmentation method When the MMU obtains a logical address

from the microprocessor, it first determines whether the segment is already in physical memory. If it is, the MMU adds an offset component to the segment base component of the address obtained from the segment descriptor table to provide the physical address. The MMU then generates the physical address on the address bus for selecting the memory.

53

The paged-segmentation method each segment contains a number of pages.

The logical address is divided into three components: segment, page, and word.

A page component of n bits can provide up to 2npages.

A segment can be assigned with one or more pages up to maximum of 2n pages;

therefore, a segment size depends on the number of pages assigned to it.

54

The Virtual memory The key idea behind the virtual memory is

to allow a user program to address more locations than those available in a physical memory.

An address generated by a user program is called a virtual address

55

The performance of a microprocessor system can be improved significantly by introducing

a small, expensive, but fast memory between the microprocessor and main memory.

56

57

a cache memory is very small in size and its access time is less than that of the main memory by a factor of 5. Typically, the access times of the cache and main memories are 100 and 500 ns, respectively.

A cache hit means : reference is found in the cache,

A cache miss means : reference is not found in the cache,

58

The relationship between the cache and main memory blocks is established using mapping techniques. Three widely used mapping techniques are direct mapping, fully associative mapping, and set-associative mapping.

59

Direct mapping, Direct mapping uses a RAM for the cache. The

microprocessor’s 12-bit address is divided into two fields, an index field and a tag field. Because the cache address is 8 bits

wide (28 = 256), the low-order 8 bits of the microprocessor’s address form the index field, and the remaining 4 bits constitute the tag field.

In general, if the main memory address field is m bits wide and the cache memory address is n bits wide, the index field will then require n bits and the tag field will be (m - n )

60

61

Direct mapping,

Direct mapping, The microprocessor first accesses the cache. If

there is a hit, the microprocessor accepts the 16-bit word from the cache. In case

of a miss, the microprocessor reads the desired 16-bit word from the main memory, and

this 16-bit word is then written to the cache. A cache memory may contain

instructions only (Instruction cache) or data only (data cache) or both instructions and data

(unified cache).

62

63

Numerical example for Direct mapping

Example :The content of index address 00 of cache is tag =

0 and data = 0 13F. Suppose that a microprocessor wants to access the memory address 100. The index address 00 is used to access the cache. Memory address tag 1 is compared with cache tag 0. This does not produce a match. Therefore, the main memory is accessed and the data 27 14 is transferred into the microprocessor. The cache word at index address 00 is then replaced by a tag of 1 and data of 27 14.

64

One of the main drawbacks of direct mapping is that numerous misses may occur if two or more words with addresses that have the same index but different tags are accessed

several times.

65

Fully associative mapping The fastest and most expensive cache

memory Each element in associative memory

contains a main memory address and its content (data).

66

Fully associative mappingWhen the microprocessor generates a main

memory address, it is compared associatively (simultaneously) with all addresses in the associative memory. If there is a match, the corresponding data word is read from the associative cache memory and sent to the microprocessor. If a miss occurs, the main memory is accessed and the address and its corresponding data are written to the associative cache memory.

67

Fully associative mapping

68

Fully associative mapping Each word in the cache is a 12-bit address

along with its 16-bit contents (data). When the microprocessor wants to access memory, the 12-bit address is placed in an address register and the associative cache memory is searched for a matching address. Suppose that the content of the microprocessor address register is 445. Because there is a match, the microprocessor reads the corresponding data OFAl into an internal data register.

69

Set-associative mapping. a combination of direct and associative

mapping. cache word stores two or more main

memory words using the same index address. Each

main memory word consists of a tag and its data word. An index with two or more tags and data words forms a set

70

Set-associative mapping. When the microprocessor generates a

memory request, the index of the main memory address is used as the cache address. The tag field of the main memory address is then compared associatively (simultaneously) with all tags stored under the index. If a match occurs, the desired dataword is read. If a match does not occur, the data word, along with its tag, is read from main memory and written into the cache

71

Set-associative mapping.

72

Set-associative mapping.The size of a set is defined by the number of tag

and data items in a cache word. A set size of 2 is used in this example. Each index address contains two data words and their associated tags. Each tag includes 4 bits, and each data word contains 16 bits. Therefore, the word length = 2 x (4 + 16) = 40 bits. An index address of 8 bits can represent 256 words. Hence, the size of the cache memory is 256 x 40. It can store 512 main memory words

73

How to write on cache : There are two ways of writing into cache:

the write-back and write-through methods.

74

The write-back method whenever the microprocessor writes

something into a cache word, a “dirty” bit is assigned to the cache word. When a dirty word is to be replaced with a new word, the dirty word is first copied into the main memory before it is overwritten by the incoming new word.

The advantage of this method is that it avoids

unnecessary writing into main memory.

75

The write-through method, whenever the microprocessor alters a cache

address, the same alteration is made in the main memory copy of the altered cache address.

This policy is easily implemented and ensures that the contents of the main memory are always valid. This feature is desirable in a multiprocesssor system, in which the main memory is shared by several processors.

76

A valid bit used to ensures proper utilization of the cache.

It is an extra bit contains in the tag directory When the power is turned on, the valid bit

corresponding to each cache block entry of the tag directory is reset to zero. This is done to indicate that the cache block holds invalid data.

When a block of data is transferred from the main memory to a cache block, the valid bit corresponding to this cache block is set to 1.

77

Finally, microprocessors such as the Intel Pentium I1 support two levels of cache, L1 (level 1) and L2 ( level 2) cache memories.

The L1 cache (smaller in size) is contained inside the processor chip while the L2 cache

(larger in size) is interfaced external to the microprocessor.

78

The L 1 cache normally provides separate instruction and data caches. The processor can access the L1 cache directly and the L2 cache normally supplies instructions and data to the L1 cache.

The L2 cache is usually accessed by the microprocessor only if L 1 misses occur. This two-level cache memory enhances microprocessor performance.

79

Documents

MICROPROCESSOR MEMORY ORGANIZATION 1. 3.1 Introduction 3.2 Main memory 3.3 Microprocessor on-chip memory management unit and cache 2