Upload
karthick77
View
216
Download
0
Embed Size (px)
Citation preview
8/3/2019 Memory and i
1/52
MEMORY AND I/O INTERFACING
MEMORY
Memory is an important part of embedded systems. The cost and performance of an embedded
system heavily depends on the kind of memory devices it utilizes. In this section we will discuss
about Memory Classification, Memory Technologies and Memory Management.
(1) Memory Classification
Memory Devices can be classified based on following characteristics
(a) Accessibility
(b) Persitance of Storage
(c) Storage Density & Cost
(d) Storage Media(f) Power Consumption
Accessibility
Memory devices can provide Random Access, Serial Access or Block Access. In a Random
Access memory, each word in memory can be directly accessed by specifying the address of this
memory word. RAM, SDRAMs, and NOR Flash are examples of Random Access Memories. In
a Serial Access Memory, all the previous words (previous to the word being accessed) need to be
accessed, before accessing a desired word. I2C PROM and SPI PROM are examples of Serial
Access Memories. In Block Access Memories, entire memory is sub-divided in to small blocks
(generally of the order of a KByte) of memory. Each block can be randomly accessed, and each
word in a given block can be serially accessed. Hard Disks and NAND flash employ a similar
mechanism. Word access time for a RAM (Random Access Memory) is independent of the
word location. This is desirable of high speed application making frequent access to the memory.
Persistence of Storage
Memory devices can provide Volatile storage or a non-Volatile stroage. In a non-Volatile
storage, the memory contents are preserved even after power shut down. Whereas a Volatile
memory looses its contents, after power shut down. Non-Volatile storage is needed for storing
application code, and re-usable data. However volatile memory can be used for all temporary
storages. RAM, SDRAM are examples of volatile memory. Hard Disks, Flash (NOR & NAND)
Memories, SD-MMC, and ROM are example of non-Volatile storages.
8/3/2019 Memory and i
2/52
Storage Cells
Memory Device may employ electronic (in terms of transistors or electron states) storage,
magnetic storage or optical storage. RAM, SDRAM are examples of electronic storage. Hard
Disks are example of magnetic storage. CDs (Compact Discs) are example of optical storage.
Old Computers also employed magnetic storage (magnetic storages are still common in some
consumer electronics products).
Storage Density & Cost
Storage Density (number of bits which can be stored per unit area) is generally a good meausre
of cost. Dense memories (like SDRAM) are much cheaper than their counterparts (like SRAM).
Power Consumption
Low Power Consumption is highly desirable in Battery Powered Embedded Systems. Such
systems generally employ memory devices which can operate at low (and ultra low) Voltage
levels. Mobile SDRAMs are example of low power memories.
(2) Memory Technologies
RAM
RAM stands for Random Access Memory. RAMs are simplest and most common form of data
storage. RAMs are volatile. The figure below shows typical Data, Address and Control Signals
on a RAM. The number of words which can be stored in a RAM are proportional (exponential of
two) to the number of address buses available. This severely restricts the storage capacity ofRAMs (A 32 GB RAM will require 36 Address lines) because designing circuit boards with
more signal lines directly adds to the complexity and cost.
DPRAM (Dual Port RAM)
DPRAM are static RAMs with two I/O ports. These two ports access the same memory locations
- hence DPRAMs are generally used to implement Shared Memories in Dual Processor Systems.
The operations performed on a single port are identical to any RAM. There are some common
problems associated with usage of DPRAM:
(a) Possible of data corruption when both ports are trying to access the same memory location -
Most DPRAM devices provide interlocked memory accesses to avoid this problem.
(b) Data Coherency when Cache scheme is being used by the processor accessing DPRAM -
This happens because any data modifications (in the DPRAM) by one processor are unknown to
the Cache controller of other processor. In order to avoid such issues, Shared memories are not
8/3/2019 Memory and i
3/52
mapped to the Cacheable space. In case processor's cache configuration is not flexible enough (to
define the shared memory space as non-cacheable), the cache needs to be flushed before
performing any reads from this memory space.
Dynamic RAM
Dynamic RAMs use a different storage technique for data storage. A Static RAM has four
transistors per memory cell, whereas Dynamic RAMs have only one transistor per memory cell.
The DRAMs use capactive storage. Since the capacitor can loose charge, these memories need to
be refreshed periodically. This makes DRAMs more complex (because we need to have extra
control) and power consuming. However, DRAMs have a very high storage density (as
compared to static RAMs) and are much cheaper in cost. DRAMs are generally accessed in
terms of rows, columns and pages which significantly reduces the number of address buses
(another advantage over RAM). Generally you need a SDRAM controller (which manages
different SDRAM commands and Address translation) to access a SDRAM. Most of the modern
processors come with an on-chip SDRAM controller.
OTP- EPROM, UV-EPROM and EEPROM
EPROMs (Electrically Programmable writable Read Only Memory) are non-volatile memories.
Contents of ROM can be randomly accessed - but generally the word RAM is used to refer to
only the volatile random access memories. The operating voltage for writing in to the EPROMs
is much higher than the operating voltage. Hence you can write in to a PROM in-circuit (which
signifies ROM). You need special programming stations (which have write mechanism) to write
in to the EPROMs.
OTP-EPROMs are One Time Programmable. Contents of these memories can not be changed,
once written. UV-EPROM are UV erasable EPROMs. Exposure of memory cells, to UV light
erases the exisiting contents of these memories and these can be re-programmed after that.
EEPROM are Eletricaly Erasable EPROMs. These can be erased electrically (generally on the
same programming station where you write in to them). The write cycles (number of times you
can erase and re-write) for UV-EPROM and EEPROM is fairly limited. Erasable PROMs use
either FLOTOX (Floating gate Tunnel Oxide) or FAMOS (Floating gate Avalanche MOS)
technology.
Flash (NOR)
8/3/2019 Memory and i
4/52
Flash (or NOR-Flash to be more accurate) are quite similar to EEPROM in usage and can be
considered in the class of EEPROM (since it is electically erasable). However there are a few
differences. Firstly, the flash devices are in-circuit programmable. Secondly, these are much
cheaper as compared to the conventional EEPROMs. These days (NOR) Flash are widely used
for storing the boot code.
NAND FLASH
These memories are more dense and cheaper than NOR Flash. However these memories are
block accessible, and can not be used for code execution. These devices are mostly used for Data
Storage (since it is cheaper than NOR flash). However some systems use them for storing the
boot codes (these can be used with external hardware or with built-in NAND boot logic in the
processor).
SD-MMC
SD-MMC cards provide a cheaper mean of mass storage. These memory cards can provide
storage capacity of the order of GBytes. These cards are very compact and can be used with
portable systems. Most modern hand-held devices requiring mass storage (e.g. still and video
cameras) use Memory cards for storage.
Hard Disc
Hard Discs are Optical Memory devices. These devices are bulky and they require another bulky
hardware (disk reader) for reading these memories. These memories are generally used for Mass
storage. Hence they memories do not exist in smaller and portable systems. However these
memories are being used in embedded systems which require bulk storage without any size
constraint.
(3) Memory Management
Cache Memory
Size and the Speed (access time) of the computer memories are inversally proportional.
Increasing the size means reduction in speed. Infact most of the memories are made up of
smaller memory blocks (generally 4 KB) in order to improve the speed. Cost of the memory is
also highly dependent on the memory speed. In order to achieve a good performance it is
desirable that code and data must reside in a high speed memory. However using a high speed
memory for all the code and data in a reasonably large system may be practically impossible.
Even in a smaller system, using high speed memory as the only storage device can raise the
8/3/2019 Memory and i
5/52
system cost exponentially.
Most Systems employ a heirarichal memory system. They employ a small and fast (and
expensive) memory device to store the frequently used code and data, whereas less frequently
used data is stored in a big low speed (cheper) memory device. In a complex system there can be
multiple level (with speed and cost) of memory heierarchy).
Cache controller is a hardware (Generally built in to the processor) which can dynamically move
the currently being used code and data from a higher level (slower) memory to the lower level
(zero level or cache) memory. The in coming data or code replaces the old code or data (which is
currently not being used) in the cache memory. The data (or code) movement is hidden to the
user.
Cache memories are based on the principle of locality in space and time. There are different
types of cache mechanism and replacement mechanism.
Software Overlays
Why Overlays
Low cost micro-processor generally do not have an in-built cache controller. But on these
devices it may be still desirable to keep the currently being used code (or data) in internal
memory and replace it with a new code section when it is not being used. This can be done using
Software Overlays. Either code or data overlays can be used. In this section we will only
discuss about code overlays (you can draw similar analogy for data overlays).Overlay Basics
(a) Each code section which is mapped to an overlay has a run space and a live space. Live space
is a space in the external (or high level) memory, where this code section resides, at non-runtime.
Run space is a space in the internal (or lower level) memory, where this code resides during
execution.
(b) Overlay Manager is a piece of software which dynamically moves the code sections from live
space to run space (whenever a function from given overlay section is called).
(c) Linker and Loader tools generate overlay symbols corresponding to the code sections which
are mapped to overlays. The overlay symbols are also supplemented by the information about
run-space and live-space of the given overlay. This information is used by the overlay manager
to move the overlays dynamically.
8/3/2019 Memory and i
6/52
(d) You can have multiple overlays in your system. The overlay sections for a given overlay,
have different live-space but the same run-space.
Implementing overlays
(a) Firstly you need to make sure that your code generation tools (linker and loader) provide
some minimum support (in terms of overlays symbols) needed for the overlays.
(b) Secondly you need to identify mutual exclusive code sections in your application. Mutually
exclusive means that only one of these code section could be used at any given point of time.
Also make sure that switching time between these code sections (i.e. the average time after
which the processor will require some code from a different section) is quite high. Else, software
overlays will degrade the performance (rather than improving it).
(c) Make sure that you have enough run-space to accomodate the largest overlay section.
(d) While implementing the code overlays, you can still choose to keep some code sections
(which are not likely to improve the performance if used as overlays) out of overlays (these
sections will have same live-space and run-space).
Data overlays are analogous to code overlays. But there are rarely used.
Virtual Memory
Virtual Memory Mechanism allows users to store there data in a Hard Disk, whereas still use it
as if it was available in RAM. The application makes accesses to the data in virtual address space
(which is mapped to RAM), whereas the actuall data physically resides in Hard Disk (and ismoved to RAM for access).
Paging Mechanism
In virtual mode, memory is divided into pages usually 4096 bytes long (see page size). These
pages may reside in any available RAM location that can be addressed in virtual mode. The high
order bits in the memory address register are an index into page-mapping tables at specific
starting locations in memory and the table entries contain the starting real addresses of the
corresponding pages. The low order bits in the address register are an offset of 0 up to 4,095 (0
to the page size - 1) into the page ultimately referenced by resolving all the table references of
page locations.
The distinct advantages of Virtual Memory Mechanism are:
(a) User can access (in virtual space) more RAM space than what actually exists in the system.
(b) In a multi-tasking application, each task can have its own independent virtual address space
8/3/2019 Memory and i
7/52
(called discrete address space).
(c) Applications can treat data as if it is stored in contiguous memory (in virtual address space),
whereas it may be in dis contiguous locations (in actual memory).
Cache Vs Virtual Memory
Cache Memory and Virtual Memory are quite similar in concept and they provide similar
benefits. However these schemes different significantly in terms of implementation:
* Cache control is fully implemented in hardware. Virtual Memory Management is done by
software (Operating System) with some minimum support from Hardware
* With cache memory in use, user still makes accesses to the actual physical memory (and cache
is hidden to the user). However it is reverse with Virtual Memory. User makes accesses to the
virtual memory and the actual physical memory is hidden to the user.
Cache memory
The cache is a small amount of high-speed memory, usually with a memory cycle time
comparable to the time required by the CPU to fetch one instruction. The cache is usually filled
from main memory when instructions or data are fetched into the CPU. Often the main memory
will supply a wider data word to the cache than the CPU requires, to fill the cache more rapidly.
The amount of information which is replaces at one time in the cache is called the line size for
the cache. This is normally the width of the data bus between the cache memory and the main
memory. A wide line size for the cache means that several instruction or data words are loaded
into the cache at one time, providing a kind of prefetching for instructions or data. Since the
cache is small, the effectiveness of the cache relies on the following properties of most programs:
Spatial locality-- most programs are highly sequential; the next instruction
usually comes from the next memory location.
Data is usually structured, and data in these structures normally are stored in contiguous
memory locations.
Short loops are a common program structure, especially for the innermost
sets of nested loops. This means that the same small set of instructions is
used over and over.
Generally, several operations are performed on the same data values, or variables.
When a cache is used, there must be some way in which the memory controller determines
whether the value currently being addressed in memory is available from the cache. There are
8/3/2019 Memory and i
8/52
several ways that this can be accomplished. One possibility is to store both the address and the
value from main memory in the cache, with the address stored in a type of memory called
associative memory or, more descriptively, content addressable memory.
An associative memory, or content addressable memory, has the property that when a value is
presented to the memory, the address of the value is returned if the value is stored in the
memory, otherwise an indication that the value is not in the associative memory is returned.All
of the comparisons are done simultaneously, so the search is performed very quickly. This type
of memory is very expensive, because each memory location must have both a comparator and a
storage element. A cache memory can be implemented with a block of associative memory,
together with a block of ``ordinary'' memory. The associative memory would hold the address of
the data stored in the cache, and the ordinary memory would contain the data at that address.
Such a cache memory might be configured as shown in Figure .
Figure: A cache implemented with associative memory
If the address is not found in the associative memory, then the value is obtained from main
memory.
Associative memory is very expensive, because a comparator is required forevery wordin the
memory, to perform all the comparisons in parallel. A cheaper way to implement a cache
memory, without using expensive associative memory, is to use direct mapping. Here, part of the
memory address (usually the low order digits of the address) is used to address a word in the
cache. This part of the address is called the index. The remaining high-order bits in the address,
called the tag, are stored in the cache memory along with the data.
For example, if a processor has an 18 bit address for memory, and a cache of 1 K words of 2
bytes (16 bits) length, and the processor can address single bytes or 2 byte words, we might have
the memory address field and cache organized as in Figure .
Figure: A direct mapped cache configuration
This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there
are 4 other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for
each byte and one for the tag. The parity bits are used to check that a single bit error has not
occurred to the data while in the cache. A fourth bit, called the valid bitis used to indicate
http://web.cs.mun.ca/~paul/cs3725/material/web/notes/node3.html#figamcachehttp://web.cs.mun.ca/~paul/cs3725/material/web/notes/node3.html#figamcache8/3/2019 Memory and i
9/52
whether or not a given location in cache is valid. In the PDP-11/60 and in many other processors,
the cache is not updated if memory is altered by a device other than the CPU (for example when
a disk stores new data in memory). When such a memory operation occurs to a location which
has its value stored in cache, the valid bit is reset to show that the data is ``stale'' and does not
correspond to the data in main memory. As well, the valid bit is reset when power is first applied
to the processor or when the processor recovers from a power failure, because the data found in
the cache at that time will be invalid.
In the PDP-11/60, the data path from memory to cache was the same size (16 bits) as from cache
to the CPU. (In the PDP-11/70, a faster machine, the data path from the CPU to cache was 16
bits, while from memory to cache was 32 bits which means that the cache had effectively
prefetched the next instruction, approximately half of the time). The amount of information
(instructions or data) stored with each tag in the cache is called the line size of the cache. (It is
usually the same size as the data path from main memory to the cache.) A large line size allows
the prefetching of a number of instructions or data words.Allitems in a line of the cache are
replaced in the cache simultaneously, however, resulting in a larger block of data being replaced
for each cache miss.
The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K
bytes. For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a
cache configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes
would be typical, and line sizes of 1, 2 or 4 words would be typical).
Figure: One possible MIPS cache organization
A characteristic of the direct mapped cache is that a particular memory address can be mapped
into only one cache location. Many memory addresses are mapped to the same cache location (in
fact, all addresses with the same index field are mapped to the same cache location.) Whenever a
``cache miss'' occurs, the cache line will be replaced by a new line of information from main
memory at an address with the same index but with a different tag.
Note that if the program ``jumps around'' in memory, this cache organization will likely not be
effective because the index range is limited. Also, if both instructions and data are stored in
cache, it may well happen that both map into the same area of cache, and may cause each other
8/3/2019 Memory and i
10/52
to be replaced very often. This could happen, for example, if the code for a matrix operation and
the matrix data itself happened to have the same index values.
A more interesting configuration for a cache is theset associative cache, which uses aset
associative mapping. In this cache organization, a given memory location can be mapped to
more than one cache location. Here, each index corresponds to two or more data words, each
with a corresponding tag. A set associative cache with n tag and data fields is called an ``n-way
set associative cache''. Usually , fork= 1, 2, 3 are chosen for a set associative cache (k= 0
corresponds to direct mapping). Such n-way set associative caches allow interesting tradeoff
possibilities; cache performance can be improved by increasing the number of ``ways'', or by
increasing the line size, for a given total amount of memory. An example of a 2-way set
associative cache is shown in Figure , which shows a cache containing a total of 2K lines, or 1 K
sets, each set being 2-way associative. (The sets correspond to the rows in the figure.)
Figure: A set-associative cache organization
In a 2-way set associative cache, if one data word is empty for a read operation corresponding to
a particular index, then it is filled. If both data words are filled, then one must be overwritten by
the new data. Similarly, in an n-way set associative cache, if all n data and tag fields in a set are
filled, then one value in the set must be overwritten, or replaced, in the cache by the new tag and
data values. Note that an entire line must be replaced each time. The most common replacement
algorithms are:
Random -- the location for the value to be replaced is chosen at random from all n of the
cache locations at that index position. In a 2-way set associative cache, this can be
accomplished with a single modulo 2 random variable obtained, say, from an internal
clock.
First in, first out (FIFO) -- here the first valuestoredin the cache, at each index position,
is the value to be replaced. For a 2-way set associative cache, this replacement strategy
can be implemented by setting a pointer to the previously loaded word each time a new
word isstoredin the cache; this pointer need only be a single bit. (For set sizes > 2, this
algorithm can be implemented with a counter value stored for each ``line'', or index in the
cache, and the cache can be filled in a ``round robin'' fashion).
8/3/2019 Memory and i
11/52
Least recently used (LRU) -- here the value which was actually used least recently is
replaced. In general, it is more likely that the most recently used value will be the one
required in the near future. For a 2-way set associative cache, this is readily implemented
by setting a special bit called the ``USED'' bit for the other word when a value is
accessedwhile the corresponding bit for the word which was accessed is reset. The value
to be replaced is then the value with the USED bit set. This replacement strategy can be
implemented by adding a single USED bit to each cache location. The LRU strategy
operates by setting a bit in the other word when a value isstoredand resetting the
corresponding bit for the new word. For an n-way set associative cache, this strategy can
be implemented by storing a modulo n counter with each data word. (It is an interesting
exercise to determine exactly what must be done in this case. The required circuitry may
become somewhat complex, for large n.)
Cache memories normally allow one of two things to happen when data is written into a memory
location for which there is a value stored in cache:
Write through cache -- both the cache and main memory are updated at the same time.
This may slow down the execution of instructions which write data to memory, because
of the relatively longer write time to main memory. Buffering memory writes can help
speed up memory writes if they are relatively infrequent, however.
Write back cache -- here only the cache is updated directly by the CPU; the cache
memory controller marks the value so that it can be written back into memory when the
word is removed from the cache. This method is used because a memory location may
often be altered several times while it is still in cache without having to write the value
into main memory. This method is often implemented using an ``ALTERED'' bit in the
cache. The ALTERED bit is set whenever a cache value is written into by the processor.
Only if the ALTERED bit is set is it necessary to write the value back into main memory
(i.e., only values which have been altered must be written back into main memory). The
value should be written back immediately before the value is replaced in the cache.
The MIPS R2000/3000 processors used the write-through approach, with a buffer for the
memory writes. (This was also the approach taken by the The VAX-11/780 processor ) In
practice, memory writes are less frequent than memory reads; typically for each memory write,
an instruction must be fetched from main memory, and usually two operands fetched as well.
8/3/2019 Memory and i
12/52
Therefore we might expect about three times as many read operations as write operations. In
fact, there are often many more memory read operations than memory write operations.
Figure shows the behaviour (actually the miss ratio, which is equal to 1 - the hit ratio) for cache
memories with various combinations of total cache memory capacity and line size. The results
are from simulations of the behaviour of several ``typical'' program mixes. Several interesting
things can be seen from these figures; Figure shows that the miss ratio drops consistently with
cache size. Note, also, that increasing the line size is not always effective in increasing the
throughput of the processor, although it does decrease the hit ratio, because of the additional time
required to transfer large lines of data from the main memory to the cache.
Figure: Cache memory performance for various line sizes
It is interesting to plot the same data using log-log coordinates. Note that in this case. the graph is
(very) roughly linear. Figure shows this plot.
Figure: Log-log plot of cache performance for various line sizes
The way size, or degree of associativity, of a cache also has an effect on the performance of a
cache; the same reference determined that, for a fixed cache size, there was a roughly constant
ratio between the performance of caches with a given set associativity and direct-mapped caches,
independent of cache size. This relation is shown in Figure . (Of course, the performance of the
set associative caches improved with associativity.)
Figure: Cache adjustments for associatively (relative to direct mapping)
MEMORY MANAGEMENT UNIT
Modern MMUs typically divide the virtual address space (the range of addresses used by the
processor) intopages, each having a size which is a power of 2, usually a few kilobytes, but they
may be much larger. The bottom n bits of the address (the offset within a page) are left
unchanged. The upper address bits are the (virtual) page number. The MMU normally translates
virtual page numbers to physical page numbers via an associative cache called a Translation
Lookaside Buffer(TLB). When the TLB lacks a translation, a slower mechanism involving
hardware-specific data structures or software assistance is used. The data found in such data
http://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Buffer8/3/2019 Memory and i
13/52
structures are typically calledpage table entries (PTEs), and the data structure itself is typically
called apage table. The physical page number is combined with the page offset to give the
complete physical address.
A PTE or TLB entry may also include information about whether the page has been written to
(the dirty bit), when it was last used (the accessed bit, for a least recently usedpage replacement
algorithm), what kind of processes (user mode, supervisor mode) may read and write it, and
whether it should be cached.
Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physical
random access memory has been allocated to that virtual page. In this case the MMU signals a
page fault to the CPU. The operating system (OS) then handles the situation, perhaps by trying to
find a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If no
RAM is free, it may be necessary to choose an existing page (known as a victim), using some
replacement algorithm, and save it to disk (this is called "paging"). With some MMUs, there can
also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new
mapping.
In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory
protection: an OS can use it to protect against errant programs, by disallowing access to memory
that a particular program should not have access to. Typically, an OS assigns each program its
own virtual address space.
An MMU also reduces the problem offragmentation of memory. After blocks of memory have
been allocated and freed, the free memory may become fragmented (discontinuous) so that the
largest contiguous block of free memory may be much smaller than the total amount. With
virtual memory, a contiguous range of virtual addresses can be mapped to several non-
contiguous blocks of physical memory.
In some early microprocessordesigns, memory management was performed by a separate
integrated circuit such as the VLSI VI475 or the Motorola 68851 used with the Motorola 68020
CPU in the Macintosh II or the Z8015 used with the Zilog Z80 family of processors. Later
microprocessors such as the Motorola 68030 and the ZILOG Z280placed the MMU together
with the CPU on the same integrated circuit, as did the Intel 80286 and laterx86
microprocessors.
http://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Fragmentation_(computer)http://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Motorola_68851http://en.wikipedia.org/wiki/Motorola_68020http://en.wikipedia.org/wiki/Macintosh_IIhttp://en.wikipedia.org/wiki/Zilog_Z80http://en.wikipedia.org/wiki/Motorola_68030http://en.wikipedia.org/wiki/ZILOG_Z280http://en.wikipedia.org/wiki/Intel_80286http://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Fragmentation_(computer)http://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Motorola_68851http://en.wikipedia.org/wiki/Motorola_68020http://en.wikipedia.org/wiki/Macintosh_IIhttp://en.wikipedia.org/wiki/Zilog_Z80http://en.wikipedia.org/wiki/Motorola_68030http://en.wikipedia.org/wiki/ZILOG_Z280http://en.wikipedia.org/wiki/Intel_80286http://en.wikipedia.org/wiki/X868/3/2019 Memory and i
14/52
While this article concentrates on modern MMUs, commonly based on pages, early systems used
a similar concept forbase-limit addressing, that further developed into segmentation. Those are
occasionally also present on modern architectures. The x86 architectureprovided segmentation
rather than paging in the 80286, and provides both paging and segmentation in the 80386 and
later processors (although the use of segmentation is not available in 64-bit operation).
Interrupts
We just discussed how CALL and JUMP instructions can break the linear code flow in an
application. Another event which can cause the change in program flow is called
"INTERRUPT". Interrupts are signals (Hardware or Software) which can cause the program
sequence to stop the normal program flow and execute instructions from a certain pre-defined
location (known as Interrupt Vector Address). Interrupts can be triggered by a Hardware (e.g.
state of an external CPU pin) or a Software (e.g. An illegal instruction execution like divide by
ZERO) event. A CPU can have multiple interrupt channels and each of these channels will have
its unique interrupt vector address. When an interrupt occurs, program sequencer starts
processing instructions from the Interrupt Vector Address (of the associated interrupt channel).
Similar to CALL instruction, the Return Address (address of the instruction which would have
been fetched in absence of an interrupt event) is saved in one of the processor registers (some
CPUs also save the current system state along with return address). An RTI (Return From
Interrupt) instruction (similar to RTS) can bring the program flow back to the Return Address.
The code which is stored at Interrupt Vector Address is called Interrupt Service Routine (ISR).
RTI instruction generally forms the last instruction of ISR.
Interrupt Controller: Is a Hardware inside the Processor which is responsible for managing the
interrupt operations.
Enabling Interrupts : Interrupts (on most processors) can be enabled or disabled by the
programmer using a (Global) Interrupt Enable Bit. Interrupt Controllers also provide option for
enabling or disabling each individual interrupt (on a local level).
Interrupt Masking: Interrupt Mask is a control word (generally stored in a Interrupt Mask
Register) which can be used to temporarily disable an interrupt (on a particular channel). The
http://en.wikipedia.org/wiki/Page_(computing)http://en.wikipedia.org/w/index.php?title=Base-limit_addressing&action=edit&redlink=1http://en.wikipedia.org/wiki/Segmentation_(memory)http://en.wikipedia.org/wiki/X86_architecturehttp://en.wikipedia.org/wiki/80286http://en.wikipedia.org/wiki/80386http://en.wikipedia.org/wiki/Page_(computing)http://en.wikipedia.org/w/index.php?title=Base-limit_addressing&action=edit&redlink=1http://en.wikipedia.org/wiki/Segmentation_(memory)http://en.wikipedia.org/wiki/X86_architecturehttp://en.wikipedia.org/wiki/80286http://en.wikipedia.org/wiki/803868/3/2019 Memory and i
15/52
Interrupt Mask contains control bits (mask bits) for each interrupt channel. If this bit is set, the
interrupt for the corresponding interrupt channel is temporarily masked (and it remains masked
unless the mask bit is cleared).
Interrupt Priority : Interrupt Channels are associated with different priority levels. If two
interrupts are acknowledged by the Interrupt Controller at same time, then the higher priority
interrupt is processed first. Interrupt Priority Scheme helps to ensure that more important
(interrupt) events gets processed first (as compared to less critical events. Critical Events (e.g.
system power failure) are assigned with highest priority.
Interrupt Mapping: Some Interrupt Controllers also provide flexibility of mapping the interrupt
sources (events that generate events) to any of the available interrupt channel. This scheme has
two major advantages. Firstly, in a system, (generally) not all the interrupts sources are active at
a time. A fixed mapping (from source to channel) means that many of the interrupt channels will
be un-utilized. However with a flexible mapping, it is possible to provide lesser interrupt
channels (and active sources can be mapped to these channels). This reduces the Hardware
complexity of Interrupt controller, and hence cost. Interrupt controller can also provide provision
for mapping multiple sources to a single interrupt channel. In the ISR (for particular interrupt),
the interrupt source (out of many sources mapped to this channel) can be identified by reading
interrupt status register (this register has the corresponding bit set if an interrupt event occurs).
Secondly, the interrupt sources can be assigned to interrupt channels with different priorities,
based on the system requirement.
Interrupts can be categorized into: maskable interrupt, non-maskable interrupt (NMI), inter-
processor interrupt(IPI), software interrupt, and spurious interrupt.
Maskable interrupt (IRQ) is a hardware interrupt that may be ignored by
setting a bit in an interrupt mask register's (IMR) bit-mask.
Non-maskable interrupt (NMI) is a hardware interrupt that lacks an associated
bit-mask, so that it can never be ignored. NMIs are often used for timers,
especially watchdog timers.
Inter-processor interrupt (IPI) is a special case of interrupt that is generated
by one processor to interrupt another processor in a multiprocessor system.
Software interrupt is an interrupt generated within a processor by
executing an instruction. Software interrupts are often used to implement
http://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/IRQhttp://en.wikipedia.org/wiki/Interrupt_mask_registerhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Watchdog_timerhttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Multiprocessorhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/IRQhttp://en.wikipedia.org/wiki/Interrupt_mask_registerhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Watchdog_timerhttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Multiprocessor8/3/2019 Memory and i
16/52
system calls because they implement a subroutine call with a CPU ring level
change.
Spurious interrupt is a hardware interrupt that is unwanted. They are
typically generated by system conditions such as electrical interference on an
interrupt line or through incorrectly designed hardware.
Processors typically have an internal interrupt maskwhich allows software to ignore all external
hardware interrupts while it is set. This mask may offer faster access than accessing an interrupt
mask register (IMR) in a PIC, or disabling interrupts in the device itself. In some cases, such as
the x86 architecture, disabling and enabling interrupts on the processor itself act as a memory
barrier, however it may actually be slower.
An interrupt that leaves the machine in a well-defined state is called a precise interrupt. Such
an interrupt has four properties: The Program Counter (PC) is saved in a known place.
All instructions before the one pointed to by the PC have fully executed.
No instruction beyond the one pointed to by the PC has been executed (that
is no prohibition on instruction beyond that in PC, it is just that any changes they
make to registers or memory must be undone before the interrupt happens).
The execution state of the instruction pointed to by the PC is known.
An interrupt that does not meet these requirements is called an imprecise interrupt.
Modern MMUs typically divide the virtual address space (the range of addresses used by the
processor) intopages, each having a size which is a power of 2, usually a few kilobytes, but they
may be much larger. The bottom n bits of the address (the offset within a page) are left
unchanged. The upper address bits are the (virtual) page number. The MMU normally translates
virtual page numbers to physical page numbers via an associative cache called a Translation
Lookaside Buffer(TLB). When the TLB lacks a translation, a slower mechanism involving
hardware-specific data structures or software assistance is used. The data found in such data
structures are typically calledpage table entries (PTEs), and the data structure itself is typically
called apage table. The physical page number is combined with the page offset to give the
complete physical address.
A PTE or TLB entry may also include information about whether the page has been written to
(the dirty bit), when it was last used (the accessed bit, for a least recently usedpage replacement
http://en.wikipedia.org/wiki/System_callhttp://en.wikipedia.org/wiki/Ring_(computer_security)http://en.wikipedia.org/wiki/Electrical_interferencehttp://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/System_callhttp://en.wikipedia.org/wiki/Ring_(computer_security)http://en.wikipedia.org/wiki/Electrical_interferencehttp://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithm8/3/2019 Memory and i
17/52
algorithm), what kind of processes (user mode, supervisor mode) may read and write it, and
whether it should be cached.
Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physical
random access memory has been allocated to that virtual page. In this case the MMU signals a
page fault to the CPU. The operating system (OS) then handles the situation, perhaps by trying to
find a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If no
RAM is free, it may be necessary to choose an existing page (known as a victim), using some
replacement algorithm, and save it to disk (this is called "paging"). With some MMUs, there can
also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new
mapping.
In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory
protection: an OS can use it to protect against errant programs, by disallowing access to memory
that a particular program should not have access to. Typically, an OS assigns each program its
own virtual address space.
DMA
DMA (Direct Memory Access) provides an efficient way of Data Transfers across "a Peipheral
and Memory" or across "two memory regions". DMA is a processing engine which can perform
data transfer operations (to or from the Memory). In absence of DMA engine, the CPU needs to
handle these data operations, and hence the overall system performance is heavily reduced.DMA is specifically useful in the system which involve huge data transfers (in absence of DMA,
CPU will be busy doing these transfers most of the time and will not be available for other
processing).
DMA Parameters : DMA Transfers involve a Source and a Destination. DMA Engine Transfers
the data from Source to Destination. DMA engine requires source and destination addresses
along with the Transfer Count in order to perform the data transfers. The (Source or Destination)
Address could be a physical address (in case of a memory) or logical (in case of a peripheral).
Transfer Counts specifies number of words which need to be transferred. As we mentioned
before, Data transfer could be either from a Peripheral to Memory (generall called Received
DMA) or from a Memory to Peripheral (generally called Transmit DMA) or from a Memory to
another Memory (Generally called Memory DMA).
Some DMA engines support additional parameters like Word-Size, and Address-Increment in
http://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protection8/3/2019 Memory and i
18/52
addition to the Start Address and Transfer Count. Word-Size specify the size of each transfer.
Address-increment specifies the offset from current address (in memory), which the next transfer
should use. This provides a way of tranferring data from non-contiguous memory locations.
DMA Channels : DMA engine can support multiple DMA Channels. This means that at a given
time, multiple DMA Transfers can happen (though physcially only one transfer may be possible,
but logically DMA can handle many channels in parallel). This feature makes the life of software
programmer very easy (as he does not have to wait for the current DMA operations to finish
before he programs the next DMA operation). Each DMA channel will have control register
where the DMA Parameters can be specified. DMA Channels also have an interrupt associated
with it (on most processors) which (optionally) triggers after completion of DMA trasfer. Inside
the ISR, programmer can take specific action (e.g. do some processign on the data which has
been just received through DMA, or program a new DMA transfer).
Chained DMA : Certain DMA controllers support an option for specifying DMA parameters in a
Buffer (or array) in memory rather than directly writing it to DMA control registers (This is
mostly applicable for the second DMA operation - parameters for first DMA operation are still
specified in the control registers). This Buffer is called DMA Transfer Control Block (TCB).
DMA controller takes the address of DMA TCB as one of the parameters, (in addition to the
control parameters for first DMA transfer) and loads the DMA parameters (for second DMA
operation) automatically from the Memory (after first DMA Operation is over). The TCB also
contains an entry for "Next TCB Address", which provides an easy way for chaining multiple
DMA operations in an automatic fashion (rather than having to program it after completion of
each DMA). The DMA chaining can be stopped, by specifying a ZERO address in Next TCB
Address field.
Multi-diemnsional DMA : combined with Address-Increment gives many options.
The simplest way to use DMA is to select a processor with an internal DMA controller. This
eliminates the need for external bus buffers and ensures that the timing is handled
correctly. Also, an internal DMA controller can transfer data to on-chip memory and
peripherals, which is something that an external DMA controller cannot do. Because the
handshake is handled on-chip, the overhead of entering and exiting DMA mode is often
much faster than when an external controller is used.
If an external DMA controller or processor is used, be sure that the hardware handles the
transition between transfers correctly. To avoid the problem of bus contention, ensure that
8/3/2019 Memory and i
19/52
bus requests are inhibited if the bus is not free. This prevents the DMA controller from
requesting the bus before the processor has reacquired it after a transfer.
So you see, DMA is not as mysterious as it sometimes seems. DMA transfers can provide
real advantages when the system is properly designed.
Figure 1: A DMA controller shares the processor's memory
Hardware interrupts were introduced as a way to avoid wasting the processor's valuable time in
polling loops, waiting for external events. They may be implemented in hardware as a distinct
system with control lines, or they may be integrated into the memory subsystem.
If implemented in hardware, an interrupt controller circuit such as the IBM PC's Programmable
Interrupt Controller(PIC) may be connected between the interrupting device and the processor's
interrupt pin to multiplex several sources of interrupt onto the one or two CPU lines typically
available. If implemented as part of the memory controller, interrupts are mapped into the
system's memory address space.
SERIAL PROTOCOLS
I2C Bus
The physical I2C bus
This is just two wires, called SCL and SDA. SCL is the clock line. It is used to synchronize all data
transfers over the I2C bus. SDA is the data line. The SCL & SDA lines are connected to all devices
on the I2C bus. There needs to be a third wire which is just the ground or 0 volts. There may also be
a 5volt wire is power is being distributed to the devices. Both SCL and SDA lines are "open drain"
drivers. What this means is that the chip can drive its output low, but it cannot drive it high. For the
line to be able to go high you must provide pull-up resistors to the 5v supply. There should be a
resistor from the SCL line to the 5v line and another from the SDA line to the 5v line. You only need
one set of pull-up resistors for the whole I2C bus, not for each device, as illustrated below:
The value of the resistors is not critical. I have seen anything from 1k8 (1800 ohms) to 47k (47000
ohms) used. 1k8, 4k7 and 10k are common values, but anything in this range should work OK. I
recommend 1k8 as this gives you the best performance. If the resistors are missing, the SCL and
SDA lines will always be low - nearly 0 volts - and the I2C bus will not work.
Masters and Slaves
The devices on the I2C bus are either masters or slaves. The master is always the device that drives
the SCL clock line. The slaves are the devices that respond to the master. A slave cannot initiate a
http://en.wikipedia.org/wiki/Polling_(computer_science)http://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Memory_controllerhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Polling_(computer_science)http://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Memory_controllerhttp://en.wikipedia.org/wiki/Address_space8/3/2019 Memory and i
20/52
transfer over the I2C bus, only a master can do that. There can be, and usually are, multiple slaves
on the I2C bus, however there is normally only one master. It is possible to have multiple masters,
but it is unusual and not covered here. On your robot, the master will be your controller and the
slaves will be our modules such as the SRF08 or CMPS03. Slaves will never initiate a transfer. Both
master and slave can transfer data over the I2C bus, but that transfer is always controlled by themaster.
The I2C Physical Protocol
When the master (your controller) wishes to talk to a slave (our CMPS03 for example) it begins by
issuing a start sequence on the I2C bus. A start sequence is one of two special sequences defined
for the I2C bus, the other being the stop sequence. The start sequence and stop sequence are
special in that these are the only places where the SDA (data line) is allowed to change while the
SCL (clock line) is high. When data is being transferred, SDA must remain stable and not change
whilst SCL is high. The start and stop sequences mark the beginning and end of a transaction with
the slave device.
Data is transferred in sequences of 8 bits. The bits are placed on the SDA line starting with the MSB
(Most Significant Bit). The SCL line is then pulsed high, then low. Remember that the chip cannot
really drive the line high, it simply "lets go" of it and the resistor actually pulls it high. For every 8 bits
transferred, the device receiving the data sends back an acknowledge bit, so there are actually 9
SCL clock pulses to transfer each 8 bit byte of data. If the receiving device sends back a low ACK
bit, then it has received the data and is ready to accept another byte. If it sends back a high then it is
indicating it cannot accept any further data and the master should terminate the transfer by sending
a stop sequence.
How fast?
The standard clock (SCL) speed for I2C up to 100KHz. Philips do define faster speeds: Fast mode,
which is up to 400KHz and High Speed mode which is up to 3.4MHz. All of our modules are
designed to work at up to 100KHz. We have tested our modules up to 1MHz but this needs a small
delay of a few uS between each byte transferred. In practical robots, we have never had any need to
use high SCL speeds. Keep SCL at or below 100KHz and then forget about it.
I2C Device Addressing
All I2C addresses are either 7 bits or 10 bits. The use of 10 bit addresses is rare and is not covered
here. All of our modules and the common chips you will use will have 7 bit addresses. This means
that you can have up to 128 devices on the I2C bus, since a 7bit number can be from 0 to 127.
When sending out the 7 bit address, we still always send 8 bits. The extra bit is used to inform the
slave if the master is writing to it or reading from it. If the bit is zero are master is writing to the slave.
8/3/2019 Memory and i
21/52
If the bit is 1 the master is reading from the slave. The 7 bit address is placed in the upper 7 bits of
the byte and the Read/Write (R/W) bit is in the LSB (Least Significant Bit).
The placement of the 7 bit address in the upper 7 bits of the byte is a source of confusion for the
newcomer. It means that to write to address 21, you must actually send out 42 which is 21 movedover by 1 bit. It is probably easier to think of the I2C bus addresses as 8 bit addresses, with even
addresses as write only, and the odd addresses as the read address for the same device. To take
our CMPS03 for example, this is at address 0xC0 ($C0). You would uses 0xC0 to write to the
CMPS03 and 0xC1 to read from it. So the read/write bit just makes it an odd/even address.
The I2C Software Protocol
The first thing that will happen is that the master will send out a start sequence. This will alert all the
slave devices on the bus that a transaction is starting and they should listen in incase it is for them.
Next the master will send out the device address. The slave that matches this address will continue
with the transaction, any others will ignore the rest of this transaction and wait for the next. Having
addressed the slave device the master must now send out the internal location or register number
inside the slave that it wishes to write to or read from. This number is obviously dependant on what
the slave actually is and how many internal registers it has. Some very simple devices do not have
any, but most do, including all of our modules. Our CMPS03 has 16 locations numbered 0-15. The
SRF08 has 36. Having sent the I2C address and the internal register address the master can now
send the data byte (or bytes, it doesn't have to be just one). The master can continue to send data
bytes to the slave and these will normally be placed in the following registers because the slave will
automatically increment the internal register address after each byte. When the master has finished
writing all data to the slave, it sends a stop sequence which completes the transaction. So to write to
a slave device:
1. Send a start sequence
2. Send the I2C address of the slave with the R/W bit low (even address)
3. Send the internal register number you want to write to
4. Send the data byte
5. [Optionally, send any further data bytes]
6. Send the stop sequence.
As an example, you have an SRF08 at the factory default address of 0xE0. To start the SRF08
ranging you would write 0x51 to the command register at 0x00 like this:
1. Send a start sequence
2. Send 0xE0 ( I2C address of the SRF08 with the R/W bit low (even address)
3. Send 0x00 (Internal address of the command register)
8/3/2019 Memory and i
22/52
4. Send 0x51 (The command to start the SRF08 ranging)
5. Send the stop sequence.
Reading from the Slave
This is a little more complicated - but not too much more. Before reading data from the slave device,
you must tell it which of its internal addresses you want to read. So a read of the slave actually startsoff by writing to it. This is the same as when you want to write to it: You send the start sequence, the
I2C address of the slave with the R/W bit low (even address) and the internal register number you
want to write to. Now you send another start sequence (sometimes called a restart) and the I2C
address again - this time with the read bit set. You then read as many data bytes as you wish and
terminate the transaction with a stop sequence. So to read the compass bearing as a byte from the
CMPS03 module:
1. Send a start sequence
2. Send 0xC0 ( I2C address of the CMPS03 with the R/W bit low (even address)
3. Send 0x01 (Internal address of the bearing register)
4. Send a start sequence again (repeated start)
5. Send 0xC1 ( I2C address of the CMPS03 with the R/W bit high (odd address)
6. Read data byte from CMPS03
7. Send the stop sequence.
The bit sequence will look like this:
Wait a moment
That's almost it for simple I2C communications, but there is one more complication. When the
master is reading from the slave, its the slave that places the data on the SDA line, but its the master
that controls the clock. What if the slave is not ready to send the data! With devices such as
EEPROMs this is not a problem, but when the slave device is actually a microprocessor with other
things to do, it can be a problem. The microprocessor on the slave device will need to go to an
interrupt routine, save its working registers, find out what address the master wants to read from, get
the data and place it in its transmission register. This can take many uS to happen, meanwhile the
master is blissfully sending out clock pulses on the SCL line that the slave cannot respond to. The
I2C protocol provides a solution to this: the slave is allowed to hold the SCL line low! This is called
clock stretching. When the slave gets the read command from the master it holds the clock line low.
The microprocessor then gets the requested data, places it in the transmission register and releases
the clock line allowing the pull-up resistor to finally pull it high. From the masters point of view, it will
issue the first clock pulse of the read by making SCL high and then check to see if it really has gone
high. If its still low then its the slave that holding it low and the master should wait until it goes high
8/3/2019 Memory and i
23/52
before continuing. Luckily the hardware I2C ports on most microprocessors will handle this
automatically.
CAN BUS
Controller Area Network (CAN) is a multicast shared serial bus standard, originally
developed in the 1980s by Robert Bosch GmbH, for connecting electronic control
units (ECUs). CAN was specifically designed to be robust in electromagnetically
noisy environments and can utilize a differential balanced line like RS-485. It can be
even more robust against noise if twisted pair wire is used. Although initially
created for automotive purposes (as a vehicle bus), nowadays it is used in many
embedded control applications (e.g., industrial) that may be subject to noise.
Bit rates up to 1 Mbit/s are possible at networks length below 40 m. Decreasing the
bit rate allows longer network distances (e.g. 125 kbit/s at 500 m).The CAN data link layer protocol is standardized in ISO 11898-1 (2003). This
standard describes mainly the data link layer composed of the Logical Link
Control (LLC) sublayer and the Media Access Control (MAC) sublayer and some
aspects of the physical layer of the ISO/OSI Reference Model. All the other protoc ol
layers are left to the network designer's choice.
CAN transmit data through a binary model of "dominant" bits and "recessive" bits
where dominant is a logical 0 and recessive is a logical 1. If one node transmits a
dominant bit and another node transmits a recessive bit then the dominant bit"wins" (a logical AND between the two).
So, if you are transmitting a recessive bit, and someone sends a dominant bit, you
see a dominant bit, and you know there was a collision. (All other collisions are
invisible.) The way this works is that a dominant bit is asserted by creating a
voltage across the wires while a recessive bit is simply not asserted on the bus. If
anyone sets a voltage difference, everyone sees it, hence, dominant.
Commonly when used with a differential bus, a Carrier Sense Multiple
Access/Bitwise Arbitration (CSMA/BA) scheme is implemented: if two or moredevices start transmitting at the same time, there is a priority based arbitration
scheme to decide which one will be granted permission to continue transmitting.
During arbitration, each transmitting node monitors the bus state and compares the
received bit with the transmitted bit. If a dominant bit is received when a recessive
bit is transmitted then the node stops transmitting (i.e., it lost arbitration).
8/3/2019 Memory and i
24/52
Arbitration is performed during the transmission of the identifier field. Each node
starting to transmit at the same time sends an ID with dominant as binary 0,
starting from the high bit. As soon as their ID is a larger number (lower priority)
they'll be sending 1 (recessive) and see 0 (dominant), so they back off. At the end
of ID transmission, all nodes bar one have backed off, and the highest priority
message gets through unimpeded.
Data transmissionFrames all frames (aka messages) begin with a start-of-frame (SOF) bit that, obviously, denotes
the start of the frame transmission.
CAN has four frame types:
Data frame: a frame containing node data for transmission
Remote frame: a frame requesting the transmission of a specific identifier
Error frame: a frame transmitted by any node detecting an error
Overload frame: a frame to inject a delay between data and/or remote frames
Data frameThe data frame is the only frame for actual data transmission. There are two message
formats:
Base frame format: with 11 identifier bits
Extended frame format: with 29 identifier bits
The CAN standard requires the implementation mustaccept the base frame format and may
accept the extended frame format, but musttolerate the extended frame format.
USB Protocols
Unlike RS-232 and similar serial interfaces where the format of data being sent is not defined,
USB is made up of several layers of protocols. While this sounds complicated, dont give up
now. Once you understand what is going on, you really only have to worry about the higher level
layers. In fact most USB controller I.C.s will take care of the lower layer, thus making it almost
invisible to the end designer.
Each USB transaction consists of ao Token Packet (Header defining what it expects to follow), an
o Optional Data Packet, (Containing the payload) and a
o Status Packet (Used to acknowledge transactions and to provide a
means of error correction)
8/3/2019 Memory and i
25/52
As we have already discussed, USB is a host centric bus. The host initiates all transactions. The
first packet, also called a token is generated by the host to describe what is to follow and whether
the data transaction will be a read or write and what the devices address and designated endpoint
is. The next packet is generally a data packet carrying the payload and is followed by an
handshaking packet, reporting if the data or token was received successfully, or if the endpoint is
stalled or not available to accept data.
Common USB Packet Fields
Data on the USBus is transmitted LSBit first. USB packets consist of the following fields,
o Sync
All packets must start with a sync field. The sync field is 8 bits long at low and
full speed or 32 bits long for high speed and is used to synchronise the clock of
the receiver with that of the transmitter. The last two bits indicate where the PID
fields starts.
o PID
PID stands for Packet ID. This field is used to identify the type of packet that is
being sent. The following table shows the possible values.
P
a
c
k
e
t
I
d
e
n
t
i
f
i
e
Group
8/3/2019 Memory and i
26/52
r
8/3/2019 Memory and i
27/52
O
U
T
T
o
k
e
n
P
I
D
V
a
l
u
e
Token
8/3/2019 Memory and i
28/52
0001 1001 IN Token
S
O
F
T
o
k
e
n
0101 1101 SETUP Token
Data 0011 DATA0
DA
T
A
1D
A
T
A
2
1
0
1
1
0111 1111 MDATA
ACK
Han
dsh
ake
Handshake
N
A
K
8/3/2019 Memory and i
29/52
H
a
n
d
s
h
a
k
e
0
0
1
0S
T
A
L
L
H
an
d
s
h
a
k
e
10
1
0N
Y
8/3/2019 Memory and i
30/52
E
T
(
N
o
R
e
s
p
o
n
s
e
Y
e
t
)
1
1
1
0P
R
E
a
mb
l
e
0
1
Special
8/3/2019 Memory and i
31/52
1
0
1100 1100 ERR
1000 Split
0100 Ping
There are 4 bits to the PID, however to insure it is received correctly, the 4 bits
are complemented and repeated, making an 8 bit PID in total. The resulting
format is shown below.
P
I
D
2
P
I
D
3
n
P
I
D
0
n
P
I
D
1
n
P
I
D
2
n
P
8/3/2019 Memory and i
32/52
I
D
3
o PID0PID1ADDR
The address field specifies which device the packet is designated for. Being 7 bitsin length allows for 127 devices to be supported. Address 0 is not valid, as any
device which is not yet assigned an address must respond to packets sent to
address zero.
o ENDP
The endpoint field is made up of 4 bits, allowing 16 possible endpoints. Low
speed devices, however can only have 2 additional endpoints on top of the default
pipe. (4 endpoints max)
o CRC
Cyclic Redundancy Checks are performed on the data within the packet payload.
All token packets have a 5 bit CRC while data packets have a 16 bit CRC.
o EOP
End of packet. Signalled by a Single Ended Zero (SE0) for approximately 2 bit
times followed by a J for 1 bit time.
USB Packet Types
USB has four different packet types. Token packets indicate the type of transaction to follow,
data packets contain the payload, handshake packets are used for acknowledging data or
reporting errors and start of frame packets indicate the start of a new frame.
o Token Packets
There are three types of token packets,
In - Informs the USB device that the host wishes to read
information.
Out - Informs the USB device that the host wishes to send
information.
Setup - Used to begin control transfers.
Token Packets must conform to the following format,
Sync PIDADD
R
END
P
CRC
5EOP
8/3/2019 Memory and i
33/52
o Data Packets
There are two types of data packets each capable of transmitting up to 1024 bytes
of data.
Data0
Data1
High Speed mode defines another two data PIDs, DATA2 and MDATA.
Data packets have the following format,
Sync PID DataCRC
16EOP
Maximum data payload size for low-speed devices is 8 bytes.
Maximum data payload size for full-speed devices is 1023 bytes.
Maximum data payload size for high-speed devices is 1024bytes.
Data must be sent in multiples of bytes.
o Handshake Packets
There are three type of handshake packets which consist simply of the PID
ACK- Acknowledgment that the packet has been successfully
received.
NAK- Reports that the device temporary cannot send or
received data. Also used during interrupt transactions to informthe host there is no data to send.
STALL - The device finds its in a state that it requires
intervention from the host.
Handshake Packets have the following format,
Sync PID EOP
o Start of Frame Packets
The SOF packet consisting of an 11-bit frame number is sent by the host every
1ms 500ns on a full speed bus or every 125 s 0.0625 s on a high speed
bus.
F
r
a
8/3/2019 Memory and i
34/52
m
e
N
u
m
b
e
r
C
R
C
5
E
O
P
SyncPIDUSB Functions
When we think of a USB device, we think of a USB peripheral, but a USB device could mean a
USB transceiver device used at the host or peripheral, a USB Hub or Host Controller IC device,
or a USB peripheral device. The standard therefore makes references to USB functions which
can be seen as USB devices which provide a capability or function such as a Printer, Zip Drive,
Scanner, Modem or other peripheral.
So by now we should know the sort of things which make up a USB packet. No? You're
forgotten how many bits make up a PID field already? Well don't be too alarmed. Fortunately
most USB functions handle the low level USB protocols up to the transaction layer (which we
will cover next chapter) in silicon. The reason why we cover this information is most USB
function controllers will report errors such as PID Encoding Error. Without briefly covering this,
one could ask what is a PID Encoding Error? If you suggested that the last four bits of the PID
didn't match the inverse of the first four bits then you would be right.
Most functions will have a series of buffers, typically 8 bytes long. Each buffer will belong to an
endpoint - EP0 IN, EP0 OUT etc. Say for example, the host sends a device descriptor request.
8/3/2019 Memory and i
35/52
The function hardware will read the setup packet and determine from the address field whether
the packet is for itself, and if so will copy the payload of the following data packet to the
appropriate endpoint buffer dictated by the value in the endpoint field of the setup token. It will
then send a handshake packet to acknowledge the reception of the byte and generate an internal
interrupt within the semiconductor/micro-controller for the appropriate endpoint signifying it has
received a packet. This is typically all done in hardware.
The software now gets an interrupt, and should read the contents of the endpoint buffer and parse
the device descriptor request.
PCI LOCAL BUS
The PCI (Peripheral Component Interconnect) is a high performance Bus for interconnecting
chips, expansion boards, and memory cards. It was originated at Intel Inc. In the early
1990s as standard methods of interconnecting chips on a board. It was later adopted as an
indusVL-Bus stands for VESA Bus a cloacl bus architecture create by VESA(Video Electronics
Standards Association). Was popularly used in early 1990s computers.
Typically used for VGA cards that drove the graphics of the computer display:try standard
administered by the PCI Special Interest Group or the PCI SIG.
The basic form of the PCI presents a fusion of sorts between ISA and VL-Bus. It provides
direct access to system memory for connected devices, but uses a ?bridge to connect to
the frontside bus and therefore to the CPU. Basically, this means that it is capable of even
higher performance than VL-Bus while eliminating the potential for interference with the
CPU. PCI can connect more devices than VL-Bus, up to five external components. Each of
the five connectors for an external component can be replaced with two fixed devices on the
motherboard. Also, you can have more than one PCI bus on the same computer, although
this is rarely done. The PCI bridge chip regulates the speed of the PCI bus independently of
the CPU's speed. This provides a higher degree of reliability and ensures that PCI hardware
manufacturers know exactly what to design for.
PCI originally operated at 33 MHz using a 32-bit-wide path. Revisions to the standard
include increasing the speed from 33 MHz to 66 MHz and doubling the bit count to 64.
Currently, PCI-X provides for 64-bit transfers at a speed of 133 MHz for an amazing 1-Gbps
(gigabit per second) transfer rate.
PCI cards use 47 pins to connect provided there is a CPU. The PCI bus is able to work with
so few pins because of hardware ?multiplexing, which means that the device sends more
http://www.codepedia.com/1/PCIhttp://www.codepedia.com/1/Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/Intelhttp://www.pcisig.com/http://www.codepedia.com/1/VL-Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/devicehttp://www.codepedia.com/1/bridgehttp://www.codepedia.com/1/CPUhttp://www.codepedia.com/1/bithttp://www.codepedia.com/1/Multiplexhttp://www.codepedia.com/1/PCIhttp://www.codepedia.com/1/Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/Intelhttp://www.pcisig.com/http://www.codepedia.com/1/VL-Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/devicehttp://www.codepedia.com/1/bridgehttp://www.codepedia.com/1/CPUhttp://www.codepedia.com/1/bithttp://www.codepedia.com/1/Multiplex8/3/2019 Memory and i
36/52
than one signal over a single pin.. The connectors at the end of the card are connected to
the motherboard slot and are called gold fingers.
PERIPHERALS
Peripherals (of a processor) are its means of communicating with the external world.
(1) Peripheral Classification
Peripherals can be classified based on following characteristics
Simplex, Duplex & Semi Duplex
Simplex communication involves unidirectional data transfers. Duplex communication involves bi-
directional data transfers. Full Duplex interfaces have independent channels for transmission and
reception. Semi-duplex communication involves data bi-directional data transfers, however at a given
time, the data transfer is only possible in one direction. Semi-duplex interfaces involves the same
communication channel for both transmission and reception.
Serial Vs Parallel
Serial peripherals communicate over a single data line. The data at Tx end needs to be converted Parallel
to Serial before transmission and the data at Rx end needs to be converted Serial to Parallel after
reception. Serial peripherals imply less signal lines on the external interface and thus reduced hardware
(circuit board) complexity and cost. However the data rate on serial interfaces are fairly limited (as
compared to the parallel interface). At the same clock rate, parallel interface can transfer Nx data, as
compared to the serial interface (where N is the number of Data lines).
Synchronous Vs Asynchronous
Synchronous transfers are synchronized by a reference clock on the interface. This clock signal is
generally provided by one of the devices (who are communicating) on the interface, called master device.
However clock can also come from an external source.
Data Throughput
8/3/2019 Memory and i
37/52
Interfaces can also be classified based on the data throughput they offers. Generally parallel interfaces
provide much more data throughput and are used for application data (this data needs to be processed by
the application). Serial interfaces offer less data throughputs, and are generally used to transfer
intermittent control data.
(2) Common Serial Peripherals
(a) UART (Universal Asynchronous Receiver Transmitter)
UART is one of the oldest and most simple serial interface. Generally UART is used to tranfer data
between different PCBs (Printed Circuit Boards). These PCBs can be either in the same system or across
differnt systems. In its simplest configuration, UART consists of two pin interface. One pin is used for
Transmission, and other for Reception.
The data on UART is transferred word by word. A word consists of Start Bit, Data bits (5 to 8), (and
optional parity bit) and (1, 1.5 or 2) Stop Bit. The individual bits of data word are transferred one by one
on the serial bus.
Start Bit: The Tx Line of a UART Transitter is high during periods of inactivity (when no communication
is taking place). When the transmitter wants to initiate a data transmission it sends one START bit (drives
the Tx line low) for one bit duration.
Data Bits:Number of data bits can be configured to any value between 5 and 8. UART employs LSB first
Transmission.
Parity Bit: One parity bit can be optionally transitted along with each data word. The parity bit can be
configured either as Odd or as even.
Stop Bit: After each word transmission, transmitter transmits Stop bits (drives the Tx line high). Number
of stop bits can be configured as 1, 1.5 or 2.
Asynchronous Transmission: UART data transfers are asynchronous. The transmitter transmits each bit
(of the word being transmitted) for a fixed duration (defined by baud rate). The receiver polls the value of
transmit line (of transmitter). In order to be able to receive the data correctly, receiver needs to be aware
of the duration for which each bit is transmitted (it is defined by baud rate).
Baud Rate: Baud is a measurement of transmission speed in asynchronous communication. It is defined
as the number of distinct symbol changes made to the transmission media per second. Since UART signal
has only two levels (high and low), baud rate here is also equal to the bit rate.
RS-232 and DB-9
UART can be used to transfer data directly across any two devices. However the most common usage of
UART involves transfer of data from a PC (or other host computer) to a remote board (other slave
device). Under such scenarios (where distance between two devices is more than a few inches), physical
8/3/2019 Memory and i
38/52
interface between Tx and Rx devices is defined by RS-232 specifications. Signals at each end are
terminated to a 9-pin (DB-9) connector.
Debugging UART Interface
Following steps could be helpful while debugging communication problems on a UART interface
(a) UART loop-back: Run the internal loop-back tests on both Rx and Tx (most UART devices provide
this functionality). This will ensure that each device is functional (not damaged)
(b) Check the Configuration: If the communication between two devices is failing, there could be a
configuration mismatch between Tx and Rx. Cross-check the configuration at both sides and ensure that it
is identical.
(c) Check the Serial Cable: Generally two UARTs are connected through a serial cable (which has 9-pin
connectors on both sides). The cable should be a cross-over (Tx on one side connects to Rx on other side).
A faulty (damaged or wrong corssings) serial cable can also cause erratic behavior. Make sure that cable
is not damaged.
(d)Probe the Tx signal: If UART communication still remains erratic (after checks a, b and c), the last
resort would be to probe the UART signals using a scope.
Limitation: Both the sender and receive should agree to a predefined configuration (Baud Rate, Parity
Settings, number of data and stop bits). A mismatch in the configuration at two ends (Transmitter and
Receiver), will cause communication failure (data corruption). Data rates are very slow. Also, if there are
more devices involved in communication, the number of external pins needed on the device increase
proportionally.
(b) SPI
Serial Peripheral Interface (SPI) provides an easy way to communicate across various (SPI compatible)
Devices in a system. SPI involves synchronous data transfers. Example of SPI compatible peripherals are
Microprocessors, Data Converters and LCD Displays. Communication on SPI bus occurs with a Master
and Slave relationship. Generally, a Micro-processors acts as the SPI bus master, and peripheral devices
(such as Data Converters or Displays) act as slave devices. At times, there could be multiple micro-
processors (or CPUs) on a given SPI bus. In such cases, a HOST processor wil act as SPI Master, and
other processors will act as SPI slaves. Multi-master configurations (though rarely used) are also possible.
SPI is a four wire interface. The fours signals on SPI bus are:
* CLK : Clock signal is used for synchronizing the data transfers. It is output from Master and Input to the
slave.
* MISO: stands for Master In Slave Out. As the name suggests it is output from Slave and Input to the
Master. This signal is used for transferring data from Slave Device to the Master Device.
8/3/2019 Memory and i
39/52
* MOSI: stands for Master Out Slave In. This signal is an output from Master and is input to the slave. It
is used for transferring data from Master Device to Slave device.
* SSEL: Slave Select is output from the Master and is an input to the slave. This signal needs to be
asserted (by the Master) for any transfers to be recognized by the slave. In a multi-slave configuration,
Master device can have multiple slave select signals (one for each slave) and only the currently selected
slave (corresponding SSEL signal asserted) will acknowledge the data transfers.
Multiple Slave Scenario
Under SPI protocol, one Master device can be connected to multiple slave devices through multiple SSEL
lines. Master can assert SSEL for only the device, with who master wants to communicate. Selecting
multiple slaves at a time, can damage the MISO pin (since multiple slaves will try to drive this line).
Multi-mas