Memory and i

8/3/2019 Memory and i

1/52

MEMORY AND I/O INTERFACING

MEMORY

Memory is an important part of embedded systems. The cost and performance of an embedded

system heavily depends on the kind of memory devices it utilizes. In this section we will discuss

about Memory Classification, Memory Technologies and Memory Management.

(1) Memory Classification

Memory Devices can be classified based on following characteristics

(a) Accessibility

(b) Persitance of Storage

(c) Storage Density & Cost

(d) Storage Media(f) Power Consumption

Accessibility

Memory devices can provide Random Access, Serial Access or Block Access. In a Random

Access memory, each word in memory can be directly accessed by specifying the address of this

memory word. RAM, SDRAMs, and NOR Flash are examples of Random Access Memories. In

a Serial Access Memory, all the previous words (previous to the word being accessed) need to be

accessed, before accessing a desired word. I2C PROM and SPI PROM are examples of Serial

Access Memories. In Block Access Memories, entire memory is sub-divided in to small blocks

(generally of the order of a KByte) of memory. Each block can be randomly accessed, and each

word in a given block can be serially accessed. Hard Disks and NAND flash employ a similar

mechanism. Word access time for a RAM (Random Access Memory) is independent of the

word location. This is desirable of high speed application making frequent access to the memory.

Persistence of Storage

Memory devices can provide Volatile storage or a non-Volatile stroage. In a non-Volatile

storage, the memory contents are preserved even after power shut down. Whereas a Volatile

memory looses its contents, after power shut down. Non-Volatile storage is needed for storing

application code, and re-usable data. However volatile memory can be used for all temporary

storages. RAM, SDRAM are examples of volatile memory. Hard Disks, Flash (NOR & NAND)

Memories, SD-MMC, and ROM are example of non-Volatile storages.


2/52

Storage Cells

Memory Device may employ electronic (in terms of transistors or electron states) storage,

magnetic storage or optical storage. RAM, SDRAM are examples of electronic storage. Hard

Disks are example of magnetic storage. CDs (Compact Discs) are example of optical storage.

Old Computers also employed magnetic storage (magnetic storages are still common in some

consumer electronics products).

Storage Density & Cost

Storage Density (number of bits which can be stored per unit area) is generally a good meausre

of cost. Dense memories (like SDRAM) are much cheaper than their counterparts (like SRAM).

Power Consumption

Low Power Consumption is highly desirable in Battery Powered Embedded Systems. Such

systems generally employ memory devices which can operate at low (and ultra low) Voltage

levels. Mobile SDRAMs are example of low power memories.

(2) Memory Technologies

RAM

RAM stands for Random Access Memory. RAMs are simplest and most common form of data

storage. RAMs are volatile. The figure below shows typical Data, Address and Control Signals

on a RAM. The number of words which can be stored in a RAM are proportional (exponential of

two) to the number of address buses available. This severely restricts the storage capacity ofRAMs (A 32 GB RAM will require 36 Address lines) because designing circuit boards with

more signal lines directly adds to the complexity and cost.

DPRAM (Dual Port RAM)

DPRAM are static RAMs with two I/O ports. These two ports access the same memory locations

- hence DPRAMs are generally used to implement Shared Memories in Dual Processor Systems.

The operations performed on a single port are identical to any RAM. There are some common

problems associated with usage of DPRAM:

(a) Possible of data corruption when both ports are trying to access the same memory location -

Most DPRAM devices provide interlocked memory accesses to avoid this problem.

(b) Data Coherency when Cache scheme is being used by the processor accessing DPRAM -

This happens because any data modifications (in the DPRAM) by one processor are unknown to

the Cache controller of other processor. In order to avoid such issues, Shared memories are not


3/52

mapped to the Cacheable space. In case processor's cache configuration is not flexible enough (to

define the shared memory space as non-cacheable), the cache needs to be flushed before

performing any reads from this memory space.

Dynamic RAM

Dynamic RAMs use a different storage technique for data storage. A Static RAM has four

transistors per memory cell, whereas Dynamic RAMs have only one transistor per memory cell.

The DRAMs use capactive storage. Since the capacitor can loose charge, these memories need to

be refreshed periodically. This makes DRAMs more complex (because we need to have extra

control) and power consuming. However, DRAMs have a very high storage density (as

compared to static RAMs) and are much cheaper in cost. DRAMs are generally accessed in

terms of rows, columns and pages which significantly reduces the number of address buses

(another advantage over RAM). Generally you need a SDRAM controller (which manages

different SDRAM commands and Address translation) to access a SDRAM. Most of the modern

processors come with an on-chip SDRAM controller.

OTP- EPROM, UV-EPROM and EEPROM

EPROMs (Electrically Programmable writable Read Only Memory) are non-volatile memories.

Contents of ROM can be randomly accessed - but generally the word RAM is used to refer to

only the volatile random access memories. The operating voltage for writing in to the EPROMs

is much higher than the operating voltage. Hence you can write in to a PROM in-circuit (which

signifies ROM). You need special programming stations (which have write mechanism) to write

in to the EPROMs.

OTP-EPROMs are One Time Programmable. Contents of these memories can not be changed,

once written. UV-EPROM are UV erasable EPROMs. Exposure of memory cells, to UV light

erases the exisiting contents of these memories and these can be re-programmed after that.

EEPROM are Eletricaly Erasable EPROMs. These can be erased electrically (generally on the

same programming station where you write in to them). The write cycles (number of times you

can erase and re-write) for UV-EPROM and EEPROM is fairly limited. Erasable PROMs use

either FLOTOX (Floating gate Tunnel Oxide) or FAMOS (Floating gate Avalanche MOS)

technology.

Flash (NOR)


4/52

Flash (or NOR-Flash to be more accurate) are quite similar to EEPROM in usage and can be

considered in the class of EEPROM (since it is electically erasable). However there are a few

differences. Firstly, the flash devices are in-circuit programmable. Secondly, these are much

cheaper as compared to the conventional EEPROMs. These days (NOR) Flash are widely used

for storing the boot code.

NAND FLASH

These memories are more dense and cheaper than NOR Flash. However these memories are

block accessible, and can not be used for code execution. These devices are mostly used for Data

Storage (since it is cheaper than NOR flash). However some systems use them for storing the

boot codes (these can be used with external hardware or with built-in NAND boot logic in the

processor).

SD-MMC

SD-MMC cards provide a cheaper mean of mass storage. These memory cards can provide

storage capacity of the order of GBytes. These cards are very compact and can be used with

portable systems. Most modern hand-held devices requiring mass storage (e.g. still and video

cameras) use Memory cards for storage.

Hard Disc

Hard Discs are Optical Memory devices. These devices are bulky and they require another bulky

hardware (disk reader) for reading these memories. These memories are generally used for Mass

storage. Hence they memories do not exist in smaller and portable systems. However these

memories are being used in embedded systems which require bulk storage without any size

constraint.

(3) Memory Management

Cache Memory

Size and the Speed (access time) of the computer memories are inversally proportional.

Increasing the size means reduction in speed. Infact most of the memories are made up of

smaller memory blocks (generally 4 KB) in order to improve the speed. Cost of the memory is

also highly dependent on the memory speed. In order to achieve a good performance it is

desirable that code and data must reside in a high speed memory. However using a high speed

memory for all the code and data in a reasonably large system may be practically impossible.

Even in a smaller system, using high speed memory as the only storage device can raise the


5/52

system cost exponentially.

Most Systems employ a heirarichal memory system. They employ a small and fast (and

expensive) memory device to store the frequently used code and data, whereas less frequently

used data is stored in a big low speed (cheper) memory device. In a complex system there can be

multiple level (with speed and cost) of memory heierarchy).

Cache controller is a hardware (Generally built in to the processor) which can dynamically move

the currently being used code and data from a higher level (slower) memory to the lower level

(zero level or cache) memory. The in coming data or code replaces the old code or data (which is

currently not being used) in the cache memory. The data (or code) movement is hidden to the

user.

Cache memories are based on the principle of locality in space and time. There are different

types of cache mechanism and replacement mechanism.

Software Overlays

Why Overlays

Low cost micro-processor generally do not have an in-built cache controller. But on these

devices it may be still desirable to keep the currently being used code (or data) in internal

memory and replace it with a new code section when it is not being used. This can be done using

Software Overlays. Either code or data overlays can be used. In this section we will only

discuss about code overlays (you can draw similar analogy for data overlays).Overlay Basics

(a) Each code section which is mapped to an overlay has a run space and a live space. Live space

is a space in the external (or high level) memory, where this code section resides, at non-runtime.

Run space is a space in the internal (or lower level) memory, where this code resides during

execution.

(b) Overlay Manager is a piece of software which dynamically moves the code sections from live

space to run space (whenever a function from given overlay section is called).

(c) Linker and Loader tools generate overlay symbols corresponding to the code sections which

are mapped to overlays. The overlay symbols are also supplemented by the information about

run-space and live-space of the given overlay. This information is used by the overlay manager

to move the overlays dynamically.


6/52

(d) You can have multiple overlays in your system. The overlay sections for a given overlay,

have different live-space but the same run-space.

Implementing overlays

(a) Firstly you need to make sure that your code generation tools (linker and loader) provide

some minimum support (in terms of overlays symbols) needed for the overlays.

(b) Secondly you need to identify mutual exclusive code sections in your application. Mutually

exclusive means that only one of these code section could be used at any given point of time.

Also make sure that switching time between these code sections (i.e. the average time after

which the processor will require some code from a different section) is quite high. Else, software

overlays will degrade the performance (rather than improving it).

(c) Make sure that you have enough run-space to accomodate the largest overlay section.

(d) While implementing the code overlays, you can still choose to keep some code sections

(which are not likely to improve the performance if used as overlays) out of overlays (these

sections will have same live-space and run-space).

Data overlays are analogous to code overlays. But there are rarely used.

Virtual Memory

Virtual Memory Mechanism allows users to store there data in a Hard Disk, whereas still use it

as if it was available in RAM. The application makes accesses to the data in virtual address space

(which is mapped to RAM), whereas the actuall data physically resides in Hard Disk (and ismoved to RAM for access).

Paging Mechanism

In virtual mode, memory is divided into pages usually 4096 bytes long (see page size). These

pages may reside in any available RAM location that can be addressed in virtual mode. The high

order bits in the memory address register are an index into page-mapping tables at specific

starting locations in memory and the table entries contain the starting real addresses of the

corresponding pages. The low order bits in the address register are an offset of 0 up to 4,095 (0

to the page size - 1) into the page ultimately referenced by resolving all the table references of

page locations.

The distinct advantages of Virtual Memory Mechanism are:

(a) User can access (in virtual space) more RAM space than what actually exists in the system.

(b) In a multi-tasking application, each task can have its own independent virtual address space


7/52

(called discrete address space).

(c) Applications can treat data as if it is stored in contiguous memory (in virtual address space),

whereas it may be in dis contiguous locations (in actual memory).

Cache Vs Virtual Memory

Cache Memory and Virtual Memory are quite similar in concept and they provide similar

benefits. However these schemes different significantly in terms of implementation:

* Cache control is fully implemented in hardware. Virtual Memory Management is done by

software (Operating System) with some minimum support from Hardware

* With cache memory in use, user still makes accesses to the actual physical memory (and cache

is hidden to the user). However it is reverse with Virtual Memory. User makes accesses to the

virtual memory and the actual physical memory is hidden to the user.

Cache memory

The cache is a small amount of high-speed memory, usually with a memory cycle time

comparable to the time required by the CPU to fetch one instruction. The cache is usually filled

from main memory when instructions or data are fetched into the CPU. Often the main memory

will supply a wider data word to the cache than the CPU requires, to fill the cache more rapidly.

The amount of information which is replaces at one time in the cache is called the line size for

the cache. This is normally the width of the data bus between the cache memory and the main

memory. A wide line size for the cache means that several instruction or data words are loaded

into the cache at one time, providing a kind of prefetching for instructions or data. Since the

cache is small, the effectiveness of the cache relies on the following properties of most programs:

Spatial locality-- most programs are highly sequential; the next instruction

usually comes from the next memory location.

Data is usually structured, and data in these structures normally are stored in contiguous

memory locations.

Short loops are a common program structure, especially for the innermost

sets of nested loops. This means that the same small set of instructions is

used over and over.

Generally, several operations are performed on the same data values, or variables.

When a cache is used, there must be some way in which the memory controller determines

whether the value currently being addressed in memory is available from the cache. There are


8/52

several ways that this can be accomplished. One possibility is to store both the address and the

value from main memory in the cache, with the address stored in a type of memory called

associative memory or, more descriptively, content addressable memory.

An associative memory, or content addressable memory, has the property that when a value is

presented to the memory, the address of the value is returned if the value is stored in the

memory, otherwise an indication that the value is not in the associative memory is returned.All

of the comparisons are done simultaneously, so the search is performed very quickly. This type

of memory is very expensive, because each memory location must have both a comparator and a

storage element. A cache memory can be implemented with a block of associative memory,

together with a block of ``ordinary'' memory. The associative memory would hold the address of

the data stored in the cache, and the ordinary memory would contain the data at that address.

Such a cache memory might be configured as shown in Figure .

Figure: A cache implemented with associative memory

If the address is not found in the associative memory, then the value is obtained from main

memory.

Associative memory is very expensive, because a comparator is required forevery wordin the

memory, to perform all the comparisons in parallel. A cheaper way to implement a cache

memory, without using expensive associative memory, is to use direct mapping. Here, part of the

memory address (usually the low order digits of the address) is used to address a word in the

cache. This part of the address is called the index. The remaining high-order bits in the address,

called the tag, are stored in the cache memory along with the data.

For example, if a processor has an 18 bit address for memory, and a cache of 1 K words of 2

bytes (16 bits) length, and the processor can address single bytes or 2 byte words, we might have

the memory address field and cache organized as in Figure .

Figure: A direct mapped cache configuration

This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there

are 4 other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for

each byte and one for the tag. The parity bits are used to check that a single bit error has not

occurred to the data while in the cache. A fourth bit, called the valid bitis used to indicate
http://web.cs.mun.ca/~paul/cs3725/material/web/notes/node3.html#figamcachehttp://web.cs.mun.ca/~paul/cs3725/material/web/notes/node3.html#figamcache


9/52

whether or not a given location in cache is valid. In the PDP-11/60 and in many other processors,

the cache is not updated if memory is altered by a device other than the CPU (for example when

a disk stores new data in memory). When such a memory operation occurs to a location which

has its value stored in cache, the valid bit is reset to show that the data is ``stale'' and does not

correspond to the data in main memory. As well, the valid bit is reset when power is first applied

to the processor or when the processor recovers from a power failure, because the data found in

the cache at that time will be invalid.

In the PDP-11/60, the data path from memory to cache was the same size (16 bits) as from cache

to the CPU. (In the PDP-11/70, a faster machine, the data path from the CPU to cache was 16

bits, while from memory to cache was 32 bits which means that the cache had effectively

prefetched the next instruction, approximately half of the time). The amount of information

(instructions or data) stored with each tag in the cache is called the line size of the cache. (It is

usually the same size as the data path from main memory to the cache.) A large line size allows

the prefetching of a number of instructions or data words.Allitems in a line of the cache are

replaced in the cache simultaneously, however, resulting in a larger block of data being replaced

for each cache miss.

The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K

bytes. For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a

cache configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes

would be typical, and line sizes of 1, 2 or 4 words would be typical).

Figure: One possible MIPS cache organization

A characteristic of the direct mapped cache is that a particular memory address can be mapped

into only one cache location. Many memory addresses are mapped to the same cache location (in

fact, all addresses with the same index field are mapped to the same cache location.) Whenever a

``cache miss'' occurs, the cache line will be replaced by a new line of information from main

memory at an address with the same index but with a different tag.

Note that if the program ``jumps around'' in memory, this cache organization will likely not be

effective because the index range is limited. Also, if both instructions and data are stored in

cache, it may well happen that both map into the same area of cache, and may cause each other


10/52

to be replaced very often. This could happen, for example, if the code for a matrix operation and

the matrix data itself happened to have the same index values.

A more interesting configuration for a cache is theset associative cache, which uses aset

associative mapping. In this cache organization, a given memory location can be mapped to

more than one cache location. Here, each index corresponds to two or more data words, each

with a corresponding tag. A set associative cache with n tag and data fields is called an ``n-way

set associative cache''. Usually , fork= 1, 2, 3 are chosen for a set associative cache (k= 0

corresponds to direct mapping). Such n-way set associative caches allow interesting tradeoff

possibilities; cache performance can be improved by increasing the number of ``ways'', or by

increasing the line size, for a given total amount of memory. An example of a 2-way set

associative cache is shown in Figure , which shows a cache containing a total of 2K lines, or 1 K

sets, each set being 2-way associative. (The sets correspond to the rows in the figure.)

Figure: A set-associative cache organization

In a 2-way set associative cache, if one data word is empty for a read operation corresponding to

a particular index, then it is filled. If both data words are filled, then one must be overwritten by

the new data. Similarly, in an n-way set associative cache, if all n data and tag fields in a set are

filled, then one value in the set must be overwritten, or replaced, in the cache by the new tag and

data values. Note that an entire line must be replaced each time. The most common replacement

algorithms are:

Random -- the location for the value to be replaced is chosen at random from all n of the

cache locations at that index position. In a 2-way set associative cache, this can be

accomplished with a single modulo 2 random variable obtained, say, from an internal

clock.

First in, first out (FIFO) -- here the first valuestoredin the cache, at each index position,

is the value to be replaced. For a 2-way set associative cache, this replacement strategy

can be implemented by setting a pointer to the previously loaded word each time a new

word isstoredin the cache; this pointer need only be a single bit. (For set sizes > 2, this

algorithm can be implemented with a counter value stored for each ``line'', or index in the

cache, and the cache can be filled in a ``round robin'' fashion).


11/52

Least recently used (LRU) -- here the value which was actually used least recently is

replaced. In general, it is more likely that the most recently used value will be the one

required in the near future. For a 2-way set associative cache, this is readily implemented

by setting a special bit called the ``USED'' bit for the other word when a value is

accessedwhile the corresponding bit for the word which was accessed is reset. The value

to be replaced is then the value with the USED bit set. This replacement strategy can be

implemented by adding a single USED bit to each cache location. The LRU strategy

operates by setting a bit in the other word when a value isstoredand resetting the

corresponding bit for the new word. For an n-way set associative cache, this strategy can

be implemented by storing a modulo n counter with each data word. (It is an interesting

exercise to determine exactly what must be done in this case. The required circuitry may

become somewhat complex, for large n.)

Cache memories normally allow one of two things to happen when data is written into a memory

location for which there is a value stored in cache:

Write through cache -- both the cache and main memory are updated at the same time.

This may slow down the execution of instructions which write data to memory, because

of the relatively longer write time to main memory. Buffering memory writes can help

speed up memory writes if they are relatively infrequent, however.

Write back cache -- here only the cache is updated directly by the CPU; the cache

memory controller marks the value so that it can be written back into memory when the

word is removed from the cache. This method is used because a memory location may

often be altered several times while it is still in cache without having to write the value

into main memory. This method is often implemented using an ``ALTERED'' bit in the

cache. The ALTERED bit is set whenever a cache value is written into by the processor.

Only if the ALTERED bit is set is it necessary to write the value back into main memory

(i.e., only values which have been altered must be written back into main memory). The

value should be written back immediately before the value is replaced in the cache.

The MIPS R2000/3000 processors used the write-through approach, with a buffer for the

memory writes. (This was also the approach taken by the The VAX-11/780 processor ) In

practice, memory writes are less frequent than memory reads; typically for each memory write,

an instruction must be fetched from main memory, and usually two operands fetched as well.


12/52

Therefore we might expect about three times as many read operations as write operations. In

fact, there are often many more memory read operations than memory write operations.

Figure shows the behaviour (actually the miss ratio, which is equal to 1 - the hit ratio) for cache

memories with various combinations of total cache memory capacity and line size. The results

are from simulations of the behaviour of several ``typical'' program mixes. Several interesting

things can be seen from these figures; Figure shows that the miss ratio drops consistently with

cache size. Note, also, that increasing the line size is not always effective in increasing the

throughput of the processor, although it does decrease the hit ratio, because of the additional time

required to transfer large lines of data from the main memory to the cache.

Figure: Cache memory performance for various line sizes

It is interesting to plot the same data using log-log coordinates. Note that in this case. the graph is

(very) roughly linear. Figure shows this plot.

Figure: Log-log plot of cache performance for various line sizes

The way size, or degree of associativity, of a cache also has an effect on the performance of a

cache; the same reference determined that, for a fixed cache size, there was a roughly constant

ratio between the performance of caches with a given set associativity and direct-mapped caches,

independent of cache size. This relation is shown in Figure . (Of course, the performance of the

set associative caches improved with associativity.)

Figure: Cache adjustments for associatively (relative to direct mapping)

MEMORY MANAGEMENT UNIT

Modern MMUs typically divide the virtual address space (the range of addresses used by the

processor) intopages, each having a size which is a power of 2, usually a few kilobytes, but they

may be much larger. The bottom n bits of the address (the offset within a page) are left

unchanged. The upper address bits are the (virtual) page number. The MMU normally translates

virtual page numbers to physical page numbers via an associative cache called a Translation

Lookaside Buffer(TLB). When the TLB lacks a translation, a slower mechanism involving

hardware-specific data structures or software assistance is used. The data found in such data
http://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Buffer


13/52

structures are typically calledpage table entries (PTEs), and the data structure itself is typically

called apage table. The physical page number is combined with the page offset to give the

complete physical address.

A PTE or TLB entry may also include information about whether the page has been written to

(the dirty bit), when it was last used (the accessed bit, for a least recently usedpage replacement

algorithm), what kind of processes (user mode, supervisor mode) may read and write it, and

whether it should be cached.

Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physical

random access memory has been allocated to that virtual page. In this case the MMU signals a

page fault to the CPU. The operating system (OS) then handles the situation, perhaps by trying to

find a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If no

RAM is free, it may be necessary to choose an existing page (known as a victim), using some

replacement algorithm, and save it to disk (this is called "paging"). With some MMUs, there can

also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new

mapping.

In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory

protection: an OS can use it to protect against errant programs, by disallowing access to memory

that a particular program should not have access to. Typically, an OS assigns each program its

own virtual address space.

An MMU also reduces the problem offragmentation of memory. After blocks of memory have

been allocated and freed, the free memory may become fragmented (discontinuous) so that the

largest contiguous block of free memory may be much smaller than the total amount. With

virtual memory, a contiguous range of virtual addresses can be mapped to several non-

contiguous blocks of physical memory.

In some early microprocessordesigns, memory management was performed by a separate

integrated circuit such as the VLSI VI475 or the Motorola 68851 used with the Motorola 68020

CPU in the Macintosh II or the Z8015 used with the Zilog Z80 family of processors. Later

microprocessors such as the Motorola 68030 and the ZILOG Z280placed the MMU together

with the CPU on the same integrated circuit, as did the Intel 80286 and laterx86

microprocessors.
http://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Fragmentation_(computer)http://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Motorola_68851http://en.wikipedia.org/wiki/Motorola_68020http://en.wikipedia.org/wiki/Macintosh_IIhttp://en.wikipedia.org/wiki/Zilog_Z80http://en.wikipedia.org/wiki/Motorola_68030http://en.wikipedia.org/wiki/ZILOG_Z280http://en.wikipedia.org/wiki/Intel_80286http://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Fragmentation_(computer)http://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Motorola_68851http://en.wikipedia.org/wiki/Motorola_68020http://en.wikipedia.org/wiki/Macintosh_IIhttp://en.wikipedia.org/wiki/Zilog_Z80http://en.wikipedia.org/wiki/Motorola_68030http://en.wikipedia.org/wiki/ZILOG_Z280http://en.wikipedia.org/wiki/Intel_80286http://en.wikipedia.org/wiki/X86


14/52

While this article concentrates on modern MMUs, commonly based on pages, early systems used

a similar concept forbase-limit addressing, that further developed into segmentation. Those are

occasionally also present on modern architectures. The x86 architectureprovided segmentation

rather than paging in the 80286, and provides both paging and segmentation in the 80386 and

later processors (although the use of segmentation is not available in 64-bit operation).

Interrupts

We just discussed how CALL and JUMP instructions can break the linear code flow in an

application. Another event which can cause the change in program flow is called

"INTERRUPT". Interrupts are signals (Hardware or Software) which can cause the program

sequence to stop the normal program flow and execute instructions from a certain pre-defined

location (known as Interrupt Vector Address). Interrupts can be triggered by a Hardware (e.g.

state of an external CPU pin) or a Software (e.g. An illegal instruction execution like divide by

ZERO) event. A CPU can have multiple interrupt channels and each of these channels will have

its unique interrupt vector address. When an interrupt occurs, program sequencer starts

processing instructions from the Interrupt Vector Address (of the associated interrupt channel).

Similar to CALL instruction, the Return Address (address of the instruction which would have

been fetched in absence of an interrupt event) is saved in one of the processor registers (some

CPUs also save the current system state along with return address). An RTI (Return From

Interrupt) instruction (similar to RTS) can bring the program flow back to the Return Address.

The code which is stored at Interrupt Vector Address is called Interrupt Service Routine (ISR).

RTI instruction generally forms the last instruction of ISR.

Interrupt Controller: Is a Hardware inside the Processor which is responsible for managing the

interrupt operations.

Enabling Interrupts : Interrupts (on most processors) can be enabled or disabled by the

programmer using a (Global) Interrupt Enable Bit. Interrupt Controllers also provide option for

enabling or disabling each individual interrupt (on a local level).

Interrupt Masking: Interrupt Mask is a control word (generally stored in a Interrupt Mask

Register) which can be used to temporarily disable an interrupt (on a particular channel). The
http://en.wikipedia.org/wiki/Page_(computing)http://en.wikipedia.org/w/index.php?title=Base-limit_addressing&action=edit&redlink=1http://en.wikipedia.org/wiki/Segmentation_(memory)http://en.wikipedia.org/wiki/X86_architecturehttp://en.wikipedia.org/wiki/80286http://en.wikipedia.org/wiki/80386http://en.wikipedia.org/wiki/Page_(computing)http://en.wikipedia.org/w/index.php?title=Base-limit_addressing&action=edit&redlink=1http://en.wikipedia.org/wiki/Segmentation_(memory)http://en.wikipedia.org/wiki/X86_architecturehttp://en.wikipedia.org/wiki/80286http://en.wikipedia.org/wiki/80386


15/52

Interrupt Mask contains control bits (mask bits) for each interrupt channel. If this bit is set, the

interrupt for the corresponding interrupt channel is temporarily masked (and it remains masked

unless the mask bit is cleared).

Interrupt Priority : Interrupt Channels are associated with different priority levels. If two

interrupts are acknowledged by the Interrupt Controller at same time, then the higher priority

interrupt is processed first. Interrupt Priority Scheme helps to ensure that more important

(interrupt) events gets processed first (as compared to less critical events. Critical Events (e.g.

system power failure) are assigned with highest priority.

Interrupt Mapping: Some Interrupt Controllers also provide flexibility of mapping the interrupt

sources (events that generate events) to any of the available interrupt channel. This scheme has

two major advantages. Firstly, in a system, (generally) not all the interrupts sources are active at

a time. A fixed mapping (from source to channel) means that many of the interrupt channels will

be un-utilized. However with a flexible mapping, it is possible to provide lesser interrupt

channels (and active sources can be mapped to these channels). This reduces the Hardware

complexity of Interrupt controller, and hence cost. Interrupt controller can also provide provision

for mapping multiple sources to a single interrupt channel. In the ISR (for particular interrupt),

the interrupt source (out of many sources mapped to this channel) can be identified by reading

interrupt status register (this register has the corresponding bit set if an interrupt event occurs).

Secondly, the interrupt sources can be assigned to interrupt channels with different priorities,

based on the system requirement.

Interrupts can be categorized into: maskable interrupt, non-maskable interrupt (NMI), inter-

processor interrupt(IPI), software interrupt, and spurious interrupt.

Maskable interrupt (IRQ) is a hardware interrupt that may be ignored by

setting a bit in an interrupt mask register's (IMR) bit-mask.

Non-maskable interrupt (NMI) is a hardware interrupt that lacks an associated

bit-mask, so that it can never be ignored. NMIs are often used for timers,

especially watchdog timers.

Inter-processor interrupt (IPI) is a special case of interrupt that is generated

by one processor to interrupt another processor in a multiprocessor system.

Software interrupt is an interrupt generated within a processor by

executing an instruction. Software interrupts are often used to implement
http://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/IRQhttp://en.wikipedia.org/wiki/Interrupt_mask_registerhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Watchdog_timerhttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Multiprocessorhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/IRQhttp://en.wikipedia.org/wiki/Interrupt_mask_registerhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Watchdog_timerhttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Multiprocessor


16/52

system calls because they implement a subroutine call with a CPU ring level

change.

Spurious interrupt is a hardware interrupt that is unwanted. They are

typically generated by system conditions such as electrical interference on an

interrupt line or through incorrectly designed hardware.

Processors typically have an internal interrupt maskwhich allows software to ignore all external

hardware interrupts while it is set. This mask may offer faster access than accessing an interrupt

mask register (IMR) in a PIC, or disabling interrupts in the device itself. In some cases, such as

the x86 architecture, disabling and enabling interrupts on the processor itself act as a memory

barrier, however it may actually be slower.

An interrupt that leaves the machine in a well-defined state is called a precise interrupt. Such

an interrupt has four properties: The Program Counter (PC) is saved in a known place.

All instructions before the one pointed to by the PC have fully executed.

No instruction beyond the one pointed to by the PC has been executed (that

is no prohibition on instruction beyond that in PC, it is just that any changes they

make to registers or memory must be undone before the interrupt happens).

The execution state of the instruction pointed to by the PC is known.

An interrupt that does not meet these requirements is called an imprecise interrupt.

Modern MMUs typically divide the virtual address space (the range of addresses used by the

processor) intopages, each having a size which is a power of 2, usually a few kilobytes, but they

may be much larger. The bottom n bits of the address (the offset within a page) are left

unchanged. The upper address bits are the (virtual) page number. The MMU normally translates

virtual page numbers to physical page numbers via an associative cache called a Translation

Lookaside Buffer(TLB). When the TLB lacks a translation, a slower mechanism involving

hardware-specific data structures or software assistance is used. The data found in such data

structures are typically calledpage table entries (PTEs), and the data structure itself is typically

called apage table. The physical page number is combined with the page offset to give the

complete physical address.

A PTE or TLB entry may also include information about whether the page has been written to

(the dirty bit), when it was last used (the accessed bit, for a least recently usedpage replacement
http://en.wikipedia.org/wiki/System_callhttp://en.wikipedia.org/wiki/Ring_(computer_security)http://en.wikipedia.org/wiki/Electrical_interferencehttp://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/System_callhttp://en.wikipedia.org/wiki/Ring_(computer_security)http://en.wikipedia.org/wiki/Electrical_interferencehttp://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithm


17/52

algorithm), what kind of processes (user mode, supervisor mode) may read and write it, and

whether it should be cached.

Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physical

random access memory has been allocated to that virtual page. In this case the MMU signals a

page fault to the CPU. The operating system (OS) then handles the situation, perhaps by trying to

find a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If no

RAM is free, it may be necessary to choose an existing page (known as a victim), using some

replacement algorithm, and save it to disk (this is called "paging"). With some MMUs, there can

also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new

mapping.

In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory

protection: an OS can use it to protect against errant programs, by disallowing access to memory

that a particular program should not have access to. Typically, an OS assigns each program its

own virtual address space.

DMA

DMA (Direct Memory Access) provides an efficient way of Data Transfers across "a Peipheral

and Memory" or across "two memory regions". DMA is a processing engine which can perform

data transfer operations (to or from the Memory). In absence of DMA engine, the CPU needs to

handle these data operations, and hence the overall system performance is heavily reduced.DMA is specifically useful in the system which involve huge data transfers (in absence of DMA,

CPU will be busy doing these transfers most of the time and will not be available for other

processing).

DMA Parameters : DMA Transfers involve a Source and a Destination. DMA Engine Transfers

the data from Source to Destination. DMA engine requires source and destination addresses

along with the Transfer Count in order to perform the data transfers. The (Source or Destination)

Address could be a physical address (in case of a memory) or logical (in case of a peripheral).

Transfer Counts specifies number of words which need to be transferred. As we mentioned

before, Data transfer could be either from a Peripheral to Memory (generall called Received

DMA) or from a Memory to Peripheral (generally called Transmit DMA) or from a Memory to

another Memory (Generally called Memory DMA).

Some DMA engines support additional parameters like Word-Size, and Address-Increment in
http://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protection


18/52

addition to the Start Address and Transfer Count. Word-Size specify the size of each transfer.

Address-increment specifies the offset from current address (in memory), which the next transfer

should use. This provides a way of tranferring data from non-contiguous memory locations.

DMA Channels : DMA engine can support multiple DMA Channels. This means that at a given

time, multiple DMA Transfers can happen (though physcially only one transfer may be possible,

but logically DMA can handle many channels in parallel). This feature makes the life of software

programmer very easy (as he does not have to wait for the current DMA operations to finish

before he programs the next DMA operation). Each DMA channel will have control register

where the DMA Parameters can be specified. DMA Channels also have an interrupt associated

with it (on most processors) which (optionally) triggers after completion of DMA trasfer. Inside

the ISR, programmer can take specific action (e.g. do some processign on the data which has

been just received through DMA, or program a new DMA transfer).

Chained DMA : Certain DMA controllers support an option for specifying DMA parameters in a

Buffer (or array) in memory rather than directly writing it to DMA control registers (This is

mostly applicable for the second DMA operation - parameters for first DMA operation are still

specified in the control registers). This Buffer is called DMA Transfer Control Block (TCB).

DMA controller takes the address of DMA TCB as one of the parameters, (in addition to the

control parameters for first DMA transfer) and loads the DMA parameters (for second DMA

operation) automatically from the Memory (after first DMA Operation is over). The TCB also

contains an entry for "Next TCB Address", which provides an easy way for chaining multiple

DMA operations in an automatic fashion (rather than having to program it after completion of

each DMA). The DMA chaining can be stopped, by specifying a ZERO address in Next TCB

Address field.

Multi-diemnsional DMA : combined with Address-Increment gives many options.

The simplest way to use DMA is to select a processor with an internal DMA controller. This

eliminates the need for external bus buffers and ensures that the timing is handled

correctly. Also, an internal DMA controller can transfer data to on-chip memory and

peripherals, which is something that an external DMA controller cannot do. Because the

handshake is handled on-chip, the overhead of entering and exiting DMA mode is often

much faster than when an external controller is used.

If an external DMA controller or processor is used, be sure that the hardware handles the

transition between transfers correctly. To avoid the problem of bus contention, ensure that


19/52

bus requests are inhibited if the bus is not free. This prevents the DMA controller from

requesting the bus before the processor has reacquired it after a transfer.

So you see, DMA is not as mysterious as it sometimes seems. DMA transfers can provide

real advantages when the system is properly designed.

Figure 1: A DMA controller shares the processor's memory

Hardware interrupts were introduced as a way to avoid wasting the processor's valuable time in

polling loops, waiting for external events. They may be implemented in hardware as a distinct

system with control lines, or they may be integrated into the memory subsystem.

If implemented in hardware, an interrupt controller circuit such as the IBM PC's Programmable

Interrupt Controller(PIC) may be connected between the interrupting device and the processor's

interrupt pin to multiplex several sources of interrupt onto the one or two CPU lines typically

available. If implemented as part of the memory controller, interrupts are mapped into the

system's memory address space.

SERIAL PROTOCOLS

I2C Bus

The physical I2C bus

This is just two wires, called SCL and SDA. SCL is the clock line. It is used to synchronize all data

transfers over the I2C bus. SDA is the data line. The SCL & SDA lines are connected to all devices

on the I2C bus. There needs to be a third wire which is just the ground or 0 volts. There may also be

a 5volt wire is power is being distributed to the devices. Both SCL and SDA lines are "open drain"

drivers. What this means is that the chip can drive its output low, but it cannot drive it high. For the

line to be able to go high you must provide pull-up resistors to the 5v supply. There should be a

resistor from the SCL line to the 5v line and another from the SDA line to the 5v line. You only need

one set of pull-up resistors for the whole I2C bus, not for each device, as illustrated below:

The value of the resistors is not critical. I have seen anything from 1k8 (1800 ohms) to 47k (47000

ohms) used. 1k8, 4k7 and 10k are common values, but anything in this range should work OK. I

recommend 1k8 as this gives you the best performance. If the resistors are missing, the SCL and

SDA lines will always be low - nearly 0 volts - and the I2C bus will not work.

Masters and Slaves

The devices on the I2C bus are either masters or slaves. The master is always the device that drives

the SCL clock line. The slaves are the devices that respond to the master. A slave cannot initiate a
http://en.wikipedia.org/wiki/Polling_(computer_science)http://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Memory_controllerhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Polling_(computer_science)http://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Memory_controllerhttp://en.wikipedia.org/wiki/Address_space


20/52

transfer over the I2C bus, only a master can do that. There can be, and usually are, multiple slaves

on the I2C bus, however there is normally only one master. It is possible to have multiple masters,

but it is unusual and not covered here. On your robot, the master will be your controller and the

slaves will be our modules such as the SRF08 or CMPS03. Slaves will never initiate a transfer. Both

master and slave can transfer data over the I2C bus, but that transfer is always controlled by themaster.

The I2C Physical Protocol

When the master (your controller) wishes to talk to a slave (our CMPS03 for example) it begins by

issuing a start sequence on the I2C bus. A start sequence is one of two special sequences defined

for the I2C bus, the other being the stop sequence. The start sequence and stop sequence are

special in that these are the only places where the SDA (data line) is allowed to change while the

SCL (clock line) is high. When data is being transferred, SDA must remain stable and not change

whilst SCL is high. The start and stop sequences mark the beginning and end of a transaction with

the slave device.

Data is transferred in sequences of 8 bits. The bits are placed on the SDA line starting with the MSB

(Most Significant Bit). The SCL line is then pulsed high, then low. Remember that the chip cannot

really drive the line high, it simply "lets go" of it and the resistor actually pulls it high. For every 8 bits

transferred, the device receiving the data sends back an acknowledge bit, so there are actually 9

SCL clock pulses to transfer each 8 bit byte of data. If the receiving device sends back a low ACK

bit, then it has received the data and is ready to accept another byte. If it sends back a high then it is

indicating it cannot accept any further data and the master should terminate the transfer by sending

a stop sequence.

How fast?

The standard clock (SCL) speed for I2C up to 100KHz. Philips do define faster speeds: Fast mode,

which is up to 400KHz and High Speed mode which is up to 3.4MHz. All of our modules are

designed to work at up to 100KHz. We have tested our modules up to 1MHz but this needs a small

delay of a few uS between each byte transferred. In practical robots, we have never had any need to

use high SCL speeds. Keep SCL at or below 100KHz and then forget about it.

I2C Device Addressing

All I2C addresses are either 7 bits or 10 bits. The use of 10 bit addresses is rare and is not covered

here. All of our modules and the common chips you will use will have 7 bit addresses. This means

that you can have up to 128 devices on the I2C bus, since a 7bit number can be from 0 to 127.

When sending out the 7 bit address, we still always send 8 bits. The extra bit is used to inform the

slave if the master is writing to it or reading from it. If the bit is zero are master is writing to the slave.


21/52

If the bit is 1 the master is reading from the slave. The 7 bit address is placed in the upper 7 bits of

the byte and the Read/Write (R/W) bit is in the LSB (Least Significant Bit).

The placement of the 7 bit address in the upper 7 bits of the byte is a source of confusion for the

newcomer. It means that to write to address 21, you must actually send out 42 which is 21 movedover by 1 bit. It is probably easier to think of the I2C bus addresses as 8 bit addresses, with even

addresses as write only, and the odd addresses as the read address for the same device. To take

our CMPS03 for example, this is at address 0xC0 ($C0). You would uses 0xC0 to write to the

CMPS03 and 0xC1 to read from it. So the read/write bit just makes it an odd/even address.

The I2C Software Protocol

The first thing that will happen is that the master will send out a start sequence. This will alert all the

slave devices on the bus that a transaction is starting and they should listen in incase it is for them.

Next the master will send out the device address. The slave that matches this address will continue

with the transaction, any others will ignore the rest of this transaction and wait for the next. Having

addressed the slave device the master must now send out the internal location or register number

inside the slave that it wishes to write to or read from. This number is obviously dependant on what

the slave actually is and how many internal registers it has. Some very simple devices do not have

any, but most do, including all of our modules. Our CMPS03 has 16 locations numbered 0-15. The

SRF08 has 36. Having sent the I2C address and the internal register address the master can now

send the data byte (or bytes, it doesn't have to be just one). The master can continue to send data

bytes to the slave and these will normally be placed in the following registers because the slave will

automatically increment the internal register address after each byte. When the master has finished

writing all data to the slave, it sends a stop sequence which completes the transaction. So to write to

a slave device:

1. Send a start sequence

2. Send the I2C address of the slave with the R/W bit low (even address)

3. Send the internal register number you want to write to

4. Send the data byte

5. [Optionally, send any further data bytes]

6. Send the stop sequence.

As an example, you have an SRF08 at the factory default address of 0xE0. To start the SRF08

ranging you would write 0x51 to the command register at 0x00 like this:


2. Send 0xE0 ( I2C address of the SRF08 with the R/W bit low (even address)

3. Send 0x00 (Internal address of the command register)


22/52

4. Send 0x51 (The command to start the SRF08 ranging)


Reading from the Slave

This is a little more complicated - but not too much more. Before reading data from the slave device,

you must tell it which of its internal addresses you want to read. So a read of the slave actually startsoff by writing to it. This is the same as when you want to write to it: You send the start sequence, the

I2C address of the slave with the R/W bit low (even address) and the internal register number you

want to write to. Now you send another start sequence (sometimes called a restart) and the I2C

address again - this time with the read bit set. You then read as many data bytes as you wish and

terminate the transaction with a stop sequence. So to read the compass bearing as a byte from the

CMPS03 module:


2. Send 0xC0 ( I2C address of the CMPS03 with the R/W bit low (even address)

3. Send 0x01 (Internal address of the bearing register)

4. Send a start sequence again (repeated start)

5. Send 0xC1 ( I2C address of the CMPS03 with the R/W bit high (odd address)

6. Read data byte from CMPS03


The bit sequence will look like this:

Wait a moment

That's almost it for simple I2C communications, but there is one more complication. When the

master is reading from the slave, its the slave that places the data on the SDA line, but its the master

that controls the clock. What if the slave is not ready to send the data! With devices such as

EEPROMs this is not a problem, but when the slave device is actually a microprocessor with other

things to do, it can be a problem. The microprocessor on the slave device will need to go to an

interrupt routine, save its working registers, find out what address the master wants to read from, get

the data and place it in its transmission register. This can take many uS to happen, meanwhile the

master is blissfully sending out clock pulses on the SCL line that the slave cannot respond to. The

I2C protocol provides a solution to this: the slave is allowed to hold the SCL line low! This is called

clock stretching. When the slave gets the read command from the master it holds the clock line low.

The microprocessor then gets the requested data, places it in the transmission register and releases

the clock line allowing the pull-up resistor to finally pull it high. From the masters point of view, it will

issue the first clock pulse of the read by making SCL high and then check to see if it really has gone

high. If its still low then its the slave that holding it low and the master should wait until it goes high


23/52

before continuing. Luckily the hardware I2C ports on most microprocessors will handle this

automatically.

CAN BUS

Controller Area Network (CAN) is a multicast shared serial bus standard, originally

developed in the 1980s by Robert Bosch GmbH, for connecting electronic control

units (ECUs). CAN was specifically designed to be robust in electromagnetically

noisy environments and can utilize a differential balanced line like RS-485. It can be

even more robust against noise if twisted pair wire is used. Although initially

created for automotive purposes (as a vehicle bus), nowadays it is used in many

embedded control applications (e.g., industrial) that may be subject to noise.

Bit rates up to 1 Mbit/s are possible at networks length below 40 m. Decreasing the

bit rate allows longer network distances (e.g. 125 kbit/s at 500 m).The CAN data link layer protocol is standardized in ISO 11898-1 (2003). This

standard describes mainly the data link layer composed of the Logical Link

Control (LLC) sublayer and the Media Access Control (MAC) sublayer and some

aspects of the physical layer of the ISO/OSI Reference Model. All the other protoc ol

layers are left to the network designer's choice.

CAN transmit data through a binary model of "dominant" bits and "recessive" bits

where dominant is a logical 0 and recessive is a logical 1. If one node transmits a

dominant bit and another node transmits a recessive bit then the dominant bit"wins" (a logical AND between the two).

So, if you are transmitting a recessive bit, and someone sends a dominant bit, you

see a dominant bit, and you know there was a collision. (All other collisions are

invisible.) The way this works is that a dominant bit is asserted by creating a

voltage across the wires while a recessive bit is simply not asserted on the bus. If

anyone sets a voltage difference, everyone sees it, hence, dominant.

Commonly when used with a differential bus, a Carrier Sense Multiple

Access/Bitwise Arbitration (CSMA/BA) scheme is implemented: if two or moredevices start transmitting at the same time, there is a priority based arbitration

scheme to decide which one will be granted permission to continue transmitting.

During arbitration, each transmitting node monitors the bus state and compares the

received bit with the transmitted bit. If a dominant bit is received when a recessive

bit is transmitted then the node stops transmitting (i.e., it lost arbitration).


24/52

Arbitration is performed during the transmission of the identifier field. Each node

starting to transmit at the same time sends an ID with dominant as binary 0,

starting from the high bit. As soon as their ID is a larger number (lower priority)

they'll be sending 1 (recessive) and see 0 (dominant), so they back off. At the end

of ID transmission, all nodes bar one have backed off, and the highest priority

message gets through unimpeded.

Data transmissionFrames all frames (aka messages) begin with a start-of-frame (SOF) bit that, obviously, denotes

the start of the frame transmission.

CAN has four frame types:

Data frame: a frame containing node data for transmission

Remote frame: a frame requesting the transmission of a specific identifier

Error frame: a frame transmitted by any node detecting an error

Overload frame: a frame to inject a delay between data and/or remote frames

Data frameThe data frame is the only frame for actual data transmission. There are two message

formats:

Base frame format: with 11 identifier bits

Extended frame format: with 29 identifier bits

The CAN standard requires the implementation mustaccept the base frame format and may

accept the extended frame format, but musttolerate the extended frame format.

USB Protocols

Unlike RS-232 and similar serial interfaces where the format of data being sent is not defined,

USB is made up of several layers of protocols. While this sounds complicated, dont give up

now. Once you understand what is going on, you really only have to worry about the higher level

layers. In fact most USB controller I.C.s will take care of the lower layer, thus making it almost

invisible to the end designer.

Each USB transaction consists of ao Token Packet (Header defining what it expects to follow), an

o Optional Data Packet, (Containing the payload) and a

o Status Packet (Used to acknowledge transactions and to provide a

means of error correction)


25/52

As we have already discussed, USB is a host centric bus. The host initiates all transactions. The

first packet, also called a token is generated by the host to describe what is to follow and whether

the data transaction will be a read or write and what the devices address and designated endpoint

is. The next packet is generally a data packet carrying the payload and is followed by an

handshaking packet, reporting if the data or token was received successfully, or if the endpoint is

stalled or not available to accept data.

Common USB Packet Fields

Data on the USBus is transmitted LSBit first. USB packets consist of the following fields,

o Sync

All packets must start with a sync field. The sync field is 8 bits long at low and

full speed or 32 bits long for high speed and is used to synchronise the clock of

the receiver with that of the transmitter. The last two bits indicate where the PID

fields starts.

o PID

PID stands for Packet ID. This field is used to identify the type of packet that is

being sent. The following table shows the possible values.

P

a

c

k

e

t

I

d

e

n

t

i

f

i

e

Group


26/52

r


27/52

O

U

T

T

o

k

e

n

P

I

D

V

a

l

u

e

Token


28/52

0001 1001 IN Token

S

O

F

T

o

k

e

n

0101 1101 SETUP Token

Data 0011 DATA0

DA

T

A

1D

A

T

A

2

1

0

1

1

0111 1111 MDATA

ACK

Han

dsh

ake

Handshake

N

A

K


29/52

H

a

n

d

s

h

a

k

e

0

0

1

0S

T

A

L

L

H

an

d

s

h

a

k

e

10

1

0N

Y


30/52

E

T

(

N

o

R

e

s

p

o

n

s

e

Y

e

t

)

1

1

1

0P

R

E

a

mb

l

e

0

1

Special


31/52

1

0

1100 1100 ERR

1000 Split

0100 Ping

There are 4 bits to the PID, however to insure it is received correctly, the 4 bits

are complemented and repeated, making an 8 bit PID in total. The resulting

format is shown below.

P

I

D

2

P

I

D

3

n

P

I

D

0

n

P

I

D

1

n

P

I

D

2

n

P


32/52

I

D

3

o PID0PID1ADDR

The address field specifies which device the packet is designated for. Being 7 bitsin length allows for 127 devices to be supported. Address 0 is not valid, as any

device which is not yet assigned an address must respond to packets sent to

address zero.

o ENDP

The endpoint field is made up of 4 bits, allowing 16 possible endpoints. Low

speed devices, however can only have 2 additional endpoints on top of the default

pipe. (4 endpoints max)

o CRC

Cyclic Redundancy Checks are performed on the data within the packet payload.

All token packets have a 5 bit CRC while data packets have a 16 bit CRC.

o EOP

End of packet. Signalled by a Single Ended Zero (SE0) for approximately 2 bit

times followed by a J for 1 bit time.

USB Packet Types

USB has four different packet types. Token packets indicate the type of transaction to follow,

data packets contain the payload, handshake packets are used for acknowledging data or

reporting errors and start of frame packets indicate the start of a new frame.

o Token Packets

There are three types of token packets,

In - Informs the USB device that the host wishes to read

information.

Out - Informs the USB device that the host wishes to send

information.

Setup - Used to begin control transfers.

Token Packets must conform to the following format,

Sync PIDADD

R

END

P

CRC

5EOP


33/52

o Data Packets

There are two types of data packets each capable of transmitting up to 1024 bytes

of data.

Data0

Data1

High Speed mode defines another two data PIDs, DATA2 and MDATA.

Data packets have the following format,

Sync PID DataCRC

16EOP

Maximum data payload size for low-speed devices is 8 bytes.

Maximum data payload size for full-speed devices is 1023 bytes.

Maximum data payload size for high-speed devices is 1024bytes.

Data must be sent in multiples of bytes.

o Handshake Packets

There are three type of handshake packets which consist simply of the PID

ACK- Acknowledgment that the packet has been successfully

received.

NAK- Reports that the device temporary cannot send or

received data. Also used during interrupt transactions to informthe host there is no data to send.

STALL - The device finds its in a state that it requires

intervention from the host.

Handshake Packets have the following format,

Sync PID EOP

o Start of Frame Packets

The SOF packet consisting of an 11-bit frame number is sent by the host every

1ms 500ns on a full speed bus or every 125 s 0.0625 s on a high speed

bus.

F

r

a


34/52

m

e

N

u

m

b

e

r

C

R

C

5

E

O

P

SyncPIDUSB Functions

When we think of a USB device, we think of a USB peripheral, but a USB device could mean a

USB transceiver device used at the host or peripheral, a USB Hub or Host Controller IC device,

or a USB peripheral device. The standard therefore makes references to USB functions which

can be seen as USB devices which provide a capability or function such as a Printer, Zip Drive,

Scanner, Modem or other peripheral.

So by now we should know the sort of things which make up a USB packet. No? You're

forgotten how many bits make up a PID field already? Well don't be too alarmed. Fortunately

most USB functions handle the low level USB protocols up to the transaction layer (which we

will cover next chapter) in silicon. The reason why we cover this information is most USB

function controllers will report errors such as PID Encoding Error. Without briefly covering this,

one could ask what is a PID Encoding Error? If you suggested that the last four bits of the PID

didn't match the inverse of the first four bits then you would be right.

Most functions will have a series of buffers, typically 8 bytes long. Each buffer will belong to an

endpoint - EP0 IN, EP0 OUT etc. Say for example, the host sends a device descriptor request.


35/52

The function hardware will read the setup packet and determine from the address field whether

the packet is for itself, and if so will copy the payload of the following data packet to the

appropriate endpoint buffer dictated by the value in the endpoint field of the setup token. It will

then send a handshake packet to acknowledge the reception of the byte and generate an internal

interrupt within the semiconductor/micro-controller for the appropriate endpoint signifying it has

received a packet. This is typically all done in hardware.

The software now gets an interrupt, and should read the contents of the endpoint buffer and parse

the device descriptor request.

PCI LOCAL BUS

The PCI (Peripheral Component Interconnect) is a high performance Bus for interconnecting

chips, expansion boards, and memory cards. It was originated at Intel Inc. In the early

1990s as standard methods of interconnecting chips on a board. It was later adopted as an

indusVL-Bus stands for VESA Bus a cloacl bus architecture create by VESA(Video Electronics

Standards Association). Was popularly used in early 1990s computers.

Typically used for VGA cards that drove the graphics of the computer display:try standard

administered by the PCI Special Interest Group or the PCI SIG.

The basic form of the PCI presents a fusion of sorts between ISA and VL-Bus. It provides

direct access to system memory for connected devices, but uses a ?bridge to connect to

the frontside bus and therefore to the CPU. Basically, this means that it is capable of even

higher performance than VL-Bus while eliminating the potential for interference with the

CPU. PCI can connect more devices than VL-Bus, up to five external components. Each of

the five connectors for an external component can be replaced with two fixed devices on the

motherboard. Also, you can have more than one PCI bus on the same computer, although

this is rarely done. The PCI bridge chip regulates the speed of the PCI bus independently of

the CPU's speed. This provides a higher degree of reliability and ensures that PCI hardware

manufacturers know exactly what to design for.

PCI originally operated at 33 MHz using a 32-bit-wide path. Revisions to the standard

include increasing the speed from 33 MHz to 66 MHz and doubling the bit count to 64.

Currently, PCI-X provides for 64-bit transfers at a speed of 133 MHz for an amazing 1-Gbps

(gigabit per second) transfer rate.

PCI cards use 47 pins to connect provided there is a CPU. The PCI bus is able to work with

so few pins because of hardware ?multiplexing, which means that the device sends more
http://www.codepedia.com/1/PCIhttp://www.codepedia.com/1/Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/Intelhttp://www.pcisig.com/http://www.codepedia.com/1/VL-Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/devicehttp://www.codepedia.com/1/bridgehttp://www.codepedia.com/1/CPUhttp://www.codepedia.com/1/bithttp://www.codepedia.com/1/Multiplexhttp://www.codepedia.com/1/PCIhttp://www.codepedia.com/1/Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/Intelhttp://www.pcisig.com/http://www.codepedia.com/1/VL-Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/devicehttp://www.codepedia.com/1/bridgehttp://www.codepedia.com/1/CPUhttp://www.codepedia.com/1/bithttp://www.codepedia.com/1/Multiplex


36/52

than one signal over a single pin.. The connectors at the end of the card are connected to

the motherboard slot and are called gold fingers.

PERIPHERALS

Peripherals (of a processor) are its means of communicating with the external world.

(1) Peripheral Classification

Peripherals can be classified based on following characteristics

Simplex, Duplex & Semi Duplex

Simplex communication involves unidirectional data transfers. Duplex communication involves bi-

directional data transfers. Full Duplex interfaces have independent channels for transmission and

reception. Semi-duplex communication involves data bi-directional data transfers, however at a given

time, the data transfer is only possible in one direction. Semi-duplex interfaces involves the same

communication channel for both transmission and reception.

Serial Vs Parallel

Serial peripherals communicate over a single data line. The data at Tx end needs to be converted Parallel

to Serial before transmission and the data at Rx end needs to be converted Serial to Parallel after

reception. Serial peripherals imply less signal lines on the external interface and thus reduced hardware

(circuit board) complexity and cost. However the data rate on serial interfaces are fairly limited (as

compared to the parallel interface). At the same clock rate, parallel interface can transfer Nx data, as

compared to the serial interface (where N is the number of Data lines).

Synchronous Vs Asynchronous

Synchronous transfers are synchronized by a reference clock on the interface. This clock signal is

generally provided by one of the devices (who are communicating) on the interface, called master device.

However clock can also come from an external source.

Data Throughput


37/52

Interfaces can also be classified based on the data throughput they offers. Generally parallel interfaces

provide much more data throughput and are used for application data (this data needs to be processed by

the application). Serial interfaces offer less data throughputs, and are generally used to transfer

intermittent control data.

(2) Common Serial Peripherals

(a) UART (Universal Asynchronous Receiver Transmitter)

UART is one of the oldest and most simple serial interface. Generally UART is used to tranfer data

between different PCBs (Printed Circuit Boards). These PCBs can be either in the same system or across

differnt systems. In its simplest configuration, UART consists of two pin interface. One pin is used for

Transmission, and other for Reception.

The data on UART is transferred word by word. A word consists of Start Bit, Data bits (5 to 8), (and

optional parity bit) and (1, 1.5 or 2) Stop Bit. The individual bits of data word are transferred one by one

on the serial bus.

Start Bit: The Tx Line of a UART Transitter is high during periods of inactivity (when no communication

is taking place). When the transmitter wants to initiate a data transmission it sends one START bit (drives

the Tx line low) for one bit duration.

Data Bits:Number of data bits can be configured to any value between 5 and 8. UART employs LSB first

Transmission.

Parity Bit: One parity bit can be optionally transitted along with each data word. The parity bit can be

configured either as Odd or as even.

Stop Bit: After each word transmission, transmitter transmits Stop bits (drives the Tx line high). Number

of stop bits can be configured as 1, 1.5 or 2.

Asynchronous Transmission: UART data transfers are asynchronous. The transmitter transmits each bit

(of the word being transmitted) for a fixed duration (defined by baud rate). The receiver polls the value of

transmit line (of transmitter). In order to be able to receive the data correctly, receiver needs to be aware

of the duration for which each bit is transmitted (it is defined by baud rate).

Baud Rate: Baud is a measurement of transmission speed in asynchronous communication. It is defined

as the number of distinct symbol changes made to the transmission media per second. Since UART signal

has only two levels (high and low), baud rate here is also equal to the bit rate.

RS-232 and DB-9

UART can be used to transfer data directly across any two devices. However the most common usage of

UART involves transfer of data from a PC (or other host computer) to a remote board (other slave

device). Under such scenarios (where distance between two devices is more than a few inches), physical


38/52

interface between Tx and Rx devices is defined by RS-232 specifications. Signals at each end are

terminated to a 9-pin (DB-9) connector.

Debugging UART Interface

Following steps could be helpful while debugging communication problems on a UART interface

(a) UART loop-back: Run the internal loop-back tests on both Rx and Tx (most UART devices provide

this functionality). This will ensure that each device is functional (not damaged)

(b) Check the Configuration: If the communication between two devices is failing, there could be a

configuration mismatch between Tx and Rx. Cross-check the configuration at both sides and ensure that it

is identical.

(c) Check the Serial Cable: Generally two UARTs are connected through a serial cable (which has 9-pin

connectors on both sides). The cable should be a cross-over (Tx on one side connects to Rx on other side).

A faulty (damaged or wrong corssings) serial cable can also cause erratic behavior. Make sure that cable

is not damaged.

(d)Probe the Tx signal: If UART communication still remains erratic (after checks a, b and c), the last

resort would be to probe the UART signals using a scope.

Limitation: Both the sender and receive should agree to a predefined configuration (Baud Rate, Parity

Settings, number of data and stop bits). A mismatch in the configuration at two ends (Transmitter and

Receiver), will cause communication failure (data corruption). Data rates are very slow. Also, if there are

more devices involved in communication, the number of external pins needed on the device increase

proportionally.

(b) SPI

Serial Peripheral Interface (SPI) provides an easy way to communicate across various (SPI compatible)

Devices in a system. SPI involves synchronous data transfers. Example of SPI compatible peripherals are

Microprocessors, Data Converters and LCD Displays. Communication on SPI bus occurs with a Master

and Slave relationship. Generally, a Micro-processors acts as the SPI bus master, and peripheral devices

(such as Data Converters or Displays) act as slave devices. At times, there could be multiple micro-

processors (or CPUs) on a given SPI bus. In such cases, a HOST processor wil act as SPI Master, and

other processors will act as SPI slaves. Multi-master configurations (though rarely used) are also possible.

SPI is a four wire interface. The fours signals on SPI bus are:

* CLK : Clock signal is used for synchronizing the data transfers. It is output from Master and Input to the

slave.

* MISO: stands for Master In Slave Out. As the name suggests it is output from Slave and Input to the

Master. This signal is used for transferring data from Slave Device to the Master Device.


39/52

* MOSI: stands for Master Out Slave In. This signal is an output from Master and is input to the slave. It

is used for transferring data from Master Device to Slave device.

* SSEL: Slave Select is output from the Master and is an input to the slave. This signal needs to be

asserted (by the Master) for any transfers to be recognized by the slave. In a multi-slave configuration,

Master device can have multiple slave select signals (one for each slave) and only the currently selected

slave (corresponding SSEL signal asserted) will acknowledge the data transfers.

Multiple Slave Scenario

Under SPI protocol, one Master device can be connected to multiple slave devices through multiple SSEL

lines. Master can assert SSEL for only the device, with who master wants to communicate. Selecting

multiple slaves at a time, can damage the MISO pin (since multiple slaves will try to drive this line).

Multi-mas

Documents

Memory and i