C HAPTER 12 Computer Organization and Architecture © 2014 Cengage Learning Engineering. All Rights Reserved. 1 Computer Organization and Architecture:

2014 Cengage Learning Engineering. All Rights Reserved. 2 Input/Output Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Input/Output is concerned with the mechanisms by which information is moved round a computer and between a computer and peripherals.

2014 Cengage Learning Engineering. All Rights Reserved. 3 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.1 describes a generic system with a CPU, I/O controllers and peripherals, and a system bus that links the CPU to memory and peripherals. The word peripheral appears twice in Figure 12.1; it is used both to describe an external device such as a printer or a mouse connected to a computer, and its used to describe the controller that provides an appropriate interface between the external peripheral and the CPU

2014 Cengage Learning Engineering. All Rights Reserved. 4 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The processor and memory lie at the heart of the system. The peripheral interfaces, connecting the processor and its memory to peripherals, are shown in two boxes; one includes internal peripherals, such as disk drives, and the other includes external peripherals, such as modems, printers, and scanners.

2014 Cengage Learning Engineering. All Rights Reserved. 5 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Memory-mapped Peripherals Theres no fundamental difference between an I/O transaction and a memory access. Outputting a word to a peripheral is the same as storing a word in memory, and getting a word from a peripheral is exactly the same as reading a word from memory. Treating I/O transactions as memory accesses is called memory-mapped I/O. This doesnt mean that we can forget about I/O because its just like accessing memory, since the properties of random access memory are radically different from the properties of typical I/O systems.

2014 Cengage Learning Engineering. All Rights Reserved. 6 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Memory-mapped Peripherals When implementing I/O structures we have to take into account the characteristics of the I/O devices themselves; for example, when writing a file to a disk drive you might have to send a new byte of data every few microseconds. Figure 12.3 shows what a typical memory-mapped I/O port ( peripheral interface chip ) looks like to the processor.

2014 Cengage Learning Engineering. All Rights Reserved. 7 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements To the host CPU this peripheral appears as the sequence of consecutive memory locations described by Figure 12.4. The left-hand side of the peripheral interface shaded gray in Figure 12.3 looks exactly like a memory element as far as the CPU is concerned. The other half of the peripheral interface chip, shown in blue, is the peripheral side that performs the specific operations required by the interface.

2014 Cengage Learning Engineering. All Rights Reserved. 8 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The memory-mapped port of Figure 12.4 has four consecutive registers at addresses i, i + 1, i + 2, and i + 3. We have assumed that the peripheral is an 8-bit device and that its consecutive locations are each separated by one byte. In a system with a 32-bit data bus, the addresses of the registers would be i, i + 4, i + 8, and i + 12. The first location at address i contains a command register that defines the operating mode and characteristics of the peripheral. Most memory-mapped I/O ports can be configured to operate in several modes, according to the specific application.

2014 Cengage Learning Engineering. All Rights Reserved. 9 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The location at address i + 1 contains the ports status, which is set up by the associated peripheral. This status information can be read by the processor to determine whether the port is ready to take part in a data transaction or whether an error condition exists; for example, a printer connected to a memory-mapped I/O port might set an error bit to indicate that it is out of paper. In this example weve created generic status bits such as ERR out, ERR in, RDY out, RDY in. The locations at addresses at i + 2 and i + 3 are the addresses used to send data to the peripheral, or receive data from the peripheral.

2014 Cengage Learning Engineering. All Rights Reserved. 10 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Peripheral Register Addressing Mechanisms The command and data-to-peripheral registers are write-only, and the status and data-from-peripheral registers are read-only. A single address line can distinguish between two pairs of registers (i.e., command/status, the data in/data out). The processors read and write signals distinguish between the read-only and write-only registers. Table 12.1 demonstrates this register-addressing scheme. The peripheral provides four internal registers, but the processor sees only two unique locations, N and N + 4. The CPUs R/W* output is used to select one of two pairs of registers. When R/W* = 0, the write-only registers are selected and when R/W* = 1, the read-only registers are selected. Figure 12.5 emphasizes the way in which peripheral register space can be divided into read-only and write-only regions.

2014 Cengage Learning Engineering. All Rights Reserved. 11 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Register addressFunctionCPU addressR/W i statusN1 i + 1 data outN + 41 i + 2 controlN0 i + 3 data inN + 40

2014 Cengage Learning Engineering. All Rights Reserved. 12 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.6 illustrates a register file addressed by a counter. After the peripheral interface has been reset, the internal pointer is loaded with zero. Each successive access to the interface increments the pointer and selects the next register. Peripherals with auto-incrementing pointers are useful when the registers will always be accessed in sequence..

2014 Cengage Learning Engineering. All Rights Reserved. 13 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Peripheral Access and Bus Width Many peripherals have 8-bit wide buses and are interfaced to computers with 16 or 32 bits. Life is easy when 8-bit peripherals are connected to 8-bit data buses with 8- bit processors, or when 16-bit peripherals are connected to 16-bit buses with 16-bit processors. Things get more complicated when 8-bit peripherals are interfaced to 16- or 32-bit buses.

2014 Cengage Learning Engineering. All Rights Reserved. 14 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Two problems can arise when you interface an 8-bit peripheral to a 16-bit bus; endianism and the mapping of 8-bit registers onto a processors 16-bit address space. Consider the arrangement in Figure 12.7 where an 8-bit peripheral is interfaced to a 16-bit bus. The peripheral is connected to half the buss data lines. If the processor supports 8-bit bus transactions, all is well and the registers can be accessed at their byte addresses (at byte offsets 0, 1, 2, and 3).

2014 Cengage Learning Engineering. All Rights Reserved. 15 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements If the processor supports only 16-bit bus operations, when a 16-bit value is written to memory all 16-bits are put on the data bus. When the processor performs a byte access, it still carries out a word access but informs the processor interface or memory that only 8-bits are to be transferred. A separate control or address signal is required to specify whether the byte being accessed is the upper or lower byte at the current address.

2014 Cengage Learning Engineering. All Rights Reserved. 16 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements In this case, the peripheral is hard-wired to one half of the data bus and can respond only to either odd or even byte addresses. In a big-endian environment, the peripheral would be wired to data lines [0:7] and accessed at the odd address, whereas in a little-endian environment the peripheral would be wired to data lines [0:7] and accessed at even addresses. The peripherals four addresses would appear to the computer at byte offsets of 0, 2, 4, and 6.

2014 Cengage Learning Engineering. All Rights Reserved. 17 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Some processors have dedicated instructions to facilitate data transfer to byte-wide peripherals; for example, the 32-bit 68K has a MOVEP, move peripheral, instruction that copies 16- or 32-bit value to or from an 8-bit memory-mapped peripheral. Figure 12.8 shows a peripheral with four internal registers and a the CPUs address map, where the peripherals data space is mapped onto successive odd addresses in this big-endian processor's memory space.

2014 Cengage Learning Engineering. All Rights Reserved. 18 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.9 shows a peripheral with four 8-bit registers. The registers appear to the programmer as locations $08 0001, $08 0003, $08 0005, and $08 0007. Locations $08 0000, $08 0002, $08 0004, and $08 0006 cannot be accessed. MOVEP moves a 16/32-bit value between a register and a byte-wide peripheral. The contents of the register are moved to consecutive even (or odd) byte addresses; for example, MOVEP.L D2, (A0) copies the four bytes in register D2 to addresses: [A0] + 0, [A0] + 2, [A0] + 4, [A0] + 6, where A0 is a pointer register.

2014 Cengage Learning Engineering. All Rights Reserved. 19 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.10 demonstrates how a MOVEP.L D0, (A0) copies four bytes in D0 to successive odd addresses in memory, starting at location 08 0001 16. The suffix.L in 68K code indicates 32-bit operation and.B indicates a byte operation. The most-significant byte in the data register is transferred to the lowest address.

2014 Cengage Learning Engineering. All Rights Reserved. 20 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Without the MOVEP instruction, it would take the following code to move four bytes to a memory-mapped peripheral. MOVE.L #Peri, A0 ;A0 points to the memory-mapped peripheral MOVE.B D0,( 6,A0) ;Move least-significant byte of D0 to the peripheral ROR.L #8, D0 ;Rotate D0 to get the next 8 bits MOVE.B D0, (4,A0) ;Move the next byte, bits 8 to 15, to the peripheral ROR.L #8, D0 ;and so on MOVE.B D0, (2,A0) ROR.L #8, D0 MOVE.B D0, (0,A0) ROR.L #8, D0 After four rotations D0 is back to its old value

2014 Cengage Learning Engineering. All Rights Reserved. 21 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Preserving Order in I/O Operations RISC architectures provide only memory load and store operations and dont implement instructions that facilitate I/O operations. However, there are circumstances where RISC organization and memory- mapped I/O clash. Some memory-mapped peripherals have configuration and self-resetting status registers or autoincrementing pointers. Its important to access such peripherals in the appropriate programmer- defined sequence. Because superscalar RISC processors take an opportunistic approach to memory access, data can be stored in memory out-of-order. Such out-of-order memory accessing doesnt cause problems with data storage and retrieval, but it can disrupt memory-mapped I/O.

2014 Cengage Learning Engineering. All Rights Reserved. 22 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The PowerPC implements an EIEIO ( enforce in-order execution of I/O ) instruction that has no parameters but ensures that all memory accesses previously initiated are completed. Consider this example where two loads are followed by an addition. lwz r5, 1000(r0);load r5 from memory[1000] lwz r6, 1040(r0);load r6 from memory[1040] add r7, r5, r6;r7 = r5 + r6 When these instructions are executed, the processor may swap the order in which r5 and r6 are loaded from memory. As long as the first two loads are executed before the add instruction, the outcome is not dependent on the order of the loads. Addresses 1000 and 1040 are memory-mapped locations. If the peripheral is designed so that a read access to address 1000 updates a register at 1040, the sequence of the two load instructions becomes all-important and reversing their order may lead to an incorrect result.

2014 Cengage Learning Engineering. All Rights Reserved. 23 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Consider the following example where we have to update a peripheral. Because the register is accessed via a pointer, we write the register address to the peripherals pointer register before writing data to the register being pointed at. In this example, we want to load peripheral register number 35 with the value 99. The PowerPC code is: addi r5, r0, 35;r5 = 35 addi r6, r0, 99 ;r6 = 99 stw r5, 1234(r0);store 35 at memory location 1234 (the pointer) stw r6, 5678(r0);store 99 at memory location 5678 The two writes must be executed in the correct order. To ensure this, the PowerPC has three synchronization instructions, eieio, sync, and isync. The isync forces instructions or memory transactions to complete before continuing; that is, instructions prior to isync are executed and fetched instructions are discarded.

2014 Cengage Learning Engineering. All Rights Reserved. 24 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Then, a new fetch begins. EIEIO forces all posted writes to complete prior to any subsequent writes. The SYNC instruction forces all previous reads and writes to complete on the bus before executing any instructions after it. We can ensure that the previous code runs in the correct order by inserting an EIEIO between the writes. addi r5,r0,35;r5 = 35 addi r6,r0, 99;r6 = 99 stw r5,1234(r0) ;M[1234] = 35; we're changing register 35 eieio ;Make sure r5 is written before proceeding stw r6,5678(r0) ;M[5678] = 99; new register value is 99

2014 Cengage Learning Engineering. All Rights Reserved. 25 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Data Transfer Three concepts are vital to an understanding of data transfer: open- and closed-loop transfers, and data buffering. In an open-loop transfer, information is sent on its way and its correct reception is assumed. In a closed-loop transfer, the receiver actively acknowledges that the data has arrived. Data buffering is concerned with handling disparities between the rate at which data is transmitted and the rate at which it is consumed by the receiver.

2014 Cengage Learning Engineering. All Rights Reserved. 26 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Open-loop Data Transfers The simplest method of transmitting data is to put the data on a bus and assert a signal, data strobe, to indicate that it is available. Figure 12.11 illustrates an open-loop transmission between a peripheral interface component and an external peripheral (e.g., a printer). The processor moves data to the peripheral interface with its address and data buses and the peripheral interface puts the data on the bus.

2014 Cengage Learning Engineering. All Rights Reserved. 27 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The peripheral interface asserts a data available strobe, DAV*, to indicate to the peripheral that the data at its input terminal is valid. The peripheral reads the data and the peripheral interface negates its DAV* strobe to complete the transfer.

2014 Cengage Learning Engineering. All Rights Reserved. 28 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.12 provides a timing diagram for this information exchange which is called is open loop because there is no feedback to acknowledge that the data has indeed been received. If the peripheral is off line, busy, or just very slow, the data may not be read during the time for which it is available (i.e., DAV* asserted). Open loop data transfers are also called synchronous transfers because the device receiving the data must be synchronized with the device sending the data.

2014 Cengage Learning Engineering. All Rights Reserved. 29 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Closed-loop Data Transfers In a closed loop transfer the device receiving data returns an acknowledgment to the sender to close the loop. DAV* (data available) from the peripheral indicates the receipt of data.

2014 Cengage Learning Engineering. All Rights Reserved. 30 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The peripheral interface makes the data available and asserts DAV* at B to indicate that the data is valid just as in an open-loop data transfer. The peripheral receiving the data sees DAV* asserted and reads the data. In turn the peripheral asserts ACK* to inform the interface that the data has been accepted. The interface de-asserts DAV* to complete the exchange. This sequence is known as handshaking. Handshaking supports slow peripherals, because the transfer waits until the peripheral indicates its readiness by asserting ACK*.

2014 Cengage Learning Engineering. All Rights Reserved. 31 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The timing diagram in Figure 12.14 is called a handshake because the assertion of ACK* is a response to the assertion of DAV*. The advantage of a closed loop data transfer is that the originator of the data knows that it has been accepted and data cannot be lost because it was not read by the remote peripheral.

2014 Cengage Learning Engineering. All Rights Reserved. 32 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The handshaking closed-loop protocol can be taken a step further. The assertion of DAV* is met by the assertion of ACK* from the peripheral. At this point it is assumed that the data has been received and the data exchange ends. Figure 12.15 shows a fully interlocked handshake in which the sequence of events is more tightly defined and each event triggers the next event in sequence.

2014 Cengage Learning Engineering. All Rights Reserved. 33 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements At B in Figure 12.15 DAV* is asserted to indicate valid data and at C ACK* is asserted to indicate its receipt. The sequence continues with the negation of DAV* at point D. DAV* can be negated because the assertion of ACK* indicates that DAV* has been recognized. Negating DAV* indicates that its acknowledgement has been detected. The peripheral negates ACK* at E and removes the data at F after negating DAV*. Point F may come before point E because the removal of the data is a response to the negation of DAV* rather than to the negation of ACK*.

2014 Cengage Learning Engineering. All Rights Reserved. 34 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Buffering Data When data is transmitted over a bus, you either have to use it while it is valid, or capture it in a memory device. Figure 12.16 illustrates three input circuits. Figure 12.16(a) uses the instantaneous values on data inputs I 0 to I 3 ; that is, the current data values are used and it is necessary for the transmitter to maintain the data values while they are being used.

2014 Cengage Learning Engineering. All Rights Reserved. 35 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.16(b) illustrates single-buffered input using D flip-flops. When the data is to be read, the flip-flops are latched and the input captured. Single-buffered input captures data and holds it until the next time the latches are clocked.

2014 Cengage Learning Engineering. All Rights Reserved. 36 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.16(c) provides a solution to the problem where new data arrives before the previous value has been read. Incoming data is latched exactly as before. Data in the input latches is copied to a second set of latches, where it is buffered for a second time. The input side of the buffer can be capturing data while the output side is waiting for the old data to be read.

2014 Cengage Learning Engineering. All Rights Reserved. 37 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.17 gives the timing diagram of a double-buffered input system. The input arrives at fixed time intervals. Input samples are clocked into the input latches at regular intervals by clock C Ii, where i is the clock pulse number.

2014 Cengage Learning Engineering. All Rights Reserved. 38 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The FIFO A general solution to data buffering is provided by the first-in-first-out, FIFO, memory. Data is written into a FIFO queue one value at a time and read out in the same order. Once the data has been read it cannot be accessed again. A FIFO can be empty, partially filled, or full; they usually have output flags to indicate fully empty or partially full.

2014 Cengage Learning Engineering. All Rights Reserved. 39 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The simplest FIFO structure is a register with an input port that receives the data and an output port. The data source provides the FIFO input and a strobe. Similarly, the reader provides a strobe when it wants data. Figure 12.18 describes a FIFO, FULL indicates that no more data can be accepted and EMPTY indicates that no more data can be read. When data arrives at the input terminals, it ripples down the shift register until it arrives at the next free location.

2014 Cengage Learning Engineering. All Rights Reserved. 40 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.19 demonstrates a 10-stage FIFO as data is added and removed.

2014 Cengage Learning Engineering. All Rights Reserved. 41 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The FIFO is usually built around a random access memory element, that is arranged as a circular buffer. A read pointer and a write pointer keep track of the data in RAM.

2014 Cengage Learning Engineering. All Rights Reserved. 42 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.21 illustrates the structure of a dual-port RAM FIFO. The advantage of RAM-based FIFOs over register-based FIFOs is that the fall-through time of a RAM-based FIFO is constant and independent of its length.

2014 Cengage Learning Engineering. All Rights Reserved. 43 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.22 demonstrates the use of a typical FIFO in a system with a 32- bit computer using little-endian I/O and an 8-bit port using big-endian I/O. This FIFO is user-configurable and can be set up to perform bus matching ; that is its input and output buses may have different widths. Its port A interface is 32 bits wide and its port B interface is 8 bits wide. You can program it to perform the byte swapping required when data is copied from a little endian to a big endian system.

2014 Cengage Learning Engineering. All Rights Reserved. 44 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.23 gives the timing diagram for the case when two 32-bit words are read into the FIFO and eight 8-bit byes are read from it.

2014 Cengage Learning Engineering. All Rights Reserved. 45 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements I/O Strategy A computer implements I/O transactions in one of three ways. it can perform an individual I/O transaction at the point the operation is needed by programmed I/O it can execute another task until a peripheral signals its readiness to take part in an I/O transaction by interrupt-driven I/O it can ask special-purpose hardware to perform the I/O transaction by direct memory access, DMA, hardware. Computer systems may employ a mixture of these strategies.

2014 Cengage Learning Engineering. All Rights Reserved. 46 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Programmed I/O A typical memory-mapped peripheral has a flag bit that is set by the peripheral when it is ready to take part in a data transfer. In programmed I/O the computer interrogates the peripherals status register and proceeds when the peripheral is ready. We can express this operation in pseudocode as: REPEAT Read peripheral status UNTIL ready Transfer data to/from peripheral

2014 Cengage Learning Engineering. All Rights Reserved. 47 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Programmed I/O The operation REPEAT Read peripheral status UNTIL ready constitutes a polling loop, because the peripherals status is continually tested until it is ready to take part in the I/O transaction. In the following example, status bit RDY is set if the peripheral has data. If we take the I/O model of Figure 12.4 and translate the pseudocode into generic assembly language form to perform an input operation, we get ADR r1,i0 ;Register r1 points to the peripheral MOV r2,#Command;Define peripheral operating mode STR [r1],r2;Set up peripheral. Load the command Rpt1LDR r3,[r1,#2] ;Read input status word into r3 AND r3,r3,#1;Mask status to RDYIN bit BEQ Rpt1 ;Repeat until device ready LDR r3,[r1,#4];Read the data into r3.

2014 Cengage Learning Engineering. All Rights Reserved. 48 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Interrupt-driven I/O A more efficient I/O strategy uses an interrupt handling mechanism to deal with I/O transactions when they occur. The processor carries out another task until a peripheral requests attention. When the peripheral is ready, it interrupts the processor, carries out the transaction and then returns the processor to its pre-interrupt state.

2014 Cengage Learning Engineering. All Rights Reserved. 49 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The two peripheral interface components are each capable of requesting the processors attention. All peripherals have an active-low interrupt request output, IRQ*, that runs from peripheral to peripheral, and is connected to the processors IRQ* input.

2014 Cengage Learning Engineering. All Rights Reserved. 50 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Active-low means that a low voltage indicates the interrupt request state. The reason that the electrically low state is used as the active state is entirely because of the behavior of transistors; that is, it is an engineering consideration that dates back to the era of the open-collector circuit that could only pull a line down to zero.

2014 Cengage Learning Engineering. All Rights Reserved. 51 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Whenever a peripheral wants to take part in an I/O transaction, it asserts its IRQ* output and drives the IRQ* input to the CPU active low. The CPU detects that IRQ* has been asserted and responds to the interrupt request if it has not been masked.

2014 Cengage Learning Engineering. All Rights Reserved. 52 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Most processors have an interrupt mask register that allows you to turn off interrupts if the CPU is performing an important operation. Interrupts may be masked when the processor is performing a critical task; for example, a system using real-time monitoring of fast events would not defer to a keyboard input interrupt (even a fast typist is glacially slow compared to a computers internal operation). Similarly, recovery from a system failure such as a loss of power will be given priority.

2014 Cengage Learning Engineering. All Rights Reserved. 53 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The way in which a processor responds to an interrupt is device-dependent. The two peripherals in Figure 12.24 are wired to the common IRQ* line and the CPU cant determine which device interrupted. The CPU identifies the interrupting device by polling each peripherals status register until the interrupter has been located. Interrupt polling provides interrupt prioritization because important devices whose interrupt requests must be answered rapidly are polled first. In Figure 12.24 each memory-mapped peripheral has an interrupt vector register, IVR, that tells the processor how to find the appropriate interrupt handler. Typically, the IVR supplies a pointer to a table of interrupt vectors.

2014 Cengage Learning Engineering. All Rights Reserved. 54 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Interrupt Processing When an interrupt occurs, the computer first decides whether to service it or whether to ignore it. When the computer responds to the interrupt, it carries out the following sequence of actions. It completes the current instruction. The contents of the program counter are saved to allow the program to continue from the point at which it was interrupted. The state of the processor must also be saved. A processors state is defined by the flag bits of the condition code, plus other status information. A jump is then made to the location of the interrupt handling routine, which is executed like any other program. After this routine has been executed, a return from interrupt is made, the program counter restored, and the system status word returned to its pre-interrupt value.

2014 Cengage Learning Engineering. All Rights Reserved. 55 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.25 shows how a typical CISC responds to an interrupt request. Stack PSR indicates that the processor status register is pushed on the stack. The interrupt is transparent to the interrupted program and the processor is returned to the state it was in immediately before the interrupt took place.

2014 Cengage Learning Engineering. All Rights Reserved. 56 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Nonmaskable Interrupts An interrupt request may be denied or deferred. Some microprocessors have a nonmaskable interrupt request, NMI, that cant be deferred. A nonmaskable interrupt is reserved for events such as a loss of power. The NMI handler routine forces the processor to deal with the interrupt and to perform an orderly shutdown of the system, before the power drops below a critical level and the computer fails completely.

2014 Cengage Learning Engineering. All Rights Reserved. 57 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Prioritized Interrupts Microprocessors often support prioritized interrupts (i.e., the chip has more than one interrupt request input). Each interrupt has a predefined priority and a new interrupt with a priority lower than or equal to the current one cannot interrupt the processor until the current interrupt has been dealt with. Equally, an interrupt with a higher priority can interrupt the current interrupt.

2014 Cengage Learning Engineering. All Rights Reserved. 58 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Nested Interrupts Interrupts and other processor exceptions have all the characteristics of a subroutine, the return address is stacked at the beginning of the call and then restored once the subroutine has been executed to completion. The interrupt is a subroutine call with an automatic target address supplied in hardware or software and a mechanism that preserves the state of the condition code as well as the program counter. Just as subroutines can be nested, so can interrupts.

2014 Cengage Learning Engineering. All Rights Reserved. 59 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.26 demonstrates nested interrupts. A level 1 interrupt occurs a second time. A level 2 interrupt takes place before the level 1 interrupt handler has completed its task. The level 1 interrupt handler is interrupted and the level 2 interrupt processed. Once the level 2 interrupt has been dealt with, a return is made to the level 1 interrupt handler.

2014 Cengage Learning Engineering. All Rights Reserved. 61 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Vectored Interrupts When a processor with a single interrupt request line detects a request for service, it doesnt know which device made the request and cant begin to execute the appropriate interrupt handler until it has identified the source of the interrupt. A vectored interrupt solves the problem of identifying the source by forcing the requesting device to identify itself to the processor. Without vectored interrupts, the processor must examine each of the peripherals interrupt status bits. When the processor detects an interrupt request it broadcasts an interrupt acknowledge to all potential interrupters. Each possible interrupter detects the acknowledge from the CPU and the interrupting device returns a vector that is used by the CPU to invoke the appropriate interrupt handler.

2014 Cengage Learning Engineering. All Rights Reserved. 62 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.28 demonstrates the 68K prioritized, vectored interrupts. There are 7 levels of interrupt request. Level i is serviced in preference to level j, if i > j. The scheme permits nested interrupts. An interrupt at level i can be interrupted by a new interrupt at level j if j > i.

2014 Cengage Learning Engineering. All Rights Reserved. 63 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Direct Memory Access The most sophisticated means of dealing with IO uses direct memory access, DMA, in which data is transferred between a peripheral and memory without the active intervention of a processor. In effect, a dedicated processor performs the I/O transaction by taking control of the system buses and using them to move data directly between a peripheral and the memory. DMA offers a very efficient means of data transfer because the DMA logic is dedicated to I/O processing and a large quantity of data can be transferred in a burst; for example, 128 bytes of input.

2014 Cengage Learning Engineering. All Rights Reserved. 64 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.29 describes a system that uses DMA to transfer data to disks. A DMA controller, DMAC, controls access to the data bus. The DMA controller must first be loaded with the destination of the data in memory and the number of bytes to be transferred; that is, you have to program the DMA controller before it can be triggered.

2014 Cengage Learning Engineering. All Rights Reserved. 65 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Three bus switches control access to the data bus by the CPU, memory, and DMA controller. A bus switch is turned on or off to enable or disable the information path between the bus and the device interfaced to the bus switch. Normally, the CPU bus switch is closed and the DMAC and peripheral bus switches are open. The CPU transfers data between memory and itself by putting an address on the address bus and reading or writing data.

2014 Cengage Learning Engineering. All Rights Reserved. 66 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.30a illustrates the situation in which the CPU is controlling the buses and Figure 12.30b demonstrates how the DMA controller takes control of the data bus to perform the data transfer itself.

2014 Cengage Learning Engineering. All Rights Reserved. 67 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The Bus Bus is a contraction of the Latin omnibus that means for all. A behaves like a highway that is used by multiple devices. In a computer, all the devices that wish to communicate with each other use a bus. Figure 12.31 illustrates the organization of a computer with three buses.

2014 Cengage Learning Engineering. All Rights Reserved. 68 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The system bus of is made up of the address, data and control paths from the CPU. Memory and memory-mapped I/O devices are connected to this bus. Such a bus has to be able to operate at the speed of the fastest device connected to it. The system bus demonstrates that a one size fits all approach does not apply to computer design because it would be hopelessly cost-ineffective to interface low-cost, low-speed peripherals connected to a high speed bus.

2014 Cengage Learning Engineering. All Rights Reserved. 69 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements In systems with more than one CPU (or at least more than one device that can initiate data transfer actions like a CPU) the bus has to decide which of the devices that want to access the bus should be granted access to it. This mechanism is called arbitration and is a key feature of modern system buses. A device that can take control of the system bus is called a bus master, and a device that can only respond to a transaction initiated by a remote bus master is called a bus slave.

2014 Cengage Learning Engineering. All Rights Reserved. 70 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements In Figure 12.31, the CPU is a bus master and the memory system a bus slave. One of the I/O ports has been labeled bus master because it can control the bus (e.g., for DMA data transfers), whereas the other peripheral is labeled bus slave because it can respond only to read or write accesses. The connection between the disk drive and its controller is also labeled bus because it represents a specialized and highly dedicated example of the bus.

2014 Cengage Learning Engineering. All Rights Reserved. 71 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Bus Structures and Topologies A simple bus structure is illustrated by the CPU plus memory plus local bus in Figure 12.32. Only one device at a time can put data on the data bus. Data is transferred between CPU and memory or peripherals. The CPU is the permanent bus master and only the CPU can put data on the bus or invite memory/peripherals to supply data via the bus.

2014 Cengage Learning Engineering. All Rights Reserved. 72 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.33 illustrates a bus structure that employs two buses linked by an expansion interface. Each of these separate bus systems may have entirely different levels of functionality; one might be optimized for high-speed processor-to-memory transactions, and the other to support a large range of plug-in peripherals.

2014 Cengage Learning Engineering. All Rights Reserved. 73 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Bus Speed Suppose device A transmits data to device B. Lets go through the sequence of events that take place when device A initiates the data transfer at t = 0. Initially, A drives data onto the data bus at time t d, the delay between device A initiating the transfer and the data appearing on the bus. Data propagates along the bus at about 70% of the speed of light or about 1 ft/ns.

2014 Cengage Learning Engineering. All Rights Reserved. 74 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements When the data reaches B, it must be captured. Latches are specified by their setup and hold times. The data setup time, t s, is the time for which the data must be available at the input to system B for it to be recognized. The data hold, t h, time is the time for which the data must remain stable at system B s input after it has been captured.

2014 Cengage Learning Engineering. All Rights Reserved. 75 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The time taken for a data transfer, t T, is, therefore, t T = t d + t p + t s + t h. Inserting typical values for these parameters yields 4 + 1.5 + 2 + 0 = 7.5 ns, corresponding to a data transfer rate of 1/7.5 ns = 109/7.5 = 133.3 MHz. A 32-bit-wide bus can transfer data at a maximum rate of 533.2 MB/s. In practice, a data transfer requires time to initiate it, called the latency, t L. Taking latency into account gives a maximum data rate of 1/(t T + t L ). Higher data rates can be achieved with pipelining, by transmitting the next data element before system B has completed reading the previous element.

2014 Cengage Learning Engineering. All Rights Reserved. 76 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.35 demonstrates the application of pipelining to the previous example. Data must be stable at the input to system B for at least t s + t h seconds; then a new element may replace the previous element. Pipelining allows an ultimate data rate of 1/(t s + t h ).

2014 Cengage Learning Engineering. All Rights Reserved. 77 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The Address Bus Some systems have an explicit address bus that operates in parallel with the data bus. When the processor writes data to memory, an address is transmitted to the memory system at the same time the data is transmitted. Some systems combine address and data buses together into a single multiplexed bus that carries both addresses and data (albeit alternately).

2014 Cengage Learning Engineering. All Rights Reserved. 78 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.36 describes the multiplexed address/data bus which requires fewer signal paths and the connectors and sockets require fewer pins. Multiplexing addresses and data onto the same lines requires a multiplexer at one end of the transmission path and a demultiplexer at the other end. Multiplexed buses can be slower than non-multiplexed buses and are often used when cost is more important than speed.

2014 Cengage Learning Engineering. All Rights Reserved. 79 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The efficiency of both non-multiplexed and multiplexed address buses can be improved by operating in a burst mode in which a sequence of data elements is transmitted to consecutive memory addresses. Burst-mode operation is used to support cache memory systems. Figure 12.37 illustrates the concept of burst mode addressing where an address is transmitted for location i and data for locations i, i+1, i+2, and i+3 are transmitted without a further address.

2014 Cengage Learning Engineering. All Rights Reserved. 80 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The Control Bus The control bus regulates the flow of information on the bus. Figure 12.38 describes a simple 2-line synchronous control bus that uses a data-direction signal and a data validation signal. The data direction signal is R/W* and is high to indicate a CPU read operation and low to indicate a write operation.

2014 Cengage Learning Engineering. All Rights Reserved. 81 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Some systems have separate read and write strobes rather than a R/W* signal. Individual READ* and WRITE* signals indicate three states: an active read state, an active write state, and a bus free state (READ* and WRITE* both negated). A R/W* signal introduces ambiguity because when R/W* = 0 the bus is always executing a write operation, whereas when R/W* = 1 indicates a read operation or the bus is free. The active-low data valid signal, DAV*, is asserted by the bus master to indicate that a data transfer is taking place.

2014 Cengage Learning Engineering. All Rights Reserved. 82 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Lets look at an example of an asynchronous data transfer, a processor memory read cycle. Figure 12.39 provides the simplified read cycle timing diagram of a 68020 processor. The processor is controlled by a clock, CLK, and the minimum bus cycle takes six clock states labeled S0 to S5.

2014 Cengage Learning Engineering. All Rights Reserved. 83 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Arbitrating for the Bus In a system with several bus masters connected to a bus, a mechanism is needed to deal with simultaneous bus requests. The process by which requests are recognized and priority given to one of them is called arbitration. There are two approaches to dealing with multiple requests for a bus localized arbitration and distributed arbitration. In localized arbitration, an arbitration circuit receives requests from the contending bus masters and then decides which of them is to be given control of the bus. In a system with distributed arbitration, each of the masters takes part in the arbitration process and the system lacks a specific arbitereach master monitors the other masters and decides whether to continue competing for the bus or whether to give up and wait until later.

2014 Cengage Learning Engineering. All Rights Reserved. 84 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Localized Arbitration and the VMEbus The VMEbus supports several types of functional modules. We are interested in the bus master that controls the bus, the bus requester that requests the bus, and the arbiter that grants the bus to a would-be master. A bus requester is employed by a bus master when it wants to access the VMEbus. A VMEbus is usually housed in a box with a number of slots into which modules can be plugged (rather like the slots used by the PCI bus).

2014 Cengage Learning Engineering. All Rights Reserved. 85 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The VMEbuss arbitration bus is described in Figure 12.40. A bus requester uses BR0* to BR3* (bus request 0 to bus request 3) to indicate that the bus master wants the bus. Four bus grant lines are used by the arbiter to grant control of the bus to the requester. Bus clear (BCLR*) and bus busy (BBSY*) control the arbitration process.

2014 Cengage Learning Engineering. All Rights Reserved. 86 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The VMEbus arbiter is located in a special position on a VMEbus, slot 1. All bus request lines run the length of the VMEbus and any would-be master can place a request on one of these lines. The level of the request is user- determined; that is, the user decides which of the four bus request lines are to be connected to a modules request output.

2014 Cengage Learning Engineering. All Rights Reserved. 87 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The arbiter reads the bus request inputs from all the slots along the bus, decides which request is to be serviced, and then informs other modules of its decision via its bus grant outputs.

2014 Cengage Learning Engineering. All Rights Reserved. 88 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The VMEbus supports four levels of arbitration. We will soon see that each of these four levels can be further subdivided. The bus request lines run the length of the VMEbus and terminate at the arbiter in slot 1

2014 Cengage Learning Engineering. All Rights Reserved. 89 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements When one or more bus requesters wish to access the VMEbus, they assert the bus request lines to which they have been assigned; for example the card in slot three might assert bus request line BR1* and the card in slot 5 might assert bus request line BR3*. The arbiter in slot decides whether one of them is to succeed.

2014 Cengage Learning Engineering. All Rights Reserved. 90 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements If a request on, say, BR2* is successful, the arbiter sends a bus grant message on its level 2 bus grant output, BG2OUT*. We will write BGxIN*, BGxOUT* and BRx* where x is 0 to 3 to avoid referring to specific levels.

2014 Cengage Learning Engineering. All Rights Reserved. 91 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The BGx* lines do not run along the entire length of the bus. Instead the VMEbus employs a chain of lines called bus grant out and bus grant in. The BGxIN * and the BGxOUT run from slot to slot rather than from end to end. A BGxOUT * line from a left-hand module is passed out on its right as a BGxIN * line. Therefore, the BGxOUT * of one module is connected to the BGxIN * of its right-hand neighbor. The arrangement is called daisy chaining. A continuous bus line transmits a signal in both directions to all devices connected to it. The daisy chained line is unidirectional, transmitting a signal from one specific end to the other. Each module connected to (i.e., receiving from and transmitting to) a daisy chained line may either pass a signal on down the line or inject a signal of its own onto the line.

2014 Cengage Learning Engineering. All Rights Reserved. 92 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements In Figure 12.42 the requester in slot j requests the bus at level 1 when no other device is requesting the bus. When BR1* is asserted, the arbiter detects it and asserts BG1OUT*, which passes down the bus until it reaches slot j. The arbiter in slot 1 sends a bus grant input to the card in slot 2. The card in slot 2 takes this bus grant input and passes it on as a bus grant output to the card in slot 3, and so on..

2014 Cengage Learning Engineering. All Rights Reserved. 93 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Each card receives a bus grant input from its left hand neighbor and may or may not pass it on as a bus grant output to its right hand neighbor. A card might choose to terminate the daisy chain signal-passing sequence and not transmit a bus grant signal to its right hand neighbor. If a slot is empty, bus jumpers (i.e., links) must be provided to route the appropriate BGxIN* signals to the corresponding BGxOUT* terminals.

2014 Cengage Learning Engineering. All Rights Reserved. 94 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements A requester module makes a bid for control of the system data transfer bus by asserting one of the bus request lines, BR0* to BR3*. Only one line is asserted and the actual line is chosen by assigning a given priority to the requester. This priority may be assigned by on board user selectable jumpers or dynamically by software.

2014 Cengage Learning Engineering. All Rights Reserved. 95 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The arbiter in slot 1 asserts a BGxOUT* line, and a bus grant propagates down the daisy chain. Each BGxOUT* arrives at the BGxIN* of the next module. If that module doesnt want the bus, it passes on the request on its BGxOUT*. If the module requested the bus, it takes control of the bus. Daisy chaining provides automatic prioritization, because bus requesters nearer the arbiter win the arbitrationthis is called geographic prioritization.

2014 Cengage Learning Engineering. All Rights Reserved. 96 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.43 provides a protocol flowchart for VMEbus arbitration. Initially, a bus master in slot M at a priority less than i is in control of the bus. This current bus master asserts the bus busy signal, BBSY*, that runs the length of the bus. As long as any master is asserting BBSY* no other master may attempt to gain control of the VMEbus. An active bus master in a VMEbus cannot be forced off the bus.

2014 Cengage Learning Engineering. All Rights Reserved. 98 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Suppose a bus requester in slot N requests the bus at a priority higher level than the current master. The arbiter detects the new higher level and asserts its bus clear output which informs the current master that another higher priority device wishes to access the bus,

2014 Cengage Learning Engineering. All Rights Reserved. 99 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The current master does not have to relinquish the bus within a prescribed time limit. Typically, it will release the bus at the first convenient instant by negating BBSY*. The VMEbus provides both geographic prioritization determined by a slots location and an optional prioritization by bus request.

2014 Cengage Learning Engineering. All Rights Reserved. 100 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements BCLR* is driven only by arbiters that permanently assign fixed priorities to the bus request lines. Other arbitration mechanisms, such as the round robin arbitration scheme to be described later, have no fixed priority and the arbiter does not make use of the bus clear line. When the arbiter detects that the current master has released the bus, the arbiter asserts BGiOUT* to indicate to the requester at level i that it has gained control of the bus. The arbiter knows only the level of the request and not which slot it came from. The bus grant message ripples along the bus, entering each module as BGiIN* and leaving as BGiOUT*. When this message reaches the requester in slot N that made the request at level i, the message is not passed on. Instead, the requester asserts BBSY* to show that it now has control of the bus. What would have happened if a requester also at level i but located nearer to the arbiter than slot N had also requested the bus at approximately the same time? The answer is that the requester closer to the arbiter would have received the bus grant first and have taken control of the bus..

2014 Cengage Learning Engineering. All Rights Reserved. 101 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Releasing the Bus The requester may implement one of two options for releasing the bus; option RWD, release when done, and option ROR, release on request. Option RWD requires the requester to release the bus as soon as the on board master stops indicating bus busy; that is, the master remains in control of the bus until its task has been completed, which can lead to undue bus hogging. The ROR option is more suitable in systems in which it is unreasonable to grant unlimited bus access to a master. The ROR requester monitors the four bus request lines. If it sees that another requester has requested service, it releases its BBSY* output and defers to the other request. The ROR option also reduces the number of arbitrations requested by a master, as the bus is frequently cleared voluntarily..

2014 Cengage Learning Engineering. All Rights Reserved. 102 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The Arbitration Process Figure 12.44 demonstrates what happens when two requesters at different levels of priority request the bus. Both requesters A and B assert their bus request outputs simultaneously. Assuming that the arbiter detects BR1* and BR2* low, the arbiter asserts BG2IN* on slot 1, because BR2* has a higher priority. When the bus grant has propagated down the daisy chain to requester B, requester B will respond to BG2IN* by asserting BBSY*. Requester B then releases BR2* and informs its own master that the VMEbus is now available.

2014 Cengage Learning Engineering. All Rights Reserved. 104 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements VMEbus Arbitration Algorithms Three strategies that the arbiter in slot 1 can be used prioritizate the bus request. 1.Option RRS (round robin select) The RRS option assigns priority to the masters on a rotating basis. Each of the four levels of bus request has a turn at being the highest level. 2.Option PRI (prioritized) The PRI option assigns a level of priority to each of the bus request lines from BR3* (highest) to BR0*. 3.Single level (SGL) The SGL option provides a minimal arbitration facility using bus request line BR3* only. The priority of individual modules is determined by daisy chaining, so that the module next to the arbiter module in Slot 1 of the VMEbus rack has the highest priority. As the position of a module moves further away from the arbiter, its priority reduces.

2014 Cengage Learning Engineering. All Rights Reserved. 105 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Distributed Arbitration Not all buses use a centralized arbiter to decide which of the competing bus masters is to get control of the bus. A mechanism called distributed arbitration allows arbitration to take place simultaneously at all slots along the bus. We now describe a backplane bus that supports distributed arbitration, the NuBus, a general-purpose synchronous backplane bus with multiplexed address and data lines that is also known as ANSI/IEEE STD 1186-1988. It was conceived at MIT in 1970 and later supported by Western Digital and Texas Instruments (1983). Apple implemented a subset of NuBus in their Macintosh II.

2014 Cengage Learning Engineering. All Rights Reserved. 106 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The key to NuBus arbitration is each modules unique slot number that ranges from 0 16 to F 16. When a card in a slot arbitrates for the bus, the card places its slot number on the bus and, as if by magic, any other requester with a lower slot number strops arbitrating for the bus. Equally, if a slot with a higher number wants the bus, the requesting slot stops requesting the bus; that is, if a card arbitrates for the bus and then finds that a card with a higher priority is also arbitrating for the bus, it backs off.

2014 Cengage Learning Engineering. All Rights Reserved. 107 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements To appreciate how distributed arbitration works, you have to understand the open-collector gate. Historically, the open-collector gate precedes the tristate gate and is used to allow more than one device to drive the same bus. Figure 12.45 illustrates an inverter with an open-collector output. The gates output can be actively forced to a low voltage. When the input of the gate is 1, the internal transistor switch is closed and its output is forced low just like a normal inverter.

2014 Cengage Learning Engineering. All Rights Reserved. 108 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements When the input is 0, the transistor switch is open and the output of the open-collector gate is left floating because it is internally disconnected from the high- or low-level power rails. That is, the open-collector gate has an active-low output state and a floating state and can pull a bus down into a low state, but it cant pull the bus up into a high state.

2014 Cengage Learning Engineering. All Rights Reserved. 109 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.46 illustrates the key circuit used in a distributed arbiter that has an input X and an output Y. The circuit is also connected to one of the arbitration control lines on the bus. In what follows, we are interested in the relationship between the circuit and the state of the bus. If you use Boolean algebra, you will see that output Y is 0 for any value of input X. This is not the whole story

2014 Cengage Learning Engineering. All Rights Reserved. 110 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Suppose the X input is 0 and that the level on the bus is low because another device is driving it low. In this case, the output of the open-collector inverter will also be forced low by the bus. Now, both inputs to the AND gate will be 0 and the Y output will be 1. That is, the Y output is 0 unless the input X is 1 and the bus is being driven low. We have a mechanism that can actively drive the bus low or detect when another device is driving the bus low when we are attempting to drive it high. This mechanism forms the basis of distributed arbitration.

2014 Cengage Learning Engineering. All Rights Reserved. 111 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.47 shows how the distributed arbiter operates by considering all possible input conditions together with the state of bus line. Remember that the bus can be floating (not driven) or actively pulled down to a low level. When it is floating, a resistor weakly pulls the bus up to a high level.

2014 Cengage Learning Engineering. All Rights Reserved. 112 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figures 12.47(a) and (b) assume that the bus is floating. The output of the circuit is always 0 and is independent of its input. In a real system, the bus will always be actively pulled down to an electrically low level or weakly pulled up to an electrically high level by a resistor..

2014 Cengage Learning Engineering. All Rights Reserved. 113 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements In figures 12.47(c) and (d) the bus is being actively driven to 0. In Figure 12.47(c) the bus is actively being driven low, but the state of the open- collector is also low, so there is no conflict between the output of the open- collector inverter and the bus. The output of the circuit is 0..

2014 Cengage Learning Engineering. All Rights Reserved. 114 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements In Figure 12.47(d) the input is 0 and the output of the open-collector gate is floating. The is low and the output of the inverter is pulled down to an electrically low state. The output of the circuit is 1. The output tells the system that another device is driving the bus low in contradiction to the input.

2014 Cengage Learning Engineering. All Rights Reserved. 115 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements In figures 12.47(e) and f the bus is in a high state because no other device is driving it low. Figure 12.47(e) is the interesting case. Here the input is 1 and the output of the open-collector gate is electrically low. This drives the bus to a low state. In this case the circuit is driving the bus. The output of the circuit is 0. In Figure 12.47(e) the input is 0 and the output of the inverter is floating so there is no conflict with the state of the bus..

2014 Cengage Learning Engineering. All Rights Reserved. 116 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements As you can see, there are two special cases. In one, the bus is active-low and the output of the inverter is high, which results in the inverters output being pulled down. In the other case, the bus is high and the output of the inverter is active- low, which results in the bus being forced low. Table 12.4 summarizes the action of this circuit. The input to this circuit represents the condition I want the bus or I dont want the bus. If the bus is not being driven low, this circuit will drive the bus low itself if its input is 1. This circuit produces a 0 output unless its input is 1 and the bus is being actively driven low by some other device. SituationBus conditionResult I want the bus Bus free (high level) Output is 1. Get the bus and drive it low I want the bus Bus busy (low level) Output is 0. I do not get the bus I do not want the bus Dont careOutput is 0

2014 Cengage Learning Engineering. All Rights Reserved. 117 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.48 illustrates the details of part of a NuBus. A potential master that wants to use the bus places its arbitration level on the 4-bit arbitration bus, ID3* to ID0*. Since NuBus uses negative logic, the arbitration number is inverted so that the highest level of priority is 0000 and the least is 1111. NuBus arbitration is simple. If a competing master sees a higher level on the bus than its own level, it ceases to compete for the bus. Each requester simultaneously drives the arbitration bus and observes the signal on the bus. If it detects the presence of a requester at a higher level, it backs off.

2014 Cengage Learning Engineering. All Rights Reserved. 118 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements ID0* to ID3* define the slot location and priority level of the master, and lines ARB0* to ARB3* are the arbitration lines running the length of the bus. Arbitrate * permits the master to arbitrate for the bus, and the output GRANT is asserted if the master wins the arbitration.

2014 Cengage Learning Engineering. All Rights Reserved. 119 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Suppose three masters numbered 0100 4, 5, and 6 put the codes 1011, 1010 and 1101, respectively, onto the arbitration bus. As the arbitration lines are open-collector, any output at a 0 level will pull the bus down to 0 Here, the bus will be forced to 1000. The master at level 2 putting1101 on the bus will detect that ARB2 is being pulled down and leave the arbitrating process. The arbitration bus will now be 1010. The master with the code 1011 will detect that ARB1 is being pulled down and will leave the arbitration process. The value on the arbitration bus is now 1010 and the master with that value has gained control.

2014 Cengage Learning Engineering. All Rights Reserved. 120 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements PCI Bus The Peripheral Component Interconnect Local Bus (or just PCI bus) represents a radical change to the PCs systems architecture. Intel designed this bus for use in Pentium-based systems towards the end of 1993. The PCI bus is not only much faster than previous buses; it greatly extends the functionality of the PC architecture. Indeed, the PCI bus is central to the PC's expandability and flexibility. The PCI bus allows users to plug cards into the computer system to increase functionality by adding modems, SCSI interfaces, video processors, sound cards, and so on. The PCI bus lets these cards communicate with the CPU via an interface known as a North Bridge. Bus interface circuits have come to be known collectively as a chipset. All PCs with PCI buses require such a chipset.

2014 Cengage Learning Engineering. All Rights Reserved. 121 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The PCI is called a local bus to contrast it with the address, data, and control signals from the CPU itself. Connecting systems directly to the CPU provides the fastest data transfer rates and a bus connected directly to a CPU is called a front side bus. The PCI bus supports plug and play capabilities in which PCI plug-in cards are automatically configured at power up and resources such as interrupt requests are assigned to plug and play cards transparently to the user. The original PCI bus operated at 33 MHz and supported a 32-bit and 64- bit data bus. PCI bus Version 2.1 supports a 66 MHz clock. The PCI bus is connected to the PC system by means of a single-chip PCI Bridge and to other buses via a second bridge. This arrangement means that a PC with a PCI bus can still support the older ISA bus. As time passes, fewer and fewer new PCs will have ISA buses because new users will demand PCI cards as they are better than ISA cards.

2014 Cengage Learning Engineering. All Rights Reserved. 122 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.49 illustrates the relationship between the PCI bus, the bridge, processor, memory and peripherals. The processor is directly connected to a bridge circuit that allows the processor to access peripherals via the PCI bus. The PCI system consists of the PCI local bus itself, any cards plugged into the bus, and central resources that control the PCI bus. These central resources perform, for example, arbitration between the cards plugged into the bus.

2014 Cengage Learning Engineering. All Rights Reserved. 123 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.50 shows a system diagram of a PC with a PCI local bus and an ISA bus. A second bridge, commonly called the South Bridge, links the PCI and ISA buses.

2014 Cengage Learning Engineering. All Rights Reserved. 124 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.51(a) illustrates the relationship between the Pentium 4, its Intel chipset, and the PCI bus. Figure 12.51(b) illustrates the more modern Intel Core i7 Processor interface.

2014 Cengage Learning Engineering. All Rights Reserved. 125 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements PCI bus arbitration Figure 12.52 demonstrates PCI bus arbitration. The REQ and GNT signals are connected to an arbiter that forms part of the north bridge. This arbiter reads the requests on REQ0 to REQ3 and returns a grant message on the GNT0 to GNT3 line corresponding to the arbitration winner. When a PCI agent arbitrates for the bus, the arbiter asserts the BPRI signal to inform the host processor that a PCI agent (i.e., a priority agent) requires the host bus.

2014 Cengage Learning Engineering. All Rights Reserved. 126 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Data Transactions on the PCI Bus The PCI bus compensates for the address/data bus bottleneck in several ways. First, it can operate in a burst mode, in which a single address is transmitted and then the address/data bus is used to transmit a sequence of consecutive data values. Second, the PCI bus supports split transactions ; that is, one device can use the bus and another device can access the PCI bus before the first transaction has been completed. Split transactions mean that the bus is used more efficiently. Finally, devices connected to the PCI bus can be buffered which allows data to be transmitted before it is needed. PCI bus literature has its own terminology (some of which is shared by SCSI systems). A device that acts as a bus master is called an initiator and a device that responds to a bus master is called a target

2014 Cengage Learning Engineering. All Rights Reserved. 127 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements SignalFunctionDriven by AD31 AD0 Multiplexed address and dataInitiator C/BE3* C/BE0* Command/byte enableInitiator TRDY* Target readyTarget IRDY* Initiator readyInitiator FRAME* FrameInitiator DEVSEL* Device selectTarget Some of the key signals of the PCI bus are given below.

2014 Cengage Learning Engineering. All Rights Reserved. 128 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.53 illustrates a PCI read cycle in which an initiator reads data from a target on the PCI bus.

2014 Cengage Learning Engineering. All Rights Reserved. 129 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.54 illustrates a PCI read cycle in which the address phase is followed by three data phases.

2014 Cengage Learning Engineering. All Rights Reserved. 130 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The PCI Express Bus The PCI Express bus was designed to replace the PCI bus. Its goals were to cost less than the existing PCI bus, use off-the-shelf technology (boards, connectors, and circuits), support mobile, desktop and server markets, and be compatible with existing PCI-based systems The PCI express uses serial transmission to transfer data from point to point.

2014 Cengage Learning Engineering. All Rights Reserved. 131 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.55 demonstrates the difference between the PCI bus and PCI Express protocols. The PCI bus protocol has echoes of the ISO standard for the Open Systems Interconnection (OSI) model, that attempts to divide any communications system into seven abstract layers, where each layer performs a certain function for the layer above it.

2014 Cengage Learning Engineering. All Rights Reserved. 132 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The lowest level of the PCI Express protocol is the physical layer responsible for transferring the bits from point-to-point. The PCI Express uses a serial bus where data is transmitted bit-by-bit along a single line or along a pair of lines using differential encoding. Two serial data paths are provided, one for each data direction; that is, a PCI Express card can both read and write data to the bus simultaneously and support full-duplex operation. The two signal paths are collectively called a lane and it is possible to implement multiple lanes. Performance scales linearly with lane numbers and you can have a x1 bus, a x2, bus, a x4 bus, a x8 bus, and so on. A single lane supports a peak data rate of 250 MB/s in each direction. A 16- lane system using duplex transmission has an effective data rate of 8 GB/s.

2014 Cengage Learning Engineering. All Rights Reserved. 134 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Conventionally, information at the electrical level in digital systems is specified with respect to the ground or chassis; that is, a signal at greater than 3.0V is interpreted as high, and a signal at less than 0.3 V is interpreted as low. PCI Express uses two signal paths to transmit data and the difference between the two conductors contains the information; for example, the signals may be +V,-V or -V, +V. The advantage of differential transmission is that it is more immune to interference (noise and other signals induced by capacitive or inductive coupling). This form of signaling is called LVDS low voltage differential signaling. If both conductors of a pair pick up interference it does not affect the information, which is determined by the difference between the two conductors.

2014 Cengage Learning Engineering. All Rights Reserved. 135 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The encoding of the bit stream across the serial link ensures that a clock signal is embedded in the data stream and the data stream can be used to recover a clock signal This means that designers do not have to worry about the distribution of clock signals and delays between data and clocks caused by different path lengths in the signals (an important factor when signaling at 2.5 x 10 30 bits/s). The bit encoding is called 8b/10b because each 8-bit byte is transmitted at 10 bits in order to equalize the number of 1s and 0s transmitted and to ensure that a clock signal can be recovered from the data signal.

2014 Cengage Learning Engineering. All Rights Reserved. 136 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements 8b/10b Encoding 8b/10b encoding is a means of transmitting serial data using 10 bits to carry 8 bits of information. The additional two bits per byte improves the performance of the transmission mechanism. The ten-bit code is constrained to contain five 1s and five 0s, or four 1s and six 0s, or six 1s and four 0s. This ensures that there are no long series of only 1s and 0s. A mechanism called running disparity is used to ensure that there is an equal number of 1s and 0s on average; this is necessary to ensure that there is no dc component in the signal.

2014 Cengage Learning Engineering. All Rights Reserved. 137 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements PCIe Data Link Layer Data transmitted across systems that support layered protocols looks a bit like a Russian doll with multiple layers of encapsulations. At one end of a link, the application takes a dollop of data and wraps it up with some form of ends or delimiters. Then, the application layer hands the package to another layer (e.g., the data link layer) and that layer in turn wraps up the data with its own terminators. The data link layer passes the data to the physical layer and that too adds beginning and end flags.

2014 Cengage Learning Engineering. All Rights Reserved. 138 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.57 illustrates the concept of encapsulation using a system with three protocol levels or layers. Each protocol layer adds a header and a tail to the information passed from the layer below. Each layer strips the header and tail off before passing the message to the next level.

2014 Cengage Learning Engineering. All Rights Reserved. 139 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.58 illustrates the PCIe bus message structure where the elements of a message are shown in blue and the protocol layers in grey. The highest level is the transaction layer that consists of a header and the actual message itself. The header defines the nature of the data message and includes information such as the address of the data we will look at the header in more detail later. The transaction layers tail is an error-detecting code, ECRC

2014 Cengage Learning Engineering. All Rights Reserved. 140 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.59 gives the general structure of a packet header that consists of 12 or 16 bytes. This structure means that all the hardware overhead associated with conventional buses becomes redundant (arbitration, interrupt, handshaking etc.) at the price of increased latency and reduced efficiency due to the data overhead.

2014 Cengage Learning Engineering. All Rights Reserved. 141 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The SCSI and SAS Interfaces One of the earliest external buses designed to link a computer and peripherals is the SCSI bus. At one time it was the preferred bus in professional and high-end systems. Today, it is in decline in the face of very low-cost high-performance buses such as USB and FireWire. The Small Computer System Interface, SCSI, is an 8-bit parallel bus dating back to 1979, when the disk manufacturer Shugart was looking for a universal interface for its family of hard disks. The SCSI bus is a parallel data bus that incorporates an information exchange protocol optimized for the buss intended use, the linking of disk drives and other storage systems to a host computer

2014 Cengage Learning Engineering. All Rights Reserved. 142 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements Figure 12.61 illustrates the concept of the SCSI bus which was originally called the SASI bus (Shugart Associates Systems Interface). In 1981 Shugart and NCR worked with ANSI to standardize the SCSI bus which became X3.131-1986 in 1986.

2014 Cengage Learning Engineering. All Rights Reserved. 143 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The original SCSI-1 bus operated at 5 MHz permitting up to seven peripherals to be connected together. A family of SCSI buses with a common architecture and different levels of performance has been developed. The specification was revised in 1991 providing a fast SCSI-2 bus at 10 MHz and a wide bus with a 16- data path. Ultra SCSI or SCSI 3 was the next step with a clock rate of 20 MHz. All SCSI systems support asynchronous data transfers, but SCSI 2 also supports faster synchronous data transfers. USB 3.0 bus provides a theoretical limit of 4.8 Gbps or 600 MB/s.

2014 Cengage Learning Engineering. All Rights Reserved. 144 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements The original SCSI-1 bus operated at 5 MHz permitting up to seven peripherals to be connected together. A family of SCSI buses with a common architecture and different levels of performance has been developed. The specification was revised in 1991 providing a fast SCSI-2 bus at 10 MHz and a wide bus with a 16- data path. Ultra SCSI or SCSI 3 was the next step with a clock rate of 20 MHz. All SCSI systems support asynchronous data transfers, but SCSI 2 also supports faster synchronous data transfers. USB 3.0 bus provides a theoretical limit of 4.8 Gbps or 600 MB/s.

2014 Cengage Learning Engineering. All Rights Reserved. 145 Computer Organization and Architecture: Themes and Variations, 1 st Edition Clements VersionWidthData rate MHzThroughput MB/s SCSI-1 855 Fast SCSI 810 Fast Wide SCSI 161020 Ultra SCSI 820 Wide Ultra SCSI 162040 Ultra-2 SCSI 840 Wide Ultra-2 SCSI 164080 Ultra-3 SCSI 1680160 Ultra 320 SCSI 16160320 Ultra 640 16320640

Documents

C HAPTER 12 Computer Organization and Architecture © 2014 Cengage Learning Engineering. All Rights Reserved. 1 Computer Organization and Architecture: