The Input/Output SubsystemCmpt 250 Input/Output April 1, 2008 The Input/Output Subsystem So far, we’ve discussed the processor and the memory hierarchy, and we’ve looked at how

Cmpt 250 Input/Output April 1, 2008

The Input/Output Subsystem

❅ So far, we’ve discussed the processor and the memory hierarchy, and we’velooked at how the two interact. Exclude, for a moment, the secondary mem-ory that lies at the bottom of the memory hierarchy.

❆ The time scale is uniformly fast, with transaction times ranging fromsubnanosecond to a few hundred nanonseconds.

❆ Memory is random access. The time required for transactions is pre-dictable, and we’re dealing with one transaction model.

❆ Interaction between the memory and the CPU is highly structured, oc-curring in a very predictable way as the CPU fetches and executes in-structions. It’s also synchronised to the CPU clock.

❆ The CPU and the top levels of the memory hierarchy are very tightlyintegrated. As a user, you cannot swap out one manufacturer’s L2 cacheand replace it with another.

❆ The distances involved are small. L1 and L2 cache are often fabricatedon the same IC as the CPU. At worst, they may be a separate IC withinthe same hybrid package. Memory is a few inches away, somewhere onthe motherboard.

❅ Input-output, on the other hand, is very different.

❆ The data rate varies from gigabits/second (network interfaces, disks)to seconds/bit (mouse or keyboard events). Meeting the bandwidth re-quirements is challenging for high-speed i/o devices.

❆ The time required for a transaction is not predictable — most i/o devicesare not random-access, so the time required will vary depending onthe data involved and the response time of the device. Worse, humaninteraction may be involved.

❆ The occurrence of an i/o event is not predictable. When it occurs, theCPU may need to respond very quickly (milli- or microseconds). Re-sponse time (latency) becomes an issue, in addition to throughput.

❆ Some i/o events are just not as important as others. We may well wantto postpone handling an i/o event, or suspend the handling of one eventwhile we deal with another that is more important and/or requires afaster response.

❆ Distance is greater, and the variation is greater. Some peripherals willbe within a few inches of the CPU, others may be a few feet away.

1

Cmpt 250 Interfaces April 1, 2008

❆ There’s an enormous variation in available i/o devices, and consumersexpect to be able to change the number and configuration of i/o deviceswith ease.

❅ As we’ll see, this requires some new capabilities and approaches.

❆ To get the attention of the CPU, we’ll introduce the idea of an interrupt(more generally, an exception). This will allow us to force the CPU tosuspend execution of the current instruction stream and divert to an-other instruction stream to deal with the interrupt. We’ll also see howthis can be made invisible to the interrupted process.

❆ To move data between i/o devices and the CPU and memory, we’ll beusing busses.

❆ To deal with the nearly infinite variety of access models embodied in i/odevices, we’ll use a (relatively) limited number of communication con-ventions called bus protocols, embedded in bus standards (PCI, USB,etc.). Manufacturers package their i/o devices with standard interfacesthat conform to some bus standard.

❆ Because of the range of distances involved, and the differing speeds ofattached devices, bus protocols will often use transaction models thatrequire explicit acknowledgement by each party. There will typically beprovision to vary (usually, extend) a transaction in order to allow oneparty time to respond.

❆ To accommodate the huge variation in data rates, some bus standardsare designed to support low-speed devices, others high-speed devices.One class of interface is a device which acts as a time-division multi-plexer, connecting several slow busses to a high-speed bus.

Interfaces

❅ At its most basic, the role of an interface is to transform the raw interfaceprovided by the i/o device into an interface that conforms to the conventionsof digital circuits.

❆ Translation between analog signal levels and digital 1’s and 0’s is anearly universal function in interfaces.

I/O devices are a mixture of digital and analog electronic components,optical components, and mechanical components.

For example, a request to read a disk block must be translated into asequence of analog signals that will cause motors to properly positionthe disk heads. Once the data begins to travel past the heads, the

2


analog signal generated by sensing the magnetic fields on the disk mustbe translated into 1’s and 0’s for use by the CPU.

❆ Another nearly universal function is the provision of buffers for datatransfer.

The basic function is to change the blocking of the data, but that’s notthe important idea. By providing buffers for data transfer, the interfacecan reduce the amount of work the CPU must perform to transfer data.Buffers also give the CPU more freedom to choose when to respond toan i/o event.

For example, many network protocols transmit data bit by bit over asingle wire. If a Gigabit Ethernet interface required the CPU to executeinstructions to move each bit of data from the interface to memory, noCPU could keep up. Instead, the interface provides a fairly large databuffer. The CPU executes instructions to load data into this buffer andthen executes a few more instructions to tell the interface to transmitthe data. Dedicated logic within the interface processes the block of dataone bit at a time.

❅ Here’s a block diagram for a generic interface.

digital databuffer

signalconversioncircuitry

controlregister

statusregister



controllogic

select

r/w

request

ready

interfaceregister

com

pute

r syste

m

devic

e

interface

❆ On the device side, there will be connections for data, control, and statussignals. Inside the interface, there will be digital logic — individualregisters, or larger memory arrays — to hold the data. Signal conversioncircuitry handles blocking (parallel/serial conversion, for example) andconversion to/from the signalling conventions used by the device.

3


Control can be separated into two distinct groups: control signals whichinstruct the device to perform some action (via the control register) andcontrol signals which are used to control the movement of data, control,and status information between the interface and the device (the signalscoming directly from the control logic).

The directions shown (bidirectional for data, output for control, input forstatus) are common. A pair of unidirectional connections for data inputand output is also possible.

❆ On the computer system side, there is a bidirectional connection fordata, with a buffer register. There are also connections for control andstatus, but they’re labelled a bit differently.

Control is exercised by using the select and r/w lines to notify the inter-face that the computer system wishes to read or write information. Theselect lines also tell the interface which of the internal registers (databuffer, control, or status) should be selected for the transfer.

The request and ready lines communicate interface status to the com-puter system. Request is used to request service from the computersystem. Ready is used to inform the computer system that the inter-face has acted on a request from the computer system (data is ready,for a read request; data has been accepted, for a write request) and isprepared to accept a new request.

❆ Inside the interface, we have the circuitry necessary to move digital databetween the computer system and the device. There is buffering on thecomputer system side (the interface register) and on the device side (thedata buffer and control and status registers). There will be an intercon-nection structure (a bus is shown, but point-to-point interconnectionsand multiplexers will work just as well). In addition, there will be con-trol logic to respond to control signals from the computer system, togenerate status signals, and to control the movement of data within theinterface.

❅ How does this work? What is the generic sequence of events to transfer databetween the computer system and the device?

❆ To read data, the computer system will select the interface and askfor the contents of the status register. The interface will transfer thecontents of the status register to the interface register and signal to thecomputer system that the information is ready.

The computer system will read the information from the interface regis-ter and examine it. If the status indicates that the device has data ready,the computer system will select the interface and ask it to write to the

4

Cmpt 250 Data Transfer Primitives April 1, 2008

control register. The data transferred from the computer system to theinterface will be the proper control word to cause the device to transferdata to the data buffer.

The interface control logic will coordinate transfer of the control word tothe device, and receipt of the data from the device.

When the data is available, the interface will use request and ready tonotify the computer system. The computer system will then select theinterface and ask for the content of the data buffer. The interface willmove data into the interface register, signal that the data is ready, andit will be read by the computer system.

❆ To write data, the sequence is much the same: The computer systemwill check the status of the device, issue the appropriate control word,and then send data to the interface for transfer to the device.

❅ Typically, an interface is not connected directly to a computer system. In-stead, multiple interfaces are connected to a bus.

interface

device

data

address

control

❆ The bus provides data, address (se-lect) and control lines that are con-nected to all interfaces.

❆ To communicate with an interface,the computer system sets the ad-dress lines to the proper value toselect the interface, sets the controlsignals for read or write, and thentransfers data using the data lines.

❆ Clearly, in order for this system towork correctly, we need to set thingsup so that at most one interfacerecognises any given address. Alloutputs from an interface must beequipped with tristate buffers so thatonly the selected interface attemptsto assert a value on a given wire inthe bus.

Data Transfer Primitives

Let’s get down to details now, and examine some of the primitive operationsinvolved in transferring data over a bus.

❅ The simplest sort of transaction uses a strobe signal to pace the exchange

5


of data. To keep the explanation simple, let’s assume that we have only twointerfaces. They are connected by a set of bidirectional data lines and twocontrol signals, strobe and r/w.

❅ It’s useful to take a moment and define the roles that an interface can playin an exchange of data.

❆ One way to characterise the role of an interface is as the source ordestination of the transfer. This role is determined by the direction ofdata transfer.

❆ Another way to characterise the role of an interface is as the initiator(master) or responder (slave) in a transaction.

❆ These are independent of one another: When the master specifies aread, the slave is the source and the master is the destination. Whenthe master specifies a write, the master is the source and the slave isthe destination.

Mano sort of obscures this with the notion of ‘destination-initiated transfer’(commonly called a read) and ‘source-initiated transfer’ (commonly called awrite).

❅ So, how do we use the data, strobe, and r/w signals to transfer data? Let’slook at a read first.

sourcedestination

data

strobemaster slave

r/w

data from slavedata

strobe

r/w

In a read operation, the master is the destination and the slave supplies thedata. The data lines start in an undefined state, with the (tristate) driversfor both the master and slave in the hi-impedance state. The r/w signal alsostarts in an unknown state, simply because we don’t know how the masterhandles it when no transfer is in progress.

6


The master begins the transfer by setting the r/w signal to 1 (read) andasserting the strobe signal.

The slave sees strobe↑ and responds by placing data onto the data lines.

The master waits for a while, latches the data, and then drops strobe to tellthe slave that it’s latched the data. When the master drops strobe, r/w nolonger needs to be valid and can return to an unknown state.

The slave responds to strobe↓ by removing the data from the data lines (byplacing its output drivers in the hi-impedance state).

At this point, the two interfaces are ready to begin another transaction.

❅ And now a write.

source destination

data

strobemaster slave

r/w

data from masterdata

strobe

r/w

In a write operation, the master is the source and must supply the data.

The master begins the transaction by asserting the data onto the data lines,setting r/w to 0 (write), and asserting strobe.

The slave sees strobe↑ and latches the data from the data lines.

After an appropriate amount of time, the master drops strobe. The diagramshows r/w and data returning to their unknown states.

❅ Notice the subtle difference in timing between the read and write operations.

❆ For a read, the master must ensure that r/w is valid while strobe isasserted. Otherwise, the slave could perform the wrong operation.

The data lines become valid only when the slave responds.

❆ For a write, the master must ensure that r/w and data are both validwhile strobe is asserted. Otherwise, the slave could perform the wrongoperation, or receive incorrect data.

7


❆ In the more general case where the master must select some interfaceas the slave for the transaction, the address (select) lines must also bevalid before strobe is asserted.

❅ Strobed data transfer, as just described, has one glaring fault: There’s nofeedback from the slave to the master. The master is simply assuming thatthe slave has done its part.

❅ A common technique for providing the missing feedback is called the two-line

handshake. It looks like this:

req(master)

rply(slave)

There’s an interlocking pattern. To start the transaction, the master assertsreq, and the slave responds by asserting rply. The transaction ends with asimilar interlock: req↓, followed by rply↓.

❅ Let’s see how this works in the context of read and write operations for oursimple pair of interfaces. First, a read operation.

sourcedestinationdata

rplymaster slave

r/w

req

data from slavedata

r/w

req(master)

rply(slave)

The master initiates the transaction by setting r/w to indicate a read oper-ation and then asserting req.

The slave responds by placing data onto the data lines. To indicate to thethe master that data is available, the slave asserts the rply signal. This givesa positive indication that the data lines are valid.

8


When the master sees rply↑, it knows that the data lines are valid and itcan latch the data. When this is complete, the master drops req; the r/w

control line can return to an unknown state.

When the slave sees req↓, it has a positive indication that the master hasreceived the data. The slave responds by returning the drivers for data tothe hi-impedance state and dropping rply.

The fall of rply indicates the end of the transaction. A new transaction canstart only after the final rply↓.

❅ Next, a write.

source destinationdata

rplymaster slave

r/w

req

data from masterdata

r/w

req(master)

rply(slave)

The master initiates the transaction by placing data on the data lines, set-ting r/w to indicate a write, and asserting req.

When the slave has completed the actions required to latch the data, it sig-nals the master by asserting rply.

When the master sees rply↑, it has a positive indication that the slave hassuccessfully performed the write operation. In response, it drops req andceases to assert data on the data lines. The r/w signal is no longer requiredand can return to an unknown state.

When the slave sees req↓, it drops rply.

As with a read, rply↓ indicates the end of the transaction. A new transactioncan start only after the final rply↓.

❅ Again, notice that the timing of the rise and fall of req and rply is slightlydifferent for read and write. The underlying principle is the same however:

9


Each transition indicates that some set of signals is valid, or some set of

operations has been completed, and it is safe for the partner to proceed to the

next step.

Taking the write operation as an example:

❆ The master should not raise req until the data and r/w signals havevalid values. Otherwise, the slave could perform the wrong operation orlatch invalid data.

❆ The slave should not raise rply until it has latched the data. Otherwise,the master could remove the data before the slave has latched it, orchange the r/w control signal, causing the slave to perform an incorrectaction.

❆ The fall of req indicates that the operation is over, as far as the master isconcerned. The r/w signal should remain valid until after req↓, so thatthe slave does not perform an incorrect operation. The master has someflexibility in terms of the data lines. Once it has seen rply↑, it knowsthat the slave has no further need for the data. The only real require-ment, in this simple example, is that the master cease to assert data onthe data lines (by putting its drivers into the hi-impedance state) beforeinitiating a read operation, so that the data lines are available to theslave.

❆ The fall of rply indicates that the operation is over, as far as the slave isconcerned, and indicates the completion of the full transaction. The req

and rply signals are now back to their initial state and a new transactioncan be initiated.

❅ In its explanation of a read operation (destination-initiated transfer), the textstates that “The destination unit [master] may not make another requestuntil the source unit [slave] has shown its readiness to provide new data bydisabling Reply.” This is a little bit misleading. It’s possible to attach thismeaning to rply↓, but not necessary. The fall of rply simply indicates thatthe slave’s interface logic is ready to start a new transaction. The attacheddevice may or may not be ready to respond. If the device is not ready torespond to a new request, all that will happen is that rply↑ will be delayedin the next transaction until the device is ready.

If you think about it for a bit, this is the right thing to do when the interfaceis attached to a shared bus. After all, the next transaction on the bus maynot involve the same interface. A design goal for interface logic is to free thebus for use by other interfaces as quickly as possible.

10

Cmpt 250 Bus Structures April 1, 2008

Bus Structures

Now that we have some of the basics in hand, let’s try to place them in thecontext of a full bus structure.

❅ The purpose of a bus in the context of computer systems is to allow manydifferent entities to communicate. Here’s a minimal example: A CPU, amemory, and a disk, all connected by a bus.

interface

data

address

control

CPUPrimaryMemory(DRAM)

Disk

interface interface

There are a number of things to point out in this figure.

❆ Everybody gets a bus interface. If the CPU is going to communicate withdisk and memory over a bus, it needs bus interface logic. The same canbe said for the primary memory.

❆ What roles do the various components play?

In computer systems, the CPU is always a master when it participatesin a bus transaction. Similarly, memory is always a slave.

Interfaces for i/o devices can play both roles. The disk interface will playthe role of the slave when the CPU is sending it commands to set up atransfer of data into the primary memory. For the actual data transfer,the disk interface will play the role of the master, controlling the bus cy-cles which move data from the disk interface to memory. (This is knownas DMA (direct memory access) i/o. It’s commonly used for high-speed

11


devices in order to spare the CPU the work of executing instructions totransfer data at high speed. We’ll come back to this later.)

❆ The function of the address lines becomes more clear: If the CPU is themaster for a bus transaction, it must have some way to choose one of thememory or the disk interface as the slave. The interface for the memorywill recognise a range of addresses that matches the amount of physicalmemory present in the system. The interface for the disk will recognisea much smaller set of addresses corresponding to the data buffer andcontrol and status registers in the interface.

Note that the CPU need only write to the address lines, because it’salways a master when it participates in a bus transaction. Similarly,the memory need only read the address lines. The disk interface mustbe able to do both.

❆ Both the CPU and the disk interface can take the role of master in a bustransaction. What happens when both of them want control of the bus?We’ll need to devise some method of arbitration — a way to select oneinterface as the master for the next bus cycle. As with DMA, we’ll comeback to this later.

❆ Since the CPU is always the master when it participates in a bus trans-action, and doesn’t even monitor the address lines to see if it’s selected,we need some other way to request it to participate in a bus transactionwith another interface. For this, we’ll use interrupts. As with arbitra-tion, we’ll come back to this later.

❅ Now that we have a system model, let’s consider a real bus — the PCI bus,introduced in the early 1990’s. The description here is far from complete,but it should be enough to give you some idea of how a bus works.

❅ The original PCI bus standard described a parallel bus which could transferup to 32 bits in parallel.

❆ PCI provides the three major signal groups — data, address, and control— but it uses the same wires for address and data in order to reducethe total number of wires required. This is a common technique used inmany bus standards. The wires are first used to transmit an address.All interfaces examine the address, and one interface will recognise thatit has been selected as the slave for the transaction. Once the interfacehas indicated that it’s selected, the master removes the address and thesame wires are used to transmit data.

❆ The PCI bus is a synchronous bus, i.e., a common clock signal is trans-mitted to all interfaces on the bus. By default, each step in a read orwrite transaction takes one clock period. However, as we’ll see, there

12


are other control signals which are used to control the progress of thebus cycle. This hybrid structure — a common clock and default timing,combined with some way to delay the progress of the bus transaction —is a very common structure.

❆ The PCI bus standard specifies that signal values change on the fallingedge of the clock and are checked on the rising edge of the clock. Thisensures that changes have time to propagate from one end of the bus tothe other.

❅ Here’s an example of a read operation on a PCI bus (the figure is adaptedfrom [2, Figure 23.8]). A PCI bus allows the transmission of multiple unitsof data during a single transaction. For the read transaction pictured here,four units of data are sent from the slave to the master, starting at the initialaddress.

clock

adr/data

frame

cmd/be

irdy

trdy

devsel

1 2 3 4 5 6 7 8 9

read

data 0 data 1 data 2 data 3address

(wait)

(wait)

byte enable

Clock Cycle #1 At clock↓, the bus master places an address on adr/data,places an operation code (in this case, the code for read) oncmd/be, and asserts frame to indicate that the address andcommand are valid and a new bus transaction has started.(Notice that many PCI bus signals are active low, so assert-ing the signal means that the value goes to zero.)

Clock Cycle #2 At clock↑, all interfaces check the frame signal for the 1 → 0transition that marks the start of a bus transaction. Thishas just occurred, so the interfaces will latch the addressand command values. One interface will recognise its ownaddress. At clock↓, this interface will assert devsel to indi-cate that it is selected.

Recall that the adr/data lines will be used for both addressand data. Now that all interfaces have had a chance to latch

13


the data, the bus master will cease to assert the addresson these wires, and they will be available for data. Datawill come from either the master or the slave, depending onwhether this is a write or a read operation. Since this is aread, the master device will put its (tristate) drivers into hi-impedance mode, effectively disconnecting from the wires.This frees them to be driven by the slave in subsequentcycles.

Similarly, the cmd/be lines are changed from the opera-tion code to a set of signals which specify which of the fourpossible adr/data bytes may be used to transmit data. Inthe case of the cmd/be signals, however, the master devicealways drives the signals.

Finally, if the master is ready to accept data, it will assertirdy (initiator ready) as shown.

The signals frame and devsel are playing the role of requestand reply, respectively, in the start of a two-line handshakesequence. If no interface asserts devsel, the bus masterknows that something has gone wrong (incorrect address,interface failure, etc.) and can attempt error recovery.

Clock Cycle #3 At clock↑, the slave will check cmd/be to see which databytes can be used to transmit data. Assuming that it’sready to supply data, at clock↓ it will drive data onto adr/data

and assert trdy to indicate the availability of data.

Clock Cycle #4 The master device will latch the data on adr/data and checktrdy at clock↑. Seeing that trdy is asserted, the master willknow that it has latched valid data. The default assumptionis that the master will consume the data at the first risingedge after the data appears on the bus.

In this example, the slave device is, for some reason, notprepared to supply new data in this clock cycle. It indicatesthis by returning trdy to the inactive value (remember, thisis an active low signal).

As you can see, the signal trdy allows the slave to delay theprogress of the transaction. The signal irdy serves the samepurpose for the master.

Clock Cycle #5 Because the slave is not asserting trdy at clock↑, the masterknows that there is no valid data on the bus.

At clock↓, the slave is again ready to supply data. It placesthe data on adr/data and asserts trdy.

14

Cmpt 250 Interrupts April 1, 2008

Clock Cycle #6 At clock↑, the master device latches the data and knowsthat it’s valid because the slave has asserted trdy.

Clock Cycle #7 At clock↑, the master device again latches valid data. Thistime, it’s the master device which is unprepared to acceptnew data at the next clock↑. It indicates this by returningirdy to the inactive value.

Clock Cycle #8 The slave, seeing that irdy is not asserted at clock↑, main-tains the same data on adr/data for another clock period.

The master device has caught up and latched the data, andat clock↓ it again asserts irdy. This is the final item of datathat the master device wants to receive in this transaction.It returns frame to the inactive state, indicating that this isthe end of the bus transaction.

Clock Cycle #9 In this clock cycle, the master and slave wrap up the trans-action, returning all signals to their initial state.

In response to frame↑, the slave removes the final data itemfrom adr/data and returns devsel and trdy to their inactivevalues. The signals frame and devsel have completed thesecond part of the two-line handshake which frames thebus transaction.

The master ceases to assert the byte enable signals on cmd/be

and returns irdy to the inactive value.

Interrupts

We have several topics pending from the previous section: DMA i/o, bus arbi-tration, and interrupts. Interrupts will be covered in this section, and then we’llmove on to discuss i/o transfer modes, including DMA i/o. As you’ll see at a laterpoint, some of the structures used to manage interrupt handling will be equallyuseful for bus arbitration.

❅ We need a way for an interface to make the CPU aware that it needs at-tention, and this is one use of interrupts. The text introduces the conceptof interrupts in §10.9, but unfortunately doesn’t integrate it into any of theprocessor designs. It’s time to correct that oversight.

❅ Interrupts provide a way to suspend the current instruction execution streamand transfer control flow to a new instruction stream in order to deal withan exceptional event.

❆ Not an unanticipated event. A computer cannot respond to a completelyunanticipated event. The best we can manage is advance preparation

15


for an event which we know will occur at some unspecified time in thefuture.

There must be a sequence of instructions somewhere in memory thatcan be executed in response to this event. This sequence of instructionsis commonly called an interrupt handler or interrupt service routine,

There must be provision in the hardware to accept an interrupt request(a signal indicating that the event has occurred) and transfer controlflow to the interrupt handler. The hardware actions which do this arecommonly called the hardware interrupt response sequence.

In other words, a human has to anticipate that interrupts might beuseful and design hardware to accept interrupt requests and transfercontrol flow to an interrupt handler. A human must also write the in-terrupt handler and make the necessary arrangements (i.e., initialisethe proper locations in memory with code and data) so that the han-dler will be executed when the hardware interrupt response sequence istriggered by an interrupt request.

❅ The kinds of events we’re talking about here are events that are anticipated,but the exact time of occurrence cannot be specified in advance. Interruptsare divided into three broad categories based on origin:

❆ External interrupts: interrupt requests due to exceptional events origi-nating outside the CPU, such as i/o requests or power failure.

❆ Internal interrupts: interrupt requests due to exceptional events trig-gered by instruction execution: an attempt to execute an illegal instruc-tion, or division by zero.

❆ Software interrupts: an interrupt request that results directly from theexecution of a special instruction (e.g., the SWI instruction in the 68HC12).

Internal interrupts are often called exceptions, and the term interrupt istaken to mean an external interrupt.

❅ You may be asking “Isn’t a software interrupt sort of a contradiction? Afterall, interrupts are supposed to happen at unpredictable times. Why would wewant to execute an instruction to trigger the interrupt response sequence?”

It turns out that the same steps used to respond to interrupts are an ex-cellent way for a user program to gain access to operating system services.The relevant question is “How does my program know what address to usewhen it calls an operating system service routine?” The short answer is“It doesn’t.” Your program executes a software interrupt instruction witha well-defined code that tells the interrupt handler what system service isrequested. This is a bit beyond the scope of Cmpt 250; we won’t pursue itfurther. Take an operating systems course to learn more.

16


❅ The precise details for responding to an interrupt will vary from one CPUarchitecture to the next, but the minimum set of actions is as follows:

1. Immediately before the CPU fetches a new instruction, it checks an in-terrupt request signal. If interrupts are enabled and the interrupt re-quest signal is asserted, the hardware will begin the hardware interruptresponse sequence of Step 2.

2. The hardware interrupt response sequence will save the current valueof the PC to a known location and load the PC with the address of thefirst instruction in the interrupt service routine. The CPU will then fetchand execute this instruction.

In short, the hardware interrupt response sequence amounts to a call tothe interrupt handler. The details of how to obtain the starting addressof the interrupt service routine will be part of the CPU’s hardware designand the overall computer system design.

3. The interrupt service routine executes its first instruction. Most often,this instruction will disable further interrupts.

Notice that the interrupt service routine is guaranteed to be able to ex-ecute this first instruction, because the CPU hardware will not checkagain for an interrupt request until execution of this instruction is fin-ished.

4. The interrupt service routine will save any CPU state that might bemodified while the service routine executes. The most common items tobe saved are the values of any CPU registers which will be used by theservice routine.

Remember, the interrupted code is not expecting this to happen. Wemust be able to restore everything exactly as it was when the CPU hard-ware accepted the interrupt.

5. The interrupt service routine will execute instructions to determine thecause of the interrupt.

In its simplest form, this might be a check of the status register of eachi/o interface, looking for an interface that’s ready for an i/o operation.(This activity is commonly called polling.) In more sophisticated designs,the hardware will assist with the task of identifying the source of theinterrupt.

6. The interrupt service routine will execute instructions to deal with thecause of the interrupt.

7. The interrupt service routine will execute instructions to restore thestate that it saved in Step 4.

17


8. The interrupt service routine will execute a return from interrupt in-struction, which will resume execution of the interrupted program.

The basic action required of a return from interrupt instruction is thesame as a return from a subroutine — the PC value saved in Step 2 isloaded into the PC and the next instruction is fetched from that address.

At some point in the course of the actions taken in Steps 2 – 5, the interfacewill realise that its interrupt request has been acknowledged and it will ceaseto assert its interrupt request.

❅ Now that we know the general sequence of events that occurs when theCPU hardware accepts an interrupt request and services the interrupt, let’slook at how an interface can signal an interrupt request to the CPU, andhow an interrupt service routine can determine which interface is requestinginterrupt service.

❅ A typical CPU will offer only a few inputs for external interrupt requests. It’squite common to have just two: a maskable interrupt and a nonmaskableinterrupt.

❆ A maskable interrupt can be enabled or disabled by program control(i.e., by executing instructions). Typically, the CPU will provide oneor more bits in a special-purpose CPU register for this purpose. (Forexample, the interrupt mask (I) bit in the CCR of the 68HC12, and theCLI and SEI instructions which clear and set it to enable or disable,respectively, the maskable interrupt.)

❆ A nonmaskable interrupt cannot be disabled by program control. Itis used for high-priority events (typically, power failure) which shouldnever be ignored or postponed.

If you think back to the actions involved in responding to an interrupt,the hardware must be designed to disable the nonmaskable interruptrequest signal while responding to this type of interrupt. The hardwaremust disable it in Step 2 and reenable it as part of the execution of thereturn from interrupt instruction in Step 8.

❆ In the situation where there are only the nonmaskable and maskable in-terrupt requests, the nonmaskable interrupt has priority over the mask-able interrupt. When a processor provides more than two interruptrequest lines, there will also be some way to establish their relative pri-ority. The priority may be hardwired, or it may be adjustable underprogram control by writing to a special-purpose CPU register.

❅ In systems with many i/o devices operating at varying speeds, it’s very usefulto establish some priority for responding to requests for service. In the case

18


where the CPU provides only one or two interrupt lines, additional logicis necessary. There are three common configurations: daisy-chain prioritylogic, parallel priority logic, and a hybrid of the two.

❅ Here’s a figure that illustrates the daisy-chain configuration.

Device 0Interface

In Out

IRQ

V+

CPU

IntAck

IntReq

Device 1Interface

In Out

IRQ

Device kInterface

In Out

IRQ

When an interface requires the attention of the CPU, it asserts the active lowsignal IRQ. All IRQ signals are connected to a single wire in a configurationcalled a ‘wired-OR’.

The IRQ output uses a special output circuit called an open-collector output.When it’s on, it pulls the output low; when it’s off, it’s in a hi-impedancestate. The difference between an open-collector output and a tri-state outputis that an open-collector output has no ability to assert a high output. Whenno IRQ output is asserted, the resistor pulls the signal value up to a high(inactive) value.

This configuration is called a wired-OR because it performs a logical ORfunction for active-low signals. When any of the connected signals (in thiscase, the IRQ outputs of the interfaces) are asserted (active low), the resultingsignal (in this case, IntReq) is asserted (active low).

❅ And here’s a figure that illustrates the logic used to generate an interruptrequest and capture or propagate the acknowledgement.

QD

Q

internal IRQ

(IntAckdaisy chain)

(IntAckdaisy chain)

(interruptrequestto CPU)

IRQ

OutIn

internalclock internal

IntAckinternal signalsare internal todevice interface

(open collectoroutput)

19


Signals labelled ‘internal’ are produced by the interface logic.

When the interface does not require service, the internal interrupt requestsignal IRQ is not asserted, hence the FF is set to 0 and the external IRQ

signal is not asserted. If an IntAck signal arrives at the In input, it will bepropagated to the Out output and passed along to the next interface in thechain.

When the interface needs service, it asserts the internal IRQ signal. This willset the FF and assert IRQ.

After a time, the CPU will respond to the interrupt and assert IntAck. Whenthe IntAck signal arrives at the In input, it will not be propagated to the daisychain Out output, and the internal IntAck signal will be asserted.

The interface and the CPU will now begin to interact over the system bus.When the CPU determines the interface that has captured the IntAck signal,it will drop IntAck to the inactive state. This same interaction will allow theinterface logic to recognise that its interrupt is now being serviced. When itsees the fall of IntAck, it will drop the internal IRQ signal. This will causethe FF to be set to 0. The external IRQ will no longer be asserted, and theinterface is once again in a state where an arriving IntAck signal will bepropagated to the next interface in the chain.

Other designs are certainly possible for this function; the details will dependon system conventions for the signals used to request and acknowledge in-terrupts.

❅ “The interface and the CPU will now begin to interact over the system bus.”is a bit vague. Here are two possible scenarios for how this interaction mightwork:

❆ After the CPU asserts the IntAck signal, it begins a special interruptresponse bus transaction to read information from the interface. Theinterface which has captured IntAck is implicitly selected as the targetinterface. When the interface participates in this special bus trans-action, it knows that its interrupt is being serviced. As part of thisinterrupt response bus transaction, the interface may supply additionalinformation to the CPU to aid it in locating the starting address of theservice routine.

❆ After the CPU asserts the IntAck signal, it begins to execute a genericinterrupt service routine. This service routine polls the interfaces on thebus. The interface which has captured IntAck will set a bit in its statusregister to indicate that it has captured IntAck and should receive serviceat this time. When the polling routine sees this bit set in the interface’sstatus register, it knows that it’s found the right interface.

20


❅ When the CPU offers multiple interrupt request and acknowledgement sig-nals, these signals will be used as inputs to priority logic which selects thehighest priority interrupt for service. The text calls this configuration theparallel priority interrupt method. Typical logic for this function is shown inthe following figure.

IRQ0

IRQ1

IRQ3

IntAck0

IntAck1

IntAck2

IntAck3

IRQ

IntAck

IVec(1:0)

PriorityEncoder

0

1

2

3 active

code(1:0)

IRQ2frominterfaces

to CPUinterruptresponselogic

Decoder

enable

0

1

2

3

code(1:0)

from CPUinterruptresponselogic

tointerfaces

2

CPU

When no interrupt requests are pending, the priority encoder’s active outputis inactive, as is IRQ. When one or more requests is pending, IRQ is assertedand the binary code corresponding to the highest priority request is availableon Ivec(1:0). The IRQ signal indicates that an interrupt is pending. Thevalue of Ivec(1:0) can be used to quickly select the starting address of theproper interrupt service routine.

When the CPU decides to respond to an interrupt, it will assert IntAck. Thiswill enable the decoder outputs, and the output corresponding to the valueof Ivec(1:0) will be asserted, acknowledging the interrupt request.

❅ When the CPU offers only one or two interrupt request and acknowledgementsignals, priority interrupt arbitration logic can be constructed external to theCPU in a dedicated interrupt controller. This very common configuration isillustrated in the text in Figure 12-17. The text figure includes an interruptmask register to disable some or all interrupt requests, and expands on thenotion of using the value of Ivec(1:0) to construct the address of the interruptservice routine (VAD, in the figure). One can picture this logic as yet anotherdevice attached to the system bus. The CPU can write to the mask registerand read the VAD register.

21

Cmpt 250 I/O Transfer Modes April 1, 2008

❅ The most general interrupt request/acknowledge configuration is a hybridof the parallel priority and daisy-chain methods. A daisy-chain structure isattached to each pair of parallel priority request/acknowledge signals.

I/O Transfer Modes

Now we’ve laid the necessary groundwork to describe the three modes used fori/o in computer systems.

Keep firmly in mind that modern CPUs perform memory-mapped i/o using thesame load and store instructions used to access memory. A range of addresses isassigned to i/o interfaces. To use the Mano pipelined RISC as an example, whenthe CPU wants to read data from memory, it will execute a load (LD) instruction.When the CPU wants to read data from an i/o device, it will execute a load (LD)instruction. The only difference between the two is the address.

❅ Most often, the goal of an i/o operation is to transfer a block of data betweenmemory and an i/o device.

❆ There may be an initial exchange between the CPU and an interfaceas the CPU gives the interface the details of the data transfer to beperformed.

❆ The may be a final exchange between the CPU and the interface to wrapup the data transfer and return the interface to the idle state.

❆ For a small set of i/o devices, transfer of one or a few bytes of datadirectly to a CPU register is all that’s required (reading a real-time clock,for example).

❅ Given that we’re interested in moving data between an i/o device and mem-ory, if the CPU is directly involved in moving bytes of data it’s acting as anintermediary. For a read, data will move from the i/o device interface to aCPU register and then to memory. For a write, the data flows from memorythrough a CPU register to the i/o device interface. The CPU must executeload and store instructions to move each unit of data.

❆ This is immediately obvious for a RISC instruction set, where the CPUmust execute a load instruction followed by a store instruction to trans-fer data in either direction.

❆ This is less obvious when the CPU supports a CISC instruction set withaddressing modes that allow the specification of operands in memory,but it remains true. Even if the instruction set provides an instructionwhich appears to allow you to specify direct movement of data from one

22


memory location to another, the data will be fetched to the CPU, held ina temporary register, and then transferred to the destination.

❆ You can see that this must be true if you think for a moment aboutbus cycles. We can specify exactly one address per bus cycle — eithera source or a destination — to select the slave for the transaction. Theother participant is the master.

❅ Add to the above another consideration: Unlike primary memory, an i/odevice is not always ready to transfer data. In addition to executing instruc-tions to move the data between the device’s interface and memory, the CPUwill need to execute instructions to determine if the i/o device is ready toread or write data.

❅ With this bit of analysis, we can introduce the three modes used for i/ooperations:

❆ When performing program-controlled i/o, the CPU executes instructionsto determine if the device is ready for a data transfer, and then exe-cutes additional instructions to move the data between the device andmemory.

❆ When performing interrupt-initiated i/o, the CPU assumes that the in-terface will produce an interrupt request when the device is ready totransfer data. The CPU will respond to the interrupt and execute in-structions to move the data between the device and memory.

❆ When performing direct memory access (DMA) i/o), the CPU executesinstructions to tell the interface the details of the transfer. Then, whilethe CPU performs other work, the data transfer is handled by the de-vice interface (which must have the intelligence to act as the bus masterduring the transfer). When the transfer is finished, the interface inter-rupts the CPU and the CPU will execute any instructions necessary toconclude the data transfer.

❅ The goal, of course, is to perform the i/o operation in a cost-effective manner.

❆ At one extreme, program-controlled i/o requires very little hardware orsoftware support, but the CPU will spend a lot of time executing instruc-tions in support of i/o.

❆ At the other extreme, DMA i/o relieves the CPU of all but the essen-tial activity of specifying the i/o operation, but it requires more capabledevice interfaces.

❆ In the middle, interrupt-driven i/o allows the CPU to do something use-ful while it’s waiting for the device to become ready for a data transfer.

23


❅ Consider the flowchart of Figure 12-14 in the text, which specifies the actionsrequired for program-controlled i/o. The CPU must execute instructions topoll the interface, followed by instructions to move the data. Here’s onepossible assembly language sequence to read data from an interface:

; R13: destination address of data in memory

; R14: number of units of data to be transferred

; R15: address of interface status register

; R16: address of interface data register

; R17: mask to isolate ‘ready’ bit in status

poll LD R1, R15 ; load status from interface

AND R2, R1, R17 ; isolate ‘ready’ bit in status

BZ poll ; if not set, device not ready

LD R1, R16 ; load data from interface data register

ST R13, R1 ; store data to memory

ADI R13, R13, 1 ; increment data destination pointer

ADI R14, R14, -1 ; decrement data count

BNZ poll ; repeat until done

At first glance, the CPU must execute eight instructions for each unit of dataread from the interface and written to memory.

❅ What will we actually achieve for a slow device (a keyboard, for example). As-sume the Mano pipelined CPU with data forwarding and branch ‘prediction’(which always predicts that the branch will not be taken). Assume further a1 GHz. clock frequency, so that one instruction completes execution or onebubble leaves the pipeline every nanosecond.

❆ First, the polling loop contains a control hazard: We’ll end up with twobubbles each time it executes and branches back to repeat the poll (i.e.,branch ‘prediction’ fails). If that were the worst of our problems, it wouldtake 5 ns. (five clock periods) to poll the device.

But . . . we can’t cache the interface status! (Why?) The absolute bestwe can hope for is that we can read the interface’s status register inthe same amount of time it takes us to read primary memory — a fewhundred clock periods. Let’s use Mano’s (optimistic) figure: 100 ns. (100clock periods) to execute the polling instructions, including the time toaccess the interface’s status register.

This line of reasoning, extended to the rest of the data transfer loop,says that we can assume another 200 ns. to transfer a unit of data oncethe device indicates it’s ready.

24


❆ A very fast typist can type perhaps 120 words per minute. With an av-erage 7 – 8 characters per word, we have around 15 keystrokes per sec-ond. Suppose our process polls the keyboard status register 20 timesper second to be sure that no keystroke is missed. How much of theCPU’s capacity is the process using? It will need 20 × 100 = 2000 clockperiods for polling each second, and there are 109 clock periods avail-able, so we need only (2 × 103)/(1 × 109) = .0002% of the CPU’s capacity.

❆ Clearly the CPU will have no trouble keeping up. But that’s not reallythe problem. How will the process know to poll once every 50 ms., atjust the right moment?

If the process simply executes the polling loop, it will waste a huge

amount of time polling (49,998 µs. in every 50,000 µs., by our estimate).And there’s still no guarantee (in a multiprocessing environment) thatthe deadline will be met, because there’s no guarantee that the processwill execute at least once every 50 ms.

For slow devices, the message is clear: We need to find some way toavoid polling.

❅ Before we leave his example, let’s consider how programmed i/o will performwhen we’re dealing with fast devices.

❆ Suppose we have a Gigabit Ethernet interface. Let’s say that the inter-face is capable of receiving a byte every 10 ns. and transfers data on thesystem bus in units of four-byte words. We have 40 ns. to perform atransfer! In this case, program-controlled i/o cannot keep up.

❆ Suppose the i/o device is a disk drive. The data transfer rate once theseek is completed is around 3 MB/sec. If the interface were designed totransfer four-byte data words to the CPU as the data is read from thedisk, the required transfer rate would be about (3 MB/sec.)/4 = 750,000transfers per second, or around 1333 ns. per transfer.

In this case, program-controlled i/o — at 300 ns. for a status checkand data transfer — can keep up. But the process will need to poll atleast 750,000 times per second to make sure no data is dropped. Then(100)(750 × 103)/(1000 × 106) = 7.5% of the CPU’s time is required forpolling! (Again, under the unrealistic assumption that we can poll justonce, at exactly the right time, in every 1333 ns. interval.) This reallyisn’t acceptable, and we’ve been very generous in our assumptions.

❆ How have we been generous? Well, no self-respecting multiprocessingoperating system will let a user process get anywhere near the hardware.So our polling loop really involves a call to the operating system, askingit to read the interface status or data register and return the result. A

25


more realistic estimate of the time involved would be in the neighbour-hood of 500 – 1000 ns. Now that ‘unacceptable’ 7.5% becomes a really

unacceptable 40% – 75% of the available CPU cycles!

Keep in mind that other processes will also need to execute. There’s noguarantee that our process will be allowed to execute often enough tomaintain a high polling rate.

❅ The bottom line is that program-controlled i/o is appropriate only in twovery dissimilar circumstances:

❆ First, in situations where the timing requirements are forgiving. Thedevice in question is slow, data transfer is well-buffered, it won’t matterif we drop the occasional unit of data, etc.

❆ Second, in situations where the timing requirements are so stringentas to tolerate absolutely no variation. Everything is under the explicitcontrol of a single process. No operating system, no virtual memoryor memory hierarchy, no interrupts, nothing that could introduce vari-ation in the response time for critical events. This case is very rare.Unless you find yourself designing fly-by-wire avionics software or criti-cal real-time process control software, it’s unlikely you’ll ever encountera situation where this style of system design is appropriate.

❅ Can we do better? Surely. We can use interrupts to avoid polling: theinterrupt-driven i/o mode. As explained in the previous section, interruptsprovide a way for an interface to ask the CPU to set aside the instructionstream that’s currently executing and execute an interrupt handler to dealwith the i/o request.

❆ There’s really not much to say here. Assuming the computer system isequipped with the interrupt request and response machinery describedin the previous section, all that’s needed is the code for the service rou-tine. And that code will look a lot like the snippet that we used toillustrate program-controlled i/o.

❆ Really, the big improvement in moving to interrupt-driven i/o is thatthe CPU no longer has to worry about polling often enough, or aboutwasting huge amounts of time polling too often. The interface will makean interrupt request whenever it requires service.

❆ The important system design consideration is to make sure that thepriority assigned to the various i/o devices will ensure timely service forall devices. In particular, we want to make sure that faster devices canpostpone, or even interrupt, the service of slower devices.

26


❅ Can we do better? Surely. With interrupt-driven i/o, the CPU isn’t wastingtime checking to see if an interface is ready to transfer data. But it is stillexecuting instructions to perform the actual data transfer.

❆ As we saw from the analysis of the code snippet for program-controlledi/o, when a very fast device is involved the CPU simply cannot executeinstructions fast enough. For not-quite-so-fast devices like a disk, theCPU might be able to keep up, but it would not be able to do anythingelse. To the user, the computer system would appear to freeze while ittransferred a disk block or received a packet from the network.

❆ Direct Memory Access (DMA) i/o gives us the means to free the CPUfrom executing instructions to move the data.

❆ As usual, there’s a cost: The device interface must be designed withthe capability to act as bus master for a bus cycle, and it must be ableto keep track of the state of a data transaction. The cost is not thatgreat, and the benefits are huge, so modern interfaces for fast deviceswill provide DMA capability.

❆ There’s one more thing that must be in place for this to work: Thebus protocol must provide for multiple bus masters and arbitration forthe right to be bus master on the next bus cycle. Again, modern busprotocols will provide this support.

❅ Here’s a diagram that adds DMA capability to the generic interface presentedat the beginning of these notes.

27


digital databuffer


controlregister

statusregister



controllogic

select

command

request

ready

interfaceregister

devic

e

DMA-capableinterface

addressregister

data countregister

IntReq

IntAck

BusReq

BusGrant

addre

ss

data

con

trol

syste

m b

us

❆ An address register is necessary so that the interface can place an ad-dress on the address lines to select the slave interface when it runs abus transaction. A data count register is necessary so that the interfacecan keep track of the number of bytes of data transferred.

❆ In order to be a bus master, the interface must request the right to runa bus cycle. The bus request (BusReq) and bus grant (BusGrant) signalsare used for this purpose.

The same logic structures (daisy-chain and parallel priority) used toarbitrate interrupt request and response can be used for bus requestand response. A bus arbitration module will exist somewhere in thecomputer system’s i/o subsystem to receive bus requests and issue busgrants.

28


❆ As discussed, in order to truly ofifload work from the CPU, the interfacewill need the ability to interrupt the CPU, hence it must have interruptrequest (IntReq) and interrupt acknowledge (IntAck) signals.

❆ Finally, the control logic for the interface must be augmented with thecapability to produce the signal sequences required to run a bus cycle.As we’ve hinted, modern bus protocols will have more than just readand write transactions, so the r/w input has changed to a multibitcommand input.

❅ Together, the data count and address registers represent the minimal staterequired to transfer a block of data. The initial address is loaded into theaddress register and the data count is loaded into the data count register.With each unit of data transferred over the bus, the data count is decre-mented and the address is incremented. When the data count reaches zero,the block transfer is complete.

❅ Note that reading or writing the interface’s address and data count registersis performed using standard bus cycles, just as for any other register withinthe interface. To load the interface’s address register, the CPU will placean address on the address lines of the bus to select the interface’s internaladdress register as the destination. It will place the value of the startingaddress for the data transfer on the data lines of the bus and perform a writetransaction to the interface. The interface will latch the starting address inits (data) interface register, and then transfer the value over the interface’sinternal bus to the internal address register.

❅ To finish off the description of DMA i/o, a few words about §12-7 in the text.Portions of the text are old. Mano has been writing hardware design textssince the early 1970’s, and he’s been recycling material for nearly that long.Section 12-7 describes a shared DMA controller configuration, somethingthat’s been obsolete for about 10 years.

❆ In this type of configuration, the DMA controller takes over from theCPU and runs non-standard bus cycles to transfer data between twodumb (non-DMA-capable) interfaces. It’s assumed that the CPU and theDMA controller are the only devices competing to be bus master.

❆ As shown in Figure 12-20, there is direct communication between thei/o interface (I/O peripheral device) and the DMA controller. To requesta data transfer, the interface asserts the DMArequest signal. The DMAcontroller then requests the CPU to relinquish the bus via the bus re-quest (BR) signal.

29


❆ When the CPU returns the bus grant (BG) signal, the DMA controllernotifies the device with the DMAacknowledge signal. An odd sort ofcooperative bus transaction follows.

Suppose that data is being transferred from the peripheral device to thememory. The DMA controller is in charge of the address and controllines of the system bus. It will set the bus address lines to the memoryaddress where the data is to be stored, and set the bus operation to be awrite. The interface for the peripheral is responsible for monitoring thesystem bus control signals and the DMAacknowledge signal and is ex-pected to place the data on the system bus data lines at the appropriatemoment in the bus cycle.

❆ For a transfer from memory to peripheral, the sequence of events ispretty much the same, except that the DMA controller will signal a readoperation and the peripheral’s interface is responsible for latching thedata.

References

[1] M. Mano and C. Kime. Logic and Computer Design Fundamentals. PrenticeHall, Upper Saddle River, New Jersey, 4th edition, 2008.

[2] B. Parhami. Computer Architecture: From Microprocessors to Supercomputers.Oxford University Press, New York, New York, 2005.

30

Documents

The Input/Output SubsystemCmpt 250 Input/Output April 1, 2008 The Input/Output Subsystem So far, we’ve discussed the processor and the memory hierarchy, and we’ve looked at how