View
214
Download
1
Embed Size (px)
Citation preview
Interfacing Processors and Peripherals
Andreas KlappeneckerCPSC321 Computer
Architecture
Collection of I/O Devices
Communication between I/O devices, processor and
memory use protocols on the bus and interrupts
Mainmemory
I/Ocontroller
I/Ocontroller
I/Ocontroller
Disk Graphicsoutput
Network
Memory– I/O bus
Processor
Cache
Interrupts
Disk
Impact of I/O on PerformanceA benchmark executes in 100 seconds 90 seconds CPU time 10 seconds I/O timeIf CPU improves 50% per year for next 5 years,
howmuch faster does the benchmark run in 5 years?
90/(1.5)5 = 90/7.59 = 11.851
I/O Devices
Very diverse devices— behavior (i.e., input vs. output)— partner (who is at the other end?)— data rate
Device Behavior Partner Data rate (KB/sec)Keyboard input human 0.01Mouse input human 0.02Voice input input human 0.02Scanner input human 400.00Voice output output human 0.60Line printer output human 1.00Laser printer output human 200.00Graphics display output human 60,000.00Modem input or output machine 2.00-8.00Network/LAN input or output machine 500.00-6000.00Floppy disk storage machine 100.00Optical disk storage machine 1000.00Magnetic tape storage machine 2000.00Magnetic disk storage machine 2000.00-10,000.00
Communicating with Processor
Polling simple I/O device puts information in a status register processor retrieves information check the status periodically
Interrupt driven I/O device notifies processor that it has completed some
operation by causing an interrupt similar to exception, except that it is asynchronous processor must be notified of the device csng interrupt interrupts must be prioritized
I/O Example: Disk Drives
To access data:— seek: position head over the proper track (8 to 20 ms. avg.)— rotational latency: wait for desired sector (.5 / RPM)— transfer: grab the data (one or more sectors) 2 to 15 MB/sec
Platter
Track
Platters
Sectors
Tracks
I/O Example: Buses
Shared communication link (one or more wires) Difficult design:
— may be bottleneck— tradeoffs (buffers for higher bandwidth increases latency)— support for many different devices— cost
Types of buses:— processor-memory (short high speed, custom design)— backplane (high speed, often standardized, e.g., PCI)— I/O (lengthy, different devices, standardized, e.g., SCSI)
Synchronous vs. Asynchronous— use a clock and a synchronous protocol,
fast and small, but every device must operate at same rate and clock skew requires the bus to be short
— don’t use a clock - use handshaking instead
Asynchronous Handshake Protocol
ReadyReq: Indicates a read request from memory
DataRdy: Indicates that data word is now ready on data lines
Ack: Used to acknowledge the ReadyReq or DataRdy signal of the other party
DataRdy
Ack
Data
ReadReq 13
4
57
642 2
Asynchronous Handshake Protocol
1. Memory sees ReadReq, reads address from data bus, raises Ack2. I/O device sees Ack high, releases ReadReq and data lines3. Memory sees ReadReq low, drops Ack to acknowledge ReadReq4. When memory has data ready, it places data from the read request on the data
lines and raises DataRdy5. I/O devices sees DataRdy, reads data from the bus, signals that it has the data
by raising Ack6. Memory sees the Ack signal, drops DataRdy, releases datalines 7. If DataRdy goes low, the I/O device drops Ack to indicate that transmission is
over
DataRdy
Ack
Data
ReadReq 13
4
57
642 2
Synchronous vs. Asynchronous Buses
Compare max. bandwidth for a synchronous bus and an asynchronous bus
Synchronous bus has clock cycle time of 50 ns each transmission takes 1 clock cycle
Asynchronous bus requires 40 ns per handshake
Find bandwidth for each bus when performing one-word reads from a 200ns memory
Synchronous Bus
1. Send address to memory: 50 ns2. Read memory: 200 ns3. Send data to device: 50ns4. Total: 300 ns5. Max. bandwidth:
1. 4 bytes/300ns = 13.3 MB/second
Asynchronous Bus Apparently much slower because each step of the
protocol takes 40 ns and memory access 200 ns Notice that several steps are overlapped with
memory access time Memory receives address at step 1 does not need to put address until step 5 steps 2,3,4 can overlap with memory access
Step 1: 40 ns Step 2,3,4: max(3 x 40ns =120ns, 200 ns) Steps 5,6,7: 3x40ns = 120ns Total time 360ns max. bandwidth 4bytes/360ns=11.1MB/second
Other important issues
Bus Arbitration:— daisy chain arbitration (not very fair)— centralized arbitration (requires an arbiter), e.g., PCI— self selection, e.g., NuBus used in Macintosh— collision detection, e.g., Ethernet
Operating system:— polling, interrupts, DMA
Performance Analysis techniques:— queuing theory— simulation— analysis, i.e., find the weakest link (see “I/O System
Design”)
Overhead of Polling
Overhead of Polling
Ways to Transfer Data between Memory and Device