63
Chapter 6 Storage and Other I/O Topics

Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

  • Upload
    vandieu

  • View
    228

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Chapter 6Storage and Other I/O Topics

Page 2: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

IntroductionIntroduction I/O devices can be characterized by

B h i i t t t t Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec Data rate: bytes/sec, transfers/sec

I/O bus connections

2

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 3: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Interfacing Processors and PeripheralsPeripherals

I/O Design affected by many factors I/O Design affected by many factors (expandability, resilience)P f Performance:

— access latency th h t— throughput

— connection between devices and th tthe system

— the memory hierarchyth ti t

3

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

— the operating system

Page 4: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Interfacing Processors and PeripheralsPeripherals

A variety of different users (e g banks A variety of different users (e.g., banks, supercomputers, engineers)

Processor

Cache

Interrupts

Main memory

I/O controller

I/O controller

I/O controller

Memory– I/O bus

Disk Graphics output

NetworkDisk

4

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 5: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O/O

Important but neglected“The difficulties in assessing and designing I/O

systems haveoften relegated I/O to second class status”often relegated I/O to second class status”“courses in every aspect of computing, from

programming toprogramming tocomputer architecture often ignore I/O or give

it scanty coverage”y g“textbooks leave the subject to near the end,

making it easierf t d t d i t t t ki it!”

5

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

for students and instructors to skip it!”

Page 6: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O/O

GUILTY!

— we won’t be looking at I/O in much gdetail

— be sure and read Chapter 6 in its entirety.y

— you should probably take a networking

6

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

y p y gclass!

Page 7: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O Devices/O e ces

Very diverse devices Very diverse devices— behavior (i.e., input vs. output)

partner (who is at the other end?)— partner (who is at the other end?)— data rate

7

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 8: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O Devices/O e ces

Device Behavior Partner Data rate (KB/sec)Device Behavior Partner Data rate (KB/sec)Keyboard input human 0.01Mouse input human 0.02Voice input input human 0.02Scanner inp t h man 400 00Scanner input human 400.00Voice output output human 0.60Line printer output human 1.00Laser printer output human 200.00Graphics display output human 60,000.00Modem input or output machine 2.00-8.00Network/LAN input or output machine 500.00-6000.00Floppy disk storage machine 100.00ppy gOptical disk storage machine 1000.00Magnetic tape storage machine 2000.00Magnetic disk storage machine 2000.00-10,000.00

8

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 9: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O System CharacteristicsI/O System Characteristics

Dependability is important Dependability is important Particularly for storage devices

Performance measures Performance measures Latency (response time) Throughput (bandwidth) Throughput (bandwidth) Desktops & embedded systems

Mainly interested in response time & diversity Mainly interested in response time & diversity of devices

Servers

9

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Mainly interested in throughput & expandability of devices

Page 10: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

DependabilityDependabilityService accomplishmentp

Service deliveredas specified

Fault: failure of a component

M t l dFailureRestoration

May or may not lead to system failure

Service interruptionDeviation from

specified service

10

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

p

Page 11: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Dependability MeasuresDependability Measures

Reliability: mean time to failure (MTTF) Reliability: mean time to failure (MTTF) Service interruption: mean time to repair (MTTR) Mean time between failures

MTBF = MTTF + MTTR

Availability = MTTF / (MTTF + MTTR)l b l Improving Availability

Increase MTTF: fault avoidance, fault tolerance, fault forecasting

Reduce MTTR: improved tools and processes for diagnosis and repair

11

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 12: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Disk StorageDisk Storage

Nonvolatile rotating magnetic storage Nonvolatile, rotating magnetic storage

12

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 13: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Disk Sectors and AccessDisk Sectors and Access Each sector records

S t ID Sector ID Data (512 bytes, 4096 bytes proposed) Error correcting code (ECC)Error correcting code (ECC)

Used to hide defects and recording errors Synchronization fields and gaps

A t t i l Access to a sector involves Queuing delay if other accesses are pending Seek: move the heads Seek: move the heads Rotational latency Data transfer

13

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Controller overhead

Page 14: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Disk Access ExampleDisk Access Example

Given Given 512B sector, 15,000rpm, 4ms average seek time,

100MB/s transfer rate, 0.2ms controller overhead, idle diskidle disk

Average read time 4ms seek time

+ ½ / (15 000/60) = 2ms rotational latency+ ½ / (15,000/60) = 2ms rotational latency+ 512 / 100MB/s = 0.005ms transfer time+ 0.2ms controller delay= 6.2ms 6.2ms

If actual average seek time is 1ms Average read time = 3.2ms

14

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 15: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Disk Performance IssuesDisk Performance Issues

Manufacturers quote average seek time Manufacturers quote average seek time Based on all possible seeks Locality and OS scheduling lead to smaller actual average

seek timesseek times

Smart disk controller allocate physical sectors on disk Present logical sector interface to hostg SCSI, ATA, SATA

Disk drives include cachesP f t h t i ti i ti f Prefetch sectors in anticipation of access

Avoid seek and rotational delay

15

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 16: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Flash StorageFlash Storage

Nonvolatile semiconductor storage Nonvolatile semiconductor storage 100× – 1000× faster than disk

S ll l b t Smaller, lower power, more robust But more $/GB (between disk and DRAM)

16

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 17: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Flash TypesFlash Types

NOR flash: bit cell like a NOR gate NOR flash: bit cell like a NOR gate Random read/write access Used for instruction memory in embedded systems Used for instruction memory in embedded systems

NAND flash: bit cell like a NAND gate Denser (bits/area), but block-at-a-time accesse se (b ts/a ea), but b oc at a t e access Cheaper per GB Used for USB keys, media storage, …

17

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 18: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Interconnecting ComponentsComponents

Need interconnections between CPU, memory, I/O controllers

Bus: shared communication channelP ll l t f i f d t d Parallel set of wires for data and synchronization of data transfer

Can become a bottleneckCan become a bottleneck Performance limited by physical factors

Wire length, number of connectionsg , More recent alternative: high-speed

serial connections with switchesk k

18

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Like networks

Page 19: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Bus TypesBus Types

Processor Memory buses Processor-Memory buses Short, high speed

D i i t h d t i ti Design is matched to memory organization

I/O buses Longer, allowing multiple connections Specified by standards for interoperability Connect to processor-memory bus through

a bridge

19

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 20: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Bus Signals and SynchronizationBus Signals and Synchronization

Data lines Data lines Carry address and data Multiplexed or separate

Control lines Indicate data type, synchronize transactions

S h Synchronous Uses a bus clock

Asynchronous Asynchronous Uses request/acknowledge control lines for

handshaking

20

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 21: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O Example: Buses/O a p e: uses

Shared communication link (one or more Shared communication link (one or more wires)

Difficult design: Difficult design:— may be bottleneck— length of the busg— number of devices— tradeoffs (buffers for higher bandwidth

i l )increases latency)— support for many different devices— cost

21

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

— cost

Page 22: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O Example: Buses/O a p e: uses

Types of buses: Types of buses:— processor-memory (short high

speed custom design)speed, custom design)— backplane (high speed, often

standardized e g PCI)standardized, e.g., PCI)— I/O (lengthy, different devices,

standardized e g SCSI USB Firewire)standardized, e.g., SCSI, USB, Firewire)

22

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 23: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O Example: Buses/O a p e: uses

Synchronous vs Asynchronous Synchronous vs. Asynchronous— use a clock and a synchronous

protocol fast and smallprotocol, fast and smallbut every device must operate

at same rate andat same rate andclock skew requires the bus to

be shortbe short— don’t use a clock and instead use

handshaking23

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

handshaking

Page 24: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Examplea p e

R dR

Ack

Data

ReadReq 13

4

5

642 2

DataRdy

57

24

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 25: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O Bus ExamplesI/O Bus ExamplesFirewire USB 2.0 PCI Express Serial ATA Serial

Attached SCSI

Intended use External External Internal Internal External

D i 63 127 1 1 4Devices per channel

63 127 1 1 4

Data width 4 2 2/lane 4 4

P k 50MB/ 0 2MB/ 250MB/ /l 300MB/ 300MB/Peak bandwidth

50MB/s or 100MB/s

0.2MB/s, 1.5MB/s, or 60MB/s

250MB/s/lane1×, 2×, 4×, 8×, 16×, 32×

300MB/s 300MB/s

Hot Yes Yes Depends Yes Yespluggable

p

Max length 4.5m 5m 0.5m 1m 8m

Standard IEEE 1394 USB PCI-SIG SATA-IO INCITS TC

25

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Implementers Forum

T10

Page 26: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Other important issuesO e po a ssues

Bus Arbitration: Bus Arbitration:— daisy chain arbitration (not very

f i )fair)— centralized arbitration (requires

an arbiter), e.g., PCI— self selection, e.g., NuBus used inself selection, e.g., NuBus used in

Macintoshlli i d t ti Eth t

26

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

— collision detection, e.g., Ethernet

Page 27: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Other important issuesO e po a ssues

Operating system: Operating system:— polling— interrupts— DMA

27

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 28: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Other important issuesO e po a ssues

Performance Analysis techniques: Performance Analysis techniques:— queuing theory— simulation— analysis, i.e., find the weakest link a a ys s, e , d t e ea est

(see “I/O System Design”)

M d l Many new developments

28

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 29: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Typical x86 PC I/O SystemTypical x86 PC I/O System

29

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 30: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O ManagementI/O Management

I/O is mediated by the OS I/O is mediated by the OS Multiple programs share I/O resources

Need protection and scheduling Need protection and scheduling

I/O causes asynchronous interruptsSame mechanism as exceptions Same mechanism as exceptions

I/O programming is fiddlyOS provides abstractions to programs OS provides abstractions to programs

30

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 31: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O CommandsI/O Commands

I/O devices are managed by I/O controller hardware I/O devices are managed by I/O controller hardware Transfers data to/from device Synchronizes operations with software

Command registers Command registers Cause device to do something

Status registers Indicate what the device is doing and occurrence of errors

Data registers Write: transfer data to a device Read: transfer data from a device

31

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 32: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O Register MappingI/O Register Mapping

Memory mapped I/O Memory mapped I/O Registers are addressed in same space as memory Address decoder distinguishes between them Address decoder distinguishes between them OS uses address translation mechanism to make

them only accessible to kernel

I/O instructions Separate instructions to access I/O registers Can only be executed in kernel mode Example: x86

32

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 33: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

PollingPolling

Periodically check I/O status register Periodically check I/O status register If device ready, do operation

If t k ti If error, take action

Common in small or low-performance l b dd dreal-time embedded systems

Predictable timing Low hardware cost

In other systems, wastes CPU time

33

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

y ,

Page 34: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

InterruptsInterrupts

When a device is ready or error occurs When a device is ready or error occurs Controller interrupts CPU

Interrupt is like an exceptionh i d i i i But not synchronized to instruction execution

Can invoke handler between instructions Cause information often identifies the interrupting p g

device Priority interrupts

Devices needing more urgent attention get higher Devices needing more urgent attention get higher priority

Can interrupt handler for a lower priority interrupt

34

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 35: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O Data TransferI/O Data Transfer

Polling and interrupt driven I/O Polling and interrupt-driven I/O CPU transfers data between memory and

I/O data registersI/O data registers Time consuming for high-speed devices

Di t (DMA) Direct memory access (DMA) OS provides starting address in memory I/O controller transfers to/from memory

autonomouslyC ll l

35

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Controller interrupts on completion or error

Page 36: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

DMA/Cache InteractionDMA/Cache Interaction

If DMA writes to a memory block that is If DMA writes to a memory block that is cached Cached copy becomes stale Cached copy becomes stale

If write-back cache has dirty block, and DMA reads memory blocky Reads stale data

Need to ensure cache coherence Flush blocks from cache if they will be used for

DMAO h bl l ti f I/O

36

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Or use non-cacheable memory locations for I/O

Page 37: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

DMA/VM InteractionDMA/VM Interaction OS uses virtual addresses for memory

DMA blocks may not be contiguous in physical memory

h ld l dd Should DMA use virtual addresses? Would require controller to do translation

If DMA uses physical addresses May need to break transfers into page-

i d h ksized chunks Or chain multiple transfers

Or allocate contiguous physical pages for

37

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Or allocate contiguous physical pages for DMA

Page 38: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Measuring I/O PerformanceMeasuring I/O Performance

I/O performance depends on I/O performance depends on Hardware: CPU, memory, controllers, buses Software: operating system, database Software: operating system, database

management system, application Workload: request rates and patterns

I/O system design can trade-off between response time and throughput Measurements of throughput often done with

constrained response-time

38

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 39: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Transaction Processing BenchmarksTransaction Processing Benchmarks

Transactions Small data accesses to a DBMS Interested in I/O rate, not data rate

Measure throughputg p Subject to response time limits and failure handling ACID (Atomicity, Consistency, Isolation, Durability) Overall cost per transaction

Transaction Processing Council (TPC) benchmarks (www.tcp.org) TPC-APP: B2B application server and web services TCP-C: on-line order entry environment TCP-E: on-line transaction processing for brokerage firm TPC-H: decision support — business oriented ad-hoc queries

39

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 40: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

File System & Web BenchmarksFile System & Web Benchmarks SPEC System File System (SFS)

Synthetic workload for NFS server, based on monitoring real systemsR lt Results Throughput (operations/sec) Response time (average ms/operation) Response time (average ms/operation)

SPEC Web Server benchmark Measures simultaneous user sessions Measures simultaneous user sessions,

subject to required throughput/session Three workloads: Banking, Ecommerce,

40

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

g, ,and Support

Page 41: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O vs CPU PerformanceI/O vs. CPU Performance Amdahl’s Law

Don’t neglect I/O performance as parallelism increases compute performance

Example Example Benchmark takes 90s CPU time, 10s I/O time Double the number of CPUs/2 years

I/O nchanged I/O unchanged

Year CPU time I/O time Elapsed time % I/O timenow 90s 10s 100s 10%+2 45s 10s 55s 18%+4 23s 10s 33s 31%

41

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

+6 11s 10s 21s 47%

Page 42: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

RAIDRAID

Redundant Array of Inexpensive Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk)

Parallelism improves performance Parallelism improves performance Plus extra disk(s) for redundant data storage

Provides fault tolerant storage systemg y Especially if failed disks can be “hot swapped”

RAID 0N d d (“AID”?) No redundancy (“AID”?) Just stripe data over multiple disks

But it does improve performance

42

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 43: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

RAID 1 & 2RAID 1 & 2

RAID 1: Mirroring RAID 1: Mirroring N + N disks, replicate data

Write data to both data disk and mirror disk Write data to both data disk and mirror disk On disk failure, read from mirror

RAID 2: Error correcting code (ECC) RAID 2: Error correcting code (ECC) N + E disks (e.g., 10 + 4)

S lit d t t bit l l N di k Split data at bit level across N disks Generate E-bit ECC

43

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Too complex, not used in practice

Page 44: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

RAID 3: Bit Interleaved ParityRAID 3: Bit-Interleaved Parity

N + 1 disks N + 1 disks Data striped across N disks at byte level Redundant disk stores parity Redundant disk stores parity Read access

Read all disks

Write access Generate new parity and update all disks

On failure On failure Use parity to reconstruct missing data

Not widely used

44

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Not widely used

Page 45: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

RAID 4: Block Interleaved ParityRAID 4: Block-Interleaved Parity

N + 1 disks Data striped across N disks at block level Redundant disk stores parity for a group of blocks Read access

Read only the disk holding the required block

Write access Write access Just read disk containing modified block, and parity disk Calculate new parity, update data disk and parity disk

On failure Use parity to reconstruct missing data

N t id l d45

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Not widely used

Page 46: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

RAID 3 vs RAID 4

46

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 47: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

RAID 5: Distributed ParityRAID 5: Distributed Parity N + 1 disks

Lik RAID 4 b t it bl k di t ib t d Like RAID 4, but parity blocks distributed across disks Avoids parity disk being a bottleneck

Wid l d Widely used

47

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 48: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

RAID 6: P + Q RedundancyRAID 6: P + Q Redundancy

N + 2 disks N + 2 disks Like RAID 5, but two lots of parity

G t f lt t l th h Greater fault tolerance through more redundancy

M lti l RAID Multiple RAID More advanced systems give similar fault

t l ith b tt ftolerance with better performance

48

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 49: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

RAID SummaryRAID Summary

RAID can improve performance and RAID can improve performance and availability

High il bilit eq i e hot pping High availability requires hot swapping

Assumes independent disk failures Too bad if the building burns down!

See “Hard Disk Performance, Quality and Reliability” http://www.pcguide.com/ref/hdd/perf/index.htm

49

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

p p g p

Page 50: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O System DesignI/O System Design

Satisfying latency requirements Satisfying latency requirements For time-critical operations If system is unloaded

Add l f Add up latency of components

Maximizing throughput Find “weakest link” (lowest-bandwidth component)( p ) Configure to operate at its maximum bandwidth Balance remaining components in the system

If system is loaded simple analysis is insufficient If system is loaded, simple analysis is insufficient Need to use queuing models or simulation

50

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 51: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Server ComputersServer Computers Applications are increasingly run on pp g y

servers Web search, office apps, virtual worlds, …, pp , ,

Requires large data center servers Multiple processors networks connections Multiple processors, networks connections,

massive storage Space and power constraints Space and power constraints

Server equipment built for 19” racksMultiples of 1 75” (1U) high

51

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Multiples of 1.75 (1U) high

Page 52: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Rack Mounted ServersRack-Mounted ServersSun Fire x4150 1U server

52

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 53: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Sun Fire x4150 1U serverSun Fire x4150 1U server

4 cores4 cores each

16 x 4GB =

53

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

16 x 4GB = 64GB DRAM

Page 54: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

I/O System Design ExampleI/O System Design Example Given a Sun Fire x4150 system with

Workload: 64KB disk reads Each I/O op requires 200,000 user-code instructions and

100,000 OS instructions Each CPU: 109 instructions/sec FSB: 10.6 GB/sec peak

DRAM DDR2 667MHz: 5 336 GB/sec DRAM DDR2 667MHz: 5.336 GB/sec PCI-E 8× bus: 8 × 250MB/sec = 2GB/sec Disks: 15,000 rpm, 2.9ms avg. seek time, p g

112MB/sec transfer rate What I/O rate can be sustained?

For random reads and for sequential reads

54

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

For random reads, and for sequential reads

Page 55: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Design Example (cont)Design Example (cont)

I/O rate for CPUs I/O rate for CPUs Per core: 109/(100,000 + 200,000) = 3,333 8 cores: 26,667 ops/sec

Random reads, I/O rate for disks Assume actual seek time is average/4 Time/op = seek + latency + transfer Time/op seek + latency + transfer

= 2.9ms/4 + 4ms/2 + 64KB/(112MB/s) = 3.3ms 303 ops/sec per disk, 2424 ops/sec for 8 disks

Sequential reads Sequential reads 112MB/s / 64KB = 1750 ops/sec per disk 14,000 ops/sec for 8 disks

55

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 56: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Design Example (cont)Design Example (cont)

PCI-E I/O rate PCI E I/O rate 2GB/sec / 64KB = 31,250 ops/sec

DRAM I/O rate5 336 GB/sec / 64KB 83 375 ops/sec 5.336 GB/sec / 64KB = 83,375 ops/sec

FSB I/O rate Assume we can sustain half the peak rate 5.3 GB/sec / 64KB = 81,540 ops/sec per FSB 163,080 ops/sec for 2 FSBs

Weakest link: disks 2424 ops/sec random, 14,000 ops/sec sequential Other components have ample headroom to accommodate

these rates

56

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 57: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Fallacy: Disk DependabilityFallacy: Disk Dependability If a disk manufacturer quotes MTTF as

1,200,000hr (140yr) A disk will work that long

Wrong: this is the mean time to failure What is the distribution of failures? What is the distribution of failures? What if you have 1000 disks

How many will fail per year?How many will fail per year?

0.73%ehrs/failur1200000

hrs/disk 8760disks 1000(AFR) Rate Failure Annual

57

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

ehrs/failur1200000

Page 58: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Fallacies Fallacies

Disk failure rates are as specified Disk failure rates are as specified Studies of failure rates in the field

Schroeder and Gibson: 2% to 4% vs. 0.6% to 0.8%Pinhei o et al 1 7% (fi st ea ) to 8 6% (thi d ea ) s 1 5% Pinheiro, et al.: 1.7% (first year) to 8.6% (third year) vs. 1.5%

Why?

A 1GB/s interconnect transfers 1GB in one sec But what’s a GB? For bandwidth, use 1GB = 109 B For storage use 1GB = 230 B = 1 075×109 B For storage, use 1GB = 2 B = 1.075×10 B So 1GB/sec is 0.93GB in one second

About 7% error

58

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 59: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Pitfall: Offloading to I/O ProcessorsPitfall: Offloading to I/O Processors

Overhead of managing I/O processor request may dominate Quicker to do small operation on the CPU But I/O architecture may prevent that

I/O processor may be slowerI/O processor may be slower Since it’s supposed to be simpler

Making it faster makes it into a major Making it faster makes it into a major system component

Might need its own coprocessors!

59

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Might need its own coprocessors!

Page 60: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Pitfall: Backing Up to TapePitfall: Backing Up to Tape

Magnetic tape used to have advantages Magnetic tape used to have advantages Removable, high capacity

Ad d d b di k h l Advantages eroded by disk technology developments

Makes better sense to replicate data E.g, RAID, remote mirroring

60

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

Page 61: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Fallacy: Disk SchedulingFallacy: Disk Scheduling

Best to let the OS schedule disk Best to let the OS schedule disk accesses

B t mode n d i e de l ith logi l blo k But modern drives deal with logical block addresses Map to physical track cylinder sector locations Map to physical track, cylinder, sector locations Also, blocks are cached by the drive

OS is unaware of physical locations OS is unaware of physical locations Reordering can reduce performance Depending on placement and caching

61

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

p g p g

Page 62: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Pitfall: Peak PerformancePitfall: Peak Performance Peak I/O rates are nearly impossible to

achieve Usually, some other system component

limits performance E.g., transfers to memory over a bus

Collision with DRAM refresh Arbitration contention with other bus masters

b k b d d h E.g., PCI bus: peak bandwidth ~133 MB/sec

In practice max 80MB/sec sustainable

62

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

In practice, max 80MB/sec sustainable

Page 63: Chapter 6hernande/Elc451/ch6.pdf · Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 8 ... Dependability

Concluding RemarksConcluding Remarks

I/O performance measures I/O performance measures Throughput, response time Dependability and cost also important

Buses used to connect CPU, memory,I/O controllers

P lli i t t DMA Polling, interrupts, DMA I/O benchmarks

TPC SPECSFS SPECWeb TPC, SPECSFS, SPECWeb RAID

Improves performance and dependability

63

Electrical & Computer EngineeringSchool of Engineering

THE COLLEGE OF NEW JERSEY

p p p y