62
Chap3. Secondary Storage and System Software

Chap3. Secondary Storage and System Software. Chapter Objectives Describe the organization of typical disk drives, including basic units of organization

Embed Size (px)

Citation preview

Chap3. Secondary Storage and System Software

Chapter Objectives

Describe the organization of typical disk drives, including basic units of organization and their

relationships

Identity and describes the factors affecting disk access time, and describe methods for

estimating access times and space requirements

Describe magnetic tapes, identify some tape applications, and investigate the implications of

block size on space requirements and transmission speeds

Identify fundamental differences between media and criteria that can be used to match the

right medium to an application

Describe in general terms the events that occur when data is transmitted between a program and a secondary storage device

Introduce concepts and techniques of buffer management

Illustrate many of the concepts introduced in the chapter, especially system software concepts, in the context of UNIX

Contents

3.1 Disks

3.2 Magnetic Tape

3.3 Disk versus Tape

3.4 Introduction to CD-ROM

3.5 Physical Organization of CD-ROM

3.6 CD-ROM Strengths and Weaknesses

3.7 Storage as a Hierarchy

3.8 A Journey of a Byte

3.9 Buffer Management

3.10 I/O in UNIX

Disks

Random Access Device

Disk device - DASD(Direct Access Storage Devices)

Magnetic disk

hard disk - most common

floppy disk - inexpensive, little capacity

magnetic tape - serial access

Iomega Zip (100 M byte), Jaz (1 G byte)

Nonmagnetic disk

optical discs - increasingly important for secondary storage

e.g. CD, DVD

Organization of Disks(1)

Components of Disk

Tracks

Sectors

the smallest addressable portion of a disk

Disk pack

a collection of a lot of platters

Cylinder (vertical collection of tracks)

– Arms are moving together

Seeking

r/w arm movement

the slowest part of reading data from disk

Organization of Disks(2)

Track Sector

Gaps

Surface of disk showing tracks and sectors

BoomRead/write heads

Spindle

Platters

Schematic illustration of disk drive

Organization of Disks(3)

Estimating Capacities(1)

Cylinder --> Track --> Sector --> Byte

Track capacity

= # of sectors per track * bytes per sector

Cylinder capacity

= # of tracks per cylinder * track capacity

Drive capacity

= # of cylinders * cylinder capacity

** Inner track, Outer track? Same capacity?

Estimating Capacities(2)

Ex) file with 50,000 fixed-length data records on 2.1G disk

# of bytes per sector = 512

# of sectors per track = 63

# of tracks per cylinder = 16

# of cylinders = 4092

How many cylinders does the file require if each data record requires 256 bytes?

50,000/2 = 25,000 sectors

63x16 = 1008 sectors in a cylinder

25,000/1008 = 24.8 cylinders

Organizing Tracks by Sector(1)

Two ways to organize data on a disk

by sector & by user-defined block

Physical placement of sectors

logical view and physical view

adequate way to view a file logically, but always not good way to store sectors

physically

interleaving

—problems occur due to the gap between disk revolution speed

and disk controller speed

—high performance disks (with high speed disk controller) now offer 1:1

interleaving

123

4

2122

23 323334

If interleaving factor is 3

..

..

.

Disk

• Interleaving

Organizing Tracks by Sector(2)

Organizing Tracks by Sector(3)

Clusters (for improving file access speed)

One file consists of one or more clusters

One cluster is a fixed number of contiguous sectors

Cluster is the smallest unit of space for a file

Contiguous sectors per cluster and Mapping table

No additional seek (arm movement) for one cluster

Very useful when access granularity is medium

large cluster vs. small cluster

FAT (File Allocation Table)

– contains a list of all the clusters in file

– each cluster entry --> physical location of the cluster

FAT(File Allocation Table)

- A table containing mappings to the physical locations of all the clusters in all files on disk storage

Cluster number

Clusterlocation

123...

File allocation table(FAT)

The part of the FAT pertaining to our file

.

.

.

2

1

Organizing Tracks by Sector(3)

Organizing Tracks by Sector(4)

Extents (further attempt for file access speed)

Contiguous clusters per extent and Mapping table

File may have several extents

Very useful when accessing the whole file sequentially

Many extents of a file increase amount of seeking

file extents extent

extent

extent

extent

extent extent

Organizing Tracks by Sector(5)

Fragmentation

the space that goes unused within a cluster, block, track, or other unit of physical

storage

internal fragmentation: loss of space within a sector or cluster

trade-offs in use of large cluster sizes

ex) size of sector is 512 bytes and size of all records in file is 300 bytes

– we may enforce one record / sector

– we may allow span sectors

Organizing Tracks by Block(1)

User-defined size of blocks can be fixed or variable

No sector spanning ==> No internal fragmentation

Blocking factor

the number of records that are to be stored in each block

In block-addressing, each block of data is accompanied by one or more subblocks containing

extra information about the data block

Note: the “block” has a different meaning in the context of the UNIX I/O system

Sector organization vs.Sector organization vs. Block organization Block organization

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 2 2 2

2 2 3 3 3 4

4 4 4 5 5 5

Sector 1 Sector 2 Sector 3 Sector 4 Sector 5

(a) sector organization

(b) block organization

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2

2 2 3 3 3 4

4 4 4 5 5 5

Organizing Tracks by Block(2)

Subblocks

count subblocks

– contain the num of bytes in the accompanying data block

key subblocks

– contain key for last record in data block

Countsubblock Count subblock

Key subblock

Datasubblock

Countsubblock

Datasubblock

Countsubblock

Keysubblock

Datasubblock

Countsubblock

Keysubblock

Datasubblock

Nondata Overhead(1)

Stored during preformatting

Sector-addressable disks (physical)

sector/track address, condition, gaps, and synchronization marks

of no concern to programmer

Block-organized disks (user-defined)

some of nondata for programmer

more nondata provided with blocks than with sectors

Nondata Overhead(2)

Ex) block-addressable disks

20,000 bytes/track

subblock, interblock gap equivalent to 300 bytes/block

If we want to store a file containing 100-byte records, how many records can be

stored per track if blocking factor is 10 or 60 ?

if blocking factor is 10, then 20000 / ( (10*100)+300) = 15 blocks ==> 150

records

if blocking factor is 60, 180 records(=3 blocks)

Cost of Disk Access

Disk Access Time Seek time + Rotational delay + Transfer time

Seek time time to move arm to locate cylinder more costly in multi-user environment average seek time for a particular file operation

Rotational delay time taken for disk to rotate correct sector under read/write

arm Transfer time (for actual data)

number of bytes transferrednumber of bytes on a track rotation time

Timing Computation

Seagate Cheetah 9 Gbyte drive

Minimum seek (track-to-track): 0.78msec

Maximum seek: 19 msec

Average seek: 8 msec

Rotation delay: 3 msec

Read one track: 6 msec

Read one cluster: 0.28 msec

Read 8704K size file

sequentially access the whole file: 1.7 sec

randomly access the whole file: 9.25 sec

Effect of Block Size on Performance (In UNIX)

UC Berkeley CSRG team

Many experiments in BSD UNIX System

Trade-off between block size, fragmentation, access time

Doubling block size to 1,024 bytes improves performance by more than a factor of 2

In 512-byte block size: the slow throughput & low internal fragmentation (6.9%)

In 4,096-byte block: the fastest throughput & internal fragmentation (45.6%)

==> Invent the cluster concept!

Disk as Bottleneck

Disk performance is increasing, but still slow!

Even high-performance network is faster than disk

(5 M/ sec HD is slower than 100M bps LAN)

Disk bound jobs

CPU and network must wait

Solution techniques

multiprogramming

striping - parallel I/Os

avoid accessing disk

– RAM disk, disk caches - locality

Magnetic Tape

A set of parallel tracks

9 tracks - parity bit

Frame

one-bit-wide slice of tape

Interblock gaps

permit stopping and starting

Nine-track tape

Track Frame

011010010

Gap Data block Gap

Estimating Tape Length(1)

There is an interblock gap for each data block

Space requirement s

s = n * ( b + g )

b is the physical length of a data block

g is the length of an interblock gap

n is the number of data blocks

Tape density

Tape speed

Size of interblock gap

Estimating Tape Length(2)

Ex) one million 100-byte records

6,250 BPI tape

0.3 inches of interblock gap

How much tape is needed?

when blocking factor is between 1 and 50

Nominal recording density

Effective recording density:

num of byte per block / num of inches for block

Estimating Data Transmission Times

Factors of data transmission rate

interblock gaps

effective recording density

nominal recording density

speed of r/w head

time to start/stop the tape

Disk VS. Tape

Disk Random access Immediate access Expensive seek in

sequential processing

Tape Sequential access Long-term storage No seek in sequential

processing

Decrease in cost of disk and RAM More RAM space is available in I/O buffers, so disk I/O decreases Tertiary storage for backup: CD-ROM, tape ...

Introduction to CD-ROM

CD-ROM: Compact Disc Read-Only Memory

Can hold over 600MB(200,000 pages)

Easy to replicate

Useful for publishing or distributing medium

But, not storing and retrieving data

CD-ROM is a child of CD audio

CD audio provides

High storage capacity

Moderate data transfer rate

But, against high seek performance

Poor seek performance

History of CD-ROM: Videodisc

Videodisc technology developed in late 1960's and early 1970's

The goal was to store movie

A number of methods for storing video signals

1. Use a needle to respond mechanically to grooves in a disc like a vinyl LP

record

2. Use optical storage

Many companies were fighting which approach should become a standard

VideodiskLaserVisionCD audioCD-ROM

History of CD-ROM: LaserVision

LaserVision

Emerged as the winner of standard battles about standard

Support CLV(Constant Linear Velocity) and CAV(Constant Angular Velocity) format

Fast seek performance by using CAV format

Data are stored in analog form

Why did they fail?

– Earlier disputes over the physical format of the video disc, many incompatible

encoding scheme and error correction techniques were used by many firms

– No standard format the market never grew

History of CD-ROM: CD-ROM

CD-ROM

Philips and Sony developed CD-ROM in 1984 in order to store music on a disc

Use a digital data format

The development of CD-ROM as a licensing system results in widely acceptance in the industry

Promised to provide a standard physical format

Any CD-ROM drive can read any sector which they want

Computer applications store data in a file not in terms of sector, thus, file system standard should

be needed

In late 1985, videodisc/digital data industry moved into CD-ROM industry

In early summer of 1986, an official standard for organizing files was worked out

Physical Organization of Master Disk

Master Disc

Formed by using the digital data, 0 or 1

Made of glass and coated that is changed by the laser beam

Two part of CD-ROM

– Pit

– The areas that is hit by the laser beam

– Scatter the light

– Land

– Smooth, unchanged areas between pits

– Reflect the light

laser beam

pit

land

light

Encoding Scheme of CD-ROM

Encoding scheme

The alternating pattern of high- and low-density reflected light is the signal

1 : transition from pit to land and back again

0s : the amount of time between transitions

Constraint

The limits of the resolution of the optical pickup (generic hardware limit), there must be at least two 0’s

between any pair of 1’s (no two adjacent 1s)

We cannot represent all bit patterns, thus, we need translation scheme

We need at least 14 bits to represent 8 bits under this constraint

EFM(eight to fourteen modulation) coding is used

110 10000100000000EFM000000012

Format of CD-ROM

CD audio chose CLV format instead of CAV format because

CD audio requires large storage space

CD audio is played from the beginning to the end sequentially

Format of CD-ROM

CLV(Constant Linear Velocity)

A single spiral pattern

Same amount of space for each sector

Capability for writing all of sectors at the maximum density

Rotational speed is slower in reading outer edge than in inner edge

Finding the correct speed though trial and error

Characteristics

– Poor seek performance

– No straightforward way to jump to a specified location

Constant Angular Velocity Disk

Magnetic disk usually uses CAV(Constant Angular Velocity)

Concentric tracks and pie-shaped sectors

Data density is higher in inner edge than in outer edge

Storage waste: total storage is less than a half of CLV

Spin the disc at the same speed for all positions

Easy to find a specific location on a disk good seek performance

CAVCLV

Addressing of CD-ROM

Addressing

Magnetic disk: cylinder/track/sector approach

CD-ROM: a sector-addressing scheme

Track density varies thus, each second of playing time on a CD is divided into 75 sectors

75 sectors/sec, 2 Kbytes/sector

At least one-hour of playing time

Maximum capacity can be calculated: 600 Mbytes

60 min * 60 sec/min * 75 sectors/sec = 270,000 sectors

We address a given sector by referring minutes, second, and sector of play

16:22:34 means 34th sector in the 22nd second in the 16th minutes of play

Fundamental Design of the CD disc

Initially designed for delivering digital audio information

Store audio data in digital form

Wave patterns should be converted into digital form

Measure of the height of the sound: 65,536 different gradation(16 bits)

Sampling rate: 44.1 kHz, because of 2 times of 20,000 Hz upto which people can listen

16 bits sample, 44,100 times per second, and two channel for stereo sound, we should store 176,400 bytes per seconds

Storage capacity of CD is 75 sectors per seconds, we have 2,352 bytes per sector

CD-ROM divides this raw sector as shown in the following figure

12 bytessynch

4 bytessector ID

2,048 bytesuser data

4 byteserror

detection

8 bytesnull

276 byteserror

correction

File Structure Problem of CD-ROM

The most important problem in using CD-ROM as a storage media

Strongness and weakness of CD-ROM

Strong aspects of CD-ROM

– Data transfer rate: 75 sectors/sec

– Storage capacity : over 600 Mbytes

– Inexpensive to duplicate and durable Weak aspects of CD-ROM

– Poor seek performance (weak random access)

– Magnetic disk: 30 msec, CD-ROM : 500 msec Comparison of access time of a large file from several media

– RAM: 20 sec, Disk: 58 days, CD-ROM: 2.5 years

We should have a good file structure avoiding seeks to an even greater extent that on magnetic disk

Storage as a Hierarchy

Primary storage

1/1,000,000,000 sec

semi-conductors: registers, RAM, RAM disk, and disk cache

Secondary storage

1/1,000 sec

magnetic disks(DASD), tape and mass storage(Serial)

Off-line storage (archival & back-up)

10 sec

removable magnetic disks, optical discs, tapes

A Journey of a Byte(1)

What happens when a program writes a byte to a file on a disk?

WRITE(TEXT, c, 1);

User Program ---> OS File manager -->

I/O buffer ---> I/O processor --->

Disk controller ---> Disk

A Journey of a Byte(2)

1 The program asks the OS to write the contents of the variable c to the next

available position in TEXT.

2 The OS passes the job on to the file manager

3 The file manager looks up TEXT in a table containing information about it, such

as whether the file is open and available for use, what types of access are

allowed, if any, and what physical file the logical name TEXT corresponds to.

4 The file manager searches a FAT for the physical location of the sector that is to

contain the byte.

5 The file manager makes sure that the last sector in the file has been stored in a

system I/O buffer in RAM, then deposits the ‘P’ into its proper position in the

buffer.

Logical layer

A Journey of a Byte(3)

6 The file manager gives instructions to the I/O processor about where the byte is

stored in RAM and where it needs to be sent on the disk

7 The I/O processor finds a time when the drive is available to receive the data and

puts the data in proper format for the disk. It may also buffer the data to send it

out in chunks of the proper size for the disk

8 The I/O processor sends the data to the disk controller.

9 The controller instructs the drive to move the r/w head to the proper track, waits

for the desired sector to come under the r/w head, then sends the byte to the

drive to be deposited, bit-by-bit, on the surface of the disk.

Physical layer

3.8 A Journey of a Byte

What is buffer?

Definition - the part of main memory available for storage of copies of disk blocks

Program buffers vs. System I/O buffers

Buffer manager

subsystem responsible for the allocation for blocks goal:

– minimize the number of disk access

– utilize the memory space effectively

Buffer Management

Buffer bottlenecks

What if program performs both input and output on one character at a time, and

only one I/O buffer is available?

At least two buffer - for input and output

I/O bound jobs: wait for I/O completion

Buffering Strategies for Performance

use more than one buffer

I/O system processes next block when CPU is processing current one

Buffering Strategies(1): Multiple buffering

Double buffering

swapping roles of two buffers after I/O finished

allows O/S operating on one buffer while the other buffer is being loaded or

emptied

any number of buffers can be used

Buffer pooling

takes from a pool of available buffers

decides which buffer to take from a buffer pool

take buffer by LRU

Double Buffering

Program data area

Program data area

To disk

To diskI/O buffer 1

I/O buffer 1

I/O buffer 2

I/O buffer 2

(a)

(b)

Buffering Strategies(2): Move & Locate mode

Move mode (using both system buffer & program buffer)

moving data from one place in RAM to another before they can be accessed

sometimes, unnecessary data moves

Locate mode (using system buffer only or program buffer only)

perform I/O directly between secondary storage and program buffer (program’s data

area)

system buffers handle all I/Os, but program uses locations through pointer variable

Move mode & Location modeMove mode & Location mode

system buffer

program’s data area

system buffer

disk

diskuser’s

programlocation(pointer)

Movemode

Locatemode

Buffering Strategies(3): Scatter/Gather IO

A file with many blocks and each block with header and data

Need to put the headers in one buffer and the data in a different buffer ==> may occur

complication

Scatter-input mode

a single READ can scatter data into a collection of buffers

Gather-output mode

a single WRITE can gather several buffers and output

Scatter Input & Gather OutputScatter Input & Gather Output

disk

disk

scatterinput

gatheroutput

read()

write()

buffer 1

buffer 2

buffer 1

buffer 3

buffer 2

3.9 Buffer Management

I/O in UNIX<Kernel I/O structure>

3.10 I/O in UNIX

I/O system

PROCESS

KERNEL

user programs libraries shell commands

system callinterface

block I/Osystem(normal files)

characterI/O system(terminals,printers, etc.)

networkI/O system(sockets)

block device drivers

disk disk...

character device drivers

consoles printers...

network interface drivers

...networks...

UNIX Kernel(1)

The central part of UNIX operating system

View all I/O as operating on a sequence of bytes

Make all operations below top layer independent of application’s logical view of file

Data structures related to unix files

file-descriptor table: owned by user process

open-file table: owned by kernel

index-node: a kind of FAT (one inode for each file in use)

index-node table: owned by kernel

3.10 I/O in UNIX

UNIX Kernel(2)

File descriptor table

associates each file descriptor to open file table

every process has its own file descriptor table

Open file table

entries for every open file

file structures: r/w mode, offset, reference count

3.10 I/O in UNIX

UNIX Kernel(3)UNIX Kernel(3)

file descriptor

file tableentry

o(keyboard)1(screen)2(error)3(normal)4(normal)5(normal)...

to open filetable

descriptor table

3.10 I/O in UNIX

UNIX Kernel(4)UNIX Kernel(4)

R/Wmode

# ofprocessesusing it

Offsetof nextaccess

ptr towriteroutine

inodetableentry......

write 1 100 ......to inode table

write() routinefor this typeof file

open file table

3.10 I/O in UNIX

UNIX Kernel(5)

Inode(Index node)

data structure used to describe a file

when a file is opened, a copy of inode is loaded into RAM for rapid access

has a list of disk blocks of the file

this list is UNIX counterpart to FAT

Device driver

I/O processor program invoked by kernel performing I/O for devices

3.10 I/O in UNIX

Inode StructureInode Structure

devicepermissions

owner’s useridfile size

block count

fileallocation

table

.

.

.

.

.

.

An inode

3.10 I/O in UNIX

Hard/Soft Link & File types

Linking file names to Linking file names to files

hard link: direct reference

– pointer from a directory to the inode of a file

– ln src-file target-file

soft(symbolic) link: pathnames

– link a file name to another file name

– ln -s src-file target-file

File typesFile types

normal file (governed by block IO system)

special file: stream, signal control of device (governed by character IO system)

sockets: endpoints of IPC (governed by network IO system)

3.10 I/O in UNIX

Let’s Review !!!

3.1 Disks

3.2 Magnetic Tape

3.3 Disk versus Tape

3.4 Introduction to CD-ROM

3.5 Physical Organization of CD-ROM

3.6 CD-ROM Strengths and Weaknesses

3.7 Storage as a Hierarchy

3.8 A Journey of a Byte

3.9 Buffer Management

3.10 I/O in UNIX