Chapter 11.2: File System Implementation. 11.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter

Chapter 11.2: File System Implementation Chapter 11.2: File System Implementation

11.2 Silberschatz, Galvin and Gagne ©2005Operating System Concepts

Chapter 11: File System ImplementationChapter 11: File System Implementation

Chapter 11.1 File-System Structure File-System Implementation Directory Implementation Allocation Methods

Chapter 11.2 Free-Space Management Recovery Log-Structured File Systems

Chapter 12 – An Introduction Overview of Mass Storage Disk Magnetic Tapes


Free-Space ManagementFree-Space Management We’ve talked about file space structure and file space

implementation.

Now: how free space is managed?

What space is available, how much, and where are the free blocks are located.

In other words, we need some kind of storage map.

Keeping track of free space requires a free-space list.

Free space list is a list of unallocated disk blocks

Free-space list: a list? Some other kind of structure?

Let’s consider a couple of alternative structures used to manage free disk.


Bit Vector Approach – used for space allocationBit Vector Approach – used for space allocation Bit vector (used to represent n blocks) is one approach. Each block is

represented by a singe bit: 1 block is available; 0 block is in use (allocated)

…

0 1 2 n-1

bit[i] = 0 block[i] free

1 block[i] occupied

We can see also how hardware features drives software functionality.

Downside: to be efficient in searching, the bit map must be kept in primary memory.

For small disks, there is not a problem, but for larger disks (say a 40GB disk with 1KB blocks) a bit map of 5MB is needed!!

A sample bit vector might appear as 0011110011111100110011 ….

This is a very simple approach but very efficient in finding the first free block.Too, a number of instruction sets contain instructions for bit manipulation


Linked List Approach used for AllocationLinked List Approach used for Allocation

This approach links all free space together via a linked list.

All we really need in memory (cached) is a pointer to the first free block.

All blocks then contain a pointer to the next free block

It is easy, however, to quickly see inefficiencies…

To allocate a lot of disk, we must read each block. This is an incredible amount of I/O and is very inefficient from a performance perspective!

If only a small amount of storage is necessary, as is usually the case, the linked list approach to disk allocation may be reasonably efficient.

Very often only a single block is requested.

FAT approach uses this approach – each entry in the FAT points to the next block.

With the linked list approach, we cannot get contiguous space easily


ImplementingImplementing the Bit-Mapped Approach the Bit-Mapped Approach Need to protect these structures very closely. :

We can implement the bit-mapped approach with a pointer to this free list

Bit map itself should be kept on disk

Must ensure that the copy in memory and disk do not differ.

Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk

Solution:

Set bit[i] = 1 in disk

Allocate block[i]

Set bit[i] = 1 in memory


Implementation of Implementation of Linked ListLinked List and Similar and Similar SchemesSchemes

First Block: Maintain pointer to first block in memory (preferably in cache) simple to program time-consuming to execute Each link points to next link

Grouping – Store addresses of the first n free blocks in the first free block, where the first n-1 blocks are actually free, but where the last block contains addresses of another n free blocks. This greatly improves performance over the standard linked list.

Counting – Know usually several contiguous blocks may be allocated / freed at the same time – particularly when space is allocated using a contiguous allocation scheme or via clustering. If this is the case, we can keep the address of the first free block and the number of

free contiguous blocks that follow the first block. As a result, the free space list contains a disk address and a count of blocks

‘starting’ at that spot – thus shortening the list considerably.


Linked Free Space List on DiskLinked Free Space List on Disk

Shows that address of first free block points to the disk area, where other blocks are linked from one to the next.


RecoveryRecovery

I cannot say enough about Recovery.

It is absolutely essential in any non-trivial computing environment.

It is easy, and it does indeed happen, that disks fail for any number of reasons – power losses, surges, back sectors on disk, dust, weather, and even malicious intent.

There is nothing more sacred in the world of computers than our data.

Programs can usually be reproduced, people replaced; but safeguarding our data and having it consistent is critical.

Backup and Recovery – two topics – are often not covered in detail in academic environments. But rest assured, in a production environment, these activities and procedures constitute a daily activity and involve planning and established procedures.

I cannot emphasize these topics enough!


Consistency Checking – RecoveryConsistency Checking – Recovery During operations, directory information is kept both in memory and

on disk – and it is usually more ‘current’ in memory.

If the directory is kept in cache, it is usually not written back to memory every time it is updated. This would negate the performance gains of cache.

Systems can and do crash at the absolute worst times. If / When so, files, directories, caches, buffers can easily be left in a very inconsistent state.

Operating systems such as Unix and MS-DOS provide systems programs to run consistency checks when needed.

These compare directory data with data blocks on disk and attempt to repair any inconsistencies these programs may find.


Consistency Checking – RecoveryConsistency Checking – Recovery

The degree of success in running these system-supplied programs is largely dependent upon the type of allocation (contiguous, linked, indexed) as well as free-space management routines used to allocate disk.

Sometimes broken links can be repaired; sometimes not.

Loss of directory entries in an indexed organization can be disastrous, because the blocks are linked from one to the next.

If link is broken, we have big trouble.

Interestingly, Unix caches directory entries for reads but any writes that cause any kind of space allocation are done synchronously.

This simply means that the allocation is successfully completed before the write takes place.

Can still have a problem if crash occurs during this process too. But we always try to minimize and localize any problems.


Backup and RestoreBackup and Restore

As mentioned in previous materials, all viable computing environments back up data from disk periodically to other media – either other disks, mag tapes, etc.

We can then restore from these backups if needed.

Often times directory information is used to developing the backup.

For files / directories not changed since last backup, backup not needed.

Schedule: depends on lots of things…

May do periodic full backups.

May do incremental backups – all files / directories changed usually overnight.

May simply copy to another medium all files changed since day n

Since the volume of data is often large, these may be blocked big time!

Often too, backed up to very low cost, high volume tapes.


Backup and RestoreBackup and Restore

We also have combinations of these:

Every so often a full backup.

More frequently, an incremental backup.

A restore can start restoring from the last full backup and then adding the incremental backups.


Log Structured File Systems - Log Structured File Systems - MotivationMotivation

Typical database systems use log-based recovery algorithms as part of their environment and operation.

We use the same techniques as part of our consistency checking approach for backup and recovery when needed.

Common operations such as creating a file involves many operations and changes to several key data structures associated with the file system.

Certainly the directory structure will be modified, file control blocks are allocated and descriptions are developed, data itself are modified and the corresponding data structures used to house free block counts must be decremented.

All of these things can be corrupted if a system crash occurs somewhere in this process.


Log-Structured File SystemsLog-Structured File Systems We have spoken about backup and recovery and special system

programs that are generally available to assist in re-establishing files and directories that are consistent.

It is important to note that in some cases, problems (inconsistencies) may not be recoverable.

Sometimes human intervention is required and the system may simply be unavailable until a recovery of some sort is established.

One solution is to incorporate a log-based recovery approach, which captures changes written sequentially to a log. Each set of operations for performing a specific task is called a

transaction. Once changes are written to the log, they are considered

‘committed, and the system call that writes these changes to the log may then return to the user process.

But the problem is that the file system itself may not be updated yet, as these updates to the file itself take place asynchronously.

Once a committed transaction is completed, it can be removed from the log file.

(It is recommended that the log be separated from the file system itself and perhaps on a different drive – in case the drive goes down.)


Restoring ConsistencyRestoring Consistency

If the file system crashes, all remaining transactions in the log must still be performed, if any are present.

Even though they were committed by the operating system, they may likely have not been effected in the file itself.

Difficulties arise if the transaction itself was aborted before the system crashed.

A partial ‘completion’ to the file system must be undone in order to arrive at a consistent data point.

End of Chapter 11.2End of Chapter 11.2but there’s more!but there’s more!

Chapter 12: Mass-Storage SystemsChapter 12: Mass-Storage Systems


Chapter 12: Mass-Storage SystemsChapter 12: Mass-Storage Systems

Overview of Mass Storage Structure

Disk Structure

Disk Attachment

Disk Scheduling

Disk Management

Swap-Space Management

RAID Structure

Disk Attachment

Stable-Storage Implementation

Tertiary Storage Devices

Operating System Issues

Performance Issues


Limited ObjectivesLimited Objectives

We can view a file system as possessing three components:

A user / programmer interface to the file system

The internal data structures and algorithms used by the operating system to implement this interface, and

The secondary and tertiary storage structures themselves

Here we will describe the physical structure of secondary and tertiary storage devices and the resulting effects on the uses of these devices


Overview of Mass Storage StructureOverview of Mass Storage Structure Magnetic disks provide bulk of secondary storage of modern computers

Drives rotate at 60 to 200 times per second Transfer rate is rate at which data flow between drive and computer Positioning time (random-access time) is time to move disk arm to

desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency)

Disk consists of a central spindle with platters attached. Data is recorded on the top and bottom surface of each platter except the

top surface of the top platter and the bottom surface of the bottom platter (for dust).

The read/write heads ‘float’ over the surface of the platters and all arms move with the arm assembly together in unison.

The set of tracks that are ‘carved out’ via each arm position forms a cylinder..

Each track may contain hundreds of sectors, depending on the size of the sectors.

See next slide.


Moving-head Disk MachanismMoving-head Disk Machanism

Discuss


Disk AccessDisk Access

The disk spins at a high speed – somewhere between 60 and 200 revolutions per second.

A disk read consists of three components1. Seek time – this is the movement of the arm to the correct cylinder

2. Head select - negligible

3. Rotational delay (latency) – generally, half the speed of rotation – until the desired sector / block moves under the read/write head.

4. Data transfer time (copying the data from the disk into the I/O controller unit.

Oftentimes, head select is not counted, because it is done electronically. But the head must be selected so that it is decided which head is going to read which surface!


Disk Head CrashesDisk Head Crashes The read/write heads float over a surface.

But one can experience a head crash results from disk head making contact with the disk surface

This can happen if power is abruptly pulled, although more modern devices store some power so that they can gracefully degrade…

Some disks are removable that this allows other disks (disk packs) to be mounted on the same disk drive.

Some are ‘permanent’ disks in an organization.

These are generally faster and have more capacity and are not application-dependent.

Floppy Disks – inexpensive, removable magnetic disks where the head sits directly on the disk surface.

Floppies rotate much more slowly and have much less capacity than hard disks.


More on DiskMore on Disk

Drive attached to computer via I/O bus Buses are the vehicle that support data transfer and are driven by special

processes called I/O Disk Controllers. A Host controller is located at the computer end of the bus. The host controller uses the bus to talk to disk controller built into drive or

storage array

The computer places a command into the host controller, typically using memory-mapped I/O ports, which then sends the command via messages to the disk controller.

The disk controller operates the disk drive hardware to carry out the command. Disk controllers usually have a built in cache to support

data transfer from the cache to the disk surface and data transfer from the cache to the host –

depending on whether we are reading or writing.


Last LookLast Look

One more bit of very interesting data: relative speeds of the disk in performing a read / write:

Seek time – nominally 20 msec

Rotational delay (latency) 8 msec

Data transfer 0.2 seconds.

Overall access: 28.2 msec.

Easy to see the emphasis on reducing seek time when allocating disk space!

This is why many allocation schemes allocate in what is called “cylinder mode,” so that successive surfaces lie in the same cylinder thus negating the need to do a seek – substituting only a head select in its place!!


Overview of Mass Storage Structure (Cont.)Overview of Mass Storage Structure (Cont.)

Magnetic tape

Was early secondary-storage medium

Relatively permanent and holds large quantities of data

Access time slow, but again, can store huge quantities of data.

Random access ~1000 times slower than disk

Mainly used for backup, storage of infrequently-used data, transfer medium between systems

Please note that in years past, these constituted a primary storage medium for files – as long as they were sequential!!

Kept in spool and wound or rewound past read-write head

Once data under head, transfer rates comparable to disk

20-200GB typical storage


Well…Well…

This is about all the time we have for this course!

Take care, and I hope you have learned a lot.

It was my pleasure to work with you!

End of CSCI 340End of CSCI 340

Documents

Chapter 11.2: File System Implementation. 11.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter