Operating Systems Slides 7 - File Systems

Preview:

DESCRIPTION

handouts version: http://cs2.swfu.edu.cn/~wx672/lecture_notes/os/slides/fs-a.pdf

Citation preview

File Systems

Wang Xiaolin

June 19, 2013

) wx672ster+os@gmail.com

1 / 78

Long-term Information Storage Requirements

▶ Must store large amounts of data▶ Information stored must survive the termination of the

process using it▶ Multiple processes must be able to access the

information concurrently

2 / 78

File-System Structure

.File-system design addressing two problems:..

......

1. defining how the FS should look to the user▶ defining a file and its attributes▶ the operations allowed on a file▶ directory structure

2. creating algorithms and data structures to map thelogical FS onto the physical disk

3 / 78

File-System — A Layered Design

APPs⇓

Logical FS⇓

File-org module⇓

Basic FS⇓

I/O ctrl⇓

Devices

▶ logical file system — managesmetadata information

- maintains all of the file-systemstructure (directory structure, FCB)

- responsible for protection andsecurity

▶ file-organization module- logical block

addresstranslate−−−−−→ physical block

address- keeps track of free blocks

▶ basic file system issues genericcommands to device driver, e.g

- “read drive 1, cylinder 72, track 2,sector 10”

▶ I/O Control — device drivers, and INThandlers

- device driver:high-levelcommands

translate−−−−−→ hardware-specificinstructions

4 / 78

The Operating Structure

APPs⇓

Logical FS⇓

File-org module⇓

Basic FS⇓

I/O ctrl⇓

Devices

.Example — To create a file..

......

1. APP calls creat()2. Logical FS

2.1 allocates a new FCB2.2 updates the in-mem dir structure2.3 writes it back to disk2.4 calls the file-org module

3. file-organization module3.1 maps the directory I/O into disk-block

numbers3.2 allocates blocks for storing the file’s

data

.Benefit of layered design..

......The I/O control and sometimes the basic file system codecan be used by multiple file systems.

5 / 78

File— A Logical View Of Information Storage

.User’s view..

......

A file is the smallest storage unit on disk.▶ Data cannot be written to disk unless they are within a file

.UNIX view..

......

Each file is a sequence of 8-bit bytes▶ It’s up to the application program to interpret this byte

stream.

6 / 78

File— What Is Stored In A File?

Source code, object files, executable files, shell scripts,PostScript....Different type of files have different structure..

......

▶ UNIX looks at contents to determine typeShell scripts start with “#!”

PDF start with “%PDF...”Executables start with magic number

▶ Windows uses file naming conventionsexecutables end with “.exe” and “.com”

MS-Word end with “.doc”MS-Excel end with “.xls”

7 / 78

File Naming

.Vary from system to system..

......

▶ Name length?▶ Characters? Digits? Special characters?▶ Extension?▶ Case sensitive?

8 / 78

File Types

Regular files: ASCII, binaryDirectories: Maintaining the structure of the FS

.In UNIX, everything is a file..

......

Character special files: I/O related, such as terminals,printers ...

Block special files: Devices that can contain file systems,i.e. disks

disks — logically, linear collections ofblocks; disk driver translates theminto physical block addresses

9 / 78

.Binary files..

......(a) (b)

Header

Header

Header

Magic number

Text size

Data size

BSS size

Symbol table size

Entry point

Flags

Text

Data

Relocationbits

Symboltable

Objectmodule

Objectmodule

Objectmodule

Modulename

Date

Owner

Protection

Size

���H

eade

r

Fig. 6-3. (a) An executable file. (b) An archive.

An UNIX executable file An UNIX archive

10 / 78

File Attributes — Metadata

▶ Name only information kept in human-readable form▶ Identifier unique tag (number) identifies file within file

system▶ Type needed for systems that support different types▶ Location pointer to file location on device▶ Size current file size▶ Protection controls who can do reading, writing,

executing▶ Time, date, and user identification data for protection,

security, and usage monitoring

11 / 78

File OperationsPOSIX file system calls

1. fd = creat(name, mode)2. fd = open(name, flags)3. status = close(fd)4. byte_count = read(fd, buffer, byte_count)5. byte_count = write(fd, buffer, byte_count)6. offset = lseek(fd, offset, whence)7. status = link(oldname, newname)8. status = unlink(name)9. status = truncate(name, size)

10. status = ftruncate(fd, size)11. status = stat(name, buffer)12. status = fstat(fd, buffer)13. status = utimes(name, times)14. status = chown(name, owner, group)15. status = fchown(fd, owner, group)16. status = chmod(name, mode)17. status = fchmod(fd, mode)

12 / 78

.An Example Program Using File System Calls..

......

/* File copy program. Error checking and reporting is minimal. */

#include <sys/types.h> /* include necessary header files */#include <fcntl.h>#include <stdlib.h>#include <unistd.h>

int main(int argc, char *argv[]); /* ANSI prototype */

#define BUF3SIZE 4096 /* use a buffer size of 4096 bytes */#define OUTPUT3MODE 0700 /* protection bits for output file */

int main(int argc, char *argv[]){

int in3 fd, out3 fd, rd3count, wt3count;char buffer[BUF3SIZE];

if (argc != 3) exit(1); /* syntax error if argc is not 3 */

/* Open the input file and create the output file */in3fd = open(argv[1], O3RDONLY); /* open the source file */if (in3 fd < 0) exit(2); /* if it cannot be opened, exit */out3 fd = creat(argv[2], OUTPUT3MODE); /* create the destination file */if (out3fd < 0) exit(3); /* if it cannot be created, exit */

/* Copy loop */while (TRUE) {

rd3count = read(in3 fd, buffer, BUF3SIZE); /* read a block of data */if (rd3count <= 0) break; /* if end of file or error, exit loop */

wt3count = write(out3fd, buffer, rd3count); /* write data */if (wt3count <= 0) exit(4); /* wt3count <= 0 is an error */

}

/* Close the files */close(in3fd);close(out3 fd);if (rd3count == 0) /* no error on last read */

exit(0);else

exit(5); /* error on last read */}

Fig. 6-5. A simple program to copy a file.13 / 78

open()

.fd open(pathname, flags)..

......

A per-process open-file table is kept in the OS▶ upon a successful open() syscall, a new entry is added into

this table▶ indexed by file descriptor (fd)

To see files opened by a process, e.g. init∼$ lsof -p 1

.Why open() is needed?..

......

To avoid constant searching▶ Without open(), every file operation involves searching

the directory for the file.

14 / 78

Directories— Single-Level Directory Systems

All files are contained in the same directory.

......

Root directory

A A B C

Fig. 6-7. A single-level directory system containing four files,owned by three different people, A, B, and C.

- contains 4 files- owned by 3 different

people, A, B, and C

.Limitations..

......

- name collision- file searching

Often used on simple embedded devices, such as telephone,digital cameras...

15 / 78

Directories— Two-level Directory Systems

.A separate directory for each user..

......

Files

Userdirectory

A A

A B

B

C

CC C

Root directory

Fig. 6-8. A two-level directory system. The letters indicate theowners of the directories and files.

Limitation: hard to access others files

16 / 78

Directories— Hierarchical Directory Systems

Userdirectory

User subdirectoriesC C

C

C C

C

B

B

A

A

B

B

C C

C

B

Root directory

User file

Fig. 6-9. A hierarchical directory system. 17 / 78

Directories— Path Names

ROOT

bin boot dev e t c h o m e var

grub p a s s w d staff s t u d mail

w x 6 7 2 2 0 0 8 1 1 5 2 0 0 1

dir

file

2 0 0 8 1 1 5 2 0 0 1

18 / 78

Directories— Directory Operations

Create Delete Rename LinkOpendir Closedir Readdir Unlink

19 / 78

File System Implementation.A typical file system layout..

......

|<---------------------- Entire disk ------------------------>|

+-----+-------------+-------------+-------------+-------------+

| MBR | Partition 1 | Partition 2 | Partition 3 | Partition 4 |

+-----+-------------+-------------+-------------+-------------+

_______________________________/ \____________

/ \

+---------------+-----------------+--------------------+---//--+

| Boot Ctrl Blk | Volume Ctrl Blk | Dir Structure | Files |

| (MBR copy) | (Super Blk) | (inodes, root dir) | dirs |

+---------------+-----------------+--------------------+---//--+

|<-------------Master Boot Record (512 Bytes)------------>|

0 439 443 445 509 511

+----//-----+----------+------+------//---------+---------+

| code area | disk-sig | null | partition table | MBR-sig |

| 440 | 4 | 2 | 16x4=64 | 0xAA55 |

+----//-----+----------+------+------//---------+---------+

20 / 78

On-Disk Information Structure

Boot control block a MBR copyUFS: boot block

NTFS: partition boot sectorVolume control block Contains volume details

number of blocks size of blocksfree-block count free-block pointersfree FCB count free FCB pointers

UFS: superblockNTFS: Master File Table

Directory structure Organizes the files FCBFile controlblock (FCB) contains file details (metadata).

UFS: i-nodeNTFS: stored in MFT using a relatiional database

structure, with one row per file

21 / 78

Each File-System Has a Superblock

Superblock keeps information about the file system:▶ Type — ext2, ext3, ext4...▶ Size▶ Status — how it’s mounted, free blocks, free inodes, ...▶ Information about other metadata structures

∼# dumpe2fs /dev/sda1 | grep -i superblock

22 / 78

Implementing FilesContiguous Allocation

572 CHAPTER 12 / FILE MANAGEMENT

access, degree of multiprogramming, other performance factors in the system,disk caching, disk scheduling, and so on.

File Allocation Methods Having looked at the issues of preallocation versusdynamic allocation and portion size, we are in a position to consider specific file al-location methods. Three methods are in common use: contiguous, chained, and in-dexed. Table 12.3 summarizes some of the characteristics of each method.

With contiguous allocation, a single contiguous set of blocks is allocated to afile at the time of file creation (Figure 12.7). Thus, this is a preallocation strategy,using variable-size portions. The file allocation table needs just a single entry foreach file, showing the starting block and the length of the file. Contiguous allocationis the best from the point of view of the individual sequential file. Multiple blockscan be read in at a time to improve I/O performance for sequential processing. It isalso easy to retrieve a single block. For example, if a file starts at block b, and the ithblock of the file is wanted, its location on secondary storage is simply b $ i % 1. Con-tiguous allocation presents some problems. External fragmentation will occur, mak-ing it difficult to find contiguous blocks of space of sufficient length. From time totime, it will be necessary to perform a compaction algorithm to free up additional

Table 12.3 File Allocation Methods

Contiguous Chained Indexed

Preallocation? Necessary Possible Possible

Fixed or variable size portions? Variable Fixed blocks Fixed blocks Variable

Portion size Large Small Small Medium

Allocation frequency Once Low to high High Low

Time to allocate Medium Long Short Medium

File allocation table size One entry One entry Large Medium

0 1 2 3 4

5 6 7

File A

File Allocation Table

File B

File C

File E

File D

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File Name

File AFile BFile CFile DFile E

29183026

35823

Start Block Length

Figure 12.7 Contiguous File Allocation

M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 572

- simple;- good for read only;

- fragmentation

23 / 78

Linked List (Chained) Allocation A pointer in each diskblock

12.6 / SECONDARY STORAGE MANAGEMENT 573

space on the disk (Figure 12.8).Also, with preallocation, it is necessary to declare thesize of the file at the time of creation, with the problems mentioned earlier.

At the opposite extreme from contiguous allocation is chained allocation(Figure 12.9). Typically, allocation is on an individual block basis. Each block con-tains a pointer to the next block in the chain. Again, the file allocation table needsjust a single entry for each file, showing the starting block and the length of the file.Although preallocation is possible, it is more common simply to allocate blocks asneeded. The selection of blocks is now a simple matter: any free block can be addedto a chain. There is no external fragmentation to worry about because only one

Figure 12.9 Chained Allocation

0 1 2 3 4

5 6 7

File A

File Allocation Table

File B

File C

File E File D

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File Name

File AFile BFile CFile DFile E

0381916

35823

Start Block Length

Figure 12.8 Contiguous File Allocation (After Compaction)

0 1 2 3 4

5 6 7

File Allocation Table

File B

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File B

File Name Start Block Length

1 5

M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 573

- no wasteblock;

- slow randomaccess;

- not 2n

24 / 78

Linked List (Chained) Allocation Though there is noexternal fragmentation, consolidation is stillpreferred.574 CHAPTER 12 / FILE MANAGEMENT

block at a time is needed.This type of physical organization is best suited to sequen-tial files that are to be processed sequentially. To select an individual block of a filerequires tracing through the chain to the desired block.

One consequence of chaining, as described so far, is that there is no accommo-dation of the principle of locality. Thus, if it is necessary to bring in several blocks ofa file at a time, as in sequential processing, then a series of accesses to different partsof the disk are required. This is perhaps a more significant effect on a single-usersystem but may also be of concern on a shared system. To overcome this problem,some systems periodically consolidate files (Figure 12.10).

Indexed allocation addresses many of the problems of contiguous and chainedallocation. In this case, the file allocation table contains a separate one-level index foreach file; the index has one entry for each portion allocated to the file. Typically, thefile indexes are not physically stored as part of the file allocation table. Rather, thefile index for a file is kept in a separate block, and the entry for the file in the file al-location table points to that block.Allocation may be on the basis of either fixed-sizeblocks (Figure 12.11) or variable-size portions (Figure 12.12). Allocation by blockseliminates external fragmentation, whereas allocation by variable-size portions im-proves locality. In either case, file consolidation may be done from time to time. Fileconsolidation reduces the size of the index in the case of variable-size portions, butnot in the case of block allocation. Indexed allocation supports both sequential anddirect access to the file and thus is the most popular form of file allocation.

Free Space Management

Just as the space that is allocated to files must be managed, so the space that is notcurrently allocated to any file must be managed. To perform any of the file alloca-tion techniques described previously, it is necessary to know what blocks on the diskare available. Thus we need a disk allocation table in addition to a file allocationtable. We discuss here a number of techniques that have been implemented.

0 1 2 3 4

5 6 7

File Allocation Table

File B

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File B

File Name Start Block Length

0 5

Figure 12.10 Chained Allocation (After Consolidation)

M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 574

25 / 78

FAT: Linked list allocation with a table in RAM.

......

▶ Taking the pointer out of eachdisk block, and putting it into atable in memory

▶ fast random access (chain is inRAM)

▶ is 2n

▶ the entire table must be in RAM

disk↗⇒ FAT↗⇒ RAMused ↗

Physicalblock

File A starts here

File B starts here

Unused block

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

10

11

7

3

2

12

14

-1

-1

Fig. 6-14. Linked list allocation using a file allocation table inmain memory.

26 / 78

Indexed Allocation 12.6 / SECONDARY STORAGE MANAGEMENT 575

Bit Tables This method uses a vector containing one bit for each block on thedisk. Each entry of a 0 corresponds to a free block, and each 1 corresponds to ablock in use. For example, for the disk layout of Figure 12.7, a vector of length 35 isneeded and would have the following value:

00111000011111000011111111111011000

A bit table has the advantage that it is relatively easy to find one or a con-tiguous group of free blocks. Thus, a bit table works well with any of the file allo-cation methods outlined. Another advantage is that it is as small as possible.

Figure 12.11 Indexed Allocation with Block Portions

0 1 2 3 4

5 6 7

File Allocation Table

File B

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

File B

File Name Index Block

24

183

1428

0 1 2 3 4

5 6 7

File B

8 9

10 11 12 13 14

15 16 17 18 19

20 21 22 23 24

25 26 27 28 29

30 31 32 33 34

Start Block

12814

341

Length

File Allocation Table

File B

File Name Index Block

24

Figure 12.12 Indexed Allocation with Variable-Length Portions

M12_STAL6329_06_SE_C12.QXD 2/21/08 9:40 PM Page 575

▶ i-node: a data structure for each file▶ an i-node is in memory only if the file is open

filesopened ↗ ⇒ RAMused ↗

27 / 78

I-node — FCB in UNIX

Directory inode (128B)

Type Mode

User ID Group ID

File size # blocks

# links Flags

Timestamps (×3)

Triple indirect

Double indirect

Single indirect

Direct blocks (×12)

.

..

passwd

fstab

… …

Directory block

File inode (128B)

Type Mode

User ID Group ID

File size # blocks

# links Flags

Timestamps (×3)

Triple indirect

Double indirect

Single indirect

Direct blocks (×12)

Indirect block

inode #

inode #

inode #

inode #

Direct blocks (×512)

Block #s ofmoredirectoryblocks

Block # ofblock with512 singleindirectentries

Block # ofblock with512 doubleindirectentries

File data block

Data

File type Description0 Unknown1 Regular file2 Directory3 Character device4 Block device5 Named pipe6 Socket7 Symbolic link

Mode: 9-bit pattern

28 / 78

Inode QuizGiven: block size is 1KB

pointer size is 4B Addressing: byte offset 9000byte offset 350,000

+----------------+

0 | 4096 |

+----------------+ ---->+----------+ Byte 9000 in a file

1 | 228 | / | 367 | |

+----------------+ / | Data blk | v

2 | 45423 | / +----------+ 8th blk, 808th byte

+----------------+ /

3 | 0 | / -->+------+

+----------------+ / / 0| |

4 | 0 | / / +------+

+----------------+ / / : : :

5 | 11111 | / / +------+ Byte 350,000

+----------------+ / ->+-----+/ 75| 3333 | in a file

6 | 0 | / / 0| 331 | +------+\ |

+----------------+ / / +-----+ : : : \ v

7 | 101 | / / | | +------+ \ 816th byte

+----------------+/ / | : | 255| | \-->+----------+

8 | 367 | / | : | +------+ | 3333 |

+----------------+ / | : | 331 | Data blk |

9 | 0 | / | | Single indirect +----------+

+----------------+ / +-----+

S | 428 (10K+256K) | / 255| |

+----------------+/ +-----+

D | 9156 | 9156 /***********************

+----------------+ Double indirect What about the ZEROs?

T | 824 | ***********************/

+----------------+

29 / 78

UNIX In-Core Data Structure

▶ mount table — info about each mounted FS▶ directory-structure cache holds the dir-info of recently

accessed dirs▶ inode table — an in-core version of the on-disk inode

table▶ file table

▶ global▶ keeps inode of each open file▶ keeps track of

▶ how many processes are associated with each open file▶ where the next read and write will start▶ access rights

▶ user file descriptor table▶ per process▶ identifies all open files for a process

30 / 78

UNIX In-Core Data Structure

.open()/creat()

..

......

1. add entry in each table2. returns a file descriptor — an index into the user file

descriptor table

31 / 78

A File Is Opened By Multiple Processes?

.Two levels of internal tables in the OS..

......

A per-process table tracks all files that a process has open.Stores

▶ the current-file-position pointer (not really)▶ access rights▶ more...

a.k.a file descriptor tableA system-wide table keeps process-independent

information, such as▶ the location of the file on disk▶ access dates▶ file size▶ file open count — the number of processes

opening this file

32 / 78

Per-process FDT

Process 1

+------------------+ System-wide

| ... | open-file table

+------------------+ +------------------+

| Position pointer | | ... |

| Access rights | +------------------+

| ... |\ | ... |

+------------------+ \ +------------------+

| ... | --------->| Location on disk |

+------------------+ | R/W |

| Access dates |

Process 2 | File size |

+------------------+ | Pointer to inode |

| Position pointer | | File-open count |

| Access rights |----------->| ... |

| ... | +------------------+

+------------------+ | ... |

| ... | +------------------+

+------------------+

33 / 78

.A process executes the following code:..

......

fd1 = open(”/etc/passwd”, O_RDONLY);fd2 = open(”local”, O_RDWR);fd3 = open(”/etc/passwd”, O_WRONLY);

user FDT file table inode table

+--------+ +-----------+ +---------------+

0| STDIN | | : | | : |

+--------+ +-----------+ | : |

1| STDOUT | | count R | | : |

+--------+ -->| 1 |\ +---------------+

2| STDERR | / +-----------+ ‘---->| (/etc/passwd) |

+--------+/ | : | ,-->| count 2 |

3| | +-----------+ | +---------------+

+--------+ | count RW | | | : |

4| |---->| 1 |\ / +---------------+

+--------+ +-----------+ \/ | (local) |

5| | | : | /\--->| count 1 |

+--------+\ +-----------+/ +---------------+

: : : \ | count W | | : |

+--------+ -->| 1 | +---------------+

+-----------+

34 / 78

.One more process B:..

......

fd1 = open(”/etc/passwd”, O_RDONLY);fd2 = open(”private”, O_RDONLY);

user FDT

proc A file table

+--------+ +-----------+ inode table

0| STDIN | | : | +---------------+

+--------+ +-----------+ | : |

1| STDOUT | | count R | | : |

+--------+ ------>| 1 |\ +---------------+

2| STDERR | / +-----------+ \--------->| (/etc/passwd) |

+--------+/ | : | ----->| count 3 |

3| | +-----------+ / ->| |

+--------+ | count RW | / / +---------------+

4| |-------->| 1 |\ / / | : |

+--------+ +-----------+ \/ / | : |

5| | | : | /\ / +---------------+

+--------+\ +-----------+/ -------->| (local) |

: : : \ ---->| count R | / | count 1 |

+--------+ \/ | 1 | / +---------------+

/\ +-----------+ / | : |

proc B | \ | : | / | : |

+--------+ | \ +-----------+/ +---------------+

0| STDIN | | -->| count W | ------->| (private) |

+--------+ | | 1 | / | count 1 |

1| STDOUT | | +-----------+ / +---------------+

+--------+ | | : | / | : |

2| STDERR | / +-----------+/ | : |

+--------+/ | count R | +---------------+

3| | .------>| 1 |

+--------+/ +-----------+

4| |

+--------+

: : :

+--------+

35 / 78

Why File Table?To allow a parent and child to share a file position, but toprovide unrelated processes with their own values.

Mode

i-node

Link count

Uid

Gid

File size

Times

Addresses offirst 10

disk blocks

Single indirect

Double indirect

Triple indirect

Parent’sfile

descriptortable

Child’sfile

descriptortable

Unrelatedprocess

filedescriptor

table

Open filedescription

File positionR/W

Pointer to i-node

File positionR/W

Pointer to i-node

Pointers todisk blocks

Tripleindirectblock Double

indirectblock Single

indirectblock

Fig. 10-33. The relation between the file descriptor table, the openfile description table, and the i-node table.

36 / 78

Why File Table?

.Where To Put File Position Info?..

......

Inode table? No. Multiple processes can open the same file.Each one has its own file position.

User file descriptor table? No. Trouble in file sharing.

.Example..

......

#!/bin/bash

echo hello

echo world

Where should the “world” be?

∼$ ./hello.sh > A

37 / 78

Implementing Directories

(a)

games

mail

news

work

attributes

attributes

attributes

attributes

Data structurecontaining theattributes

(b)

games

mail

news

work

Fig. 6-16. (a) A simple directory containing fixed-size entries withthe disk addresses and attributes in the directory entry. (b) A direc-tory in which each entry just refers to an i-node.

(a) A simple directory (Windows)▶ fixed size entries▶ disk addresses and attributes in directory entry

(b) Directory in which each entry just refers to an i-node(UNIX)

38 / 78

How Long A File Name Can Be?

File 1 entry length

File 1 attributes

Pointer to file 1's name

File 1 attributes

Pointer to file 2's name

File 2 attributes

Pointer to file 3's nameFile 2 entry length

File 2 attributes

File 3 entry length

File 3 attributes

p

e

b

e

r

c

u

t

o

t

d

j

-

g

p

e

b

e

r

c

u

t

o

t

d

j

-

g

p

e r s o

n n e l

f o o

p

o

l

e

n

r

n

f o o

s

e

Entry

for one

file

Heap

Entry

for one

file

(a) (b)

File 3 attributes

39 / 78

UNIX Treats a Directory as a File

Directory inode (128B)

Type Mode

User ID Group ID

File size # blocks

# links Flags

Timestamps (×3)

Triple indirect

Double indirect

Single indirect

Direct blocks (×12)

.

..

passwd

fstab

… …

Directory block

File inode (128B)

Type Mode

User ID Group ID

File size # blocks

# links Flags

Timestamps (×3)

Triple indirect

Double indirect

Single indirect

Direct blocks (×12)

Indirect block

inode #

inode #

inode #

inode #

Direct blocks (×512)

Block #s ofmoredirectoryblocks

Block # ofblock with512 singleindirectentries

Block # ofblock with512 doubleindirectentries

File data block

Data

.Example..

......

. 2

.. 2bin 11116545boot 2cdrom 12dev 3...

...

40 / 78

.The steps in looking up /usr/ast/mbox..

......

Root directoryI-node 6 is for /usr

Block 132 is /usr

directory

I-node 26 is for

/usr/ast

Block 406 is /usr/ast directory

Looking up usr yields i-node 6

I-node 6 says that /usr is in

block 132

/usr/ast is i-node

26

/usr/ast/mbox is i-node

60

I-node 26 says that

/usr/ast is in block 406

1

1

4

7

14

9

6

8

.

..

bin

dev

lib

etc

usr

tmp

6

1

19

30

51

26

45

dick

erik

jim

ast

bal

26

6

64

92

60

81

17

grants

books

mbox

minix

src

Mode size

times

132

Mode size

times

406

41 / 78

File Sharing— Multiple Users

User IDs identify users, allowing permissions andprotections to be per-user

Group IDs allow users to be in groups, permitting groupaccess rights

.Example: 9-bit pattern..

......

owner access 7⇒ rwx1 1 1

group access 5⇒ r−x1 0 1

public access 0⇒ −−−0 0 0

42 / 78

File Sharing— Remote File Systems

Uses networking to allow file system access betweensystems

▶ Manually via programs like FTP▶ Automatically, seamlessly using distributed file systems▶ Semi automatically, via the world wide web

Client-server model allows clients to mount remote filesystems from servers

▶ NFS — standard UNIX client-server file sharing protocol▶ CIFS — standard Windows protocol▶ Standard system calls are translated into remote calls

Distributed Information Systems (distributed namingservices)

▶ such as LDAP, DNS, NIS, Active Directory implementunified access to information needed for remotecomputing

43 / 78

File Sharing— Protection

▶ File owner/creator should be able to control:▶ what can be done▶ by whom

▶ Types of access▶ Read▶ Write▶ Execute▶ Append▶ Delete▶ List

44 / 78

Shared Files— Hard Links vs. Soft Links

Root directory

B

B B C

C C

CA

B C

B

? C C C

A

Shared file

Fig. 6-18. File system containing a shared file. 45 / 78

.Hard Links..

......

Hard links + the same inode

46 / 78

.Drawback..

......

C's directory B's directory B's directoryC's directory

Owner = C Count = 1

Owner = C Count = 2

Owner = C Count = 1

(a) (b) (c)

47 / 78

.Symbolic Links..

......

A symbolic link has its own inode + a directory entry.

48 / 78

Disk Space Management— Statistics

49 / 78

▶ Block size is chosen while creating the FS▶ Disk I/O performance is conflict with space utilization

▶ smaller block size ⇒ better space utilization▶ larger block size ⇒ better disk I/O performance

∼$ dumpe2fs /dev/sda1 | grep ”Block size”

50 / 78

Keeping Track of Free Blocks

1. Linked List10.5 Free-Space Management 443

0 1 2 3

4 5 7

8 9 10 11

12 13 14

16 17 18 19

20 21 22 23

24 25 26 27

28 29 30 31

15

6

free-space list head

Figure 10.10 Linked free-space list on disk.

of a large number of free blocks can now be found quickly, unlike the situationwhen the standard linked-list approach is used.

10.5.4 Counting

Another approach takes advantage of the fact that, generally, several contigu-ous blocks may be allocated or freed simultaneously, particularly when space isallocated with the contiguous-allocation algorithm or through clustering. Thus,rather than keeping a list of n free disk addresses, we can keep the address ofthe first free block and the number (n) of free contiguous blocks that follow thefirst block. Each entry in the free-space list then consists of a disk address anda count. Although each entry requires more space than would a simple diskaddress, the overall list is shorter, as long as the count is generally greater than1. Note that this method of tracking free space is similar to the extent methodof allocating blocks. These entries can be stored in a B-tree, rather than a linkedlist, for efficient lookup, insertion, and deletion.

10.5.5 Space Maps

Sun’s ZFS file system was designed to encompass huge numbers of files,directories, and even file systems (in ZFS, we can create file-system hierarchies).The resulting data structures could have been large and inefficient if they hadnot been designed and implemented properly. On these scales, metadata I/Ocan have a large performance impact. Consider, for example, that if the free-space list is implemented as a bit map, bit maps must be modified both whenblocks are allocated and when they are freed. Freeing 1 GB of data on a 1-TBdisk could cause thousands of blocks of bit maps to be updated, because thosedata blocks could be scattered over the entire disk.

2. Bit map (n blocks)

0 1 2 3 4 5 6 7 8 .. n-1

+-+-+-+-+-+-+-+-+-+-//-+-+

|0|0|1|0|1|1|1|0|1| .. |0|

+-+-+-+-+-+-+-+-+-+-//-+-+

bit[i] ={0⇒ block[i] is free1⇒ block[i] is occupied

51 / 78

Journaling File Systems

.Operations required to remove a file in UNIX:..

......

1. Remove the file from its directory- set inode number to 0

2. Release the i-node to the pool of free i-nodes- clear the bit in inode bitmap

3. Return all the disk blocks to the pool of free disk blocks- clear the bits in block bitmap

What if crash occurs between 1 and 2, or between 2 and 3?

52 / 78

Journaling File Systems

.Keep a log of what the file system is going to dobefore it does it..

......

▶ so that if the system crashes before it can do its plannedwork, upon rebooting the system can look in the log tosee what was going on at the time of the crash andfinish the job.

▶ NTFS, EXT3, and ReiserFS use journaling among others

53 / 78

Ext2 File System

.Physical Layout..

......

+------------+---------------+---------------+--//--+---------------+

| Boot Block | Block Group 0 | Block Group 1 | | Block Group n |

+------------+---------------+---------------+--//--+---------------+

__________________________/ \_____________

/ \

+-------+-------------+------------+--------+-------+--------+

| Super | Group | Data Block | inode | inode | Data |

| Block | Descriptors | Bitmap | Bitmap | Table | Blocks |

+-------+-------------+------------+--------+-------+--------+

1 blk n blks 1 blk 1 blk n blks n blks

54 / 78

Ext2 Block groups

.The partition is divided into Block Groups..

......

▶ Block groups are same size — easy locating▶ Kernel tries to keep a file’s data blocks in the same

block group — reduce fragmentation▶ Backup critical info in each block group▶ The Ext2 inodes for each block group are kept in the

inode table▶ The inode-bitmap keeps track of allocated and

unallocated inodes

55 / 78

.Group descriptor..

......

▶ Each block group has a group descriptor▶ All the group descriptors together make the groupdescriptor table

▶ The table is stored along with the superblock▶ Block Bitmap: tracks free blocks▶ Inode Bitmap: tracks free inodes▶ Inode Table: all inodes in this block group▶ Free blocks count, Free Inodes count, Used directory

count: counters▶ see more: ∼# dumpe2fs /dev/sda1

56 / 78

Ext2 Block Allocation Policies626 Chapter 15 The Linux System

allocating scattered free blocks

allocating continuous free blocks

block in use bit boundaryblock selectedby allocator

free block byte boundarybitmap search

Figure 15.9 ext2fs block-allocation policies.

these extra blocks to the file. This preallocation helps to reduce fragmentationduring interleaved writes to separate files and also reduces the CPU cost ofdisk allocation by allocating multiple blocks simultaneously. The preallocatedblocks are returned to the free-space bitmap when the file is closed.

Figure 15.9 illustrates the allocation policies. Each row represents asequence of set and unset bits in an allocation bitmap, indicating used andfree blocks on disk. In the first case, if we can find any free blocks sufficientlynear the start of the search, then we allocate them no matter how fragmentedthey may be. The fragmentation is partially compensated for by the fact thatthe blocks are close together and can probably all be read without any diskseeks, and allocating them all to one file is better in the long run than allocatingisolated blocks to separate files once large free areas become scarce on disk. Inthe second case, we have not immediately found a free block close by, so wesearch forward for an entire free byte in the bitmap. If we allocated that byteas a whole, we would end up creating a fragmented area of free space betweenit and the allocation preceding it, so before allocating we back up to make thisallocation flush with the allocation preceding it, and then we allocate forwardto satisfy the default allocation of eight blocks.

15.7.3 Journaling

One popular feature in a file system is journaling, whereby modificationsto the file system are sequentially written to a journal. A set of operationsthat performs a specific task is a transaction. Once a transaction is written tothe journal, it is considered to be committed, and the system call modifyingthe file system (write()) can return to the user process, allowing it tocontinue execution. Meanwhile, the journal entries relating to the transactionare replayed across the actual file-system structures. As the changes are made, a

57 / 78

Maths

Given block size = 4kblock bitmap = 1 blk , then

blocks per group = 8bits× 4k = 32k

How large is a group?

group size = 32k× 4k = 128MB

How many block groups are there?

≈ partition sizegroup size =

partition size128M

How many files can I have in max?

≈ partition sizeblock size =

partition size4k

58 / 78

Ext2 inode

59 / 78

.Ext2 inode..

......

Mode: holds two pieces of information1. Is it a

{file|dir|sym-link|blk-dev|char-dev|FIFO}?2. Permissions

Owner info: Owners’ ID of this file or directorySize: The size of the file in bytes

Timestamps: Accessed, created, last modified timeDatablocks: 15 pointers to data blocks (12 + S+D+ T)

60 / 78

.Max File Size..

......

Given: {block size = 4kpointer size = 4B

,

We get:

Max File Size = number of pointers× block size

= (

number of pointers︷ ︸︸ ︷12︸︷︷︸

direct

+ 1k︸︷︷︸1−indirect

+ 1k× 1k︸ ︷︷ ︸2−indirect

+1k× 1k× 1k︸ ︷︷ ︸3−indirect

)× 4k

= 48k+ 4M+ 4G+ 4T

61 / 78

Ext2 Superblock

▶ Magic Number: 0xEF53▶ Revision Level: determines what new features are

available▶ Mount Count and Maximum Mount Count: determines if

the system should be fully checked▶ Block Group Number: indicates the block group holding

this superblock▶ Block Size: usually 4k▶ Blocks per Group: 8bits× block size▶ Free Blocks: System-wide free blocks▶ Free Inodes: System-wide free inodes▶ First Inode: First inode number in the file system▶ see more: ∼# dumpe2fs /dev/sda1

62 / 78

Ext2 File Types

File type Description0 Unknown1 Regular file2 Directory3 Character device4 Block device5 Named pipe6 Socket7 Symbolic link

Device file, pipe, and socket: No data blocks are required.All info is stored in the inode

Fast symbolic link: Short path name (< 60 chars) needs nodata block. Can be stored in the 15 pointerfields

63 / 78

Ext2 Directories0 11|12 23|24 39|40

+----+--+-+-+----+----+--+-+-+----+----+--+-+-+----+----+--//--+

| 21 |12|1|2|. | 22 |12|2|2|.. | 53 |16|5|2|hell|o | |

+----+--+-+-+----+----+--+-+-+----+----+--+-+-+----+----+--//--+

,--------> inode number

| ,---> record length

| | ,---> name length

| | | ,---> file type

| | | | ,----> name

+----+--+-+-+----+

0 | 21 |12|1|2|. |

+----+--+-+-+----+

12| 22 |12|2|2|.. |

+----+--+-+-+----+----+

24| 53 |16|5|2|hell|o |

+----+--+-+-+----+----+

40| 67 |28|3|2|usr |

+----+--+-+-+----+----+

52| 0 |16|7|1|oldf|ile |

+----+--+-+-+----+----+

68| 34 |12|4|2|sbin|

+----+--+-+-+----+

▶ Directories are special files▶ “.” and “..” first▶ Padding to 4×▶ inode number is 0 — deleted

file

64 / 78

Many different FS are in use

.Windows........

uses drive letter (C:, D:, ...) to identify each FS

.UNIX..

......

integrates multiple FS into a single structure▶ From user’s view, there is only one FS hierarchy

∼$ man fs

65 / 78

Virtural File Systems.Put common parts of all FS in a separate layer..

......

▶ It’s a layer in the kernel▶ It’s a common interface to several kinds of file systems▶ It calls the underlying concrete FS to actual manage the

data

User process

FS 1 FS 2 FS 3

Buffer cache

Virtual file system

File system

VFS interface

POSIX

66 / 78

67 / 78

.Virtual File System..

......

▶ Manages kernel level file abstractions in one format forall file systems

▶ Receives system call requests from user level (e.g.write, open, stat, link)

▶ Interacts with a specific file system based on mountpoint traversal

▶ Receives requests from other parts of the kernel, mostlyfrom memory management

.Real File Systems..

......

▶ managing file & directory data▶ managing meta-data: timestamps, owners, protection,

etc.▶ disk data, NFS data... translate←−−−−−−−−−→ VFS data

68 / 78

File System Mounting

/

a b a

c

p q r q q r

d

/

c d

b

Diskette

/

Hard diskHard disk

x y z

x y z

Fig. 10-26. (a) Separate file systems. (b) After mounting.

69 / 78

A FS must be mounted before it can be used.Mount — The file system is registered with theVFS..

......

▶ The superblock is read into the VFS superblock▶ The table of addresses of functions the VFS requires is

read into the VFS superblock▶ The FS’ topology info is mapped onto the VFS

superblock data structure

.The VFS keeps a list of the mounted file systemstogether with their superblocks..

......

The VFS superblock contains:▶ Device, blocksize▶ Pointer to the root inode▶ Pointer to a set of superblock routines▶ Pointer to file_system_type data structure▶ more...

70 / 78

V-node

▶ Every file/directory in the VFS has a VFS inode, kept inthe VFS inode cache

▶ The real FS builds the VFS inode from its own info.Like the EXT2 inodes, the VFS inodes describe..

......

▶ files and directories within the system▶ the contents and topology of the Virtual File System

71 / 78

VFS Operation

...

Process table

0

File descriptors

...

V-nodes

openreadwrite

Function pointers

...2

4

VFS

Read function

FS 1

Call from VFS into FS 1

72 / 78

Linux VFS

.The Common File Model..

......

All other filesystems must map their own concepts into thecommon file model

For example, FAT filesystems do not have inodes.

▶ The main components of the common file model are- superblock – information about mounted filesystem- inode – information about a specific file- file – information about an open file- dentry – information about directory entry

▶ Geared toward Unix FS

73 / 78

.The Superblock Object..

......

▶ is implemented by each FS and is used to storeinformation describing that specific FS

▶ usually corresponds to the filesystem superblock or thefilesystem control block

▶ Filesystems that are not disk-based (such as sysfs, proc)generate the superblock on-the-fly and store it inmemory

▶ struct super_block in <linux/fs.h>▶ s_op in struct super_block + struct super_operations —

the superblock operations table▶ Each item in this table is a pointer to a function that

operates on a superblock object

74 / 78

.The Inode Object..

......

▶ For Unix-style filesystems, this information is simplyread from the on-disk inode

▶ For others, the inode object is constructed in memory inwhatever manner is applicable to the filesystem

▶ struct inode in <linux/fs.h>▶ An inode represents each file on a FS, but the inodeobject is constructed in memory only as files areaccessed

▶ includes special files, such as device files or pipes▶ i_op + struct inode_operations

75 / 78

.The Dentry Object..

......

▶ components in a path▶ makes path name lookup easier▶ struct dentry in <linux/dcache.h>▶ created on-the-fly from a string representation of a path

name

76 / 78

.Dentry State..

......

▶ used▶ unused▶ negative

.Dentry Cache..

......

consists of three parts:1. Lists of “used” dentries2. A doubly linked “least recently used” list of unused and

negative dentry objects3. A hash table and hashing function used to quickly

resolve a given path into the associated dentry object

77 / 78

.The File Object..

......

▶ is the in-memory representation of an open file▶ open() ⇒ create; close() ⇒ destroy▶ there can be multiple file objects in existence for the

same file▶ Because multiple processes can open and manipulate a

file at the same time▶ struct file in <linux/fs.h>

78 / 78

Recommended