27
Disks

Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Embed Size (px)

Citation preview

Page 1: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Disks

Page 2: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Secondary Storage

• Secondary Storage Typically– Storage systems outside of “primary memory”– Cannot access data using load/store instructions

• Characteristics– Large: Terabytes– Cheap: few cents / GB– Persistent: data survives power loss– Slow: 100us – 10ms to access • RAM: 100ns access

Page 3: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Disks and the OS

• Disks are messy, messy devices– Errors, bad blocks, etc…

• OS hides details from higher level software– Low level device drivers (block reads, etc)– Higher level abstractions (files, databases, etc)

• Different abstractions for different clients– Physical disk blocks (surface, cylinder, sector)– Logical disk blocks (disk block index #)– Logical files (filenames, byte index #)

Page 4: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

I/O System

OSFile System

I/O System

Device Driver

User Process User Process User Process

Device Controller

Disk

Page 5: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Disk HW Architecture

Page 6: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Disk Performance

• How long to read or write n sectors?– Positioning Time + Transfer Time– Positioning Time: Seek time + rotational delay– Transfer time: n / (RPM * bytes/track)

Page 7: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Disk Performance

• Performance depends on number of steps– Seek: moving disk arm to correct cylinder

• Depends on how fast disk arm can move– Seek times aren’t diminished very quickly (why?)

– Rotation (latency): wait for sector to rotate under read head• Depends on rotation rate of disk

– Rates are increasing slowly

– Transfer: Disk Surface -> Disk Controller -> Host Memory• When the OS uses disk, it tries to minimize total cost of

all steps– Particularly seeks and rotation

Page 8: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Interacting with Disks

• Old days– OS would specify cylinder, sector, surface, and transfer

size• OS would need to know all disk parameters

• Modern disks are even more complicated– Not all sectors are the same size, sectors are remapped,

etc…– Disks now provide higher level interface (SCSI, EIDE)

• Exports data as logical array of blocks [0….N]• Disk controller maps logical blocks to cylinder/surface/sector• On board cache• Thus, physical parameters are hidden from OS

Page 9: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Disk Controller

• Responsible for Interface between OS and disk drive– Common interfaces; (S)ATA/IDE vs SCSI

• (S)ATA/IDE used for personal storage– slower rotation speeds, slower seek times

• SCSI (SAS) used for enterprise systems– Faster rotation and seek times

• Basic Operations– Read Block <index> / Write Block <index>

• OS does not know of internal disk complexity– Disks export array of Logical Block Addresses (LBAs)– Disk maps internal structures to LBA numbers

• Implicit Contract:– Large sequential accesses to contiguous LBA regions achieve

better performance than small transfers or small accesses

Page 10: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Reliability• Disks fail more often

– When continuously powered-on– With heavy workloads– Under high temperatures

• How do disks fail?– Whole disk can stop working (e.g. motor dies)– Transient problem (cable disconnected)– Individual sectors can fail (e.g. head crash or scratch)

• Data can either be corrupted or (more likely) unreadable

• Disks can internally fix some errors– ECC (error correction code): Detect/correct bit errors– Retry sector reads/writes– Move sectors to other locations on disk

• Maintain same LBA location

Page 11: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Buffering

• Disks contain internal memory (2MB – 16MB) used as cache

• Read-ahead– Read entire cylinder into memory during rotational delay

• Write caching to volatile memory– Returns from writes immediately– Data stored in disk’s cache

• Can be lost on power failure

• Command Queuing– Keep multiple outstanding requests for disk– Disk can reorder (schedule) requests to imrpove

performance

Page 12: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Disk Scheduling

• Goal: Minimize positioning time– Performed for both OS and disk

• FCFS: Schedule requests in order received– Advantage: fair– Disadvantage: High seek cost and rotation

• Shortest seek time first (SSTF)– Handle nearest cylinder next– Advantage: Reduces arm movement (seek time)– Disadvantage: Unfair, can starve requests

• Where have we seen this before?

Page 13: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Disk Scheduling

• SCAN (elevator): Move from outer cylinder in, then reverse direction– Advantage: More fair to requests, minimizes seeks

Page 14: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

SSDs: Solid State Storage

• Forget everything about rotating disks– Completely different– Greater Complexity but higher performance– Less density

• A block of NVRAM (non-volatile) arranged in a grid– Single Level Cell (SLC): 1 bit per cell

• Faster and more reliable

– Multi Level Cell (MLC): 2 bits per cell • Slower and less reliable

Page 15: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Role of OS

• Standard Library– Common block interface across all storage devices

• Resource Coordination– Fairness and isolation

• Protection– Prevent applications from crashing system

• Provide file system semantics– Even higher level abstraction than block devices

Page 16: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

File Systems

• In general file systems are simple– Abstraction for secondary storage• Files

– Logical organization of files• Directories

– Sharing of data between users/processes• Permissions/ACLs

Page 17: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Files

• Collection of data with some properties– Contents, size, owner, permissions

• Files may also have types– Understood by file system

• Device, directory, symbolic link, …

– Understood by other parts of OS/runtime• Executable, DLL, source code, object code, …

• Types can be encoded in name or contents– File extension: .exe, .txt, .jpg, .dll– Content: “#!<interpreter>”

• Operating system view– Bytes mapped to collection of blocks on persistent physical storage

Page 18: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Directories

• Provides:– Method for organizing files– Namespace that is accessible by both users and FS– Map from file name to file data

• Actually maps name to meta-data, meta-data maps to data

• Directories contain files and other directories– /, /usr, /usr/local,

• Most file systems support notion of a current directory– Absolute names: fully qualified starting from root of FS

• /usr/local (absolute)

– Relative names: specified with respect to current directory• ./local (relative)

Page 19: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

File Meta-Data

• Meta-Data: Additional information associated with a file– Name– Type– Location of data blocks on disk– Size– Times: Creation, access, modification– Ownership– Permissions (read/write)

• Stored on disk– How and where its stored depends on file system

Page 20: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Directory entries

• A directory is a file that contains only meta-data– List of files with associated meta-data– Each file’s meta-data = attributes• Size, protection, location on disk, creation time, …

– List is usually un-ordered (effectively random)• Running ‘ls’ sorts files in memory

Page 21: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Directory Trees

• Directory entries specify files…• But a directory is a file

– Special bit in meta-data stored in directory entry

• User programs can read directories• But, only system programs can write directories

– Why is this true?• Special directories

– This directory: ‘.’– Parent directory: ‘..’– Root: ‘/’

• Fixed directory entry for its own meta-data

Page 22: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

File Operations• Create file with given pathname /a/b/file

– Traverse pathname, allocate meta-data and directory entry• Read-from/write-to offset in file

– Find (or allocate) blocks on disk, update meta-data• Delete

– Remove directory entry, free disk blocks • Rename

– Change directory entry• Copy

– Allocate new directory entry, allocate new space on disk, copy• Change permissions

– Change directory entry, or ACL meta-data

Page 23: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Opening Files

• Files must be opened before accessing data– User must specify access mode: read and/or write

• OS tracks open files for each process– Table containing open file’s metadata– How do you know which table entry belongs to a

file?– File descriptor = index into open file table– Table contains meta-data + current position and

access mode

Page 24: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Basic operations

UNIX• create(name)• open(name, mode)• read(fd, buf, len)• write(fd, buf, len)• sync(fd)• seek(fd, pos)• close(fd)• unlink(name)• rename(old, new)

Win32• CreateFile(name, CREATE)• CreateFile(name, OPEN)• ReadFile(handle, …)• WriteFile(handle, …)• FlushFileBuffers(handle, …)• SetFilePointer(handle, …)• CloseHandle(handle, …)• DeleteFile(name)• CopyFile(name, …)• MoveFile(name, …)

Page 25: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Path name translation• Example: Open “/one/two/three”

– int fd = open(“/one/two/three”, O_RDWR);• What happens in FS?

– Open directory “/” at known FS location– Search for directory entry of “one”– Get Location of “/one” from directory entry– Open “one”, search for “two”, get location of “two”– Open “two”, search for “three”, get location of “three”– Open file “three”– (Of course check permissions at each step)

• FS spends a lot of time walking through directories– This is why open is separated from read/write– OS can cache directory entries for faster lookups

Page 26: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Acyclic-graph Directories

• More general than a tree structure– Add connections across the tree (no cycles)– Create links from one file (or directory) to another

• Hard link: “ln a b” (“a” must exist)– Idea: Can use name “a” or “b” to access same file data– Implementation: Multiple directory entries point to

same meta-data– What happens when you remove “a”? (Does “b” still

exist?)– UNIX: No hard links to directories. Why?

Page 27: Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions

Acyclic-graph Directories

• Symbolic (soft) link: “ln -s a b”– Can use name “a” or “b” to get same file data• But only if “a” exists

– When referencing “b”, lookup pathname of “a”– B: special file (designated via bit in meta-data)• Contents of “b” simply contains the pathname of “a”• Since data is small, many FS implementations just

embed pathname of “a” into directory entry for “b”