33
1 Data Storage Memory Hierarchy Disks Source: our textbook

1 Data Storage Memory Hierarchy Disks Source: our textbook

Embed Size (px)

Citation preview

Page 1: 1 Data Storage Memory Hierarchy Disks Source: our textbook

1

Data Storage

Memory HierarchyDisks

Source: our textbook

Page 2: 1 Data Storage Memory Hierarchy Disks Source: our textbook

2

Strawman Implementation

Use UNIX file system to store relations, e.g. Students(name, id, dept) in file

/usr/db/Students

One line per tuple, each component stored as character string, with # as a separator, e.g. tuple could be: Smith#123#CS

Store schema in /usr/db/schema, e.g.: Students#name#STR#id#INT#dept#STR

Page 3: 1 Data Storage Memory Hierarchy Disks Source: our textbook

3

Strawman cont'd

To execute SELECT * FROM R WHERE <Condition>:

1. read schema file for R2. check that <Condition> is valid for R3. print out column headers4. read file R and for each line

check the condition print the line if true

Page 4: 1 Data Storage Memory Hierarchy Disks Source: our textbook

4

Strawman cont'd

To execute a query involving a join of two relations R and S:

for each tuple (line) in R do for each tuple (line) in S do if the condition is satisfied then display the desired attributes

Page 5: 1 Data Storage Memory Hierarchy Disks Source: our textbook

5

What's Wrong?

The storage of the tuples on disk is inflexible: if a student changes major from EE to ECON, entire file must be rewritten

Search is very expensive (read entire relation)

Query processing is "brute force" -- there are faster ways to do joins, etc.

Data is not buffered between disk and main memory

No concurrency control No reliability in case of a crash

Page 6: 1 Data Storage Memory Hierarchy Disks Source: our textbook

6

How to Fix these Problems

Take advantage of the characteristics of computer hardware with clever algorithms to do things better

We will cover data storage (predominantly disks) how to represent data elements indexes query optimization failure recovery concurrency control

Page 7: 1 Data Storage Memory Hierarchy Disks Source: our textbook

7

Memory Hierarchy cache

main memory

secondary storage (disk)

tertiary storage (tapes, CD-ROM)

faster,smaller,more expensive

slower,larger, cheaper

Page 8: 1 Data Storage Memory Hierarchy Disks Source: our textbook

8

Cache Memory

Transfer a few bytes at a time between cache and main memory: instruction, integer, floating point, short string

Processor operates on instruction and data in the cache

Typical size: 1 Mbyte (220 bytes) Typical speed to/from main memory:

100 nanosec (1 nanosec = 10-9 sec)

Page 9: 1 Data Storage Memory Hierarchy Disks Source: our textbook

9

Main Memory

Typical size: 100 Mbytes to 10 Gbytes (1 Gbyte = 230 bytes)

Typical access speed (to read or write): 10 to 100 nanosec

At least 100 times larger than cache

At least 10 times slower than cache

Page 10: 1 Data Storage Memory Hierarchy Disks Source: our textbook

10

Secondary Storage Usually disk Divided logically into blocks, unit of transfer

between main memory (called disk I/O) Typical size: 100 Gbytes Typical speed: 10 millisec (10-3 sec) At least 100 times larger than main memory Much slower than main memory and much

much slower than cache: can execute several million instructions during one disk I/O

Page 11: 1 Data Storage Memory Hierarchy Disks Source: our textbook

11

Tertiary Storage

Tape(s) CD-ROM(s) At least 1000 times slower than

secondary storage At least 1000 times larger than

secondary storage

Page 12: 1 Data Storage Memory Hierarchy Disks Source: our textbook

12

Volatile vs. Nonvolatile

Storage is volatile if the data is lost when the power is gone

Usually main memory is volatile Usually secondary and tertiary

storage is nonvolatile Thus every change made to a

database in main memory must be backed up on disk before it can be permanent.

Page 13: 1 Data Storage Memory Hierarchy Disks Source: our textbook

13

Disks

spindledisk heads

platters: each has two surfaces,each surface consists oftracks (concentric rings)

one head per surface,very close to surface,does the reading andwriting

Page 14: 1 Data Storage Memory Hierarchy Disks Source: our textbook

14

More on Disks

orange ring is a track

black squares are gaps,which don't hold data

part of track betweentwo gaps is a sector

one or more sectorsmake a block

Page 15: 1 Data Storage Memory Hierarchy Disks Source: our textbook

15

Disk Controller

controls mechanical actuator that moves the heads in and out (radius, distance from spindle) one track from each surface at the same

radius forms a cylinder

selects a surface selects a sector (senses when that sector

is under the corresponding head) transfers bits

Page 16: 1 Data Storage Memory Hierarchy Disks Source: our textbook

16

Typical Values

Rotation speed: 5400 rmp

Number of platters: 5

Number of tracks/surface: 20,000

Number of sectors/track: 500

Number of bytes/sector: thousands

Page 17: 1 Data Storage Memory Hierarchy Disks Source: our textbook

17

Disk Latency for a Read Time between issuing command to read a

block and when contents of block appear in main memory: time for processor and disk controller to process

request, including resolving any contention (negligible)

seek time: time to move heads to correct radius (0 to 40 millisec)

rotational latency: time until first sector of block is under the head (5 millisec)

transfer time: until all sectors of the block have passed under the head; depends on rotation speed and size of block

Page 18: 1 Data Storage Memory Hierarchy Disks Source: our textbook

18

Disk Latency for Updates

For a write: like reading plus verification (read back and compare)

To modify a block: read it into main memory change it in main memory write it back to disk

Page 19: 1 Data Storage Memory Hierarchy Disks Source: our textbook

19

Moral of the Story

Disks accesses are orders of magnitude slower than accesses to main memory.

They are unavoidable in large databases.

Thus do everything possible to minimize them.

Can lead to different algorithms than for main memory model.

Page 20: 1 Data Storage Memory Hierarchy Disks Source: our textbook

20

Speeding Up Disk Accesses

1. Place blocks accessed together on same cylinder reduces seek time and rotational latency

2. Divide data among several disks head assemblies can move in parallel

3. Mirror a disk: make copies of it speeds up reads: get data from disk

whose head is closest to desired block no effect on writes: write to all copies also helps with fault tolerance

Page 21: 1 Data Storage Memory Hierarchy Disks Source: our textbook

21

Speeding up Disk Accesses

4. Be clever about order in which read and write requests are serviced, i.e., algorithm in OS or DBMS or disk controller

Ex: elevator algorithm

5. Prefetch blocks to main memory in anticipation of future use (buffering)

Page 22: 1 Data Storage Memory Hierarchy Disks Source: our textbook

22

Elevator Algorithm

Works well when there are many "independent" read and write requests, i.e., don't need to be done in a particular order, that are randomly distributed over the disk.

Disk head assembly sweeps in and out repeatedly

When heads pass a cylinder with pending requests, they stop to do the request

When reaching a point with no pending requests ahead, change direction

Page 23: 1 Data Storage Memory Hierarchy Disks Source: our textbook

23

Prefetching

Suppose you can predict order in which blocks will be requested from disk.

Load them into main memory buffers before they are needed.

Have flexibility to schedule the reads efficiently

Can also delay writing buffered blocks if the buffers are not needed immediately

Page 24: 1 Data Storage Memory Hierarchy Disks Source: our textbook

24

Disk Failures

Intermittent failure: attempt to read or write a sector fails but a subsequent try succeeds

Impossible to read sector Impossible to write a sector Disk crash: entire disk becomes

unreadable

Page 25: 1 Data Storage Memory Hierarchy Disks Source: our textbook

25

Coping with Intermittent Failures

Use redundant bits in each sector Store checksums in the redundant

bits After a read, check if checksums are

correct; if not then try again After a write, can do a read and

compare with value written, or be optimistic and just check the checksum of the read

Page 26: 1 Data Storage Memory Hierarchy Disks Source: our textbook

26

Checksums Suppose we use one extra bit, a parity bit.

if the number of 1's in the data bits is odd, then set the parity bit to 1, otherwise to 0

This is not foolproof: 101 and 110 both have even parity so checksum would be 0 for both

Use n parity bits in the checksum: parity bit 1 stores parity of every n-th bit, starting

with first bit, parity bit 2 stores parity of every n-th bit, starting

with second bit, etc. Probability of missing an error is 1/2n

Page 27: 1 Data Storage Memory Hierarchy Disks Source: our textbook

27

Coping with Permanent Read/Write Errors

Stable storage policy: Each "virtual" sector X is represented by

two real sectors, XL and XR. To write value v to X:

repeat {write v to XL, read from XL } until read's checksum is correct or exceed max # of tries

do the same thing with XR

if XL or XR is discovered to be bad, then must find a substitute

Page 28: 1 Data Storage Memory Hierarchy Disks Source: our textbook

28

Handling Write Failures

Suppose write(s) to XL all fail. Then old value is safe in XR.

Suppose write(s) to XR all fail. Then new value is safe in XL.

Assumption is that it is highly unlikely for two sectors to fail around the same time.

Page 29: 1 Data Storage Memory Hierarchy Disks Source: our textbook

29

More on Stable Storage

To read from X: repeatedly read XL until checksum is

good or exceed max # tries if read of XL failed then repeatedly

read XR until checksum is good or exceed max # tries

Handles permanent read failures, unless both XL and XR fail about the same time (unlikely)

Page 30: 1 Data Storage Memory Hierarchy Disks Source: our textbook

30

Coping with Disk Crashes

"Mean time to failure" of a disk is length of time by which 50% of such disks will have had a head crash

Goal is to have a much longer "mean time to data loss" for your system

Key idea: use redundancy Discuss three such approaches next…

Page 31: 1 Data Storage Memory Hierarchy Disks Source: our textbook

31

Mirroring (RAID Level 1) Keep another copy of each disk: write

to both, read from one. Only way data can be lost is if second

disk crashes while first is being repaired.

If mean time to crash of a single disk is 10 years and it takes 3 hours to repair a disk, then mean time to data loss is 146,000 years.

Page 32: 1 Data Storage Memory Hierarchy Disks Source: our textbook

32

Parity Blocks (RAID Level 4)

Drawback of previous scheme is that you need double the number of disks.

Instead use one spare disk no matter how many data disks you have.

Block i of the spare disk contains the parity checks for block i of all the data disks.

If spare disk fails, get a new spare. If a data disk fails, recompute its data from

the other data disks and the spare.

Page 33: 1 Data Storage Memory Hierarchy Disks Source: our textbook

33

RAID Level 5 Drawback of previous scheme is that

spare disk is a bottleneck. Instead, let each data disk also serve

as the spare disk for some blocks.

All these assume only one crash at a time. RAID Level 6 uses error-correcting codes to be able to handle multiple crashes.