72
CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Embed Size (px)

Citation preview

Page 1: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

CS4432: Database Systems II

Data Storage

(Sections 11.2, 11.3, 11.4, 11.5)

Page 2: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Data Storage: Overview

• How does a DBMS store and manage large amounts of data?– (today, tomorrow)

• What representations and data structures best support efficient manipulations of this data?– (next week)

Page 3: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

The Memory Hierarchy

Cache (all levels)

Main Memory

Secondary Storage

Tertiary Storage

Fastest

SlowestAvg. Size: 256kb-1MB

Read/Write Time: 10-8 seconds.

Random Access

Smallest of all memory, and also the most costly.

Usually on same chip as processor.

Easy to manage in Single Processor Environments, more complicated in Multiprocessor Systems.

Avg. Size: 128 MB – 1 GB

Read/Write Time: 10-7 to 10-8 seconds.

Random Access

Becoming more affordable.

Volatile

Avg. Size: 30GB-160GB

Read/Write Time: 10-2 seconds

NOT Random Access

Extremely Affordable: $0.68/GB!!!

Can be used for File System, Virtual Memory, or for raw data access.

Blocking (need buffering)

Avg. Size: Gigabytes-Terabytes

Read/Write Time: 101 - 102 seconds

NOT Random Access, or even remotely close

Extremely Affordable: pennies/GB!!!

Not efficient for any real-time database purposes, could be used in an offline processing environment

Page 4: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Memory Hierarchy Summary

10-9 10-6 10-3 10-0 103

access time (sec)

1015

1013

1011

109

107

105

103

cache

electronicmain

electronicsecondary

magneticopticaldisks

onlinetape

nearlinetape &opticaldisks

offlinetape

typi

cal c

apac

ity

(byt

es)

Page 5: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Memory Hierarchy Summary

10-9 10-6 10-3 10-0 103

access time (sec)

104

102

100

10-2

10-4

cache

electronicmain

electronicsecondary magnetic

opticaldisks

onlinetape

nearlinetape &opticaldisks

offlinetape

doll

ars/

MB

Page 6: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Motivation

Consider the following algorithm :

For each tuple r in relation R{Read the tuple rFor each tuple s in relation S{

read the tuple sappend the entire tuple s to r

}}

What is the time complexity of this algorithm?

Page 7: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Motivation• Complexity:– This algorithm is O(n2) ! Is it always ?– Yes, if we assume random access of data.

• Hard disks are NOT Random Access !• Unless organized efficiently, this algorithm

may be much worse than O(n2).• We need to know how a hard disk

operates to understand how to efficiently store information and optimize storage.

Page 8: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Disk Mechanics

• Many DB related issues involve hard disk I/O!• Thus we will now study how a hard disk works.

Page 9: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Disk MechanicsDisk Head

Platter

Cylinder

Page 10: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Disk Mechanics

Track

Sector

Gap

Page 11: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Disk MechanicsP

M DC ......

Page 12: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Disk Controller

• Disk Controller is a processor capable of:– Controlling the motion of disk heads– Selecting surface from which to read/write– Transferring data to/from memory

P

M DC ......

Page 13: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

More Disk Terminology

• Rotation Speed: – The speed at which the disk rotates: 5400RPM =

one rotation every 11ms.

• Number of Tracks: – Typically 10,000 to 15,000.

• Bytes per track: – ~105 bytes per track

Page 14: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

How big is the disk if?

• There are 4 platters• There are 8192 tracks per surface• There are 256 sectors per track• There are 512 bytes per sector

Size = 2 * num of platters * tracks * sectors * bytes per sector

Size = 2 * 4platters * 8192 tracks/platter * 256 sect/trac * 512 bytes/sect

Size = 233 bytes / (1024 bytes/kb) /(1024 kb/MB) /(1024 MB/GB)

Size = 233 = 23 * 230 = 8GB

Remember 1kb = 1024 bytes, not 1000!

Page 15: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

What about access time?

block xin memory

?

I wantblock X

Time = Disk Controller Processing Time + Disk Latency +

Transfer Time

Page 16: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Access time, Graphically

P

M DC ......

Disk Controller Processing Time

Disk Latency

Transfer Time

Page 17: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Disk Controller Processing TimeTime = Disk Controller Processing Time + Disk Latency + Transfer Time

• CPU Request Disk Controller– nanoseconds

• Disk Controller Contention– microseconds

• Bus– microseconds

• Typically a few microseconds, so this is negligible for our purposes.

Page 18: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Transfer Time

Time = Disk Controller Processing Time + Disk Latency + Transfer Time

• Typically 10mb/sec• Or 4096 blocks takes ~ .5 ms

Page 19: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Disk Delay

Time = Disk Controller Processing Time + Disk Latency + Transfer Time

More complicated

Disk Delay = Seek Time +Rotational Latency

Page 20: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Seek Time

• Seek time is most critical time in Disk Delay. • Average Seek Times:– Maxtor 40GB (IDE) ~10ms– Western Digital (IDE) 20GB ~9ms– Seagate (SCSI) 70 GB ~3.6ms– Maxtor 60GB (SATA) ~9ms

Page 21: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Rotational Latency

Head Here

Block I Want

Page 22: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Average Rotational Latency

• Average latency is about half of the time it takes to make one revolution.

• 3600 RPM = 8.33 ms • 5400 RPM = 5.55 ms • 7200 RPM = 4.16 ms• 10,000 RPM = 3.0 ms (newer drives)

Page 23: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Example Disk Latency Problem

• Calculate the Minimum, Maximum and Average disk latencies for reading a 4096-byte block on the same hard drive as before:

•4 platters

•8192 tracks

•256 sectors/track

•512 bytes/sector

•Disk rotates at 3840 RPM

•Seek time: 1 ms between cylinders, + 1ms for every 500 cylinders traveled.

•Gaps consume 10% of each track

A 4096-byte block is 8 sectors

The disk makes one revolution in 1/64 of a second

1 rotation takes: 15.6 ms

Moving one track takes 1.002ms. Moving across all tracks takes

17.4ms

Page 24: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Solution: Minimum Latency• Assume best case:

– head is already on block we want!

• In that case, it is just read time of 8 sectors of 4096-byte block. We will pass over 8 sectors and 7 gaps.

• Remember : 10% are gaps and 90% are information, . or 36o are gaps, 324o is information.

36 x (7/256) + 324 x (8/256) = 11.109 degrees

11.109 / 360 = .0308 rot (3.08% of the rotation)

.0308 rot / 64 rot/sec = 0.482ms ~ 0.5ms

Page 25: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Solution: Maximum Latency

• Now assume worst case:– The disk head is over innermost cylinder and the block we want is on

outermost cylinder, – block we want has just passed under the head, so we have to wait a

full rotation.

Time = Time to move from innermost track to outermost track +Time for one full rotation +

Time to read 8 sectors= 17.4 ms (seek time) + 15.6 ms (one rotation) + .5ms . . (from minimum latency calculation)= 33.5 ms!!

Page 26: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Solution: Average Latency

• Now assume average case: – It will take an average amount of time to seek, and – block we want is ½ of a revolution away from heads.

Time = Time to move over tracks +Time for one-half of a rotation +

Time to read 8 sectors= 6.5ms (next slide) + 7.8ms (.5 rotation) + .5 ms (from min latency )= 14.8 ms

Page 27: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Solution: Calculating Average Seek Time

0500

10001500200025003000350040004500

CylindersTravelled

Graph: indicates avg travel time as fct of initial head position.That is about 1/3 across the disk on average.So integrate over this graph : =2730 cylinders = 1 + 2730/500 = 6.5 ms

Page 28: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Writing Blocks

• Basically same as reading!• Phew!

Page 29: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Verifying a write

• Verify : Same as reading/writing,– plus one additional revolution to come back to the

block and verify.

• So for our earlier example to verify each case:• MIN 0.5ms + 15.6ms + 0.5ms = 16.6ms• MAX 33.5ms + 15.6ms + 0.5ms = 49.6ms• AVG 14.8ms + 15.6ms + 0.5ms = 30.9 ms

Page 30: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

After seeing all of this …

• Which will be faster Sequential I/O or Random I/O?

• What are some ways we can improve I/O times without changing the disk features?

Page 31: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Next …

• Disk Optimizations

Page 32: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

One Simple Idea : Prefetching

Problem: Have a File» Sequence of Blocks B1, B2

Have a Program» Process B1» Process B2» Process B3

CS 4432 32

...

Page 33: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Single Buffer Solution

(1) Read B1 Buffer(2) Process Data in Buffer(3) Read B2 Buffer(4) Process Data in Buffer ...

CS 4432 33

Page 34: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Say P = time to process/blockR = time to read in 1 blockn = # blocks

Single buffer time = n(P+R)

CS 4432 34

Page 35: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Question: Could the DBMS know something about

behavior of such future block accesses ?

What if: If we knew more about sequence of future

block accesses, what and how could we do better ?

CS 4432 35

Page 36: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Idea : Double Buffering/Prefetching

Memory:

Disk:

CS 4432 36

A B C D GE F

A B

done

process

AC

process

B

done

Page 37: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Say P R

What is processing time now?

CS 4432 37

P = Processing time/blockR = IO time/block

n = # blocks

• Double buffering time = ?

Page 38: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Say P R

CS 4432 38

P = Processing time/blockR = IO time/block

n = # blocks

• Double buffering time = R + nP

• Single buffering time = n(R+P)

Page 39: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Block Size Selection?

• Question : Do we want Small or Big Block Sizes ?

• Pros ?• Cons ?

CS 4432 39

Page 40: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Block Size Selection?• Big Block Amortize I/O Cost

– For seek and rotational delays are reduced …

CS 4432 40

• Big Block Read in more useless stuff!

and takes longer to read

Unfortunately...

Page 41: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Using secondary storage effectively

• Example: Sorting data on disk• General Wisdom :– I/O costs dominate– Design algorithms to reduce I/O

CS 4432 42

Page 42: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Disk IO Model Of Computations Efficient Use of Disk

Example: Sort Task

Page 43: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

“Good” DBMS Algorithms

• Try to make sure if we read a block, we use much of data on that block

• Try to put blocks together that are accessed together

• Try to buffer commonly used blocks in main memory

CS 4432 44

Page 44: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Why Sort Example ?

• A classic problem in computer science!• Data requested in sorted order

– e.g., find students in increasing gpa order

• Sorting is first step in bulk loading B+ tree index.• Sorting useful for eliminating duplicate copies in a

collection of records (Why?)• Sort-merge join algorithm involves sorting.• Problem: sort 1Gb of data with 1Mb of RAM.

– why not virtual memory?

Page 45: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Sorting Algorithms

• Any examples algorithms you know ??• Typically they are main-memory oriented • They don’t look too good when you take disk

I/Os into account ( why? )

CS 4432 46

Page 46: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Merge Sort

• Merge : Merge two sorted lists and repeatedly choose the smaller of the two “heads” of the lists

• Merge Sort: Divide records into two parts; merge-sort those recursively, and then merge the lists.

CS 4432 47

Page 47: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

2-Way Sort: Requires 3 Buffers

• Pass 1: Read a page, sort it, write it.– only one buffer page is used

• Pass 2, 3, …, etc.:– three buffer pages used.

Main memory buffers

INPUT 1

INPUT 2

OUTPUT

DiskDisk

Page 48: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Two-Way External Merge Sort

• Idea: Divide and conquer: sort subfiles and merge

Input file

1-page runs

2-page runs

4-page runs

8-page runs

PASS 0

PASS 1

PASS 2

PASS 3

9

3,4 6,2 9,4 8,7 5,6 3,1 2

3,4 5,62,6 4,9 7,8 1,3 2

2,34,6

4,7

8,91,35,6 2

2,3

4,46,7

8,9

1,23,56

1,22,3

3,4

4,56,6

7,8

Page 49: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Two-Way External Merge Sort

• Costs for each pass?

• How many passes do we need ?

• What is the total cost for sorting?

Input file

1-page runs

2-page runs

4-page runs

8-page runs

PASS 0

PASS 1

PASS 2

PASS 3

9

3,4 6,2 9,4 8,7 5,6 3,1 2

3,4 5,62,6 4,9 7,8 1,3 2

2,34,6

4,7

8,91,35,6 2

2,3

4,46,7

8,9

1,23,56

1,22,3

3,4

4,56,6

7,8

Page 50: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Two-Way External Merge Sort• Each pass we read +

write each page in file.

• = 2 * N

• N pages in file => number of passes:

• So total cost is:

log2 1N

2 12N Nlog

Input file

1-page runs

2-page runs

4-page runs

8-page runs

PASS 0

PASS 1

PASS 2

PASS 3

9

3,4 6,2 9,4 8,7 5,6 3,1 2

3,4 5,62,6 4,9 7,8 1,3 2

2,34,6

4,7

8,91,35,6 2

2,3

4,46,7

8,9

1,23,56

1,22,3

3,4

4,56,6

7,8

Page 51: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

General External Merge Sort

• What if we had more buffer pages?• How do we utilize them ?

Page 52: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

General External Merge Sort

B Main memory buffers

INPUT ?

INPUT ?

OUTPUT?

DiskDisk

INPUT ?

. . . . . .

. . .

To sort file with N pages using B buffer pages?

Page 53: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

General External Merge Sort• To sort file with N pages using B buffer pages• Phase 1 (pass 0):

– Fill memory with records– Sort using any favorite main-memory sort– Write sorted records to disk– Repeat above, until all records have been put into one sorted list

B Main memory buffers

INPUT 1

INPUT B

DiskDisk

INPUT 2

. . . . . .

. . .

Page 54: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

General External Merge Sort• Phase 1 (pass 0): using B buffer pages

– Produce what output ??? – Cost (in terms of I/Os) ???

B Main memory buffers

INPUT 1

INPUT B

DiskDisk

INPUT 2

. . . . . .

. . .

Page 55: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

General External Merge Sort

• To sort file with N pages using B buffer pages:– Produce output: Sorted runs of B pages each

• Run Sizes: B pages each run.• How many runs: [ N / B ] runs.

– Cost : ?

B Main memory buffers

INPUT 1

INPUT B-1

OUTPUT

DiskDisk

INPUT 2

. . . . . .

. . .

Page 56: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

General External Merge Sort• To sort file with N pages using B buffer pages:

– Pass 0: use B buffer pages. – Produce output: Sorted runs of B pages each

• Run Sizes: B pages each run.• How many runs: [ N / B ] runs.

– Cost:• 2 * N I/Os

B Main memory buffers

INPUT 1

INPUT B-1

OUTPUT

DiskDisk

INPUT 2

. . . . . .

. . .

Page 57: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

General External Merge Sort• Sort N pages using B buffer pages:

– Phase 1 (which is pass 0 ). Produce sorted runs of B pages each.

– Phase 2 (may involve several passes 2, 3, etc.) Each pass merges B – 1 runs.

B Main memory buffers

INPUT 1

INPUT B-1

OUTPUT

DiskDisk

INPUT 2

. . . . . .

. . .

Page 58: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Phase 2

• Initially load input buffers with the first blocks of respective sorted run

• Repeatedly run a competition among list unchosen records of each of buffered blocks– Move record with least key to output

• Manage buffers as needed:– If input block exhausted, get next block from file– If output block is full, write it to disk

CS 4432 59

Page 59: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

General External Merge Sort• Sort N pages using B buffer pages:

– Phase 1 (which is pass 0 ). Produce sorted runs of B pages each.

– Phase 2 (may involve several passes 2, 3, etc.) Number of passes ? Cost of each pass?

B Main memory buffers

INPUT 1

INPUT B-1

OUTPUT

DiskDisk

INPUT 2

. . . . . .

. . .

Page 60: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Cost of External Merge Sort

• Number of passes:• Cost = 2N * (# of passes)• Total Cost : multiply above

1 1 log /B N B

Page 61: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Example• Buffer : with 5 buffer pages, • File to sort : 108 pages

– Pass 0: • Size of each run?• Number of runs?

– Pass 1: • Size of each run?• Number of runs?

– Pass 2: ???

Page 62: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Example• Buffer : with 5 buffer pages • File to sort : 108 pages

– Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages)

– Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages)

– Pass 2: 2 sorted runs, 80 pages and 28 pages– Pass 3: Sorted file of 108 pages

108 5/

22 4/

• Total I/O costs: ?

Page 63: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Example• Buffer : with 5 buffer pages • File to sort : 108 pages

– Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages)

– Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages)

– Pass 2: 2 sorted runs, 80 pages and 28 pages– Pass 3: Sorted file of 108 pages

108 5/

22 4/

• Total I/O costs: 2*N ( 4 )

Page 64: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Number of Passes of External Sort

N B=3 B=5 B=9 B=17 B=129 B=257100 7 4 3 2 1 11,000 10 5 4 3 2 210,000 13 7 5 4 2 2100,000 17 9 6 5 3 31,000,000 20 10 7 5 3 310,000,000 23 12 8 6 4 3100,000,000 26 14 9 7 4 41,000,000,000 30 15 10 8 5 4

Page 65: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

How large a file can be sorted in 2 passes with a given buffer size M?

CS 4432 66

???

Page 66: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Double Buffering (Useful here)• To reduce wait time for I/O request to complete,

can prefetch into `shadow block’. – Potentially, more passes; in practice, most files still

sorted in 2 or at most 3 passes.

OUTPUT

OUTPUT'

Disk Disk

INPUT 1

INPUT k

INPUT 2

INPUT 1'

INPUT 2'

INPUT k'

block sizeb

B main memory buffers, k-way merge

Page 67: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

Sorting Summary

• External sorting is important; DBMS may dedicate part of buffer pool for sorting!

• External merge sort minimizes disk I/O cost– Larger block size means less I/O cost per page.– Larger block size means smaller # runs merged

• In practice, # of runs rarely > 2 or 3

Page 68: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

CS 4432 69

Re-examine

Improving Access Times of Secondary Storage :

Five Disk Optimizations

Chapter 11.5

Page 69: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

CS 4432 70

Five Optimizations (in disk controller or OS)

• Group blocks accessed together on same cylinder – (to reduce seek times)

• One big disk several smaller disks – (to help read several blocks at same time)

• Mirror disks multiple copies of same data– (redundant disks to reduce rotational delay)

• Prefetch blocks into memory double-buffering. – (bring data in early)

• Disk Scheduling Algorithms to select order in which several blocks will be read or written– (streamline reads)

Page 70: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

CS 4432 71

Assessment of Five Optimizations

• Effect for “regular predictable tasks”, – like one long dedicated process with sequential read– e.g., a database SORT (1st-phase of multi-way-sort)

• Effect for many “unpredictable irregular tasks”– like many short processes in parallel– e.g., airline reservations or 2nd-phase of multi-way sort

• Or, some mixture in workload …

Page 71: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

CS 4432 72

Five Optimizations : Useful or Not ?• Group blocks together on same cylinder

• One big disk -> several smaller disks

• Mirror disks -> multiple copies of same data

• Prefetch blocks -> e.g., double-buffering.

• Disk scheduling -> e.g., elevator algorithm

Page 72: CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)

CS 4432 73

Assessment of Five Optimizations

• Book has in-depth answer to this assessment !

• So read the book (ch. 11.5).