47
File Organizations and Indexes ISYS 464

File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Embed Size (px)

Citation preview

Page 1: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

File Organizations and Indexes

ISYS 464

Page 2: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Disk Devices

• Disk drive: Read/write head and access arm.• Single-sided, double-sided, disk pack• Track, sector, cylinder (tracks with the same

diameter on the various disks) • Page, block, or physical record: It is the unit of

transfer between disk and primary storage, and vice versa.

• Blocking factor: the number of records in a block

Page 3: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Disk Speed

• Rpm: rounds per minute– 2400, 3600, 7200 rpm

• Ex. 2400 rpm, then each round takes 1/2400 min/round. – 60*1000/2400 = 25 msec/r

Page 4: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Time Required to Read One Block

• Seek time

• Rotational delay– Half round

• Block transfer time

Page 5: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Example

• A student file contains 20,000 records, each record has 113 bytes, assume each block is 512 bytes, how many blocks needed?– Blocking factor = floor(Block size/record size)

= floor(512/113)=4– Number of blocks = ceiling(number of

records/blocking factor) = 20,000/4=5,000 blocks

Page 6: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Linear Search, Binary search, and Direct Access

• Assume seek = s, rotational delay = r, block transfer time = tr, and file size is 5000 blocks,

• then the average time to do a linear search is:s + r + tr*(half of blocks) = s + r + 2500*tr

If the file is ordered by a key field, then the time to do a binary search is:

(s + r + tr) * Log25000

If index is available to enable direct access:

s + r + tr

Page 7: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Linear Search and Binary search

• Assume seek = s, rotational delay = r, block transfer time = tr, and file size is 5000 blocks, then the average time to do a linear search is:

s + r + tr*(half of blocks) = s + r + 2500*tr

Binary search: If the file is ordered by a key field, then the time to do a binary search is:

. Number of blocks accessed given n blocks: Log2n

. (s + r + tr) * Log25000

Page 8: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Updating a Record

• Read the block into main memory.

• Change the record in main memory.

• Write the block back to disk.

Page 9: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

File Organizations• Technique for physically arranging records of a file on

secondary storage• Factors for selecting file organization:

– Fast data retrieval and throughput– Efficient storage space utilization– Protection from failure and data loss– Minimizing need for reorganization– Accommodating growth– Security from unauthorized use

• Types of file organizations– Sequential– Indexed– Hashed

Page 10: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Access Method

• The steps involved in storing and retrieving records from a file.– Searching and updating

Page 11: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Unordered Files (Heap Files)

• Records are placed in the file in the same order as they are inserted.

• Searching: must do a linear search if index is not available.

• Updating:– Insertion: Read the last page, append to the last page,

then write the page back.– Modification: Search and read the block to main

memory. Write the block back after making changes.– Deletion: Mark the record for deletion (deletion flag)

and periodically reorganize the file.

Page 12: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Ordered Files

• Enable binary search

• Insertion: May need a temporary overflow file and periodically the overflow file is merged with the ordered file.

• Deletion: May need periodical reorganization.

Page 13: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Hash Files (Direct Files)

• The page a record is to be stored is determined by a hash function.

• Hash function calculates the address of the page based on the key field of the file:– Address = H(Key)

• Typical hash function: division/remainder:– 0 <= Key Mod M <= M-1

– Where M is the number of blocks allocated to this file.

Page 14: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Disk blocks

Block Address

0

1

2

3

4

5

6

7

0123

4567

H(K) -> Block number

Block address: Physical address

Page 15: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Hash File Example

• 8 blocks, each block holds 2 records• Hash function: Key Mod 8• Record keys:

– Key = 1821, Key Mod 8 = 5– 7115, 3– 2428, 4– 4750, 6– 1620, 4– 4692, 4

Page 16: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Collision Resolution

• Collision: When a record’s home block is full.

• Open addressing (linear probing): Place the record in the first available block.

Page 17: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Searching a Hash File

• Home block = H(SearchKey)

• If found in the home block then search successful

• Else– Search the next block until found or reach a

block with empty space

Page 18: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Hash File Performance

• Average Search Length = (Total # of blocks accessed to find all records)/(The number of records in the file)

• Using the previous example:– (1 + 1 + 1 + 2 + 1 + 1)/6 = 7/6

• Time needed to find a record in this file:– (s + r + tr) * 7/6

Page 19: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Factors Affecting Hash File Performance

• Hash file should spread the records evenly over the disk space.

• Use of a low load factor:– (# of records)/(# of available spaces)

• Allow each block to hold more records

Page 20: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Limitations of Hash File

• Cannot be accessed by other order:– Direct access only

• Fixed amount of space allocated to the file:– Static hashing– Waste space, hard to grow

• Inappropriate for retrievals based on ranges of values:– Find EmpID = 123– Find EmpID > 123

Page 21: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Index

• A data structure that allows the DBMS to locate particular records in a file more quickly.

• Index file:– IndexField + RecordPointer– Ordered according to the indexing field

Page 22: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Types of Index

• Primary index: Index on the primary key field.

• Secondary index: Index on a non-key field.

Page 23: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Index on Ordering Key Field

S10, …

S05, …S07, …

S20, …

S12, …S15, …

S30, …

S25, …S27, …

S05

S12S25

Block ptr

SID

SID

Note: The number of index entries equals the number of file blocks.

Page 24: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Index on NonOrdering Key Field

S12, …

S25, …S47, …

S20, …

S22, …S05, …

S30, …

S33, …S27, …

S05

S12S20

Record ptr

S22

SID

Note: The number of index entries equals the number of records.

Page 25: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Index on Ordering NonKey Field(Cluster Index)

S12, …

S25, …S47, …

S20, …

S22, …S05, …

S30, …

S33, …S27, …

ACCT

CISFIN

Block ptr

SID Major

ACCTACCTACCT

ACCTCISCIS

CISFINFIN

Major

Page 26: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Index on NonOrdering NonKey Field

S12, …

S25, …S47, …

S20, …

S22, …S05, …

S30, …

S33, …S27, …

ACCT

ACCTCIS

Record ptr

CIS

SID Major

CISFINACCT

ACCTCISFIN

MKTCISFIN

Major

CISFIN

Page 27: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Physical pointer vs Logical PointerWhen index on the key field is available, index on nonkey field can use record keys as logical pointers.

S12, …

S25, …S47, …

S20, …

S22, …S05, …

S30, …

S33, …S27, …

ACCT

ACCTCIS

SID

CIS

SID Major

CISFINACCT

ACCTCISFIN

MKTCISFIN

CISFIN

MajorS12

S22

S25

S05S27

S47

Page 28: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Physical pointer vs Logical PointerWhen index on the key field is available, index on nonkey field can use record keys as logical pointers.

S12, …

S25, …S47, …

S20, …

S22, …S05, …

S30, …

S33, …S27, …

ACCT

ACCTCIS

SID

CIS

SID Major

CISFINACCT

ACCTCISFIN

MKTCISFIN

CISFIN

MajorS12

S22

S25

S05S27

S47

SID is a logical pointer

The location of S12 can be found by search the primary index.

Page 29: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Searching with IndexA file with 30,000 records, each record has 100 bytes, block size is 1024 bytes:

. Data file blocking factor = floor(1024/100)=10

. Data file blocks = ceiling(30,000/10)=3000 blocks

If key field has 9 bytes, and physical pointer has 6 bytes, so each index entry has 15 bytes:

. Index file blocking factor = floor(1024/15) = 68

. Index file blocks = ceiling(30,000/68) = 442 blocks

Time to search for a record with the index is:

. Binary search the index = Log2442

. One data file access

. Time = (s + rd + tr) * (1 + Log2442)

Page 30: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Tree

• Nodes:– Regular nodes (internal nodes): nodes with parent and

children

– Root node: node with no parent

– Leaf nodes: nodes with no children

• Level: length of the path from the root to a node.– Root: level 0

• Balanced tree: All leaf nodes are at the same level.

Page 31: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

B -Trees

• If a node can store n pointers (n-1 keys), then each node except root and leaf nodes has at least ceiling(n/2) pointers.

• Each key in the tree represents (key + RecordPointer)

• All leaf nodes are at the same level.• When a node split, it splits into two nodes at the

same level, and the middle key is moved up to its parent node.

Page 32: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

B-Tree Examples

• A B-Tree with 3 pointers (2 keys) in a node, insert keys: 8, 5, 1,7, 3, 12, 9, 6, 4

• A B-Tree with 4 pointers (3 keys) in a node, insert keys: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 100, 95

Page 33: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

B+ Trees

• Record pointers are stored only at the leaf nodes.– More keys in a node, shorter path

• Every key must exist at the leaf nodes.• Every leaf node contains pointer to the next leaf

node.• Node Split:

– Leaf node split: keep the middle key in the left node and duplicate it in the parent node.

– Internal node split: move up the middle key as B-Tree.

Page 34: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

B+ Tree Examples

• A B+ Tree with 3 pointers (2 keys) in a node, insert keys: 8, 5, 1, 7, 3, 12, 9, 6

• A B+ Tree with 4 pointers (3 keys) in a node, insert keys: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 100, 95

Page 35: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

B+ Tree Advantages

• Shorter tree: Because internal nodes do not include record pointers, internal nodes can have more keys.

• All keys in the leaf nodes are already in sorted order.

• B+ Tree can be used to store data file.

Page 36: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Figure 6-8Bitmap index index organization

Bitmap saves on space requirementsRows - possible values of the attribute

Columns - table rows

Bit indicates whether the attribute of a row has the values

The bitmap index is used where the values of a field repeats very frequently, it is not used for primary key index.

Page 37: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

• Too many indexes will slow down update operations.

Page 38: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,
Page 39: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Rules for Using Indexes

1. Use on larger tables2. Index the primary key of each table3. Index search fields (fields frequently in

WHERE clause)4. Fields in SQL ORDER BY and GROUP

BY commands5. When there are >100 values but not when

there are <30 values

Page 40: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Rules for Using Indexes (cont.)6. Avoid use of indexes for fields with long values;

perhaps compress values first7. DBMS may have limit on number of indexes per

table and number of bytes per indexed field(s)8. Null values will not be referenced from an index9. Use indexes heavily for non-volatile databases;

limit the use of indexes for volatile databasesWhy? Because modifications (e.g. inserts, deletes) require updates to occur in index files

Page 41: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Redundant Arrays of Inexpensive (Independent) Disks

• RAID is a method to group more than one drive and make them appear as a single drive.

Page 42: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Disk 0 Disk 1 Disk 2 Disk 3

1A 2A 3A 4A1B 2B 3B 4B1C 2C 3C 4C

RAID 0

No redundancyBest write performancedisk can be accessed in parallelUnreliable

•Creating a stripe set without parity: •Spreads the data out over various disks

Page 43: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Figure 6-10RAID with four disks and striping

Here, pages 1-4 can be read/written simultaneously

Page 44: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

RAID 1

• Mirror set– Primary disk and mirror disk– 2 writes– Data can be accessed from either disk.– Fault tolerance

Page 45: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

RAID 5

• Creating a stripe set with parity

Disk 0 Disk 1 Disk 2 Disk 3

ParityA 1A 2A 3A1B Parity B 2B 3B1C

1D

2C Parity C 3C

2C 3D Parity D

Page 46: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Exclusive OR, XOR

• Condition 1

Condition 1 Condition 2Condtion 1 XOR Condition 2

T T F

T F TF T T

F F F

Page 47: File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Creating Parity with XOR

Disk 0 Disk 1 Disk 2 Disk 3

ParityA 1A 2A 3A

1A=1010, 2A=0100, 3A=1100

ParityA=(1A XOR 2A) XOR 3A = 0010

If Disk 0 fails: Recover by using =(1A XOR 2A) XOR 3A

If Disk 1 fails: Recover by using =(ParityA XOR 2A) XOR 3A