24
©Silberschatz, Korth and Sudarsha 12.1 Database System Concepts Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function to be modified dynamically Extendable hashing – one form of dynamic hashing This hashing scheme take advantage of the fact that the result of applying a hashing function is a non-negative integer which can be represented as a binary number- a string of bits. a type of directory, i.e., an array of 2 d bucket addresses—is maintained, where d is called the global depth of the directory. A local depth d’—stored with each bucket—specifies the number of bits on which the bucket contents are based Value of d grows and shrinks as the size of the database grows and shrinks. Thus, actual number of buckets is < 2 d The number of buckets changes dynamically due to coalescing and splitting of buckets.

Dynamic Hashing and Indexing

Embed Size (px)

Citation preview

Page 1: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.1Database System Concepts

Dynamic Hashing

Good for database that grows and shrinks in size Allows the hash function to be modified dynamically Extendable hashing – one form of dynamic hashing

This hashing scheme take advantage of the fact that the result of applying a hashing function is a non-negative integer which can be represented as a binary number- a string of bits.

a type of directory, i.e., an array of 2d bucket addresses—is maintained, where d is called the global depth of the directory.

A local depth d’—stored with each bucket—specifies the number of bits on which the bucket contents are based

Value of d grows and shrinks as the size of the database grows and shrinks.

Thus, actual number of buckets is < 2d

The number of buckets changes dynamically due to coalescing and splitting of buckets.

Page 2: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.2Database System Concepts

Splitting and Coalescing of Buckets

Splitting of buckets is done when an overflow occurs; the value of d is incremented by one.

For a bucket whose hash value starting with 01, after splitting, first contains records whose hash value start with 010 and the other with 011

Coalescing occurs when records are deleted, i.e. d>d’; The value of d is decremented by one.

Page 3: Dynamic Hashing and Indexing

Extendible Hashing - Example

Record K h(K) h(K)2

rec1 2639 1 00001rec2 3760 16 10000rec3 4692 20 10100rec4 4871 7 00111rec5 5659 27 11011rec6 1821 29 11101rec7 1074 18 10010rec8 2115 11 01011rec9 1620 20 10100rec10 2428 28 11100rec11 3943 7 00111rec12 4750 14 01110rec13 6975 31 11111rec14 4981 21 10101rec15 9208 24 11000

Page 4: Dynamic Hashing and Indexing

01

rec 1rec 2

d1=0

record 3 = overflow!!

splitting bucket

d = 1d = 0

d1 = local depthd = global depth

rec 1

d1 = 1

d1 = 1

rec 2rec 3

rec 4

record 5 = overflow!!

splitting bucket

NEXT

Directory Locations

Page 5: Dynamic Hashing and Indexing

rec 2rec 3

rec 1rec 4

rec 5rec 6

00

10

d = 2

d1 = 2

d1 = 1

11d1 = 2

01

record 7 = overflow!!

splitting bucket

NEXT

Page 6: Dynamic Hashing and Indexing

rec 2

rec 3

rec 1rec 4

rec 5rec 6

d1 = 3

000

110

d = 3

d1 = 1

111d1 = 2

001010011

100101

d1 = 3

rec 7

record 8 = overflow!!

splitting bucket

NEXT

Page 7: Dynamic Hashing and Indexing

record 10 = overflow!!

splitting bucket

NEXTrec 1

rec 4

000

110

d = 3

111

001010011

100101

d1 = 3

d1 = 2

d1 = 3

d1 = 3

d1 = 3

d1 = 2rec 8

rec 2

rec 3

rec 5rec 6

rec 7

rec 9

Page 8: Dynamic Hashing and Indexing

record 13 = overflow!!

splitting bucket

NEXT

rec 5

rec 6rec 10

d1 = 3

d1 = 3

rec 7

rec 9

d1 = 3

d1 = 3

rec 1

rec 4

000

110

d = 3

111

001010011

100101

rec 2

rec 3

d1 = 3

d1 = 3

d1 = 2rec 8

rec 11

rec 12

Page 9: Dynamic Hashing and Indexing

rec 6 d1 = 4

d1 = 3

d1 = 3

d1 = 3

d1 = 2

0000

1110

d = 4

1111

000100100011

11001101

0100010101100111

10101011

10001001

d1 = 3

d1 = 4

d1 = 3

rec 10

rec 13

rec 1

rec 4

rec 2

rec 3

rec 5

rec 7

rec 8

rec 11

rec 12

rec 14

rec 15

Page 10: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.10Database System Concepts

Advantages and Disadvantages

Benefits of extendable hashing: Hash performance does not degrade with growth of file

Minimal space overhead

Disadvantages of extendable hashing Extra level of indirection to find desired record

Bucket address table may itself become very big (larger than memory) Need a tree structure to locate desired record in the structure!

Changing size of bucket address table is an expensive operation

Linear hashing is an alternative mechanism which avoids these disadvantages at the possible cost of more bucket overflows. That is the directory is not needed.

Page 11: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.11Database System Concepts

Indexing

Page 12: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.12Database System Concepts

Indexing : Basic Concepts

Indexing mechanisms used to speed up access to desired data. E.g., The catalog of library.

Search Key - attribute to set of attributes used to look up records in a file.

An index file consists of records (called index entries) of the form

Index files are typically much smaller than the original file Two basic kinds of indices:

Ordered indices: search keys are stored in sorted order

Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”.

search-key pointer

Page 13: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.13Database System Concepts

Index Evaluation Factors

Access types supported efficiently. E.g., records with a specified value in the attribute

or records with an attribute value falling in a specified range of values.

Access time Insertion time Deletion time Space overhead- additional space occupied by an index

structure.

Page 14: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.14Database System Concepts

Ordered Indices

In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.

Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file. Also called clustering index

The search key of a primary index is usually but not necessarily the primary key.

Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.

Index-sequential file: ordered sequential file with a primary index.

Indexing techniques evaluated on basis of:

Page 15: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.15Database System Concepts

Dense Index Files

Dense index — Index record appears for every search-key value in the file.

Page 16: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.16Database System Concepts

Sparse Index Files

Sparse Index: contains index records for only some search-key values. Applicable when records are sequentially ordered on search-key

To locate a record with search-key value K we: Find index record with largest search-key value < K

Search file sequentially starting at the record to which the index record points

Less space and less maintenance overhead for insertions and deletions.

Generally slower than dense index for locating records. Good tradeoff: sparse index with an index entry for every block

in file, corresponding to least search-key value in the block.

Page 17: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.17Database System Concepts

Example of Sparse Index Files

Page 18: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.18Database System Concepts

Multilevel Index

If primary index does not fit in memory, access becomes expensive.

To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it. outer index – a sparse index of primary index

inner index – the primary index file

If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.

Indices at all levels must be updated on insertion or deletion from the file.

Page 19: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.19Database System Concepts

Multilevel Index (Cont.)

Page 20: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.20Database System Concepts

Index Update: Insertion

Single-level index insertion: Perform a lookup using the search-key value appearing in the record

to be inserted.

Dense indices – if the search-key value does not appear in the index, insert it.

Sparse indices – if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created. In this case, the first search-key value appearing in the new block is inserted into the index.

Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms

Page 21: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.21Database System Concepts

Index Update: Deletion

If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also.

Single-level index deletion: Dense indices – deletion of search-key is similar to file record

deletion.

Sparse indices – if an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.

Page 22: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.22Database System Concepts

Secondary Indices

Frequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index satisfy some condition. Example 1: In the account database stored sequentially

by account number, we may want to find all accounts in a particular branch

Example 2: as above, but where we want to find all accounts with a specified balance or range of balances

We can have a secondary index with an index record for each search-key value; index record points to a bucket that contains pointers to all the actual records with that particular search-key value.

Page 23: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.23Database System Concepts

Secondary Index on balance field of account

Page 24: Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.24Database System Concepts

THANK YOU.

That’s all about Indices……