Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
St. Vincent Pallotti College of Engineering & Technology
Department of Computer Engineering
Academic Year: - 2019-20
Sub Code:- BECME402T Course:- File Structure and Data Processing
Semester:-IV Sem. CE Course Teacher:- S.M.Wanjari
Contents� Introduction
� A Simple Hashing Algorithm
� Hashing Functions and Record Distributions
How Much Extra Memory Should Be Used?� How Much Extra Memory Should Be Used?
� Collision Resolution by Progressive Overflow
� Storing More Than One Record per Address: Buckets
� Making Deletions
� Other Collision Resolution Techniques
� Patterns of Record Access
1. Introduction� O-notation
� O(1)
� O(N) : sequential searching
� O(log2N)2
� O(logkN) : B-Tree
� What is Hashing?� a = h(K)
� h (hash function), K (key), a (home address)
� ExampleK = BASSh = (first char * second char) mod 1000
� a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290
Introduction� Collision
� Examplekey : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4
OLIVIER => a = (79 * 76) mod 1000 = 6,004 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4
� Several ways to reduce the number of collisions� 1. Spread out the records
� Good hashing algorithms
� 2. Use extra memory
� 3. Put more than one record at a single address� Buckets
2. A Simple Hashing Algorithm� 3 Steps
� 1. Represent the key in numerical form
� 2. Fold and add
� 3. Divide by a prime number and use the remainder as � 3. Divide by a prime number and use the remainder as the address
� Example
� Step 1. Represent the Key in Numerical FormLOWELL =
76 79 87 69 76 76 32 32 32 32 32 32L O W E L L Blanks
A Simple Hashing Algorithm� Example
� Step 2. Fold and Add76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 327679 + 8769 + 7676 + 3232 + 3232 = 30588(30588+3232 = 33820 => 2byte Maximum 값 32767 을초과하므로)(30588+3232 = 33820 => 2byte Maximum 32767 )
7679 + 8769 = 16448 => 16448 mod 19937 = 1644816448 + 7676 = 24124=> 24124 mod 19937 = 41874187 + 3232 = 7419 => 7419 mod 19937 = 74197419 + 3232 = 10651 => 10651 mod 19937 = 1065110651 + 3232 = 13883 => 13883 mod 19937 = 13883
� Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46
3. Hashing Functions and Record
Distributions� Distributing Records among Addresses
1234
AB
Record Address
Best
1234
AB
Record Address
Worst
1234
AB
Record Address
Acceptable
45678910
BCDEFG
(a)
45678910
BCDEFG
(b)
45678910
BCDEFG
(c)
<Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case(c) Randomly distribution (Acceptable)
Hashing Functions and Record
Distributions� Some Other Hashing Methods
� Better than random
� Examine keys for a pattern
� Divide the key by a prime number� Divide the key by a prime number
� Random
� Square the key and take the middle4532 => 2 0 5 2 0 9
� Radix transformation
4. How Much Extra Memory
Should Be Used ?� Packing Density
N
r=spaces of #
records of #
� Example
r = 75 records
N = 100 address %7575.0100
75 ==
How Much Extra Memory Should
Be Used ?� Predicting Collisions for Different Packing Densities
Packing density (%) Synonyms (%)
10
40
4.8
17.640
70
90
100
17.6
28.1
34.1
36.8
<Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses
5. Collision Resolution by
Progressive Overflow� Progressive Overflow
� Open addressing
� Linear probing0address 0
1
Rosen2
Jasper3
York4
Novak’s home address
York’s home address
York h(K)address
3
Novak h(K)address
2
Collision Resolution by Progressive
Overflow� Search Length
KeyHome
Address
# of Access
(Search Length)
Adams
Bates
0
1
1
1
Adams0
Bates1
Cole2Bates
Cole
Dean
Evans
1
1
2
0
1
2
2
5
Dean3
Evans4
5
Collision Resolution by Progressive
Overflow� Search Length
� Example
records ofnumber total
lengthsearch total Length Search Average =
2.25
52211 Length Search Average =++++=
<Figure 11.7>Average search lengthversus packing densityin a hashed file
6. Storing More Than One Record
per Address : Buckets� Buckets
Key Home Address
Green
Hall
Jenks
0
0
2
Green Hall0
1Jenks
King
Land
Marx
Nutt
2
3
3
3
3
Jenks2
King Land Marks3
Nutt4
Storing More Than One Record per
Address : Buckets� Effects of Buckets on Performance
bN
r density packing =
r : # of recordsN : # of addressesb : # of records in a bucket
File without buckets File with buckets
# of records
# of addresses
Bucket size
Packing density
Ratio of records to addresses
r = 750
N = 1000
b = 1
0.75
r/N = 0.75
r = 750
N = 500
b = 2
0.75
r/N = 1.5
Storing More Than One Record per
Address : Buckets<Table 11.4> Synonyms causing collisions as a percent of records for different
packing densities and different bucket sizes
Packing
density
Bucket size
1 2 5 10density 1 2 5 10
20 %
50 %
80 %
100 %
9.4
21.3
31.2
36.8
2.2
10.4
20.4
27.1
0.1
2.5
10.3
17.6
0.0
0.4
5.3
12.5
7. Making Deletions
KeyHome
Address
Actual
address
Adams 0 0
Adams0
Jones1
Jones
Morris
Smith
1
1
0
1
2
3
Jones1
Morris2
Smith3
Making Deletions� (1) Tombstones for Handling Deletions
Adams0
Jones1
Morris2
Smith3
* Deletion of Morris
Smith3
Adams0
Jones1
###2
Smith3
“Smith는찾을수없다”
### : tombstoneThis mark indicates that a record once lived there but no longer does
Making Deletions� (2) Implications of Tombstones for Insertions
� Inserting “Smith”
� (3) Effects of Deletions and Additions on Performance� (3) Effects of Deletions and Additions on Performance
� Solution to problem of deteriorating average search length
� Reorganization
8. Other Collision Resolution
Techniques� (1) Double Hashing
� Second hashing function
� Increment(c) adding
� Seek time overhead� Seek time overhead
Other Collision Resolution
Techniques� (2) Chained Progressive Overflow
KeyHome
address
Actual
Address
Search
length(1)
Search
length(2)
Adams 0 0 1 1
Adams0
Bates1
Cole2
Dean3
Evans4
Flint5Adams
Bates
Cole
Dean
Evans
Flint
0
1
0
1
4
0
0
1
2
3
4
5
1
1
3
3
1
6
1
1
2
2
1
3
Flint5
Adams0
Bates1
Cole2
Dean3
Evans4
Flint5
2
3
5
-1
-1
-1
Other Collision Resolution
Techniques� (3) Chaining with a Separate Overflow Area
Adams0
Bates1
0
1
Cole
Dean
2
-1
Homeaddress
Primarydata area
Overflowarea
2
3
Evans4 -1
Flint -1
Other Collision Resolution
Techniques� (4) Scatter Tables: Indexing Revisited
0
1
Adams
Coles
1
3
2
3
4Deans
Bates 4
Flint -1
-1
-1Evans
Patterns of Record Access� A small percentage of the records in a file account for a
large percentage of the accesses :80 / 20 Rule
80% of the accesses are performed on 20% of the 80% of the accesses are performed on 20% of the records