25
St. Vincent Pallotti College of Engineering & Technology Department of Computer Engineering Academic Year: - 2019-20 Sub Code:- BECME402T Course:- File Structure and Data Processing Semester:-IV Sem. CE Course Teacher:- S.M.Wanjari

St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

St. Vincent Pallotti College of Engineering & Technology

Department of Computer Engineering

Academic Year: - 2019-20

Sub Code:- BECME402T Course:- File Structure and Data Processing

Semester:-IV Sem. CE Course Teacher:- S.M.Wanjari

Page 2: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s
Page 3: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Contents� Introduction

� A Simple Hashing Algorithm

� Hashing Functions and Record Distributions

How Much Extra Memory Should Be Used?� How Much Extra Memory Should Be Used?

� Collision Resolution by Progressive Overflow

� Storing More Than One Record per Address: Buckets

� Making Deletions

� Other Collision Resolution Techniques

� Patterns of Record Access

Page 4: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

1. Introduction� O-notation

� O(1)

� O(N) : sequential searching

� O(log2N)2

� O(logkN) : B-Tree

� What is Hashing?� a = h(K)

� h (hash function), K (key), a (home address)

� ExampleK = BASSh = (first char * second char) mod 1000

� a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290

Page 5: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Introduction� Collision

� Examplekey : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4

OLIVIER => a = (79 * 76) mod 1000 = 6,004 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4

� Several ways to reduce the number of collisions� 1. Spread out the records

� Good hashing algorithms

� 2. Use extra memory

� 3. Put more than one record at a single address� Buckets

Page 6: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

2. A Simple Hashing Algorithm� 3 Steps

� 1. Represent the key in numerical form

� 2. Fold and add

� 3. Divide by a prime number and use the remainder as � 3. Divide by a prime number and use the remainder as the address

� Example

� Step 1. Represent the Key in Numerical FormLOWELL =

76 79 87 69 76 76 32 32 32 32 32 32L O W E L L Blanks

Page 7: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

A Simple Hashing Algorithm� Example

� Step 2. Fold and Add76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 327679 + 8769 + 7676 + 3232 + 3232 = 30588(30588+3232 = 33820 => 2byte Maximum 값 32767 을초과하므로)(30588+3232 = 33820 => 2byte Maximum 32767 )

7679 + 8769 = 16448 => 16448 mod 19937 = 1644816448 + 7676 = 24124=> 24124 mod 19937 = 41874187 + 3232 = 7419 => 7419 mod 19937 = 74197419 + 3232 = 10651 => 10651 mod 19937 = 1065110651 + 3232 = 13883 => 13883 mod 19937 = 13883

� Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46

Page 8: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

3. Hashing Functions and Record

Distributions� Distributing Records among Addresses

1234

AB

Record Address

Best

1234

AB

Record Address

Worst

1234

AB

Record Address

Acceptable

45678910

BCDEFG

(a)

45678910

BCDEFG

(b)

45678910

BCDEFG

(c)

<Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case(c) Randomly distribution (Acceptable)

Page 9: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Hashing Functions and Record

Distributions� Some Other Hashing Methods

� Better than random

� Examine keys for a pattern

� Divide the key by a prime number� Divide the key by a prime number

� Random

� Square the key and take the middle4532 => 2 0 5 2 0 9

� Radix transformation

Page 10: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

4. How Much Extra Memory

Should Be Used ?� Packing Density

N

r=spaces of #

records of #

� Example

r = 75 records

N = 100 address %7575.0100

75 ==

Page 11: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

How Much Extra Memory Should

Be Used ?� Predicting Collisions for Different Packing Densities

Packing density (%) Synonyms (%)

10

40

4.8

17.640

70

90

100

17.6

28.1

34.1

36.8

<Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses

Page 12: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

5. Collision Resolution by

Progressive Overflow� Progressive Overflow

� Open addressing

� Linear probing0address 0

1

Rosen2

Jasper3

York4

Novak’s home address

York’s home address

York h(K)address

3

Novak h(K)address

2

Page 13: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Collision Resolution by Progressive

Overflow� Search Length

KeyHome

Address

# of Access

(Search Length)

Adams

Bates

0

1

1

1

Adams0

Bates1

Cole2Bates

Cole

Dean

Evans

1

1

2

0

1

2

2

5

Dean3

Evans4

5

Page 14: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Collision Resolution by Progressive

Overflow� Search Length

� Example

records ofnumber total

lengthsearch total Length Search Average =

2.25

52211 Length Search Average =++++=

<Figure 11.7>Average search lengthversus packing densityin a hashed file

Page 15: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

6. Storing More Than One Record

per Address : Buckets� Buckets

Key Home Address

Green

Hall

Jenks

0

0

2

Green Hall0

1Jenks

King

Land

Marx

Nutt

2

3

3

3

3

Jenks2

King Land Marks3

Nutt4

Page 16: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Storing More Than One Record per

Address : Buckets� Effects of Buckets on Performance

bN

r density packing =

r : # of recordsN : # of addressesb : # of records in a bucket

File without buckets File with buckets

# of records

# of addresses

Bucket size

Packing density

Ratio of records to addresses

r = 750

N = 1000

b = 1

0.75

r/N = 0.75

r = 750

N = 500

b = 2

0.75

r/N = 1.5

Page 17: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Storing More Than One Record per

Address : Buckets<Table 11.4> Synonyms causing collisions as a percent of records for different

packing densities and different bucket sizes

Packing

density

Bucket size

1 2 5 10density 1 2 5 10

20 %

50 %

80 %

100 %

9.4

21.3

31.2

36.8

2.2

10.4

20.4

27.1

0.1

2.5

10.3

17.6

0.0

0.4

5.3

12.5

Page 18: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

7. Making Deletions

KeyHome

Address

Actual

address

Adams 0 0

Adams0

Jones1

Jones

Morris

Smith

1

1

0

1

2

3

Jones1

Morris2

Smith3

Page 19: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Making Deletions� (1) Tombstones for Handling Deletions

Adams0

Jones1

Morris2

Smith3

* Deletion of Morris

Smith3

Adams0

Jones1

###2

Smith3

“Smith는찾을수없다”

### : tombstoneThis mark indicates that a record once lived there but no longer does

Page 20: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Making Deletions� (2) Implications of Tombstones for Insertions

� Inserting “Smith”

� (3) Effects of Deletions and Additions on Performance� (3) Effects of Deletions and Additions on Performance

� Solution to problem of deteriorating average search length

� Reorganization

Page 21: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

8. Other Collision Resolution

Techniques� (1) Double Hashing

� Second hashing function

� Increment(c) adding

� Seek time overhead� Seek time overhead

Page 22: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Other Collision Resolution

Techniques� (2) Chained Progressive Overflow

KeyHome

address

Actual

Address

Search

length(1)

Search

length(2)

Adams 0 0 1 1

Adams0

Bates1

Cole2

Dean3

Evans4

Flint5Adams

Bates

Cole

Dean

Evans

Flint

0

1

0

1

4

0

0

1

2

3

4

5

1

1

3

3

1

6

1

1

2

2

1

3

Flint5

Adams0

Bates1

Cole2

Dean3

Evans4

Flint5

2

3

5

-1

-1

-1

Page 23: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Other Collision Resolution

Techniques� (3) Chaining with a Separate Overflow Area

Adams0

Bates1

0

1

Cole

Dean

2

-1

Homeaddress

Primarydata area

Overflowarea

2

3

Evans4 -1

Flint -1

Page 24: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Other Collision Resolution

Techniques� (4) Scatter Tables: Indexing Revisited

0

1

Adams

Coles

1

3

2

3

4Deans

Bates 4

Flint -1

-1

-1Evans

Page 25: St. Vincent Pallotti College of Engineering & Technology ......Progressive Overflow Open addressing Linear probing address 0 1 2 Rosen 3 Jasper 4 York Novak’s home address York’s

Patterns of Record Access� A small percentage of the records in a file account for a

large percentage of the accesses :80 / 20 Rule

80% of the accesses are performed on 20% of the 80% of the accesses are performed on 20% of the records