64
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Hash Tables (2) Prof. Neeraj Suri Dan Dobre

© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group Introduction to Computer Science 2 Hash

Embed Size (px)

Citation preview

Page 1: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

© Neeraj SuriEU-NSF ICT March 2006

Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

Introduction to Computer Science 2

Hash Tables (2)

Prof. Neeraj SuriDan Dobre

Page 2: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 2Hash Tables (2)

Overview

So far: Direct hashing Hash functions (folding, modulo etc.) Collision resolution (linear & quadratic probing)

What’s next? Collision resolution continued Cost analysis of hashing Hashing on external memory Extendible (dynamic) hashing Excursus: (pseudo-)random numbers and their application

Page 3: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 3Hash Tables (2)

Double/repeated Hashing

If a collision occurs the key is hashed a second time using another Hash function.

Can be generalized: if a collision occur, the key is hashed again using the next Hash function.

If the collision after using k Hash functions persists, another technique has to be applied.

Avoids collision accumulation, delete remains complex, accessibility of the entire memory space is problematic

Page 4: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 4Hash Tables (2)

Chaining of synonyms in the same HT

Members of a collision class are chained. Each memory slot in HT must have an additional pointer. Because there is no separate overflow area, collisions

continue to occur due to foreign occupation. Chaining doesn’t prevent the collisions, however it

facilitates the search. Delete becomes considerably easier, because only one

pointer have to be reset. Insert requires to follow the pointer list, until a free place

is found. If the home address is occupied by another key (which

does not belong there), move it.

Page 5: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 5Hash Tables (2)

Chaining: Example

h0 (K) = K mod 7; hi (K) = (h0 (K) + i) mod m

Insert: 11, 32, 8, 25

0 1 2 3 4 5 6

11 325

8 256

Page 6: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 6Hash Tables (2)

Chaining: Example

h0 (K) = K mod 7; hi (K) = (h0 (K) + i) mod m

Now insert 12

Move 32: search left for pointer, then move further to position 0.

0 1 2 3 4 5 6

8 11 32 255 6

Page 7: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 7Hash Tables (2)

Chaining: Example

h0 (K) = K mod 7; hi (K) = (h0 (K) + i) mod m

Now insert 12 in its home address

0 1 2 3 4 5 6

32 8 11 256 0

Page 8: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 8Hash Tables (2)

Chaining: Example

h0 (K) = K mod 7; hi (K) = (h0 (K) + i) mod m Delete 11:

Follow chain until 25 is reached (4-0-6) Move 25 to its home address 4 Delete pointer “6” in address 0

0 1 2 3 4 5 6

32 8 11 12 256 0

Page 9: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 9Hash Tables (2)

Chaining: Example

h0 (K) = K mod 7; hi (K) = (h0 (K) + i) mod m

Collision chain until 32 is now broken (empty address 6) But this is not a problem since pointers are used for

chaining

0 1 2 3 4 5 6

32 8 25 120

Page 10: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 10Hash Tables (2)

Chaining with separate overflow

All records, which can not be stored in the own home address, are transferred to an overflow area.

Overflow area can be: A single overflow for all synonyms with only one entry point

• simple, avoid having pointers in the Hash table • possibly long synonym chains, therefore only suitable with small

collision frequency A single overflow with more than one entry point

• efficient, since only members of a collision class are browsed• requires pointer for each entry in Hash table• reference to synonym chain can be implemented using double

Hashing in the case of collisions synonyms (mostly few) of 2 collision classes are affected

Page 11: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 11Hash Tables (2)

Chaining with separate overflow

Separate overflow area can be assigned dynamically HT can be restricted to the keys in the home address, all

data can be stored in the dynamic overflow area. Since pointers can refer to any address, this corresponds

to a partition of the overflow Chaining of synonyms is a preferred method

Position Key Pointer

0 HAYDN HAENDEL VIVALDI 1 BEETHOVEN BACH BRAHMS 2 CORELLI 3 4 SCHUBERT LISZT 5 MOZART 6

Page 12: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 12Hash Tables (2)

Hashing: analysis of the costs

Cost measure: Number of steps (addressing attempts)

Assumption: The same time effort for all h(Kp) and search steps The Hash table is allocated with n keys

Search costs Sn = delete costs without rearrangement

Insert costs = unsuccessful search Un

Delete costs = Sn + rearrangement Rn

Costs can be expressed as function of the allocation factor = n/m

Page 13: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 13Hash Tables (2)

Hashing: analysis of the costs – extreme cases

Worst case: Sn = n

Un = n + 1 One collision class, access as in linear list

Best case: Sn = 1

Un = 1 No collisions

Page 14: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 14Hash Tables (2)

Hashing: analysis of the costs – average cases

Average case depends on overflow handling

Assumption: h(Kp) distributes keys uniformly

-> Probability, that a key a Hash value 0 i m-1 has, is 1/m

Page 15: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 15Hash Tables (2)

Costs using linear probing

Example hi(k) = (h0(k)+i) mod m In the case of small allocation of HT, no problem In the case of higher allocation, drastic degradation

Probability p, that 7 will be allocated is 1/m because 6 is free Probability that 14 will be allocated is 5/m (the p for 14 as home

address plus the sum of the p for 10,11,12,13, which can produce an overflow on 14)

Long chains will be longer and chains can grow together (insert in 3 or 14)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Page 16: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 16Hash Tables (2)

Costs using linear probing

According to KnuthSn = 0.5(1 + 1/(1- ))

with 0 = n/m < 1

Un = 0.5(1 + 1/(1- )2)

0.1 0.3 0.5 0.7 0.9

8

7

6

5

4

3

2

1

Sn

Un

Number of search steps increases drastically with higher allocationfactor

Steps

Page 17: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 17Hash Tables (2)

Costs using optimal collision resolution

With optimal methods for collision resolution a uniform distribution can be approximately assumed despite collision E.g. : rehashing, pseudo-random numbers etc.

Probability that a place is occupied/free depends on the number of the already allocated places (n) and on the ones, that are still available (m-n) E.g. : Pfree = (m-n)/m

See script for details of the derivative

Page 18: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 18Hash Tables (2)

Costs using optimal collision resolution (2)

ApproximatelySn ~ |(1/ ) ln(1- )|

with 0 = n/m < 1Un ~ 1/(1- )

0.1 0.3 0.5 0.7 0.9

8

7

6

5

4

3

2

1

Sn

UnNumber of search steps can improvedrastically with independentallocation after collision resolution

Steps

Page 19: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 19Hash Tables (2)

Costs using separate overflow

Assumption: Uniform distribution of the keys over all chains n/m = Keys per chain, furthermore linear chaining (Q: how big is Sn?)

If key i is inserted in HT, then i-1 keys are in the table and in each chain (i-1)/m keys

Costs to find a free place are 1 step for home address plus (i-1)/m steps to reach end of the chain (must first see, if the key already exists in table or not)

Averaged over all n keys

Sn = 1/n i=1...n(1 + (i-1)/m) = 1+(n-1)/2m ~ 1+ /2

Page 20: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 20Hash Tables (2)

Costs using separate overflow

For successful search half of the chain will be traversed in average

For unsuccessful search the entire chain has to be traversed

Chaining is superior to other methods, even with high overflow ( >1) good efficiency

0.5 0.75 1 1.5 2 3 4 5

Sn 1.25 1.37 1.5 1.75 2 2.5 3 3.5

Un 1.11 1.22 1.37 1.72 2.14 3.05 4.02 5.01

Page 21: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 21Hash Tables (2)

Hashing on external memory (b>1)

With bucket factor > 1, b records can be stored in one address

For both main and external memory suitable, particularly attractive with external memory

During collision the new record will simply be stored in the same bucket

First within b+1 entries bucket overflows

Having overflow the known methods for collision resolution can be applied Overflow in primary area Separate overflow area

Page 22: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 22Hash Tables (2)

Hashing on external memory

Overflow bucket can be assigned dynamically and interlinked with overflow address

An overflow bucket can serve for several home addresses as overflow area

Recommended: one chain per collision class

With b>1 is =n/bm

Sequence for storing records in bucket: According to the insert sequence (sequential) According to the sorting sequence (linked list)

Page 23: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 23Hash Tables (2)

Hashing on external memory

Typical bucket size: Sector Track Page

Generally: Transfer unit (1 I/O per bucket)

Like B-Trees: I/O dominates (approx. 6-10 ms) more complex Hash function justified Relative search costs inside one bucket are low

Insert always at first free space in chain

While deletion, no need to bridge gaps (or only inside a page)

Empty overflow buckets are removed from chain

Page 24: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 24Hash Tables (2)

Example: b=2

b=2; h(k) = k mod 7

Insert: 11, 32, 8, 25, 21, 15, 2, 18, 13, 20, 4, 27

0 1 2 3 4 5 6

Page 25: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 25Hash Tables (2)

Example: b=2 (2)

Now: delete 25

0 1 2 3 4 5 6

21 8 2 11 13

15 32 20

25 27

18

4

Page 26: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 26Hash Tables (2)

Example: b=2 (3)

Chains will not be closed!Inside of a page will berearranged if needed.

0 1 2 3 4 5 6

21 8 2 11 13

15 32 20

18 27

4

Page 27: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 27Hash Tables (2)

Summary: Hashing on external memory

Primary buckets remain always assigned because of relative addressing

Overflow buckets will be assigned dynamically (append), delete empty buckets

With strong negative growth, buckets possibly understaffed (reorganization of the file, e.g. using rehashing of all entries stored in the hash table)

Page 28: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 28Hash Tables (2)

Approximate values for Hashing

Selected values for Sn(b) and Un(b) as function of b and β

Rule of thumb: b is typically determined by data transfer unit, select β in such a way, that

S ~ 1.05 to 1.08 holds

Page 29: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 29Hash Tables (2)

Hashing vs. B+-Tree

Access costs with a good designed Hash method better than B+-Tree (1.05 vs. path length)

Disadvantages: no sorting of all keys (sequential output needs an obviously

higher cost) Hashing is static

• Not extendable, long chains lead to degenerations• Consumes already with a small number of keys the complete

designated memory space• (can also be an advantage: the required memory space is defined to

a large extent from the beginning)

Page 30: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 30Hash Tables (2)

Extendible Hashing

Disadvantages of static Hash methods with strongly growing volume of data Primary area must be largely dimensioned from the beginning

( bad initial allocation) If the capacity of the primary area is exceeded, the overflow

chains grow fast Run time behavior degrades Reorganization requires to unload the entire volume of data and

to load it again interruption of the operation (often not possible, e.g., with 24x7 operation)

Page 31: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 31Hash Tables (2)

Extendible Hashing

Therefore we need a Hash method that Permits dynamic growing and shrinking of the Hash area Guarantees constant run time behavior independently of the size

of data Requires not more than 2 page accesses for finding a record Avoids overflow mechanisms and total reorganization Guarantees a high allocation of the memory independently of the

growth of the key set

Page 32: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 32Hash Tables (2)

Extendible Hashing

Must avoid overflow buckets

Would like stability are ready to pay for it, i.e., constantly 2 accesses

Available (known to us) techniques Balancing the B-Trees (constant path length) Addressing techniques via coding of the key from digital trees

Extendible Hashing uses these techniques in order to guarantee a stable access with exactly 2 I/O operations.

Page 33: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 33Hash Tables (2)

Extendible Hashing

Hash function transforms keys into binary strings (coding)

Only the first n bits are used if necessary (addressing like in the digital tree)

Additional indirection over container board Having few keys, few bits are sufficient With many keys additional bits are used

Containers are if necessary added or removed (balancing)

Container board is “doubled” if necessary memory space costs, but not high intensive computations

Page 34: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 34Hash Tables (2)

Example: Extendible Hashing

Insertion sequence: 11, 32, 8, 25, 21, 15, 2, 18, 13, 20, 4, 27

11 001011 2 00001032 100000 18 0100108 001000 13 00110125 011001 20 01010021 010101 4 00010015 001111 27 011011

Page 35: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 35Hash Tables (2)

Extendible Hashing, b=2

Initial situation Container board contains only a

reference To an empty container

Insert11 00101132 100000works without problems

Page 36: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 36Hash Tables (2)

Extendible Hashing, b=2

Next key8 001000

Doesn’t fit anymore

Thus, doubling of the capacity through duplication of the container board (still no extra containers!)

Page 37: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 37Hash Tables (2)

Extendible Hashing, b=2

Blue numbers: implicit through addresses of the container board

Now: next key 8 001000

Fits through partition of the boards

Page 38: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 38Hash Tables (2)

Extendible Hashing, b=2

Next key25 011001

Doesn’t fit in the first board, no other address available (for partition of the container) container board has to be doubled

Page 39: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 39Hash Tables (2)

Extendible Hashing, b=2

Again: through doubling of the container board, no extra container is generated

Next key (still)25 011001

Page 40: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 40Hash Tables (2)

Extendible Hashing, b=2

Additional container

Page 41: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 41Hash Tables (2)

Extendible Hashing, b=2

Next key21 010101

No problems

Page 42: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 42Hash Tables (2)

Extendible Hashing, b=2

Next key15 001111

Easy doubling of the container board

Page 43: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 43Hash Tables (2)

Extendible Hashing, b=2

Next key15 001111

Still not possible Doubling again

Page 44: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 44Hash Tables (2)

Extendible Hashing, b=2

Next key15 001111

Now selectivity is sufficient big container doubling

Page 45: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 45Hash Tables (2)

Extendible Hashing, b=2

Next key (straight-forward)2 00001018 01001013 00110120 0101004 00010027 011011

Page 46: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 46Hash Tables (2)

Extendible Hashing, b=2

Finish

Page 47: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 47Hash Tables (2)

Extendible Hashing

Within the key the prefix doesn’t need to be used always, one can also use the postfix

Within keys which are not uniformly distributed, an internal hash function can be used to produce the bit string to utilize in extendible hashing

Page 48: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 48Hash Tables (2)

Summary, extendible Hashing

Key fragment with n bits direct hashing (container board)

Container having a bucket factor b>1 (typically b>20)

Search Look up the container address in the container board Search in the container (e.g., binary search)

Page 49: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 49Hash Tables (2)

Summary, extendible Hashing

Insert Look up the container address in the container board Search in the container If found good, no further actions If not found

• If there is a free slot in the container insert• If no free slot is there

- Double the container board until the key fragment is selective enough to establish more containers (note: sometimes the container board doesn’t need to be doubled)

- Add new containers and if needed, redistribute keys from the old container among the new containers

Page 50: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 50Hash Tables (2)

Summary, extendible Hashing

Delete Look up the container address in the container board Search in the container If found delete If container is empty delete the container, set pointer in the

container board to the neighbor container

Page 51: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 51Hash Tables (2)

Extendible Hashing

In principle very similar to direct hashing using the first bits of the key (h(k) = k / 2x)

BUT: Within direct hashing the doubling of the table if an overflow occurs is much more expensive. For extendible hashing, each pointer should only be set to two successive addresses, for direct hashing each address should be split.

Page 52: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 52Hash Tables (2)

Example

Extendible hashing Direct Hashing

(There is no container board in direct hashing, but we added it here for the sake of understanding)

Page 53: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 53Hash Tables (2)

Analysis, extendible Hashing

Search has a constant cost, two I/O operations

Delete is combined if needed with the deletion of the container, but still constant cost

For insert “usually” max. 5 operations (search, write to the container, if needed write to other containers, write to the container board)

BUT IN ADDITION: If needed reorganization of the container board (duplicate all pointers)

Page 54: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 54Hash Tables (2)

Analysis, extendible Hashing

Doubling of the container board occurs mainly in the main memory low cost in comparison to I/O operations

A very successful and widely used method

Page 55: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 55Hash Tables (2)

Excursus: Pseudo-random numbers

A topic which is well related to hashing

Why “pseudo”-random numbers Computer is a “good” computational menial Algorithms are always executed reliably in a similar way Consequence: generating random numbers is not a strength of

computers!

Applications Games Simulation Generating keys for cryptography

But specially also numerical solutions of problems

Page 56: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 56Hash Tables (2)

Example of an application

Computation of Pi

Surface of the unit circle (Pi)

Compute the surface offourth of the circle (Pi/4)numerically and thenmultiply by 4 Pi

1

1

1

Page 57: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 57Hash Tables (2)

Compute Pi

Counting:36 x 36 =1296 smallboxes

Or roll the dice!

11

65

64

63

62

61

56

55

54

53

52

51

46

45

44

43

42

41

36

35

34

33

32

31

26

25

24

23

22

21

16

15

14

13

12

66

1

66

66655555

544444433

33332222

2211111

6

1

543

216543216

54321654

321654321

65432

6

Page 58: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 58Hash Tables (2)

Compute Pi

Particularly for computations of four-dimensional cases (e.g., physic systems with many degrees of freedom, computation of physic simulations, crash tests, …) it isn’t possible to go through all possible parameters systematically

The utilization of (good) multi-dimensional random numbers can lead to better results while using less values

Page 59: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 59Hash Tables (2)

Pseudo-random numbers

For this type of applications, pseudo-random numbers are even better than “real” random numbers

How works a normal pseudo-random generator? Needs an initialization z0

A random function computes starting from the last random number the next one:zn = Z(zn-1)

Requirements are also like those of hash-/collision resolution functions: Uniform distribution of the random numbers All random numbers (from a specific interval) should eventually

appear once in the sequence

Page 60: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 60Hash Tables (2)

Example: Mid-square-generator

Was implemented e.g., in Apple II

zn = middle_digits(zn-12)

Example: z0 = 42

42 x 42 = 1764; 76 x 76 = 5776 etc.

Sequence: 42 – 76 – 77 – 92 – 46 – 11 – 12 – 14 – 19 – 36 – 29 – 84 – 5 – 2 – 0 – 0 – 0 - …

Many sequences either ends with “0” or are repeated continuously (24 – 57 – 24 – 57 - …)

Very bad generator

Page 61: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 61Hash Tables (2)

Linear congruence-generator

Better: linear congruence- generator

Appears to be familiar to us

zn = (zn-1 * a + b) mod m

Example:zn = (zn-1 * 21 + 17) mod 40

… generates an optimal sequence …1 - 38 - 15 - 12 - 29 - 26 - 3 - 0 - 17 - 14 - 31 - 28 - 5 - 2 - 19 - 16 - 33 - 30 - 7 - 4 - 21 - 18 - 35 - 32 - 9 - 6 - 23 - 20 - 37 - 34 - 11 - 8 - 25 - 22 - 39 - 36 - 13 - 10 - 27 - 24 - 1

Page 62: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 62Hash Tables (2)

Linear congruence-generator

zn = (zn-1 * a + b) mod m

Parameter a, b, m determine the quality

Like in Hashing: it is reasonably easy to define the minimal requirements for a good quality e.g., a, m coprime

But: uniform distribution for multi-dimensions is hard

Example: 2, 7, 4, 9, 6, 1, 8, 3, 0, 5, …

One-dimension: uniformly distributed

Two-dimensions: (2, 7) (4, 9) (6, 1) (8, 3), (0, 5) located in two “lines” – not uniformly distributed

Page 63: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 63Hash Tables (2)

Linear congruence-generator

Separate research area in computer science and mathematics which is focused on finding good pseudo-random generators

For numerical applications pseudo-random numbers are often better than real random numbers

For cryptography this doesn’t apply anymore – there are plug-in cards which generate real random numbers because of quantum physics …

Page 64: © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group  Introduction to Computer Science 2 Hash

ICS-II - 2008 64Hash Tables (2)

Thoughts: Hash / Random

Often, the computer produces apparently chaos

The computer can not do this really: if you look deeply it is always another way of ordering

“Chaotic” arrangement of data in hash tables and pseudo-random generators are good examples for this