24
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

  • Upload
    george

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems. Sailesh Kumar Patrick Crowley. Problem Statement. How to implement deterministic hast tables Near worst case O (1) deterministic performance We are given with a small amount of on-chip memory - PowerPoint PPT Presentation

Citation preview

Page 1: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking

Subsystems

Sailesh KumarPatrick Crowley

Page 2: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

2 - Sailesh Kumar - 04/21/23

Problem Statement

How to implement deterministic hast tables

Near worst case O(1) deterministic performance

We are given with a small amount of on-chip memory

On-chip memory limited to 1-2 bytes per table entry

In this paper we tackle the above problem

Page 3: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

3 - Sailesh Kumar - 04/21/23

Hash Tables

Hash table uses a hash function which is used to index the table entries» hash("apple") = 5

hash("watermelon") = 3hash("grapes") = 9hash("cantaloupe") = 7hash("kiwi") = 0hash("mango") = 6hash("banana") = 2

» hash("honeydew") = 2

This is called collision» Now what

kiwi

bananaWatermelon

applemango

cantaloupe

grapes

0

1

2

3

4

5

6

7

8

9Linear ProbingDouble HashingHash2(honeydew) = 3

honeydew

honeydewLinear Chaining

honeydew

No. of keys mapped to a

bucket is called collision chain

length

Page 4: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

4 - Sailesh Kumar - 04/21/23

Performance Analysis

Average performance is O(1) However, worst-case performance is O(n) In fact the probability of collision chain > 1 is

pretty high

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

10 20 30 40 50 60 70 80 90 100

Load m/n (%)

Pro

babi

lity

Co llis io n c h a in > 1

Co llis io n c h a in > 2

These keys will take twice time to be

probed

These will take thrice the time to be

probed

Pretty high probability that performance is half or three times

lower

Page 5: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

5 - Sailesh Kumar - 04/21/23

Segmented Hashing

Uses power of multiple choices» has been proposed and used earlier by several authors

A N-way segmented hash» Logically divides the hash table array into N equal segments» Maps the incoming keys onto a bucket from each segment» Picks the bucket which is either empty or has minimum keys

k i

h( ) k i is mappedto this bucket

k i+1

h( )k i+1 is mappedto this bucket

2 1 1 1 2 1 21 2

A 4-way segmented hash table

12

Page 6: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

6 - Sailesh Kumar - 04/21/23

Segmented Hash Performance

More segments improves the probabilistic performance» With 64 segments, probability of collision chain > 2 is nearly

zero even at 100% load» More deterministic hash table performance

1E-15

1E-12

1E-09

1E-06

1E-03

1E+00

10 20 30 40 50 60 70 80 90 100

Load m/n (%)

Pro

b. {

collis

ion

chai

n >

1}

1 s e g me n t

4

16

64

32

8

1E-15

1E-12

1E-09

1E-06

1E-03

1E+00

10 20 30 40 50 60 70 80 90 100

Load m/n (%)

Pro

b. {

collis

ion

chai

n >

2} 1 s e g me n t

4

16

32

8

Page 7: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

7 - Sailesh Kumar - 04/21/23

An Obvious Deficiency

O(N) memory probes per query» Requires N times higher memory bandwidth

How to ensure an O(1) memory probes per query

Use Bloom filters implemented using small on-chip memory (filters out unnecessary memory accesses)

Before going further brief introduction of Bloom filters

2 1 1 1 2 0 1 21 2

k ih( ) Every query requires 4 probes

Page 8: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

8 - Sailesh Kumar - 04/21/23

Bloom Filter

X

1

1

1

1

1

m-bit Array

H1

H2

H3

H4

Hk

Bloom Filter

Page 9: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

9 - Sailesh Kumar - 04/21/23

Bloom Filter

Y

1

1

1

1

1

m-bit Array

1

1

1

H1

H2

H3

H4

Hk

Page 10: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

10 - Sailesh Kumar - 04/21/23

Bloom Filter

X

1

1

1

1

1

m-bit Array

1

1

1

match

H1

H2

H3

H4

Hk

Page 11: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

11 - Sailesh Kumar - 04/21/23

Bloom Filter

W

1

1

1

1

1

m-bit Array

1

1

1

Match

(false positive)

H1

H2

H3

H4

Hk

Page 12: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

12 - Sailesh Kumar - 04/21/23

Adding per Segment Filters

0

1

0

2 1 1 1 2 0 1 21 2

k ih( ) k i can go to any of the 3 buckets

1

0

0

0

0

1

1

0

1

h1(ki)

h2(ki)

hk(ki)

:

mb bits

We can select any of the above three segments and insert the key into the

corresponding filter

Page 13: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

13 - Sailesh Kumar - 04/21/23

False Positive Rates

With Bloom Filters, there is likelihood of false positives» False positive means unnecessary memory accesses

With N segments, clearly the false positive rates will be at least N times higher» In fact, it will be even higher, because we have to also

consider several permutations of false positives

We use Selective Filter Insertion algorithm, which reduces the false positive rates by several orders of magnitude

Page 14: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

14 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Algorithm

0

1

0

k ih( )

2 1 1 1 2 0 1 21 2

k i can go to any of the 3 buckets

1

0

0

0

0

1

1

0

1

h1(ki)

h2(ki)

hk(ki)

:

mb bits

Insert the key into segment 4, since fewer bits are set. Fewer

bits are set => lower false positive

With more segments (or more choices), our

algorithm sets far fewer bits in the Bloom filter

Page 15: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

15 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Details

Greedy policy

For every arriving key

We choose the segment where minimum bits are set in the Bloom filter

We show that this leads to unbalanced segments» Reduced performance

Page 16: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

16 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Algorithmk1

h( )

h1( )

h2( )

1

1

1

Page 17: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

17 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Algorithmk2

h( )

h1( )

h2( )

1

1

1

1

1

Page 18: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

18 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Algorithmk3

h( )

h1( )

h2( )

1

1

1

1

1

1

1

Page 19: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

19 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Algorithmk4

h( )

h1( )

h2( )

1

1

1

1

1

1

1

1

1

Page 20: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

20 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Algorithmk5

h( )

h1( )

h2( )

1

1

1

1

1

1

1

1

1

Reduced No. of

choices

Page 21: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

21 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Enhancement

Objective is to keep segments balanced

Might need to make sub-optimal choices at times

One way is to avoid the most loaded segment» Reduces number of choices by 1

However, it leads to situations where two segments alternately leads

Things get complicated» More detailed version of algorithm can be found in paper

Page 22: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

22 - Sailesh Kumar - 04/21/23

Selective Filter Insertion Results

1E-11

1E-09

1E-07

1E-05

1E-03

1E-01

8 16 24 32 40 48 56 64

Bloom Filter bits per hash table entry

Fal

se p

ositi

ve p

roba

bilit

y

O p t im um k

N o r m a l B lo o m f ilt e r

Se le c t iv e F ilt e r I n se r t io n s

6 4 se gm e n t s

Page 23: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

23 - Sailesh Kumar - 04/21/23

Simulation Results

64K buckets, 32 bits/entry Bloom filter. Simulation runs for 500 phases.

» During every phase, 100,000 random searches are performed. Between two phases, 10,000 random keys are deleted and inserted.

Hash policy = Linear Chaining

1

1.1

1.2

1.3

1.4

1.5

0 20 40 60 80 100

Load (%)

Avg

. sea

rch

time

1 s eg m en t

4

1 66 4

Page 24: Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems

24 - Sailesh Kumar - 04/21/23

Conclusion

We presented a way to implement

» Hash tables with deterministic performance» We utilize small on-chip memory to achieve it» We also show that on-chip memory requirements are modest» Well within the Moore’s law» A 1M hash table for example needs 1-2MB of on-chip memory

Questions?