20
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh Kumar

An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

Embed Size (px)

Citation preview

Page 1: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

An Improved Constructionfor Counting Bloom Filters

Flavio BonomiMichael Mitzenmacher

Rina PanigrahySushil Singh

George Varghese

Presented by:Sailesh Kumar

Page 2: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

2 - Sailesh Kumar - 04/18/23

Bloom Filter

Store a set S = {x1,x2,x3,…xn} on some universe U, so that we are able to answer queries of the form:

» Is x a member of S

Bloom Filter is a technique that can answer this» Small amount of space independent of element size» Constant query time» False positive probability (some probability of a wrong answer)

Alternative to hashing with some interesting trade-offs

Page 3: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

3 - Sailesh Kumar - 04/18/23

Bloom Filter

X

1

1

1

1

1

m-bit Array

H1

H2

H3

H4

Hk

Bloom Filter

Page 4: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

4 - Sailesh Kumar - 04/18/23

Bloom Filter

Y

1

1

1

1

1

m-bit Array

1

1

1

H1

H2

H3

H4

Hk

Page 5: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

5 - Sailesh Kumar - 04/18/23

Bloom Filter

X

1

1

1

1

1

m-bit Array

1

1

1

match

H1

H2

H3

H4

Hk

Page 6: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

6 - Sailesh Kumar - 04/18/23

Bloom Filter

W

1

1

1

1

1

m-bit Array

1

1

1

Match

(false positive)

H1

H2

H3

H4

Hk

Page 7: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

7 - Sailesh Kumar - 04/18/23

How many Hash Functions?

k = no. of hash functions n = Total no. of elements m = no. of bits in the array

Objective is to pick k so that we minimize the false positive prob.

It is fairly simple to derive that k = (ln 2)m/n

» For opt. k, fpp is approx. (0.6185)m/n

Page 8: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

8 - Sailesh Kumar - 04/18/23

How many Hash Functions?

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 1 2 3 4 5 6 7 8 9 10

Hash functions

Fal

se p

osit

ive

rate m/n = 8

Opt k = 8 ln 2 = 5.5

Page 9: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

9 - Sailesh Kumar - 04/18/23

Counting Bloom Filter

Bloom filters do not support deletes» Use counting Bloom filter

Use counters instead of bits in the array» Instead of setting the bits, increment the counters

During query, if (counter > 0) implies the bit is set

Page 10: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

10 - Sailesh Kumar - 04/18/23

Counting Bloom Filter

X

1

1

1

1

1

m-counter Array

H1

H2

H3

H4

Hk

Bloom Filter

Page 11: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

11 - Sailesh Kumar - 04/18/23

Bloom Filter

Y

1

1

1

m-counter Array

1

1

1

H1

H2

H3

H4

Hk

1

1

2

2

Deletes are straightforward:Just decrement the counters

Page 12: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

12 - Sailesh Kumar - 04/18/23

Improved Counting Bloom Filter

4-bit counters ensures wvhp that counters do not overflow» 4x increase in space compared to Bloom filter

Construct an alternative Bloom filter that is 2 times compact than CBF» Based upon d-left hashing and fingerprinting technique

We need to understand d-left hashing and fingerprinting

Page 13: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

13 - Sailesh Kumar - 04/18/23

Fingerprinting

Temporarily assume that we have a perfect hash function h» Use some random function to compute c-bit fingerprints» F() : U -> [2c]

» False positive prob. = 1/2c

» 2x compact than Bloom filter

» Not easy to compute the perfect hash function h– Use near perfect hashing (d-left)

Element 1 Element 2 Element 3 Element 4 Element 5

Fingerprint(4)Fingerprint(5)Fingerprint(2)Fingerprint(1)Fingerprint(3)

h

Page 14: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

14 - Sailesh Kumar - 04/18/23

d-left hashing

Use d equal sized tables Use d different hash functions and chose bucket from each table A bucket can store multiple elements Store the element into least loaded bucket (break tie to left)

Interesting properties:» Very small maximum load O(log log n)» Maximum load is close to average load even for small d such as 4» 80% space utilization with d=4

Page 15: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

15 - Sailesh Kumar - 04/18/23

Improved Counting Bloom Filter

Use d-left hashing d hash tables each containing B buckets

» Note that a bucket contains multiple cells; a cell can store a fingerprint and a small counter

In order to store an element, we compute its fingerprint» Fingerprint consists of two components

– Bucket index – [1, B]– Remainder – [1, R], thus log2R bits, stored explicitly

» We use separate bucket index for each table but identical remainders

» Use d-left insertion policy; augment fingerprint with counters; if fingerprint matches, then increment the counter

Page 16: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

16 - Sailesh Kumar - 04/18/23

Improved Counting Bloom Filter

Element x H(x) = (3, 7), (4, 7) : we store element in first table

7

Element y H(y) = (1, 5), (5, 5) : we store element in first table

5

Element z H(z) = (1, 7), (4, 7) : we store element in second table

7

Now, if we try to delete x, we do not know whether fingerprint in table 1 ortable 2 has to be removed

Page 17: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

17 - Sailesh Kumar - 04/18/23

Improved Counting Bloom Filter

Solve the problem by breaking the hash operating into 2 phases

1st phase: compute a single true fingerprint

2nd phase: to obtain d locations, use permutations P1, … Pd

A permutation of a set is a one-to-one map of the set onto itself

This simple modification enables proper delete operations

1 2 3 4 5

3 5 1 2 4

Page 18: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

18 - Sailesh Kumar - 04/18/23

Improved Counting Bloom Filter

Claim. When deleting an element in the set, only one remainder corresponding to the element will exist in the table.

Proof:» Suppose not. Then there is some element x ∈ S whose

remainder is stored in table j to be deleted and at the same time another element y ∈ S such that Pi(fx) = Pi(fy) for i = j.

» Since the Pi are permutations, we must have that fx = fy, so x and y share the same true fingerprint.

» Let x was inserted before y; in this case, when y is inserted, the counter in table j associated with the remainder of x would be incremented, contradicting our assumption.

Page 19: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

19 - Sailesh Kumar - 04/18/23

Simulation Results

Target is fpp < 0.002

dlCBF configuration» d = 4 tables with 2048 buckets each» Each bucket has 8 cells» Target load = 0.75 (6 items per bucket)» 14-bit fingerprint, r.» 2-bit counter to handle identical fingerprints

» Total size of structure = 220 bits. Total items = 3x214

CBF configuration» 13.5 counters per element (9 hash function)» For 3x214 elements, we will need 2.5x220 bits, 2.5 times dlCBF

Page 20: An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

20 - Sailesh Kumar - 04/18/23

Questions?