An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh

An Improved Constructionfor Counting Bloom Filters

Flavio BonomiMichael Mitzenmacher

Rina PanigrahySushil Singh

George Varghese

Presented by:Sailesh Kumar

2 - Sailesh Kumar - 04/18/23

Bloom Filter

Store a set S = {x1,x2,x3,…xn} on some universe U, so that we are able to answer queries of the form:

» Is x a member of S

Bloom Filter is a technique that can answer this» Small amount of space independent of element size» Constant query time» False positive probability (some probability of a wrong answer)

Alternative to hashing with some interesting trade-offs


Bloom Filter

X

1

1

1

1

1

m-bit Array

H1

H2

H3

H4

Hk

Bloom Filter


Bloom Filter

Y

1

1

1

1

1

m-bit Array

1

1

1

H1

H2

H3

H4

Hk


Bloom Filter

X

1

1

1

1

1

m-bit Array

1

1

1

match

H1

H2

H3

H4

Hk


Bloom Filter

W

1

1

1

1

1

m-bit Array

1

1

1

Match

(false positive)

H1

H2

H3

H4

Hk


How many Hash Functions?

k = no. of hash functions n = Total no. of elements m = no. of bits in the array

Objective is to pick k so that we minimize the false positive prob.

It is fairly simple to derive that k = (ln 2)m/n

» For opt. k, fpp is approx. (0.6185)m/n


How many Hash Functions?

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 1 2 3 4 5 6 7 8 9 10

Hash functions

Fal

se p

osit

ive

rate m/n = 8

Opt k = 8 ln 2 = 5.5


Counting Bloom Filter

Bloom filters do not support deletes» Use counting Bloom filter

Use counters instead of bits in the array» Instead of setting the bits, increment the counters

During query, if (counter > 0) implies the bit is set


Counting Bloom Filter

X

1

1

1

1

1

m-counter Array

H1

H2

H3

H4

Hk

Bloom Filter


Bloom Filter

Y

1

1

1

m-counter Array

1

1

1

H1

H2

H3

H4

Hk

1

1

2

2

Deletes are straightforward:Just decrement the counters


Improved Counting Bloom Filter

4-bit counters ensures wvhp that counters do not overflow» 4x increase in space compared to Bloom filter

Construct an alternative Bloom filter that is 2 times compact than CBF» Based upon d-left hashing and fingerprinting technique

We need to understand d-left hashing and fingerprinting


Fingerprinting

Temporarily assume that we have a perfect hash function h» Use some random function to compute c-bit fingerprints» F() : U -> [2c]

» False positive prob. = 1/2c

» 2x compact than Bloom filter

» Not easy to compute the perfect hash function h– Use near perfect hashing (d-left)

Element 1 Element 2 Element 3 Element 4 Element 5

Fingerprint(4)Fingerprint(5)Fingerprint(2)Fingerprint(1)Fingerprint(3)

h


d-left hashing

Use d equal sized tables Use d different hash functions and chose bucket from each table A bucket can store multiple elements Store the element into least loaded bucket (break tie to left)

Interesting properties:» Very small maximum load O(log log n)» Maximum load is close to average load even for small d such as 4» 80% space utilization with d=4



Use d-left hashing d hash tables each containing B buckets

» Note that a bucket contains multiple cells; a cell can store a fingerprint and a small counter

In order to store an element, we compute its fingerprint» Fingerprint consists of two components

– Bucket index – [1, B]– Remainder – [1, R], thus log2R bits, stored explicitly

» We use separate bucket index for each table but identical remainders

» Use d-left insertion policy; augment fingerprint with counters; if fingerprint matches, then increment the counter



Element x H(x) = (3, 7), (4, 7) : we store element in first table

7

Element y H(y) = (1, 5), (5, 5) : we store element in first table

5

Element z H(z) = (1, 7), (4, 7) : we store element in second table

7

Now, if we try to delete x, we do not know whether fingerprint in table 1 ortable 2 has to be removed



Solve the problem by breaking the hash operating into 2 phases

1st phase: compute a single true fingerprint

2nd phase: to obtain d locations, use permutations P1, … Pd

A permutation of a set is a one-to-one map of the set onto itself

This simple modification enables proper delete operations

1 2 3 4 5

3 5 1 2 4



Claim. When deleting an element in the set, only one remainder corresponding to the element will exist in the table.

Proof:» Suppose not. Then there is some element x ∈ S whose

remainder is stored in table j to be deleted and at the same time another element y ∈ S such that Pi(fx) = Pi(fy) for i = j.

» Since the Pi are permutations, we must have that fx = fy, so x and y share the same true fingerprint.

» Let x was inserted before y; in this case, when y is inserted, the counter in table j associated with the remainder of x would be incremented, contradicting our assumption.


Simulation Results

Target is fpp < 0.002

dlCBF configuration» d = 4 tables with 2048 buckets each» Each bucket has 8 cells» Target load = 0.75 (6 items per bucket)» 14-bit fingerprint, r.» 2-bit counter to handle identical fingerprints

» Total size of structure = 220 bits. Total items = 3x214

CBF configuration» 13.5 counters per element (9 hash function)» For 3x214 elements, we will need 2.5x220 bits, 2.5 times dlCBF


Questions?

Documents

An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh