Upload
nigel-stokes
View
214
Download
1
Embed Size (px)
Citation preview
An Improved Constructionfor Counting Bloom Filters
Flavio BonomiMichael Mitzenmacher
Rina PanigrahySushil Singh
George Varghese
Presented by:Sailesh Kumar
2 - Sailesh Kumar - 04/18/23
Bloom Filter
Store a set S = {x1,x2,x3,…xn} on some universe U, so that we are able to answer queries of the form:
» Is x a member of S
Bloom Filter is a technique that can answer this» Small amount of space independent of element size» Constant query time» False positive probability (some probability of a wrong answer)
Alternative to hashing with some interesting trade-offs
3 - Sailesh Kumar - 04/18/23
Bloom Filter
X
1
1
1
1
1
m-bit Array
H1
H2
H3
H4
Hk
Bloom Filter
4 - Sailesh Kumar - 04/18/23
Bloom Filter
Y
1
1
1
1
1
m-bit Array
1
1
1
H1
H2
H3
H4
Hk
5 - Sailesh Kumar - 04/18/23
Bloom Filter
X
1
1
1
1
1
m-bit Array
1
1
1
match
H1
H2
H3
H4
Hk
6 - Sailesh Kumar - 04/18/23
Bloom Filter
W
1
1
1
1
1
m-bit Array
1
1
1
Match
(false positive)
H1
H2
H3
H4
Hk
7 - Sailesh Kumar - 04/18/23
How many Hash Functions?
k = no. of hash functions n = Total no. of elements m = no. of bits in the array
Objective is to pick k so that we minimize the false positive prob.
It is fairly simple to derive that k = (ln 2)m/n
» For opt. k, fpp is approx. (0.6185)m/n
8 - Sailesh Kumar - 04/18/23
How many Hash Functions?
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0 1 2 3 4 5 6 7 8 9 10
Hash functions
Fal
se p
osit
ive
rate m/n = 8
Opt k = 8 ln 2 = 5.5
9 - Sailesh Kumar - 04/18/23
Counting Bloom Filter
Bloom filters do not support deletes» Use counting Bloom filter
Use counters instead of bits in the array» Instead of setting the bits, increment the counters
During query, if (counter > 0) implies the bit is set
10 - Sailesh Kumar - 04/18/23
Counting Bloom Filter
X
1
1
1
1
1
m-counter Array
H1
H2
H3
H4
Hk
Bloom Filter
11 - Sailesh Kumar - 04/18/23
Bloom Filter
Y
1
1
1
m-counter Array
1
1
1
H1
H2
H3
H4
Hk
1
1
2
2
Deletes are straightforward:Just decrement the counters
12 - Sailesh Kumar - 04/18/23
Improved Counting Bloom Filter
4-bit counters ensures wvhp that counters do not overflow» 4x increase in space compared to Bloom filter
Construct an alternative Bloom filter that is 2 times compact than CBF» Based upon d-left hashing and fingerprinting technique
We need to understand d-left hashing and fingerprinting
13 - Sailesh Kumar - 04/18/23
Fingerprinting
Temporarily assume that we have a perfect hash function h» Use some random function to compute c-bit fingerprints» F() : U -> [2c]
» False positive prob. = 1/2c
» 2x compact than Bloom filter
» Not easy to compute the perfect hash function h– Use near perfect hashing (d-left)
Element 1 Element 2 Element 3 Element 4 Element 5
Fingerprint(4)Fingerprint(5)Fingerprint(2)Fingerprint(1)Fingerprint(3)
h
14 - Sailesh Kumar - 04/18/23
d-left hashing
Use d equal sized tables Use d different hash functions and chose bucket from each table A bucket can store multiple elements Store the element into least loaded bucket (break tie to left)
Interesting properties:» Very small maximum load O(log log n)» Maximum load is close to average load even for small d such as 4» 80% space utilization with d=4
15 - Sailesh Kumar - 04/18/23
Improved Counting Bloom Filter
Use d-left hashing d hash tables each containing B buckets
» Note that a bucket contains multiple cells; a cell can store a fingerprint and a small counter
In order to store an element, we compute its fingerprint» Fingerprint consists of two components
– Bucket index – [1, B]– Remainder – [1, R], thus log2R bits, stored explicitly
» We use separate bucket index for each table but identical remainders
» Use d-left insertion policy; augment fingerprint with counters; if fingerprint matches, then increment the counter
16 - Sailesh Kumar - 04/18/23
Improved Counting Bloom Filter
Element x H(x) = (3, 7), (4, 7) : we store element in first table
7
Element y H(y) = (1, 5), (5, 5) : we store element in first table
5
Element z H(z) = (1, 7), (4, 7) : we store element in second table
7
Now, if we try to delete x, we do not know whether fingerprint in table 1 ortable 2 has to be removed
17 - Sailesh Kumar - 04/18/23
Improved Counting Bloom Filter
Solve the problem by breaking the hash operating into 2 phases
1st phase: compute a single true fingerprint
2nd phase: to obtain d locations, use permutations P1, … Pd
A permutation of a set is a one-to-one map of the set onto itself
This simple modification enables proper delete operations
1 2 3 4 5
3 5 1 2 4
18 - Sailesh Kumar - 04/18/23
Improved Counting Bloom Filter
Claim. When deleting an element in the set, only one remainder corresponding to the element will exist in the table.
Proof:» Suppose not. Then there is some element x ∈ S whose
remainder is stored in table j to be deleted and at the same time another element y ∈ S such that Pi(fx) = Pi(fy) for i = j.
» Since the Pi are permutations, we must have that fx = fy, so x and y share the same true fingerprint.
» Let x was inserted before y; in this case, when y is inserted, the counter in table j associated with the remainder of x would be incremented, contradicting our assumption.
19 - Sailesh Kumar - 04/18/23
Simulation Results
Target is fpp < 0.002
dlCBF configuration» d = 4 tables with 2048 buckets each» Each bucket has 8 cells» Target load = 0.75 (6 items per bucket)» 14-bit fingerprint, r.» 2-bit counter to handle identical fingerprints
» Total size of structure = 220 bits. Total items = 3x214
CBF configuration» 13.5 counters per element (9 hash function)» For 3x214 elements, we will need 2.5x220 bits, 2.5 times dlCBF
20 - Sailesh Kumar - 04/18/23
Questions?