Upload
hoangkiet
View
220
Download
0
Embed Size (px)
Citation preview
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Bloom Filters: general theory and variants
Information RetrievalWherever a list or set is used, and space is a consideration, a Bloom Filter should be considered.
When using a Bloom Filter, consider the effects of false positives.
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Indice
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
The problem
Index
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
The problem
Membership query
Definition
The Membership Problem
� Given a set S and an element y : y?∈S
� Given a set S compute its characteristic function χs
χs(y) =
{1, if y ∈ S0, if y 6∈ S
well-known solutions
Linear ScanDeterministic ArraysHash Functions
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
The problem
Membership query
Definition
The Membership Problem
� Given a set S and an element y : y?∈S
� Given a set S compute its characteristic function χs
χs(y) =
{1, if y ∈ S0, if y 6∈ S
well-known solutions
Linear ScanDeterministic ArraysHash Functions
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
The problem
Well-known Solutions
Linear Scan
not in our world
Deterministic Arrays (exactly compute χs)
elements of S belong to a finite universea boolean array big as the universemap elements of universe on it
Hash Functions (approximate χs)
use α bits for each elements of S (usually α = log(n))sort hashed values (sort α-tuples)what may happen with collisions?
Bloom Filters use these as starting point..
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
The problem
Well-known Solutions
Linear Scan
not in our world
Deterministic Arrays (exactly compute χs)
elements of S belong to a finite universea boolean array big as the universemap elements of universe on it
Hash Functions (approximate χs)
use α bits for each elements of S (usually α = log(n))sort hashed values (sort α-tuples)what may happen with collisions?
Bloom Filters use these as starting point..
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
The problem
Well-known Solutions
Linear Scan
not in our world
Deterministic Arrays (exactly compute χs)
elements of S belong to a finite universea boolean array big as the universemap elements of universe on it
Hash Functions (approximate χs)
use α bits for each elements of S (usually α = log(n))sort hashed values (sort α-tuples)what may happen with collisions?
Bloom Filters use these as starting point..
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Reminder
Wherever a list or set is used, and space is aconsideration, a Bloom Filter should be considered.
When using a Bloom Filter, consider the effects of falsepositives.
Bloom[1970]
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Index
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Preconditions
given a set of objects S = {x1, . . . , xn}no restrictions on objects
a vector B of m bits were bi ∈ {0, 1}will discuss about m-value
suppose we have k hash functions h1, . . . , hk
each hi is defined as hi : U ⊇ S → [1;m]hi indexes the B vector
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Building vector B
how to build B
∀i . bi = 1 ⇔ ∃(j , t). hj(xt) = i
B
xi
1 1 100 01
xj
hi1(xi)
hik(xi)hj1
(xj) hjk(xj)
. . .
. . .
. . . . . . . . . . . .
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Building vector B
procedure buildbeginfor each s in S do // is Θ(|S |)for each h in H do // is O(1)B[h(s)] = 1; // suppose is O(1)
donedone
end
All the build is Θ(|S |) = Θ(n) time and Θ(|B|) = Θ(m) space
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Searching into vector B
how to search for y
y ∈ S ⇔ ∀i = 1, . . . , k. bhi (y) = 1
B
y
1 1 100 01
. . .
. . . . . . . . . . . .
y 6∈ S
S is useless when B is built
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Searching into vector B
how to search for y
y ∈ S ⇔ ∀i = 1, . . . , k. bhi (y) = 1
B
y
1 1 100 01
. . .
. . . . . . . . . . . .
y 6∈ S
S is useless when B is built
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Searching into vector B
procedure searchbeginfor each h in H do // is O(1)if B[h(y)] = 0 then return "not found"
donereturn "found"
end
All the search is O(1) time and O(1) space
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Searching into vector B
it’s easy to notice that computing χs(y)
1 if ∃i .Bhi (y) = 0 ⇒ χs(y) = 0 thus y 6∈ S
2 if ∀i .Bhi (y) = 1 ⇒ χs(y) = 1 thus y ∈ S
of course sentence 1 is true
what about 2?
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Searching into vector B
it’s easy to notice that computing χs(y)
1 if ∃i .Bhi (y) = 0 ⇒ χs(y) = 0 thus y 6∈ S
2 if ∀i .Bhi (y) = 1 ⇒ χs(y) = 1 thus y ∈ S
of course sentence 1 is true
what about 2?
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
The real problem of Bloom Filters
Definition
The main problem are false positives:may exist an xj 6= y so that h...(xj) = h...(y)
B
y 6∈ S
1. . . . . . . . . . . .
xj
sentence 2 is not always true
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
What we compute
what may happen with a false positive?
we say ”y ∈ S” even if this is false
we shall say ”y ∈ S , probably”
can we compute this probability?
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Example
B
ACGATACGATTAATACGCAAATCT
h1 h2 h3
3211169410
6119102314
4106913211
. . . . . . . . . . . .
1 2 3 4 5 6 7
. . .
8 9 10 11 12
S = {TTA, TCT, ATA}
0 0 0 0 0 000 0 0 0 0
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Example: building B
B
ACGATACGATTAATACGCAAATCT
h1 h2 h3
3211169410
6119102314
4106913211
. . . . . . . . . . . .
1 2 3 4 5 6 7
. . .
8 9 10 11 12
S = {TTA, TCT, ATA}
1 0 0 0 0 000 0 1 1 0
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Example: building B
B
ACGATACGATTAATACGCAAATCT
h1 h2 h3
3211169410
6119102314
4106913211
. . . . . . . . . . . .
1 2 3 4 5 6 7
. . .
8 9 10 11 12
S = {TTA, TCT, ATA}
1 0 0 0 0 000 1
h1(TCT ) = h2(TTA) = 9
111
we may have introduceda false positive in B10
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Example: building B
B
ACGATACGATTAATACGCAAATCT
h1 h2 h3
3211169410
6119102314
4106913211
. . . . . . . . . . . .
1 2 3 4 5 6 7
. . .
8 9 10 11 12
S = {TTA, TCT, ATA}
1 0 0 0 0 00 11 111
we may have introducedtwo false positives in B10 and B11
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Example: searching into B
B
ACGATACGATTAATACGCAAATCT
h1 h2 h3
3211169410
6119102314
4106913211
. . . . . . . . . . . .
1 2 3 4 5 6 7
. . .
8 9 10 11 12
CGA
S = {TTA, TCT, ATA}
CGA?∈S → NO
1 11 11 10 0 0 0 0 0
Bh3(CGA) = 0
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Example: searching into B
B
ACGATACGATTAATACGCAAATCT
h1 h2 h3
3211169410
6119102314
4106913211
. . . . . . . . . . . .
1 2 3 4 5 6 7
. . .
8 9 10 11 12
AAA
S = {TTA, TCT, ATA}
AAA?∈S → Y ES
1 11 11 10 0 0 0 0 0
h2(ATA) = h3(AAA) = 2
false positive
h1(TTA) = h2(AAA) = 1
h2(TCT ) = h1(AAA) = 4
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Index
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Probability of a false positive
assumption that hash are perfectly random
after build
P(bi = 0) =
(1− 1
m
)kn
≈ e−kn/m = p
probability of a false positive is
(1− e−kn/m)k = (1− p)k = ε
other formulations are asymptotically equivalent
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Optimizing number of hash functions
higher k-value
more chances to find a 0-bit for y 6∈ S
lower k-value
increase fraction of 0-bits in B
minimize the ε function
k̃ = ln 2 · (m/n)
if p = 0.5 then ε is a constant
ε = (0.5)k̃ = (0.6185)m/n
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Optimizing number of hash functions
higher k-value
more chances to find a 0-bit for y 6∈ S
lower k-value
increase fraction of 0-bits in B
minimize the ε function
k̃ = ln 2 · (m/n)
if p = 0.5 then ε is a constant
ε = (0.5)k̃ = (0.6185)m/n
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
How big should be the B vector?
depends on the ε value we want
given n we fix m
m ε %
n 0.61 61%2n 0.38 38%5n 0.09 9%10n 0.008 0.1%
m = O(n) is generally a good choice
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Bloom Filters v.s. hash functions
� hash functions Bloom Filters
build time Θ(n + n log(n)) Θ(n)
space needed Θ(n log(n)) Θ(m)
search time O(log(n)) O(1)
ε value 1/n (1− p)k
Hash functions are Bloom Filters with k = 1
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Bloom Filters tricks
union by OR1 we have sets S1,S2 and Bloom Filters B1,B2
2 suppose m1 = m2 and same hashing functions3 just OR the counters
B12i = B1
i ∨ B2i
halved size1 suppose m = 2α
2 make union by OR of the two half3 when hashing mask high-order bit
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Main ideaMathematics
Summary
we have a tradeoff between space and false positives
ε value is computable (and constant)
we use ”abstraction” provided by hash functions on xi ∈ S
we approximate the characteristic function
we have an easy to code data structure
we started from the Membership Problem, we solve this one:
”Handle massive data sets to support membershipqueries using compact data structure”
what else shall we want?
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Why variants
extend Bloom Filters to multisets
Spectral Bloom Filter, Matias and Cohen [2003]
compute almost any function
Bloomier Filter, Chazelle et al. [2004]
something else?
someone else..
...are up to date results, let’s try to give a brief overview...
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Index
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Spectral Bloom Filters (SBF)
We extend the Bloom Filters to multisets
Definition
M = 〈S , fx〉 is a multiset were
S is a set
fx is a function
fx are the occurrences of x in M
ex M = 〈{A=2,B=1,C=2}, fx〉|M| = 5, fA = fC = 2, fB = 1
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Main features
space usage is slightly larger
performances are generally better
insertions are always possible, deletion not
can be built incrementally for streaming data
we query values fx > T with T a threshold
with T = 0 we guess for membership
we have tricks for SBF
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
SBF
B vector is replaced by a vector of counters C1,C2, . . . ,Cm
Ci is a sum of fx values for each x ∈ S mapping to i
as always, approximations of fx are stored into
Ch1(x),Ch2(x), . . . ,Chk (x)
thus, to compute fx , we have
mx = min{Ch1(x),Ch2(x), . . . ,Chk (x)}
mx is the basic estimator or Minimum Selection (MS)
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
The Minimum Selection
. . . . . .
fx fx fx
fyfz
fz
CiCi−1
. . .Cj+1Cj
h...(y) = h...(x) = i− 1
Ci−1 is not a good approximation of fx (neither of fy )
Ci is an exact approximation of fx
Cj+1 is an exact approximation of fz
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Insertion and Deletion
insertion is simple
increase each counter by 1. . .for each h in H do // O(1)C[h(x)] = C[h(x)] + 1;
done. . .
deletion is simple
decrease each counter by 1
search for an element x
compute the MinimumSelection mx
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
On the error of SBF
error is the same ε of Bloom Filters
Theorem
For all x fx 6 mx . Furthermore fx 6= mx with probability
ESBF = ε ≈ (1− p)k
Proof.
With no collisions mx = fx .With collisions mx > fx .The mx < fx cannot happen with collision.The event fx 6= mx is ”all counters have a collision”, that is a”false positive”.
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Implementing a SBF: challenges
Mainly two challenges
1 vector of counters
computational complexity of
random accessesinsertiondeletion
2 performances
allow insertion/deletion keeping low ESBF
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Solving Problem 2 with Minimal Increase(MI)
We minimize redundant insertions
Minimal Increase principle
When performing insertion of element x, increase onlythe counters that equals mx .Each lookup will return value mx .
We get the inequalityESBF 6 ε
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Minimal Increase: example of increase
insertion is always possible
. . . . . . . . .
x
1 3 2 . . . . . . . . .
x
1 3 22insert x
mx = 1 mx = 2
. . . . . . . . .
x
2 3 23insert x
mx = 33
mx
mx mx
mx mxmx
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Minimal Increase: example of decrease
deletion may introduce false negatives
. . . . . . . . .
x
1 1 1 . . . . . . . . .
x
1 1 1delete ymx = 1 = my 0
mxmx
mx
mx
. . . . . . . . .
x
1 1 1insert y
mx = 1my = 0
0
y
0
y
1
mx = 0 = my
y
10
mx mxmx
my
mymy
my
we lie saying ”x 6∈ S”
MI doesn’t allow deletion
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Solving Problem 2 with Recurring Minimum(RM)
Recurring Minimum : definition
. . . . . .
fx fx fx
fz
fz
mx mz
x has a Recurring Minimum (RM)z has a Single Minimum (SM)
”An element has a RM iff exist more than one counter withits MS value”
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Solving Problem 2 with Recurring Minimum(RM)
We identify Bloom Errors and handle them
Recurring Minimum : principle
For item x with RM we use mx as estimator
ESBF < ε
For items with a single minimum we use a secondary SBFwith |SBF2| � |SBF1|
ESBF2 � ε
Improvements are remarkable.
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Recurrent Minimum: insertion
insertion handles potential future errors1 increase(SBF1,x)2 if x has a RM in SBF1, stop3 look for x in SBF2
1 if x ∈ SBF2 increase(SBF2,x)2 if x 6∈ SBF2 increase(SBF2,search(SBF1,x))
. . . . . .
x
1 2 4
SM
. . . . . . . . .
x
1 6 6insert x
. . .
. . .
. . . . . .0 0..
. . . . . . . . .
x
1 6 6
insert x
. . . . . .0 0..
2
2 2
SM
SM
. . . . . .
x
1 2 42
. . .
. . .
RM
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Recurrent Minimum: lookup and deletion
lookup looks, if needed, in both SBF
1 if x has a RM in SBF1, return it2 say mx2 is value of x in SBF2
1 if mx2 > 0 , return it2 return min value of x in SBF1
deletion is reverse of insertion
1 decrease(SBF1,x)2 if x has a SM in SBF1, decrease(SBF2,x)
As insertion is in both SBT, deletion can’t create falsepositives
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Methods Comparison: MS v.s. MI v.s. RM
error ratesMI ≺ RM ≺ MS = ε
space overheadMI ≺ MS ≺ RM
complexityMS ≺ MI ≺ RM
insertion/deletionMS = RM ≺ MI
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Solving Problem 1 with an integer vector
Each counter fits in one word, for example a 4 bytes word.
All the m-counter are (4m) bytes.To get ε < 0.01% we have m = 10n.So m-counter are (40n) bytes.
With n = 220 (few more than 106 objects) we have that countersneed 40MB!
vector of integer it’s too big
do we need to count up to 232 − 1 ?
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Solving Problem 1 with an integer vector
Each counter fits in one word, for example a 4 bytes word.
All the m-counter are (4m) bytes.To get ε < 0.01% we have m = 10n.So m-counter are (40n) bytes.
With n = 220 (few more than 106 objects) we have that countersneed 40MB!
vector of integer it’s too big
do we need to count up to 232 − 1 ?
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Solving Problem 1 with a static bit vector
Suppose ∀i .fi < 10 thus Cj < 10 + α with α depending fromcollisions.
Use for each counter dlog2 10e = 4 bits (= 0.5 bytes).To get ε < 0.01% we have m = 10n.So m-counter are (5n) bytes.
With n = 220 we have a 5MB static vector!
this static vector doesn’t allow insertion or deletion
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Solving Problem 1 with a static bit vector
Suppose ∀i .fi < 10 thus Cj < 10 + α with α depending fromcollisions.
Use for each counter dlog2 10e = 4 bits (= 0.5 bytes).To get ε < 0.01% we have m = 10n.So m-counter are (5n) bytes.
With n = 220 we have a 5MB static vector!
this static vector doesn’t allow insertion or deletion
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Solving Problem 1 with String Array Index
use number of bits (per counter) strictly needed
use some slack bits
fix a value α > 0add αm bits to vector, one every 1/α items
each counter Ci uses dlog2(Ci )eeach counter counts up to 2dlog2(Ci )e+..
C vector is
(m∑
i=1
dlog2 Cie)
+ αm = N bits
. . .C1 C2 Cm
dlog2C1e dlog2C2e dlog2Cme
. . . C1/α
dlog2C1/αe
slack bit
C2/α
slack bit
dlog2C2/αe
. . .
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
The String Array Index: main idea
first level of pointers to subsequences into SBF
a Coarse Offset Vector to groups of (log N)−size itemsthese pointers are m/ log N
second level may be
other Coarse Offset Vector of pointers to subsequencesa simple vector of offsets
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
The String Array Index: graphic
. . .
. . .Coarse Offset Vector
S
Offset Vectors C.O.V.
Offset Vectors
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
The String Array Index: performances
2-level of pointers to sub-sequences
if N =m∑
i=1(dlog2(Ci )e+ αi )
Theorem
The SAI of size o(N) + O(m) bits can be built in O(m) time,supporting access to sub-sequences in O(1) time
⇓
Theorem
An SBF of size N + o(N) + O(m) bits can be built in O(N) time,supporting lookup in O(1) time.Furthermore, each update takes O(1) amortized time.
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
SBF tricks
merging by addiction1 we have two sets S1,S2 and two SBF C 1,C 2
2 suppose m1 = m2 and same hashing functions
3 just sum the counters
C 12i = C 1
i + C 2i
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Index
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Bloomier Bloom Filters, (BBF)
we compute any function f using a BBF
some constraints on fsame tradeoff inherited by Bloom Filters
”we associate values with a subset of the domain elements”
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Which function
f : D = {0, . . . ,N − 1} → R = {⊥, 0, . . . , 2r − 1}values computed are into S ⊆ D⊥∈ R
S
D R
f(S)
⊥
f
f
error free
error arbitrarilyclose to 1
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Main features
query is O(1)
space requirement is O(nr)
can be generalized to handle dynamic updates
function can be updatedspace unchanged
we query values of f
we may change f (x) for x ∈ S but S is immutable
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Main idea
a false positive in a BBF
”returning a result when the key is not in the map”
we give a simple idea of a BBF
the Bloom Filter cascadecan be formerly generalized
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
A near-optimal and simple BBF
possible values are {0, 1}A0 is a BF with values mapping to 0B0 is a BF with values mapping to 1
we will build many (Ai ,Bi ) (here is the cascade)
we make a cross search
we search as deep as we need
what may happen when searching?
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
A near-optimal and simple BBF
we start looking in (A0,B0)if it is in neither
it is not in the map (surely)
if it is in A0 but not in B0
it does not map to 1 (surely)it does map to 0 (probably)
if it is in A0 and in B0
which one lies ? (false positive)
we have to go recursively into (A1,B1)
A1 are values mapping to 0 that are false positives in B0
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
A near-optimal and simple BBF
we start looking in (A0,B0)if it is in neither
it is not in the map (surely)
if it is in A0 but not in B0
it does not map to 1 (surely)it does map to 0 (probably)
if it is in A0 and in B0
which one lies ? (false positive)
we have to go recursively into (A1,B1)
A1 are values mapping to 0 that are false positives in B0
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
A near-optimal and simple BBF
we start looking in (A0,B0)if it is in neither
it is not in the map (surely)
if it is in A0 but not in B0
it does not map to 1 (surely)it does map to 0 (probably)
if it is in A0 and in B0
which one lies ? (false positive)
we have to go recursively into (A1,B1)
A1 are values mapping to 0 that are false positives in B0
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
A near-optimal and simple BBF
we start looking in (A0,B0)if it is in neither
it is not in the map (surely)
if it is in A0 but not in B0
it does not map to 1 (surely)it does map to 0 (probably)
if it is in A0 and in B0
which one lies ? (false positive)
we have to go recursively into (A1,B1)
A1 are values mapping to 0 that are false positives in B0
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
A BF cascade
A0 B0
falsepositive
falsepositive
A1 B1
f.p. f.p.
. . . . . .
|Ai+1| � |Ai |average search is O(1)
first pairs are generally enough
total space is independent of nfirst pair occupies most space
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
The general idea
results are binary-coded
v ∈ R is coded with βv ∈ {0, 1}q
for each bit of βv
we use the simple BBF
what we get is
space is slightly larger than the space for 2q BFlookup is Θ(q)build is O(n log n)the EBBF is proportional to 2−q
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
The general idea
they use a table T of coded values
T has m locations
we have as always k hash functions
we use a masking value M to reduce EBBF
βf (x) = M ⊕k⊕
i=1
T [hi (x)]
and if x 6∈ S
P[lookup(x ,T ) = ⊥] ≥ 1− k
2q
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Index
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
Spectral Bloom FiltersBloomier Bloom FiltersOthers
Other ”special” Bloom Filters
Counting Bloom Filters
Broder et al. [1998]
Compressed Bloom Filters
M.Mitzenmacher [2002]
Attenuated Bloom Filters
Rhea, Kubiatowicz [2002]
Compact Approximator of Lattice Functions
Boldi, Vigna [2004]
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
ApplicationsReferences
Index
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
ApplicationsReferences
When are really used
routing
probabilistic location and routingshortest path distance information
proxy
Web proxy cache into SQUIDdistributed caching
peer-to-peer
summarize the contents
spell checking
original B.Bloom idea
. . .
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
ApplicationsReferences
Index
1 IntroductionThe problem
2 Bloom Filters: the classicMain ideaMathematics
3 Bloom Filters: variantsSpectral Bloom FiltersBloomier Bloom FiltersOthers
4 ConclusionsApplicationsReferences
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
ApplicationsReferences
References: foundations
B.Bloom”Space/time tradeoffs in hash coding with allowable errors”.CACM,13(7): 422-426, 1970
Saar Cohen, Yossi Matias”Spectral bloom filters”.ACM SIGMOD ’03, 2003
B. Chazelle, J. Kilian, R. Rubinfeld, A. Tal”The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables”.Proceedings of 15th SODA (2004), 30-39, 2004
Bose, Guo, Kranakis, Maheshwari, Morin, Morrison, Smid, Tang”On the false-positive rate of Bloom Filters”.School of Computer Science, Carleton University, 2004
G.Caravagna Bloom Filters, general theory and variants
IntroductionBloom Filters: the classic
Bloom Filters: variantsConclusions
ApplicationsReferences
References: extras
M. Mitzenmacher”Compressed Bloom Filters”.In Proceedings of 20th ACM SIGACT-SIGOPS, 144-150, 2002
P.Boldi, S.Vigna”Compact Approximation of Lattice Functions with Applications to Large-Alphabet Text Search”.Dipartimento di Scienze dell’Informazione, Universita di Milano, 2004
A. Broder, M. Mitzenmacher”Network Applications of Bloom Filters: A Survey”.In Proceedings of 40th Allerton Conference (2004), 636-646, 2002
G.Caravagna Bloom Filters, general theory and variants