The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel

Preview:

Citation preview

The Bloom Paradox

Ori Rottenstreich

Joint work with

Yossi Kanizo and Isaac Keslassy

Technion, Israel

• Requirement: A data structure in user with fast answer to• Solutions:

o O(n) – Searching in a listo O(log(n)) – Searching in a sorted listo O(1) – But with false positives / negatives

Slocal cache

Problem Definition

2

Mcentral memory with

all elements

vuzyxzx

x

usercost = 10

cost = 1x

y

cost = 10

y

user

y

• False Positive: but the data structure answers

• Results in a redundant access to the local cache.

Additional cost of 1.

• False Negative: but the data structure answers

• Results in an expensive access to the central memory instead of the local cache.

Additional cost of 10-1=9.

Two Possible Errors

3

x

y

1

• Initialization: Array of zero bits.

• Insertion: Each of the elements is hashed times, the corresponding bits are set.

• Query: Hashing the element, checking that all bits are set.

• False positive rate (probability) of • No false negatives

Bloom Filters (Bloom, 1970)

4

0000000000 00

1

y1 1

0000000000 00

1 1

z

x11

1 1

1 11 1 1

x11 1 w

1 11

• Cache/Memory Framework• Packet Classification• Intrusion Detection• Routing• Accounting• Beyond networking: Spell Checking, DNA Classification

• Can be found in o Google's web browser Chromeo Google's database system BigTableo Facebook's distributed storage system Cassandrao Mellanox's IB Switch System

Bloom Filters are Widely Used

5

Outline

Introduction to Bloom Filters

The Bloom Paradox

The Variable-Increment Counting Bloom Filter

6

The Bloom Paradox

7

Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it,

thus making the Bloom filter useless.

• Parameters:

• Extreme case without locality: All elements with equal probability of

belonging to the cache.o Toy example

Example

8

Bloom filter

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives →

• Intuition:

Slocal cache

Mcentral memory with

all elements

vuzyxzx

cost = 10cost = 1

cost = 10

The Bloom Paradox

. .

userBBloom filterBloom filter

9

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives →

• Surprise:

cost = 1

Slocal cache

Mcentral memory with

all elements

vuzyxzx

cost = 10

cost = 10

The Bloom Paradox

. . 9

BBloom filter

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives →

• Surprise:

The Bloom filter indicates the membership of

elements. Only of them are indeed in .

The Bloom Paradox

. .

BBloom filter

• When the Bloom filter states that , it is wrong with probability

• Average cost if we listen to the Bloom filter:

• Average cost if we don’t:

The Bloom filter is useless!

The Bloom Paradox

11

Don’t listen to the Bloom filter

= =

Outline

Introduction to Bloom Filters

The Bloom Paradox

The Variable-Increment Counting Bloom Filter

12

1

• Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives.

• The solution: Counting Bloom filters - Storing array of counters instead of bits.o Insertion: Incrementing counters by one.o Deletion: Decrementing counters by one. o Query: Checking that counters are positive.

• The same false positive probability.• Require too much memory, e.g. 57 bits per element for .

Counting Bloom Filters (CBFs)

y+1 +1

0102001010 01

+1 +1x

+1+1

0000001010 00

x11 111

• Upon query, we should consider the exact values of the counters and not just their positiveness

• Can we design a deterministic scheme that exploits the exact values of the counters?

• Idea: Use variable increments to encode the element identity

Intuition for Variable Increments

14

0381052010 12

zy

• Each hash entry contains a pair of counters:o , fixed increments → number of elements in entry (as in CBF)o , variable increments → weighted sum of elements

o weights from a pre-determined set

Architecture

15

34 9 6 2626 17 210 25

5 3 3 42 30 3c1

c2

2 7 8 94 5 61 3

2

• We use two sets of hash functions:o The first set uses hash functions with range

, i.e. it points to the set of entries.o The second set uses hash functions with

range , i.e. it points to the set .

• Insertion:At each entry , the two counters are updated as follows.

o o from the set

• Example 1:

Insertion

16

34 9 13 2617 17 210 25

5 3 3 42 30 3c1

c2

2 7 8 94 5 61 3

x

+4+8

2

z

+4+13

• Query ( with )

• We ask whethero 17 can be a sum of 2 elements from the set including 4o 30 can be a sum of 3 elements from the set including 8

• No: • How should we pick the set of variable increments?

Query

17

y

We should use Sequences!

34 30 13 2617 30 210 25

5 4 3 42 30 3c1

c2

2 7 8 94 5 61 3

3

y?

8?4?

• Definition 1:Let be a sequence of positive integers.

Then, is a sequence iff all the sums

with are distinct.

• Example 2:

All the sums of elements of are distinct:

Therefore, is a sequence. • sequences are widely used in error-correcting codes.

Bh Sequences

18

The Bh-CBF Scheme Query

19

• Example 3: is a sequence

o Since , then the Bh-CBF can determine that

34 30 13 2617 30 210 25

5 4 3 42 30 3c1

c2

2 7 8 94 5 61 3

X?

1?

3

4?

• Example 3: is a sequence

The Bh-CBF Scheme Operations

19

o Here, and then necessarily

Since , the Bh-CBF can determine that

34 30 13 2617 30 210 25

5 4 3 42 30 3c1

c2

2 7 8 94 5 61 3

X?

1?

3

4?

The Bh-CBF Scheme Query

y?

8?4?

• Example 3: is a sequence

The Bh-CBF Scheme Operations

19

o Since , the Bh-CBF cannot exclude that

34 30 13 2617 30 210 25

5 4 3 42 30 3c1

c2

2 7 8 94 5 61 3

X?

1?

3

4?

z?

4? 13?

The Bh-CBF Scheme Query

y?

8?4?

• Internet trace (equinix-chicago) with real hash functions.

For the Bh-CBF, (with ).

20

Experimental Results

• The Bloom Paradoxo Discovery of the Bloom paradoxo Importance of the a priori membership probability

• The Variable-Increment Counting Bloom Filtero Can extend many variants of the counting Bloom filtero First time sequences are presented in networking applications

Concluding Remarks

21

Thank You

Recommended