428
Tries and Succinct Data Structures Auto-completion as our target application Rossano Venturini [email protected]

Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Embed Size (px)

Citation preview

Page 1: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Tries and Succinct Data Structures

Auto-completion as our target application

Rossano [email protected]

Page 2: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application
Page 3: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application
Page 4: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application
Page 5: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application
Page 6: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application
Page 7: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application
Page 8: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset?

Page 9: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset? All the past queries

Page 10: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset?Searches?

All the past queries

Page 11: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset?Prefix searchSearches?All the past queries

Page 12: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset?Prefix searchSearches?All the past queries

Data structure?

Page 13: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset?Prefix searchSearches?All the past queries

Data structure? Trie

Page 14: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset?Prefix searchSearches?All the past queries

Data structure? TrieHow to find top-k efficiently?

Page 15: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset?Prefix searchSearches?All the past queries

Data structure? TrieHow to find top-k efficiently?

Project: implement a autocompletion system

Page 16: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application
Page 17: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Problem

Given a dictionary D of n strings of total length m drawn from an alphabet of size σ, build a data structures to answer the following queries:

Page 18: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Problem

Given a dictionary D of n strings of total length m drawn from an alphabet of size σ, build a data structures to answer the following queries:

• Lookup(P): say 1 if P belongs to D, 0 otherwise;

Page 19: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Problem

Given a dictionary D of n strings of total length m drawn from an alphabet of size σ, build a data structures to answer the following queries:

• Lookup(P): say 1 if P belongs to D, 0 otherwise;• WeakPrefix(P): return the consecutive range of strings in D

that are prefixed by P, -1 if there is no such interval;

Page 20: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Problem

Given a dictionary D of n strings of total length m drawn from an alphabet of size σ, build a data structures to answer the following queries:

• Lookup(P): say 1 if P belongs to D, 0 otherwise;• WeakPrefix(P): return the consecutive range of strings in D

that are prefixed by P, -1 if there is no such interval;• StrongPrefix(P): return the consecutive range of strings in D

sharing the longest common prefix with P;

Page 21: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Problem

Given a dictionary D of n strings of total length m drawn from an alphabet of size σ, build a data structures to answer the following queries:

• Lookup(P): say 1 if P belongs to D, 0 otherwise;• WeakPrefix(P): return the consecutive range of strings in D

that are prefixed by P, -1 if there is no such interval;• StrongPrefix(P): return the consecutive range of strings in D

sharing the longest common prefix with P;• Insert(P): insert s in D;

Page 22: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Problem

Given a dictionary D of n strings of total length m drawn from an alphabet of size σ, build a data structures to answer the following queries:

• Lookup(P): say 1 if P belongs to D, 0 otherwise;• WeakPrefix(P): return the consecutive range of strings in D

that are prefixed by P, -1 if there is no such interval;• StrongPrefix(P): return the consecutive range of strings in D

sharing the longest common prefix with P;• Insert(P): insert s in D;• Delete(P): delete s from D;

Page 23: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

Problem

Given a dictionary D of n strings of total length m drawn from an alphabet of size σ, build a data structures to answer the following queries:

• Lookup(P): say 1 if P belongs to D, 0 otherwise;• WeakPrefix(P): return the consecutive range of strings in D

that are prefixed by P, -1 if there is no such interval;• StrongPrefix(P): return the consecutive range of strings in D

sharing the longest common prefix with P;• Insert(P): insert s in D;• Delete(P): delete s from D;

Page 24: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Trie

0

b

1

b

2

a

5

c

6

a

Page 25: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Prefix(c)

Trie

0

b

1

b

2

a

5

c

6

a

Page 26: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Prefix(c)

Trie

0

b

1

b

2

a

5

c

6

a

Page 27: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Prefix(c)

Trie

0

b

1

b

2

a

5

c

6

a

Page 28: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Prefix(c)

Trie

0

b

1

b

2

a

5

c

6

a

Find all the strings prefixed by any pattern P in O(|P| log σ) time

Page 29: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Prefix(c)

Trie

0

b

1

b

2

a

5

c

6

a

Find all the strings prefixed by any pattern P in O(|P| log σ) time Insert? Delete?

Page 30: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Prefix(c)

Trie

0

b

1

b

2

a

5

c

6

aO(m) nodes

O(m log m + m log σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time Insert? Delete?

Page 31: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Prefix(c)

Trie

0

b

1

b

2

a

5

c

6

aO(m) nodes

O(m log m + m log σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

Save space?

Insert? Delete?

Page 32: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

Page 33: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Page 34: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

Page 35: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

Variable length labels on the edges 😱

Page 36: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

Page 37: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

Page 38: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

label len

Page 39: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

label len string id

Page 40: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

label len string id

Page 41: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

<b,0,1>

label len string id

Page 42: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

<b,0,1>

label len string id

Page 43: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

<b,0,1>

<a,1,1>

label len string id

Page 44: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

<b,0,1>

<a,1,1>

label len string id

<c,1,2> <a,0,3>

<b,0,3><c,0,4><a,1,5><b,1,6>

<c,1,3>

<b,1,5>

Page 45: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

<b,0,1>

<a,1,1>

Search: every time we traverse an edge we have to access a

dictionary string!

label len string id

<c,1,2> <a,0,3>

<b,0,3><c,0,4><a,1,5><b,1,6>

<c,1,3>

<b,1,5>

Page 46: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

<b,0,1>

<a,1,1>

Search: every time we traverse an edge we have to access a

dictionary string!

label len string id

<c,1,2> <a,0,3>

<b,0,3><c,0,4><a,1,5><b,1,6>

<c,1,3>

<b,1,5>

Probably a cache miss at each edge 😱. Can we do better?

Page 47: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

(Compact) Trie

O(n) nodes O(n log m + mlog σ) bits of space

Find all the strings prefixed by any pattern P in O(|P| log σ) time

<a,1,0>

<b,0,1>

<a,1,1>

Search: every time we traverse an edge we have to access a

dictionary string!

label len string id

<c,1,2> <a,0,3>

<b,0,3><c,0,4><a,1,5><b,1,6>

<c,1,3>

<b,1,5>

Probably a cache miss at each edge 😱. Can we do better?

Patricia trie & Blind search every time we traverse an

edge, we match the only symbol and skip “label len” symbols in the

pattern

Page 48: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

Page 49: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

Blind search every time we traverse an

edge, we match the only symbol and skip “label len” symbols in the

pattern

Page 50: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

Page 51: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

Page 52: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

Page 53: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

Page 54: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

Page 55: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

Page 56: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

What’s the property of the node we identify?

Page 57: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

What’s the property of the node we identify?

Any substring in its subtree has the same longest common prefix with P

as the “correct node”

Page 58: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

What’s the property of the node we identify?

Any substring in its subtree has the same longest common prefix with P

as the “correct node”

=> the correct node is an ancestor of the identified node.

Page 59: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

What’s the property of the node we identify?

Any substring in its subtree has the same longest common prefix with P

as the “correct node”

=> the correct node is an ancestor of the identified node.

Page 60: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0000000000S1

0000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

1

2

3

4

S1 S2

0 1

5

S3 S4

0 1

0[00] 1[11]

6

7

S5 S6

0 1

S7

0 1

0 1[01]

S8

0 1

S9

0[00] 1

(b) Patricia trie of S

1 2 3 4 5

1 0 0 1 ⇥

0 0 1 3 ⇥

⇡ 3 2 4 1 5

(c) , , and ⇡

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings ={00000000, 00000001, 0000001, 00000100, cbcc}.

5. A. Brodnik and J. I. Munro. Membership in constant time and almost-minimumspace. SIAM J. Comput., 28(5):1627–1640, 1999.

6. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in amemory hierarchy. CoRR, cs.DS/0410048, 2004.

7. P. Elias. E�cient storage and retrieval by content and address of static files. J.ACM, 21(2):246–260, 1974.

8. P. Ferragina. On the weak prefix-search problem. In CPM, LNCS vol. 6661, pages261–272. Springer, 2011.

15

000000000S1

000000001S2

000001110S3

000001111S4

000010100S5

000010101S6

00001011S7

0001S8

1S9

(a) S

0

S91

S82

6

S77

S6S5

0 1

0 1

3

5

S4S3

0 1

4

S2S1

0 1

0[00] 1[11]

0 1[01]

0 1

0[00] 1

(b) Patricia trie of S

0,000000000S1

8,1S2

5,1110S3

5,1S4

4,10100S5

8,1S6

7,1S7

3,1S8

0,1S9

(c) Front Coding of S

h0, 0, a, 3, 5i

h1, 1, b, 2, 3i

h1, 2, c, 1, 3i

h1, 3, c, 0, 0i

(d) Tuples inducedby string abcc

Figure 1. The picture shows a running example for the set of strings S ={000000000, 000000001, 000001110, 000001111, 000010100, 000010101, 00001011, 0001, 1}.

16

Patricia Trie (and blind search)

P= 000 0 110Blind search

every time we traverse an edge, we match the only symbol

and skip “label len” symbols in the pattern

What’s the property of the node we identify?

Any substring in its subtree has the same longest common prefix with P

as the “correct node”

=> the correct node is an ancestor of the identified node.

Insert? Delete?

Page 61: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

Page 62: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

A node is a struct with- a variable length vector of (sorted)

symbols and pointers to the children

Page 63: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

A node is a struct with- a variable length vector of (sorted)

symbols and pointers to the children

Binary search on symbols to decide which is the edge to follow

Page 64: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

A node is a struct with- a variable length vector of (sorted)

symbols and pointers to the children

Binary search on symbols to decide which is the edge to follow

Symbols and pointers stored far from the node 😱

Page 65: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

A node is a struct with- a variable length vector of (sorted)

symbols and pointers to the children

Binary search on symbols to decide which is the edge to follow

Symbols and pointers stored far from the node 😱 Seach time is O(|P| log σ)

Page 66: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

A node is a struct with- a variable length vector of (sorted)

symbols and pointers to the children

Binary search on symbols to decide which is the edge to follow

Symbols and pointers stored far from the node 😱 Seach time is O(|P| log σ)

Can you solve both these issues?

Page 67: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

A node is a struct with- a variable length vector of (sorted)

symbols and pointers to the children

Binary search on symbols to decide which is the edge to follow

Symbols and pointers stored far from the node 😱 Seach time is O(|P| log σ)

Can you solve both these issues?

A node is a struct with- a binary vector saying which

symbol is represented and a variable length vector of pointers

to the children

Page 68: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

A node is a struct with- a variable length vector of (sorted)

symbols and pointers to the children

Binary search on symbols to decide which is the edge to follow

Symbols and pointers stored far from the node 😱 Seach time is O(|P| log σ)

Can you solve both these issues?

A node is a struct with- a binary vector saying which

symbol is represented and a variable length vector of pointers

to the children

but the space grows to at least σ bits per node!

Page 69: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m = total length of strings in D, σ = alphabet size

3 4

a b

a c

c

a b

bab c

Implement a Trie

0

b

1

b

2

a

5

c

6

a

A node is a struct with- a variable length vector of (sorted)

symbols and pointers to the children

Binary search on symbols to decide which is the edge to follow

Symbols and pointers stored far from the node 😱 Seach time is O(|P| log σ)

Can you solve both these issues?

A node is a struct with- a binary vector saying which

symbol is represented and a variable length vector of pointers

to the children

but the space grows to at least σ bits per node!

Obtaining the best of both with a easy-to-implement

data structure!

Page 70: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

Page 71: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search TreeEvery node has (at most) tree

children

Page 72: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search TreeEvery node has (at most) tree

children

<c =c >croot

Page 73: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search TreeEvery node has (at most) tree

children

Strings that start with c

Strings that start with a symbol <c

Strings that start with a symbol >c

<c =c >croot

Page 74: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search TreeEvery node has (at most) tree

children

Bulid the TST recursively removing the

c

Strings that start with c

Strings that start with a symbol <c

Strings that start with a symbol >c

<c =c >croot

Page 75: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search TreeEvery node has (at most) tree

children

Bulid the TST recursively removing the

c

Strings that start with c

Strings that start with a symbol <c

Strings that start with a symbol >c

Bulid the TST recursively

Bulid the TST recursively

<c =c >croot

Page 76: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

Page 77: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

Page 78: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Page 79: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Page 80: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Page 81: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Page 82: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Page 83: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Page 84: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Page 85: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

Page 86: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

Page 87: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Page 88: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

Page 89: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Page 90: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

Page 91: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

Page 92: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|)

Page 93: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

Page 94: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

Page 95: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

Page 96: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

c<c >c

Page 97: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

c<c >c

<n/2 <n/2

Page 98: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

c<c >c

<n/2 <n/2

This way every time we go left or right we (at least) halve the size of the remaining strings

Page 99: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

c<c >c

<n/2 <n/2

This way every time we go left or right we (at least) halve the size of the remaining strings

Insert? Delete?

Page 100: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

c<c >c

<n/2 <n/2

This way every time we go left or right we (at least) halve the size of the remaining strings

Insert? Delete?

Page 101: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

c<c >c

<n/2 <n/2

This way every time we go left or right we (at least) halve the size of the remaining strings

Insert? Delete?

Page 102: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

c<c >c

<n/2 <n/2

This way every time we go left or right we (at least) halve the size of the remaining strings

Insert? Delete?

Ohhhh it is simple even in C

Page 103: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Ternary Search Tree

D= { now for tip ilk dim tag jot sob nob sky hut ace bet men egg few jay owl joy rap gig wee was cab wad

caw cue fee tap ago tar jam dug and }

Node have fixed size, independent of σ

All information is on the node

What about query time?

Search time is O(|P| log σ) if c is always the symbol in

the “middle”…

How can we choose c to have O(|P| + log n) query

time?

Query time is

O( # )+ # + #

O(|P|) Select c to pay O(log n)

~ (3way)Quicksort and pivot

c<c >c

<n/2 <n/2

This way every time we go left or right we (at least) halve the size of the remaining strings

Insert? Delete?

Project: Is it possible to increase nodes’ fanout to reduce cache-misses?

Ohhhh it is simple even in C

Page 104: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Dataset?Prefix searchSearches?All the past queries

Data structure? TrieHow to find top-k efficiently?

Page 105: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cFinding Top-1

Page 106: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cFinding Top-1

Score

Page 107: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Finding Top-1

Page 108: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Finding Top-1

Page 109: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Scan to find the maximum!

Finding Top-1

Page 110: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Scan to find the maximum!

Finding Top-1

Page 111: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Scan to find the maximum!

Finding Top-1

Page 112: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Scan to find the maximum!

O(n) query time :-(

Finding Top-1

Page 113: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Scan to find the maximum!

O(n) query time :-(

Finding Top-1

Better ideas?

Page 114: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Finding Top-1

Page 115: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

Finding Top-1

Page 116: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

4,3

Finding Top-1

Page 117: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

4,3 6,5

Finding Top-1

Page 118: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

4,3 6,5

6,5

Finding Top-1

Page 119: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

4,3 6,5

6,52,1

Finding Top-1

Page 120: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

4,3 6,5

6,52,1

7,0

Finding Top-1

Page 121: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

4,3 6,5

6,52,1

7,0

Preprocessing time: O(n)

Extra space: O(n log n) bits

Query time: O(1)

Finding Top-1

Page 122: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

4,3 6,5

6,52,1

7,0

Preprocessing time: O(n)

Extra space: O(n log n) bits

Query time: O(1)

Solving Top-k?

Finding Top-1

Page 123: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Augment each node with the max (and string id )

within its subtree!

4,3 6,5

6,52,1

7,0

Preprocessing time: O(n)

Extra space: O(n log n) bits

Query time: O(1)

Solving Top-k?

Finding Top-1

Solving Top-k?

- Extra space: O(k*n*log n) bits :-( - You must know k at building time! :-(

Page 124: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Finding Top-1

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

Page 125: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

7 2 1 4 1 6 2S

Finding Top-1

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

Page 126: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Assume you have a Data Structure on top of S answering in O(1) by using O(n) bits

RMQ(i,j) = position of the maximum in the range S[i,j] 7 2 1 4 1 6 2S

Finding Top-1

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

Page 127: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Assume you have a Data Structure on top of S answering in O(1) by using O(n) bits

RMQ(i,j) = position of the maximum in the range S[i,j] 7 2 1 4 1 6 2S

Finding Top-1

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

Page 128: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Assume you have a Data Structure on top of S answering in O(1) by using O(n) bits

RMQ(i,j) = position of the maximum in the range S[i,j] 7 2 1 4 1 6 2S

RMQ(3,6) = 5

Finding Top-1

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

Page 129: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Assume you have a Data Structure on top of S answering in O(1) by using O(n) bits

RMQ(i,j) = position of the maximum in the range S[i,j] 7 2 1 4 1 6 2S

RMQ(3,6) = 5Can you solve Top-2?

Finding Top-1

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

Page 130: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Assume you have a Data Structure on top of S answering in O(1) by using O(n) bits

RMQ(i,j) = position of the maximum in the range S[i,j] 7 2 1 4 1 6 2SCan you solve Top-2?

Finding Top-1

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

Page 131: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

n = |D|, m = total length of strings in D, σ = alphabet size

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

P = cHow to find Top-1?

Assume you have a Data Structure on top of S answering in O(1) by using O(n) bits

RMQ(i,j) = position of the maximum in the range S[i,j] 7 2 1 4 1 6 2SCan you solve Top-2?

Finding Top-1

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

Page 132: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

Page 133: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

Page 134: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

Cartesian Tree

Page 135: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10Cartesian Tree

Page 136: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7

Cartesian Tree

Page 137: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

Cartesian Tree

Page 138: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

5

3

6

1

Cartesian Tree

1

Page 139: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian TreeIt can be built top-down

with RMQ

1

Page 140: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

1

Page 141: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

1

Page 142: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=41

Page 143: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

Results 1

Page 144: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

Results 1

Page 145: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

10Results 1

Page 146: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

Results 1

Page 147: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

Results

10

1

Page 148: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

Results

10

1

Page 149: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

10

1

Page 150: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

107

1

Page 151: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

1

Page 152: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

87

1

Page 153: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

87

7

1

Page 154: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

87

8

7

1

Page 155: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

87

8

7

77

1

Page 156: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

87

8

7

777

7

1

Page 157: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

87

8

7

777

7

Claim: we “touch” at most 2k nodes. ⇒ Query time O(k log k)

1

Page 158: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

87

8

7

777

7

Claim: we “touch” at most 2k nodes. ⇒ Query time O(k log k)

Important: the cartesian tree is not built!

1

Page 159: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Finding Top-k

3 5 1 7 1 6 10S 9 8 7 1 4 2… …

10

7 9

85

3

6

1 7

4

1 2

Cartesian Tree

How to find Top-k?

Visit the node starting from the root and try to insert each visited node in a max-Heap storing at most k elements.

Extract (and report) the maximum from the heap and visit its children.

max-Heap

k=4

97

Results

1079

87

8

7

777

7

Claim: we “touch” at most 2k nodes. ⇒ Query time O(k log k)

Important: the cartesian tree is not built!

1

Assume you have a Data Structure on top of S answering in O(1) by using O(n) bits

RMQ(i,j) = position of the maximum in the range S[i,j]

Page 160: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (1)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Page 161: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (1)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n2 log n) bits Query time: O(1)

Page 162: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (1)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n2 log n) bits Query time: O(1)

Precompute the answer to any possible query.

There are O(n2) distinct queries!

Page 163: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (1)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n2 log n) bits Query time: O(1)

M[i,j] = RMQ(i,j)

Precompute the answer to any possible query.

There are O(n2) distinct queries!

Page 164: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (1)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n2 log n) bits Query time: O(1)

M 0 1 2 3 4 5 6 7 8 9 10 1101

5

234

6789

1011

M[i,j] = RMQ(i,j)

Precompute the answer to any possible query.

There are O(n2) distinct queries!

Page 165: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (1)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n2 log n) bits Query time: O(1)

M 0 1 2 3 4 5 6 7 8 9 10 1101

5

234

6789

1011

M[i,j] = RMQ(i,j)

Precompute the answer to any possible query.

There are O(n2) distinct queries!

3

Page 166: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Page 167: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

Page 168: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

Maximum in a interval is the max between the maxima of any its subintervals

Page 169: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

Maximum in a interval is the max between the maxima of any its subintervals

Page 170: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

Maximum in a interval is the max between the maxima of any its subintervals

Page 171: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

?

Maximum in a interval is the max between the maxima of any its subintervals

Page 172: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

?

Maximum in a interval is the max between the maxima of any its subintervals

9=1+23

Page 173: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

?

Maximum in a interval is the max between the maxima of any its subintervals

9=1+23

6

Page 174: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

Maximum of a interval is the max between the maxima of any its subintervals

Page 175: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

Maximum of a interval is the max between the maxima of any its subintervals

RMQ(1,7) =

Page 176: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

Maximum of a interval is the max between the maxima of any its subintervals

argmax(S[M[1,1+22]], S[M[7-22,7]]) = 6RMQ(1,7) =

Page 177: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

Maximum of a interval is the max between the maxima of any its subintervals

argmax(S[M[1,1+22]], S[M[7-22,7]]) = 6RMQ(1,7) =

Page 178: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

3

Maximum of a interval is the max between the maxima of any its subintervals

argmax(S[M[1,1+22]], S[M[7-22,7]]) = 6RMQ(1,7) =

Page 179: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

3

Maximum of a interval is the max between the maxima of any its subintervals

argmax(S[M[1,1+22]], S[M[7-22,7]]) = 6RMQ(1,7) =

Page 180: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

3

Maximum of a interval is the max between the maxima of any its subintervals

argmax(S[M[1,1+22]], S[M[7-22,7]]) = 6RMQ(1,7) =

6

Page 181: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (2)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Space: O(n log2 n) bits Query time: O(1)

M[i,j] = RMQ(i,i+2j)

Precompute the answer to every interval of size a power of 2.

There are O(log n) possible intervals starting at any position i.

M 0 1 2 3 401

5

234

6789

1011

3

Maximum of a interval is the max between the maxima of any its subintervals

argmax(S[M[1,1+22]], S[M[7-22,7]]) = 6RMQ(1,7) =

where len =⎣log (j-i+1)⎦RMQ(i,j) = argmax(S[M[i,i+2len]], S[M[j-2len,j]])

6

Page 182: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Page 183: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

Page 184: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log n

Page 185: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log n

Page 186: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR

Page 187: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5

Page 188: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Page 189: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1)

Page 190: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

Page 191: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

Page 192: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

Page 193: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

Page 194: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

Page 195: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

O(1) time

Page 196: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

O(1) time

O(log n) time O(log n) time

Page 197: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

O(1) time

O(log n) time O(log n) time

Space: O(n log n) bits Query time: O(1)

Page 198: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

O(1) time

O(log n) time O(log n) time

Space: O(n log n) bits Query time: O(1)

O(1) time O(1) time

Page 199: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

O(1) time

O(log n) time O(log n) time

Space: O(n log n) bits Query time: O(1)

O(1) time O(1) time

Page 200: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Range Maximum Query (3) Space: O(n log n) bits Query time: O(log n)

3 5 1 7 1 6 10S 9 8 7 1 40 1 2 3 4 5 6 7 8 9 10 11

log nR 5 7 10 7

Use the previous solution on R!

Space: ? bits Query time: O(1) Space: O(n log n) bits Query time: O(1)

RMQ(1,10) = ?

O(1) time

O(log n) time O(log n) time

Space: O(n log n) bits Query time: O(1)

O(1) time O(1) time

Puzzle: Use RMQ to compute LCA(u,v) queries on a tree

Page 201: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Page 202: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P

Page 203: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time

Page 204: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time O(m log σ + n log m) bits

Page 205: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time O(m log σ + n log m) bits

Compute the top-k strings

Page 206: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time O(m log σ + n log m) bits

Compute the top-k strings O(k log k) time

Page 207: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time O(m log σ + n log m) bits

Compute the top-k strings O(k log k) time O(n) bits

Page 208: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time O(m log σ + n log m) bits

Compute the top-k strings O(k log k) time O(n) bits

Page 209: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time O(m log σ + n log m) bits

Compute the top-k strings O(k log k) time O(n) bits

3 months query log at Yahoo!

≈600 million of distinct (and clean) queries

Page 210: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time O(m log σ + n log m) bits

Compute the top-k strings O(k log k) time O(n) bits

3 months query log at Yahoo!

≈600 million of distinct (and clean) queries

Trie requires ≈50 Gbytes!

Page 211: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P| + log n) time O(m log σ + n log m) bits

Compute the top-k strings O(k log k) time O(n) bits

3 months query log at Yahoo!

≈600 million of distinct (and clean) queries

Trie requires ≈50 Gbytes!

We will see how to reduce to ≈5 Gbytes!

Page 212: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

0

1 2

3 4 5 6

ab b

ab ca

c

a b

baacb c

Patricia trie

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

Page 213: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

Page 214: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

First symbol and length of the label

Page 215: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

Page 216: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

P = cba

Page 217: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

P = cba

Page 218: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

P = cba

Page 219: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

P = cba

Page 220: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

P = cba

Page 221: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

P = cba

Blind search. We can skip symbols and

check at the end.

Page 222: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie

0

1 2

3 4 5 6

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

D = { ab, bab, bca, cab, cac, cbac, cbba }

n = |D|, m total length of strings in D

P = cba

Blind search. We can skip symbols and

check at the end.

O(n log m + m log σ) bitsO(|P| + log m) time with TST structure

Page 223: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Page 224: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Page 225: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank0(7) =

Page 226: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank0(7) = 4

Page 227: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Rank0(7) = 4

Page 228: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank1(7) = 8 - Rank0(7) = 4

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Rank0(7) = 4

Page 229: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank1(7) = 8 - Rank0(7) = 4

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Rank0(7) = 4

Select0(j) = position of the j-th 0 in B

Page 230: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank1(7) = 8 - Rank0(7) = 4

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Rank0(7) = 4

Select0(j) = position of the j-th 0 in B

Select1(j) = position of the j-th 0 in B

Page 231: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank1(7) = 8 - Rank0(7) = 4

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Rank0(7) = 4

Select0(j) = position of the j-th 0 in B

Select1(j) = position of the j-th 0 in B

Select1(4) =

Page 232: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank1(7) = 8 - Rank0(7) = 4

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Rank0(7) = 4

Select0(j) = position of the j-th 0 in B

Select1(j) = position of the j-th 0 in B

Select1(4) = 6

Page 233: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank1(7) = 8 - Rank0(7) = 4

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Rank0(7) = 4

Select0(j) = position of the j-th 0 in B

Select1(j) = position of the j-th 0 in B

Select1(4) = 6 Space: n + O(n log log n/log n) bits Query time: O(1)

Page 234: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank1(7) = 8 - Rank0(7) = 4

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Rank0(7) = 4

Select0(j) = position of the j-th 0 in B

Select1(j) = position of the j-th 0 in B

Select1(4) = 6 Space: n + O(n log log n/log n) bits Query time: O(1)

Page 235: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

Page 236: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

Page 237: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

Page 238: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’

Page 239: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Page 240: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) =

Page 241: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) =

Page 242: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) =

Page 243: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

Page 244: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

Page 245: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

1/2 log n bits fit in a word. O(1) with popcount op!

Page 246: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

Page 247: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

Page 248: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1

Page 249: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1

1 = 4

Page 250: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1

1 = 4

How much space?

Page 251: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1

1 = 4

How much space?

O(21/2 log n log n) = O(√n log n) cells,

each uses O(log log n) bits

Page 252: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1

1 = 4

How much space?

O(21/2 log n log n) = O(√n log n) cells,

each uses O(log log n) bits

How much space?

Page 253: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1

1 = 4

How much space?

O(21/2 log n log n) = O(√n log n) cells,

each uses O(log log n) bits

How much space?

O(n/log n) entries, each uses O(log n) bits

⇒ O(n) bits :-(

Page 254: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

1/2 log n

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

Rank0(7) = 3+

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1

1 = 4

How much space?

O(21/2 log n log n) = O(√n log n) cells,

each uses O(log log n) bits

How much space?

O(n/log n) entries, each uses O(log n) bits

⇒ O(n) bits :-(

Space: O(n) + o(n) bits Query time: O(1)

Page 255: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

Page 256: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

Groups into superblocks!

Page 257: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4

Page 258: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4Store the # of 0s up to the beginning of its superblock

Page 259: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4Store the # of 0s up to the beginning of its superblock

0

Page 260: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4Store the # of 0s up to the beginning of its superblock

0

Rank0(j) is split into 3 parts:

- # of 0s up to the superblock of j - # of 0s up to the block of j - # of 0s within the block of j

Page 261: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4

0

How much space?

Page 262: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4

0

How much space?

O(n/log2 n) entries, each uses O(log n) bits ⇒ O(n/log n) bits :-)

Page 263: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4

0

How much space?

Page 264: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4

0

How much space?

O(n/log n) entries, each uses O(log log n) bits ⇒ O(n log log n/log n) bits :-)

Page 265: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4

0

Page 266: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4

0

How to support Select in O(log n) time?

Page 267: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Rank/Select queries

0 0 1 1 1 0 1B 0 1 1 0 10 1 2 3 4 5 6 7 8 9 10 11

Rank0(j) = # of 0 in B[0,j]

Rank1(j) = # of 1 in B[0,j]

Space: n + O(n log log n/log n) bits Query time: O(1)

B’ 0 2 3 4

M1 2 31 2 3

1 2 2000001

0 1 1101

1 1 20101 1 1011

0 1 2100

0 0 11100 0 0111

1/2 log n

log n

B’’ 0 4

0

How to support Select in O(log n) time?

Select can be solved in O(1) time with a more difficult

approach

Page 268: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Page 269: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

Page 270: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

Page 271: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

Trivial representation requires O(n log m) bits :-( Can we do better?

Page 272: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

0

log

m =

log

24

Page 273: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

Page 274: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

Page 275: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

Page 276: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

Page 277: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

Max number of buckets 2log n = n

Page 278: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 279: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 280: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 281: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 282: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 283: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 284: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 285: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 286: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 287: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

012

013

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 288: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

012

013

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 289: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

012

013

1 014 15

Max number of buckets 2log n = n

Write buckets’ cardinalities in unary

Page 290: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

012

013

1 014 15

Max number of buckets 2log n = n

A 1 for each value, at most a 0 for each bucket. <= 2n bits!

Page 291: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

L

012

013

1 014 15

Max number of buckets 2log n = n

A 1 for each value, at most a 0 for each bucket. <= 2n bits!

Page 292: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

L

012

013

1 014 15

Max number of buckets 2log n = n

n log (m/n) bits

A 1 for each value, at most a 0 for each bucket. <= 2n bits!

Page 293: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

L

012

013

1 014 15

Max number of buckets 2log n = n

Access(6)

Page 294: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

L

012

013

1 014 15

Max number of buckets 2log n = n

Access(6)1. Select1(6)-6 =3

Page 295: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

L

012

013

1 014 15

Max number of buckets 2log n = n

Access(6)1. Select1(6)-6 =3 011 in binary!

Page 296: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence S of n (positive) increasing integers up to m

Space: n log (m/n) + O(n) bits Access(i) in O(1)

2 3 5S1 2 3

74

115

136

147

248

0

0

0

1

1

0

0

1

0

1

0

0

1

1

1

0

1

0

1

1

0

1

1

0

1

0

1

1

1

0

1

1

0

0

0

0

0

0

1

0

log

m =

log

24

log (m/n)

log n

H 1 1 01 2 3

1 14 5

06

1 07 8

19

110

011

L

012

013

1 014 15

Max number of buckets 2log n = n

Access(6)1. Select1(6)-6 =3 011 in binary!

2. Access L in position 6*log (m/n) = 13

Page 297: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence of n (positive) integers summing up to m

Page 298: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence of n (positive) integers summing up to m

2 2 4 3S1 2 3 4

Page 299: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence of n (positive) integers summing up to m

2 2 4 3S

Trivial representation requires O(n log m) bits :-( Can we do better?

1 2 3 4

Page 300: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representationGiven a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros

1 2 3 4

Page 301: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representation

1 00 1

Given a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros

1 2 3 4

Page 302: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representation

1 00 1

1 02 3

Given a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros

1 2 3 4

Page 303: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representation

1 00 1

1 02 3

1 1 1 04 5 6 7

Given a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros

1 2 3 4

Page 304: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representation

1 00 1

1 02 3

1 1 1 04 5 6 7

1 1 08 9 10

Given a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros

1 2 3 4

Page 305: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representation

1 00 1

1 02 3

1 1 1 04 5 6 7

1 1 08 9 10

The i-th value of S is Select0(i)-Select0(i-1)

Given a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros

1 2 3 4

Page 306: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representation

1 00 1

1 02 3

1 1 1 04 5 6 7

1 1 08 9 10

The i-th value of S is Select0(i)-Select0(i-1)

Given a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros

1 2 3 4

Page 307: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representation

1 00 1

1 02 3

1 1 1 04 5 6 7

1 1 08 9 10

The i-th value of S is Select0(i)-Select0(i-1)

Given a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros Select0(3)=7Select0(2)=3

1 2 3 4

Page 308: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Elias-Fano representation

1 00 1

1 02 3

1 1 1 04 5 6 7

1 1 08 9 10

The i-th value of S is Select0(i)-Select0(i-1)

Given a sequence of n (positive) integers summing up to m

2 2 4 3S

B

Represent the integer x by writing x-1 in unary to obtain B of m bits with n zeros Select0(3)=7Select0(2)=3

Space: n log (m/n) + O(n) bits Select0 in O(1)

1 2 3 4

Page 309: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Unicorn: A System for Searching the Social Graph

Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko,

Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin,

Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang

Facebook, Inc.

ABSTRACTUnicorn is an online, in-memory social graph-aware index-ing system designed to search trillions of edges between tensof billions of users and entities on thousands of commodityservers. Unicorn is based on standard concepts in informa-tion retrieval, but it includes features to promote resultswith good social proximity. It also supports queries that re-quire multiple round-trips to leaves in order to retrieve ob-jects that are more than one edge away from source nodes.Unicorn is designed to answer billions of queries per day atlatencies in the hundreds of milliseconds, and it serves as aninfrastructural building block for Facebook’s Graph Searchproduct. In this paper, we describe the data model andquery language supported by Unicorn. We also describe itsevolution as it became the primary backend for Facebook’ssearch o↵erings.

1. INTRODUCTIONOver the past three years we have built and deployed a

search system called Unicorn1. Unicorn was designed withthe goal of being able to quickly and scalably search all basicstructured information on the social graph and to performcomplex set operations on the results. Unicorn resemblestraditional search indexing systems [14, 21, 22] and servesits index from memory, but it di↵ers in significant ways be-cause it was built to support social graph retrieval and socialranking from its inception. Unicorn helps products to spliceinteresting views of the social graph online for new user ex-periences.

Unicorn is the primary backend system for Facebook GraphSearch and is designed to serve billions of queries per daywith response latencies less than a few hundred milliseconds.As the product has grown and organically added more fea-tures, Unicorn has been modified to suit the product’s re-quirements. This paper is intended to serve as both a nar-

1The name was chosen because engineers joked that—muchlike the mythical quadruped—this system would solve all ofour problems and heal our woes if only it existed.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee. Articles from this volume were invited to present

their results at The 39th International Conference on Very Large Data Bases,

August 26th - 30th 2013, Riva del Garda, Trento, Italy.

Proceedings of the VLDB Endowment, Vol. 6, No. 11Copyright 2013 VLDB Endowment 2150-8097/13/09... $ 10.00.

rative of the evolution of Unicorn’s architecture, as well asdocumentation for the major features and components ofthe system.To the best of our knowledge, no other online graph re-

trieval system has ever been built with the scale of Unicornin terms of both data volume and query volume. The sys-tem serves tens of billions of nodes and trillions of edgesat scale while accounting for per-edge privacy, and it mustalso support realtime updates for all edges and nodes whileserving billions of daily queries at low latencies.This paper includes three main contributions:

• We describe how we applied common information re-trieval architectural concepts to the domain of the so-cial graph.

• We discuss key features for promoting socially relevantsearch results.

• We discuss two operators, apply and extract, whichallow rich semantic graph queries.

This paper is divided into four major parts. In Sections 2–5, we discuss the motivation for building unicorn, its design,and basic API. In Section 6, we describe how Unicorn wasadapted to serve as the backend for Facebook’s typeaheadsearch. We also discuss how to promote and rank sociallyrelevant results. In Sections 7–8, we build on the imple-mentation of typeahead to construct a new kind of searchengine. By performing multi-stage queries that traverse aseries of edges, the system is able to return complex, user-customized views of the social graph. Finally, in Sections8–10, we talk about privacy, scaling, and the system’s per-formance characteristics for typical queries.

2. THE SOCIAL GRAPHFacebook maintains a database of the inter-relationships

between the people and things in the real world, which itcalls the social graph. Like any other directed graph, itconsists of nodes signifying people and things; and edgesrepresenting a relationship between two nodes. In the re-mainder of this paper, we will use the terms node and entityinterchangeably.Facebook’s primary storage and production serving ar-

chitectures are described in [30]. Entities can be fetchedby their primary key, which is a 64-bit identifier (id). Wealso store the edges between entities. Some edges are di-rectional while others are symmetric, and there are manythousands of edge-types. The most well known edge-type

Page 310: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Unicorn: A System for Searching the Social Graph

Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko,

Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin,

Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang

Facebook, Inc.

ABSTRACTUnicorn is an online, in-memory social graph-aware index-ing system designed to search trillions of edges between tensof billions of users and entities on thousands of commodityservers. Unicorn is based on standard concepts in informa-tion retrieval, but it includes features to promote resultswith good social proximity. It also supports queries that re-quire multiple round-trips to leaves in order to retrieve ob-jects that are more than one edge away from source nodes.Unicorn is designed to answer billions of queries per day atlatencies in the hundreds of milliseconds, and it serves as aninfrastructural building block for Facebook’s Graph Searchproduct. In this paper, we describe the data model andquery language supported by Unicorn. We also describe itsevolution as it became the primary backend for Facebook’ssearch o↵erings.

1. INTRODUCTIONOver the past three years we have built and deployed a

search system called Unicorn1. Unicorn was designed withthe goal of being able to quickly and scalably search all basicstructured information on the social graph and to performcomplex set operations on the results. Unicorn resemblestraditional search indexing systems [14, 21, 22] and servesits index from memory, but it di↵ers in significant ways be-cause it was built to support social graph retrieval and socialranking from its inception. Unicorn helps products to spliceinteresting views of the social graph online for new user ex-periences.

Unicorn is the primary backend system for Facebook GraphSearch and is designed to serve billions of queries per daywith response latencies less than a few hundred milliseconds.As the product has grown and organically added more fea-tures, Unicorn has been modified to suit the product’s re-quirements. This paper is intended to serve as both a nar-

1The name was chosen because engineers joked that—muchlike the mythical quadruped—this system would solve all ofour problems and heal our woes if only it existed.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee. Articles from this volume were invited to present

their results at The 39th International Conference on Very Large Data Bases,

August 26th - 30th 2013, Riva del Garda, Trento, Italy.

Proceedings of the VLDB Endowment, Vol. 6, No. 11Copyright 2013 VLDB Endowment 2150-8097/13/09... $ 10.00.

rative of the evolution of Unicorn’s architecture, as well asdocumentation for the major features and components ofthe system.To the best of our knowledge, no other online graph re-

trieval system has ever been built with the scale of Unicornin terms of both data volume and query volume. The sys-tem serves tens of billions of nodes and trillions of edgesat scale while accounting for per-edge privacy, and it mustalso support realtime updates for all edges and nodes whileserving billions of daily queries at low latencies.This paper includes three main contributions:

• We describe how we applied common information re-trieval architectural concepts to the domain of the so-cial graph.

• We discuss key features for promoting socially relevantsearch results.

• We discuss two operators, apply and extract, whichallow rich semantic graph queries.

This paper is divided into four major parts. In Sections 2–5, we discuss the motivation for building unicorn, its design,and basic API. In Section 6, we describe how Unicorn wasadapted to serve as the backend for Facebook’s typeaheadsearch. We also discuss how to promote and rank sociallyrelevant results. In Sections 7–8, we build on the imple-mentation of typeahead to construct a new kind of searchengine. By performing multi-stage queries that traverse aseries of edges, the system is able to return complex, user-customized views of the social graph. Finally, in Sections8–10, we talk about privacy, scaling, and the system’s per-formance characteristics for typical queries.

2. THE SOCIAL GRAPHFacebook maintains a database of the inter-relationships

between the people and things in the real world, which itcalls the social graph. Like any other directed graph, itconsists of nodes signifying people and things; and edgesrepresenting a relationship between two nodes. In the re-mainder of this paper, we will use the terms node and entityinterchangeably.Facebook’s primary storage and production serving ar-

chitectures are described in [30]. Entities can be fetchedby their primary key, which is a 64-bit identifier (id). Wealso store the edges between entities. Some edges are di-rectional while others are symmetric, and there are manythousands of edge-types. The most well known edge-type

Page 311: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Unicorn: A System for Searching the Social Graph

Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko,

Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin,

Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang

Facebook, Inc.

ABSTRACTUnicorn is an online, in-memory social graph-aware index-ing system designed to search trillions of edges between tensof billions of users and entities on thousands of commodityservers. Unicorn is based on standard concepts in informa-tion retrieval, but it includes features to promote resultswith good social proximity. It also supports queries that re-quire multiple round-trips to leaves in order to retrieve ob-jects that are more than one edge away from source nodes.Unicorn is designed to answer billions of queries per day atlatencies in the hundreds of milliseconds, and it serves as aninfrastructural building block for Facebook’s Graph Searchproduct. In this paper, we describe the data model andquery language supported by Unicorn. We also describe itsevolution as it became the primary backend for Facebook’ssearch o↵erings.

1. INTRODUCTIONOver the past three years we have built and deployed a

search system called Unicorn1. Unicorn was designed withthe goal of being able to quickly and scalably search all basicstructured information on the social graph and to performcomplex set operations on the results. Unicorn resemblestraditional search indexing systems [14, 21, 22] and servesits index from memory, but it di↵ers in significant ways be-cause it was built to support social graph retrieval and socialranking from its inception. Unicorn helps products to spliceinteresting views of the social graph online for new user ex-periences.

Unicorn is the primary backend system for Facebook GraphSearch and is designed to serve billions of queries per daywith response latencies less than a few hundred milliseconds.As the product has grown and organically added more fea-tures, Unicorn has been modified to suit the product’s re-quirements. This paper is intended to serve as both a nar-

1The name was chosen because engineers joked that—muchlike the mythical quadruped—this system would solve all ofour problems and heal our woes if only it existed.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee. Articles from this volume were invited to present

their results at The 39th International Conference on Very Large Data Bases,

August 26th - 30th 2013, Riva del Garda, Trento, Italy.

Proceedings of the VLDB Endowment, Vol. 6, No. 11Copyright 2013 VLDB Endowment 2150-8097/13/09... $ 10.00.

rative of the evolution of Unicorn’s architecture, as well asdocumentation for the major features and components ofthe system.To the best of our knowledge, no other online graph re-

trieval system has ever been built with the scale of Unicornin terms of both data volume and query volume. The sys-tem serves tens of billions of nodes and trillions of edgesat scale while accounting for per-edge privacy, and it mustalso support realtime updates for all edges and nodes whileserving billions of daily queries at low latencies.This paper includes three main contributions:

• We describe how we applied common information re-trieval architectural concepts to the domain of the so-cial graph.

• We discuss key features for promoting socially relevantsearch results.

• We discuss two operators, apply and extract, whichallow rich semantic graph queries.

This paper is divided into four major parts. In Sections 2–5, we discuss the motivation for building unicorn, its design,and basic API. In Section 6, we describe how Unicorn wasadapted to serve as the backend for Facebook’s typeaheadsearch. We also discuss how to promote and rank sociallyrelevant results. In Sections 7–8, we build on the imple-mentation of typeahead to construct a new kind of searchengine. By performing multi-stage queries that traverse aseries of edges, the system is able to return complex, user-customized views of the social graph. Finally, in Sections8–10, we talk about privacy, scaling, and the system’s per-formance characteristics for typical queries.

2. THE SOCIAL GRAPHFacebook maintains a database of the inter-relationships

between the people and things in the real world, which itcalls the social graph. Like any other directed graph, itconsists of nodes signifying people and things; and edgesrepresenting a relationship between two nodes. In the re-mainder of this paper, we will use the terms node and entityinterchangeably.Facebook’s primary storage and production serving ar-

chitectures are described in [30]. Entities can be fetchedby their primary key, which is a 64-bit identifier (id). Wealso store the edges between entities. Some edges are di-rectional while others are symmetric, and there are manythousands of edge-types. The most well known edge-type

Page 312: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Unicorn: A System for Searching the Social Graph

Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko,

Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin,

Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang

Facebook, Inc.

ABSTRACTUnicorn is an online, in-memory social graph-aware index-ing system designed to search trillions of edges between tensof billions of users and entities on thousands of commodityservers. Unicorn is based on standard concepts in informa-tion retrieval, but it includes features to promote resultswith good social proximity. It also supports queries that re-quire multiple round-trips to leaves in order to retrieve ob-jects that are more than one edge away from source nodes.Unicorn is designed to answer billions of queries per day atlatencies in the hundreds of milliseconds, and it serves as aninfrastructural building block for Facebook’s Graph Searchproduct. In this paper, we describe the data model andquery language supported by Unicorn. We also describe itsevolution as it became the primary backend for Facebook’ssearch o↵erings.

1. INTRODUCTIONOver the past three years we have built and deployed a

search system called Unicorn1. Unicorn was designed withthe goal of being able to quickly and scalably search all basicstructured information on the social graph and to performcomplex set operations on the results. Unicorn resemblestraditional search indexing systems [14, 21, 22] and servesits index from memory, but it di↵ers in significant ways be-cause it was built to support social graph retrieval and socialranking from its inception. Unicorn helps products to spliceinteresting views of the social graph online for new user ex-periences.

Unicorn is the primary backend system for Facebook GraphSearch and is designed to serve billions of queries per daywith response latencies less than a few hundred milliseconds.As the product has grown and organically added more fea-tures, Unicorn has been modified to suit the product’s re-quirements. This paper is intended to serve as both a nar-

1The name was chosen because engineers joked that—muchlike the mythical quadruped—this system would solve all ofour problems and heal our woes if only it existed.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee. Articles from this volume were invited to present

their results at The 39th International Conference on Very Large Data Bases,

August 26th - 30th 2013, Riva del Garda, Trento, Italy.

Proceedings of the VLDB Endowment, Vol. 6, No. 11Copyright 2013 VLDB Endowment 2150-8097/13/09... $ 10.00.

rative of the evolution of Unicorn’s architecture, as well asdocumentation for the major features and components ofthe system.To the best of our knowledge, no other online graph re-

trieval system has ever been built with the scale of Unicornin terms of both data volume and query volume. The sys-tem serves tens of billions of nodes and trillions of edgesat scale while accounting for per-edge privacy, and it mustalso support realtime updates for all edges and nodes whileserving billions of daily queries at low latencies.This paper includes three main contributions:

• We describe how we applied common information re-trieval architectural concepts to the domain of the so-cial graph.

• We discuss key features for promoting socially relevantsearch results.

• We discuss two operators, apply and extract, whichallow rich semantic graph queries.

This paper is divided into four major parts. In Sections 2–5, we discuss the motivation for building unicorn, its design,and basic API. In Section 6, we describe how Unicorn wasadapted to serve as the backend for Facebook’s typeaheadsearch. We also discuss how to promote and rank sociallyrelevant results. In Sections 7–8, we build on the imple-mentation of typeahead to construct a new kind of searchengine. By performing multi-stage queries that traverse aseries of edges, the system is able to return complex, user-customized views of the social graph. Finally, in Sections8–10, we talk about privacy, scaling, and the system’s per-formance characteristics for typical queries.

2. THE SOCIAL GRAPHFacebook maintains a database of the inter-relationships

between the people and things in the real world, which itcalls the social graph. Like any other directed graph, itconsists of nodes signifying people and things; and edgesrepresenting a relationship between two nodes. In the re-mainder of this paper, we will use the terms node and entityinterchangeably.Facebook’s primary storage and production serving ar-

chitectures are described in [30]. Entities can be fetchedby their primary key, which is a 64-bit identifier (id). Wealso store the edges between entities. Some edges are di-rectional while others are symmetric, and there are manythousands of edge-types. The most well known edge-type

101 102 103 104 1050

20

40

60

Inner Truncation Limit

RelativeDeviation

(%)

The first two plots above show that increasing the in-ner truncation limit leads to higher latency and cost, withlatency passing 100ms at approximately a limit of 5000.Query cost increases sub-linearly, but there will likely alwaysbe queries whose inner result sets will need to be truncatedto prevent latency from exceeding a reasonable threshold(say, 100ms).

This means that—barring architectural changes—we willalways have queries that require inner truncation. As men-tioned in section 7.1.2, good inner query ranking is typicallythe most useful tool. However, it is always possible to con-struct “needle-in-a-haystack” queries that elicit bad perfor-mance. Query planning and selective denormalization canhelp in many of these cases.

The final plot shows that relative deviation increases grad-ually as the truncation limit increases. Larger outer querieshave higher variance per shard as perceived by the rack ag-gregator. This is likely because network queuing delays be-come more likely as query size increases.

11. RELATED WORKIn the last few years, sustained progress has been made in

scaling graph search via the SPARQL language, and someof these systems focus, like Unicorn, on real-time responseto ad-hoc queries [24, 12]. Where Unicorn seeks to handle afinite number of well understood edges and scale to trillionsof edges, SPARQL engines intend to handle arbitrary graphstructure and complex queries, and scale to tens of millionsof edges [24]. That said, it is interesting to note that thecurrent state-of-the-art in performance is based on variantsof a structure from [12] in which data is vertically partitionedand stored in a column store. This data structure uses aclustered B+ tree of (subject-id, value) for each propertyand emphasizes merge-joins, and thus seems to be evolvingtoward a posting-list-style architecture with fast intersectionand union as supported by Unicorn.

Recently, work has been done on adding keyword searchto SPARQL-style queries [28, 15], leading to the integrationof posting lists retrieval with structured indices. This workis currently at much smaller scale than Unicorn. Startingwith XML data graphs, work has been done to search forsubgraphs based on keywords (see, e.g. [16, 20]). The focusof this work is returning a subgraph, while Unicorn returnsan ordered list of entities.

In some work [13, 18], the term ’social search’ in fact refersto a system that supports question-answering, and the socialgraph is used to predict which person can answer a question.

While some similar ranking features may be used, Unicornsupports queries about the social graph rather than via thegraph, a fundamentally di↵erent application.

12. CONCLUSIONIn this paper, we have described the evolution of a graph-

based indexing system and how we added features to make ituseful for a consumer product that receives billions of queriesper week. Our main contributions are showing how manyinformation retrieval concepts can be put to work for serv-ing graph queries, and we described a simple yet practicalmultiple round-trip algorithm for serving even more com-plex queries where edges are not denormalized and insteadmust be traversed sequentially.

13. ACKNOWLEDGMENTSWe would like to acknowledge the product engineers who

use Unicorn as clients every day, and we also want to thankour production engineers who deal with our bugs and helpextinguish our fires. Also, thanks to Cameron Marlow forcontributing graphics and Dhruba Borthakur for conversa-tions about Facebook’s storage architecture. Several Face-book engineers contributed helpful feedback to this paperincluding Philip Bohannon and Chaitanya Mishra. BretTaylor was instrumental in coming up with many of theoriginal ideas for Unicorn and its design. We would alsolike to acknowledge the following individuals for their con-tributions to Unicorn: Spencer Ahrens, Neil Blakey-Milner,Udeepta Bordoloi, Chris Bray, Jordan DeLong, Shuai Ding,Jeremy Lilley, Jim Norris, Feng Qian, Nathan Schrenk, San-jeev Singh, Ryan Stout, Evan Stratford, Scott Straw, andSherman Ye.

Open SourceAll Unicorn index server and aggregator code is written inC++. Unicorn relies extensively on modules in Facebook’s“Folly” Open Source Library [5]. As part of the e↵ort ofreleasing Graph Search, we have open-sourced a C++ im-plementation of the Elias-Fano index representation [31] aspart of Folly.

14. REFERENCES[1] Apache Hadoop. http://hadoop.apache.org/.[2] Apache Thrift. http://thrift.apache.org/.[3] Description of HHVM (PHP Virtual machine).

https://www.facebook.com/note.php?note_id=

10150415177928920.[4] Facebook Graph Search.

https://www.facebook.com/about/graphsearch.[5] Folly GitHub Repository.

http://github.com/facebook/folly.[6] HPHP for PHP GitHub Repository.

http://github.com/facebook/hiphop-php.[7] Open Compute Project.

http://www.opencompute.org/.[8] Scribe Facebook Blog Post. http://www.

facebook.com/note.php?note_id=32008268919.[9] Scribe GitHub Repository.

http://www.github.com/facebook/scribe.[10] The Life of a Typeahead Query. http:

//www.facebook.com/notes/facebook-engineering/

the-life-of-a-typeahead-query/389105248919.

Page 313: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Unicorn: A System for Searching the Social Graph

Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko,

Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin,

Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang

Facebook, Inc.

ABSTRACTUnicorn is an online, in-memory social graph-aware index-ing system designed to search trillions of edges between tensof billions of users and entities on thousands of commodityservers. Unicorn is based on standard concepts in informa-tion retrieval, but it includes features to promote resultswith good social proximity. It also supports queries that re-quire multiple round-trips to leaves in order to retrieve ob-jects that are more than one edge away from source nodes.Unicorn is designed to answer billions of queries per day atlatencies in the hundreds of milliseconds, and it serves as aninfrastructural building block for Facebook’s Graph Searchproduct. In this paper, we describe the data model andquery language supported by Unicorn. We also describe itsevolution as it became the primary backend for Facebook’ssearch o↵erings.

1. INTRODUCTIONOver the past three years we have built and deployed a

search system called Unicorn1. Unicorn was designed withthe goal of being able to quickly and scalably search all basicstructured information on the social graph and to performcomplex set operations on the results. Unicorn resemblestraditional search indexing systems [14, 21, 22] and servesits index from memory, but it di↵ers in significant ways be-cause it was built to support social graph retrieval and socialranking from its inception. Unicorn helps products to spliceinteresting views of the social graph online for new user ex-periences.

Unicorn is the primary backend system for Facebook GraphSearch and is designed to serve billions of queries per daywith response latencies less than a few hundred milliseconds.As the product has grown and organically added more fea-tures, Unicorn has been modified to suit the product’s re-quirements. This paper is intended to serve as both a nar-

1The name was chosen because engineers joked that—muchlike the mythical quadruped—this system would solve all ofour problems and heal our woes if only it existed.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee. Articles from this volume were invited to present

their results at The 39th International Conference on Very Large Data Bases,

August 26th - 30th 2013, Riva del Garda, Trento, Italy.

Proceedings of the VLDB Endowment, Vol. 6, No. 11Copyright 2013 VLDB Endowment 2150-8097/13/09... $ 10.00.

rative of the evolution of Unicorn’s architecture, as well asdocumentation for the major features and components ofthe system.To the best of our knowledge, no other online graph re-

trieval system has ever been built with the scale of Unicornin terms of both data volume and query volume. The sys-tem serves tens of billions of nodes and trillions of edgesat scale while accounting for per-edge privacy, and it mustalso support realtime updates for all edges and nodes whileserving billions of daily queries at low latencies.This paper includes three main contributions:

• We describe how we applied common information re-trieval architectural concepts to the domain of the so-cial graph.

• We discuss key features for promoting socially relevantsearch results.

• We discuss two operators, apply and extract, whichallow rich semantic graph queries.

This paper is divided into four major parts. In Sections 2–5, we discuss the motivation for building unicorn, its design,and basic API. In Section 6, we describe how Unicorn wasadapted to serve as the backend for Facebook’s typeaheadsearch. We also discuss how to promote and rank sociallyrelevant results. In Sections 7–8, we build on the imple-mentation of typeahead to construct a new kind of searchengine. By performing multi-stage queries that traverse aseries of edges, the system is able to return complex, user-customized views of the social graph. Finally, in Sections8–10, we talk about privacy, scaling, and the system’s per-formance characteristics for typical queries.

2. THE SOCIAL GRAPHFacebook maintains a database of the inter-relationships

between the people and things in the real world, which itcalls the social graph. Like any other directed graph, itconsists of nodes signifying people and things; and edgesrepresenting a relationship between two nodes. In the re-mainder of this paper, we will use the terms node and entityinterchangeably.Facebook’s primary storage and production serving ar-

chitectures are described in [30]. Entities can be fetchedby their primary key, which is a 64-bit identifier (id). Wealso store the edges between entities. Some edges are di-rectional while others are symmetric, and there are manythousands of edge-types. The most well known edge-type

101 102 103 104 1050

20

40

60

Inner Truncation Limit

RelativeDeviation

(%)

The first two plots above show that increasing the in-ner truncation limit leads to higher latency and cost, withlatency passing 100ms at approximately a limit of 5000.Query cost increases sub-linearly, but there will likely alwaysbe queries whose inner result sets will need to be truncatedto prevent latency from exceeding a reasonable threshold(say, 100ms).

This means that—barring architectural changes—we willalways have queries that require inner truncation. As men-tioned in section 7.1.2, good inner query ranking is typicallythe most useful tool. However, it is always possible to con-struct “needle-in-a-haystack” queries that elicit bad perfor-mance. Query planning and selective denormalization canhelp in many of these cases.

The final plot shows that relative deviation increases grad-ually as the truncation limit increases. Larger outer querieshave higher variance per shard as perceived by the rack ag-gregator. This is likely because network queuing delays be-come more likely as query size increases.

11. RELATED WORKIn the last few years, sustained progress has been made in

scaling graph search via the SPARQL language, and someof these systems focus, like Unicorn, on real-time responseto ad-hoc queries [24, 12]. Where Unicorn seeks to handle afinite number of well understood edges and scale to trillionsof edges, SPARQL engines intend to handle arbitrary graphstructure and complex queries, and scale to tens of millionsof edges [24]. That said, it is interesting to note that thecurrent state-of-the-art in performance is based on variantsof a structure from [12] in which data is vertically partitionedand stored in a column store. This data structure uses aclustered B+ tree of (subject-id, value) for each propertyand emphasizes merge-joins, and thus seems to be evolvingtoward a posting-list-style architecture with fast intersectionand union as supported by Unicorn.

Recently, work has been done on adding keyword searchto SPARQL-style queries [28, 15], leading to the integrationof posting lists retrieval with structured indices. This workis currently at much smaller scale than Unicorn. Startingwith XML data graphs, work has been done to search forsubgraphs based on keywords (see, e.g. [16, 20]). The focusof this work is returning a subgraph, while Unicorn returnsan ordered list of entities.

In some work [13, 18], the term ’social search’ in fact refersto a system that supports question-answering, and the socialgraph is used to predict which person can answer a question.

While some similar ranking features may be used, Unicornsupports queries about the social graph rather than via thegraph, a fundamentally di↵erent application.

12. CONCLUSIONIn this paper, we have described the evolution of a graph-

based indexing system and how we added features to make ituseful for a consumer product that receives billions of queriesper week. Our main contributions are showing how manyinformation retrieval concepts can be put to work for serv-ing graph queries, and we described a simple yet practicalmultiple round-trip algorithm for serving even more com-plex queries where edges are not denormalized and insteadmust be traversed sequentially.

13. ACKNOWLEDGMENTSWe would like to acknowledge the product engineers who

use Unicorn as clients every day, and we also want to thankour production engineers who deal with our bugs and helpextinguish our fires. Also, thanks to Cameron Marlow forcontributing graphics and Dhruba Borthakur for conversa-tions about Facebook’s storage architecture. Several Face-book engineers contributed helpful feedback to this paperincluding Philip Bohannon and Chaitanya Mishra. BretTaylor was instrumental in coming up with many of theoriginal ideas for Unicorn and its design. We would alsolike to acknowledge the following individuals for their con-tributions to Unicorn: Spencer Ahrens, Neil Blakey-Milner,Udeepta Bordoloi, Chris Bray, Jordan DeLong, Shuai Ding,Jeremy Lilley, Jim Norris, Feng Qian, Nathan Schrenk, San-jeev Singh, Ryan Stout, Evan Stratford, Scott Straw, andSherman Ye.

Open SourceAll Unicorn index server and aggregator code is written inC++. Unicorn relies extensively on modules in Facebook’s“Folly” Open Source Library [5]. As part of the e↵ort ofreleasing Graph Search, we have open-sourced a C++ im-plementation of the Elias-Fano index representation [31] aspart of Folly.

14. REFERENCES[1] Apache Hadoop. http://hadoop.apache.org/.[2] Apache Thrift. http://thrift.apache.org/.[3] Description of HHVM (PHP Virtual machine).

https://www.facebook.com/note.php?note_id=

10150415177928920.[4] Facebook Graph Search.

https://www.facebook.com/about/graphsearch.[5] Folly GitHub Repository.

http://github.com/facebook/folly.[6] HPHP for PHP GitHub Repository.

http://github.com/facebook/hiphop-php.[7] Open Compute Project.

http://www.opencompute.org/.[8] Scribe Facebook Blog Post. http://www.

facebook.com/note.php?note_id=32008268919.[9] Scribe GitHub Repository.

http://www.github.com/facebook/scribe.[10] The Life of a Typeahead Query. http:

//www.facebook.com/notes/facebook-engineering/

the-life-of-a-typeahead-query/389105248919.

Page 314: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Unicorn: A System for Searching the Social Graph

Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko,

Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin,

Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang

Facebook, Inc.

ABSTRACTUnicorn is an online, in-memory social graph-aware index-ing system designed to search trillions of edges between tensof billions of users and entities on thousands of commodityservers. Unicorn is based on standard concepts in informa-tion retrieval, but it includes features to promote resultswith good social proximity. It also supports queries that re-quire multiple round-trips to leaves in order to retrieve ob-jects that are more than one edge away from source nodes.Unicorn is designed to answer billions of queries per day atlatencies in the hundreds of milliseconds, and it serves as aninfrastructural building block for Facebook’s Graph Searchproduct. In this paper, we describe the data model andquery language supported by Unicorn. We also describe itsevolution as it became the primary backend for Facebook’ssearch o↵erings.

1. INTRODUCTIONOver the past three years we have built and deployed a

search system called Unicorn1. Unicorn was designed withthe goal of being able to quickly and scalably search all basicstructured information on the social graph and to performcomplex set operations on the results. Unicorn resemblestraditional search indexing systems [14, 21, 22] and servesits index from memory, but it di↵ers in significant ways be-cause it was built to support social graph retrieval and socialranking from its inception. Unicorn helps products to spliceinteresting views of the social graph online for new user ex-periences.

Unicorn is the primary backend system for Facebook GraphSearch and is designed to serve billions of queries per daywith response latencies less than a few hundred milliseconds.As the product has grown and organically added more fea-tures, Unicorn has been modified to suit the product’s re-quirements. This paper is intended to serve as both a nar-

1The name was chosen because engineers joked that—muchlike the mythical quadruped—this system would solve all ofour problems and heal our woes if only it existed.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee. Articles from this volume were invited to present

their results at The 39th International Conference on Very Large Data Bases,

August 26th - 30th 2013, Riva del Garda, Trento, Italy.

Proceedings of the VLDB Endowment, Vol. 6, No. 11Copyright 2013 VLDB Endowment 2150-8097/13/09... $ 10.00.

rative of the evolution of Unicorn’s architecture, as well asdocumentation for the major features and components ofthe system.To the best of our knowledge, no other online graph re-

trieval system has ever been built with the scale of Unicornin terms of both data volume and query volume. The sys-tem serves tens of billions of nodes and trillions of edgesat scale while accounting for per-edge privacy, and it mustalso support realtime updates for all edges and nodes whileserving billions of daily queries at low latencies.This paper includes three main contributions:

• We describe how we applied common information re-trieval architectural concepts to the domain of the so-cial graph.

• We discuss key features for promoting socially relevantsearch results.

• We discuss two operators, apply and extract, whichallow rich semantic graph queries.

This paper is divided into four major parts. In Sections 2–5, we discuss the motivation for building unicorn, its design,and basic API. In Section 6, we describe how Unicorn wasadapted to serve as the backend for Facebook’s typeaheadsearch. We also discuss how to promote and rank sociallyrelevant results. In Sections 7–8, we build on the imple-mentation of typeahead to construct a new kind of searchengine. By performing multi-stage queries that traverse aseries of edges, the system is able to return complex, user-customized views of the social graph. Finally, in Sections8–10, we talk about privacy, scaling, and the system’s per-formance characteristics for typical queries.

2. THE SOCIAL GRAPHFacebook maintains a database of the inter-relationships

between the people and things in the real world, which itcalls the social graph. Like any other directed graph, itconsists of nodes signifying people and things; and edgesrepresenting a relationship between two nodes. In the re-mainder of this paper, we will use the terms node and entityinterchangeably.Facebook’s primary storage and production serving ar-

chitectures are described in [30]. Entities can be fetchedby their primary key, which is a 64-bit identifier (id). Wealso store the edges between entities. Some edges are di-rectional while others are symmetric, and there are manythousands of edge-types. The most well known edge-type

101 102 103 104 1050

20

40

60

Inner Truncation Limit

RelativeDeviation

(%)

The first two plots above show that increasing the in-ner truncation limit leads to higher latency and cost, withlatency passing 100ms at approximately a limit of 5000.Query cost increases sub-linearly, but there will likely alwaysbe queries whose inner result sets will need to be truncatedto prevent latency from exceeding a reasonable threshold(say, 100ms).

This means that—barring architectural changes—we willalways have queries that require inner truncation. As men-tioned in section 7.1.2, good inner query ranking is typicallythe most useful tool. However, it is always possible to con-struct “needle-in-a-haystack” queries that elicit bad perfor-mance. Query planning and selective denormalization canhelp in many of these cases.

The final plot shows that relative deviation increases grad-ually as the truncation limit increases. Larger outer querieshave higher variance per shard as perceived by the rack ag-gregator. This is likely because network queuing delays be-come more likely as query size increases.

11. RELATED WORKIn the last few years, sustained progress has been made in

scaling graph search via the SPARQL language, and someof these systems focus, like Unicorn, on real-time responseto ad-hoc queries [24, 12]. Where Unicorn seeks to handle afinite number of well understood edges and scale to trillionsof edges, SPARQL engines intend to handle arbitrary graphstructure and complex queries, and scale to tens of millionsof edges [24]. That said, it is interesting to note that thecurrent state-of-the-art in performance is based on variantsof a structure from [12] in which data is vertically partitionedand stored in a column store. This data structure uses aclustered B+ tree of (subject-id, value) for each propertyand emphasizes merge-joins, and thus seems to be evolvingtoward a posting-list-style architecture with fast intersectionand union as supported by Unicorn.

Recently, work has been done on adding keyword searchto SPARQL-style queries [28, 15], leading to the integrationof posting lists retrieval with structured indices. This workis currently at much smaller scale than Unicorn. Startingwith XML data graphs, work has been done to search forsubgraphs based on keywords (see, e.g. [16, 20]). The focusof this work is returning a subgraph, while Unicorn returnsan ordered list of entities.

In some work [13, 18], the term ’social search’ in fact refersto a system that supports question-answering, and the socialgraph is used to predict which person can answer a question.

While some similar ranking features may be used, Unicornsupports queries about the social graph rather than via thegraph, a fundamentally di↵erent application.

12. CONCLUSIONIn this paper, we have described the evolution of a graph-

based indexing system and how we added features to make ituseful for a consumer product that receives billions of queriesper week. Our main contributions are showing how manyinformation retrieval concepts can be put to work for serv-ing graph queries, and we described a simple yet practicalmultiple round-trip algorithm for serving even more com-plex queries where edges are not denormalized and insteadmust be traversed sequentially.

13. ACKNOWLEDGMENTSWe would like to acknowledge the product engineers who

use Unicorn as clients every day, and we also want to thankour production engineers who deal with our bugs and helpextinguish our fires. Also, thanks to Cameron Marlow forcontributing graphics and Dhruba Borthakur for conversa-tions about Facebook’s storage architecture. Several Face-book engineers contributed helpful feedback to this paperincluding Philip Bohannon and Chaitanya Mishra. BretTaylor was instrumental in coming up with many of theoriginal ideas for Unicorn and its design. We would alsolike to acknowledge the following individuals for their con-tributions to Unicorn: Spencer Ahrens, Neil Blakey-Milner,Udeepta Bordoloi, Chris Bray, Jordan DeLong, Shuai Ding,Jeremy Lilley, Jim Norris, Feng Qian, Nathan Schrenk, San-jeev Singh, Ryan Stout, Evan Stratford, Scott Straw, andSherman Ye.

Open SourceAll Unicorn index server and aggregator code is written inC++. Unicorn relies extensively on modules in Facebook’s“Folly” Open Source Library [5]. As part of the e↵ort ofreleasing Graph Search, we have open-sourced a C++ im-plementation of the Elias-Fano index representation [31] aspart of Folly.

14. REFERENCES[1] Apache Hadoop. http://hadoop.apache.org/.[2] Apache Thrift. http://thrift.apache.org/.[3] Description of HHVM (PHP Virtual machine).

https://www.facebook.com/note.php?note_id=

10150415177928920.[4] Facebook Graph Search.

https://www.facebook.com/about/graphsearch.[5] Folly GitHub Repository.

http://github.com/facebook/folly.[6] HPHP for PHP GitHub Repository.

http://github.com/facebook/hiphop-php.[7] Open Compute Project.

http://www.opencompute.org/.[8] Scribe Facebook Blog Post. http://www.

facebook.com/note.php?note_id=32008268919.[9] Scribe GitHub Repository.

http://www.github.com/facebook/scribe.[10] The Life of a Typeahead Query. http:

//www.facebook.com/notes/facebook-engineering/

the-life-of-a-typeahead-query/389105248919.

Page 315: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)[LOUDS - Level-order unary degree sequence]

Page 316: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits

[LOUDS - Level-order unary degree sequence]

Page 317: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

[LOUDS - Level-order unary degree sequence]

Page 318: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

3

[LOUDS - Level-order unary degree sequence]

Page 319: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

3

0 2 2

2 20 0

0 0 0 0

[LOUDS - Level-order unary degree sequence]

Page 320: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

[LOUDS - Level-order unary degree sequence]

Page 321: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3

[LOUDS - Level-order unary degree sequence]

Page 322: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2

[LOUDS - Level-order unary degree sequence]

Page 323: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2

[LOUDS - Level-order unary degree sequence]

Page 324: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0

[LOUDS - Level-order unary degree sequence]

Page 325: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0A tree is uniquely determined by the

degree sequence

[LOUDS - Level-order unary degree sequence]

Page 326: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0A tree is uniquely determined by the

degree sequence

How reconstruct the tree?

[LOUDS - Level-order unary degree sequence]

Page 327: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0

It still requires O(n log n) bits :-(

[LOUDS - Level-order unary degree sequence]

Page 328: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0

It still requires O(n log n) bits :-(

Solution: write them in unary

[LOUDS - Level-order unary degree sequence]

Page 329: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0

It still requires O(n log n) bits :-(

Solution: write them in unary

B

[LOUDS - Level-order unary degree sequence]

Page 330: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0

It still requires O(n log n) bits :-(

Solution: write them in unary

B 1110

[LOUDS - Level-order unary degree sequence]

Page 331: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0

It still requires O(n log n) bits :-(

Solution: write them in unary

B 1110 0 110 110 0 0 110 110 0 0 0 0

[LOUDS - Level-order unary degree sequence]

Page 332: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0

It still requires O(n log n) bits :-(

Solution: write them in unary

B 1110 0 110 110 0 0 110 110 0 0 0 0

B takes 2n - 1 bits! For each node we have a 0 and a 1

(but the root)

[LOUDS - Level-order unary degree sequence]

Page 333: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

Trivial: O(n log n) bits Best: 2n bits

Write the degree sequence in level order

3

0 2 2

2 20 0

0 0 0 0

D 3 0 2 2 0 0 2 2 0 0 0 0

It still requires O(n log n) bits :-(

Solution: write them in unary

B 1110 0 110 110 0 0 110 110 0 0 0 0

B takes 2n - 1 bits! For each node we have a 0 and a 1

(but the root)

Can we navigate the tree?

[LOUDS - Level-order unary degree sequence]

Page 334: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

Page 335: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

Page 336: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

Page 337: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

Page 338: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

Page 339: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

Page 340: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

1

Page 341: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

Page 342: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) =

1 2 3 4 5 6 7 8 9 10 11 12

pos(5)

Page 343: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

pos(5)

Page 344: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ?

Page 345: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

Page 346: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

firstChild(3)

Page 347: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

y= Select0(3)+1=8

firstChild(3)

Page 348: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

y= Select0(3)+1=8

firstChild(3)

Page 349: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

y= Select0(3)+1=8

firstChild(3)

Page 350: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

y= Select0(3)+1=8

firstChild(3) = 8-3 = 5

Page 351: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

y= Select0(3)+1=8

degree(x) = ?

Page 352: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

y= Select0(3)+1=8

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

Page 353: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

y= Select0(3)+1=8

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

degree(3)

Page 354: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

y= Select0(3)+1=8

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

degree(3) = Select0(4) - (Select0(3) + 1)

Page 355: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

y= Select0(3)+1=8

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

Select0(4)=10

degree(3) = Select0(4) - (Select0(3) + 1)

Page 356: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

parent(x) =

Page 357: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

parent(x) = Rank0(pos(x))

Page 358: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

parent(x) = Rank0(pos(x))All these operations in O(1) time!

Page 359: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

parent(x) = Rank0(pos(x))

subtreeSize(x) = ?

Page 360: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

parent(x) = Rank0(pos(x))

subtreeSize(x) = ?

Not efficient! Nodes of the subtree are

spread in B

Page 361: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (1)

1 0B 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0

[LOUDS - Level-order unary degree sequence]

1

3 42

7 85 6

9 10 11 12

pos(x) = Select1(x)

1 2 3 4 5 6 7 8 9 10 11 12

firstChild(x) = ? y = Select0(x)+1// start of x’s children in B

if B[y] == 0return -1 // is a leaf

return y-x // Rank1(y)

else

degree(x) = ? Select0(x+1) - (Select0(x) + 1)

parent(x) = Rank0(pos(x))

subtreeSize(x) = ?

Page 362: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

Page 363: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

Page 364: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

Page 365: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

Page 366: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

2

Page 367: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

2

Page 368: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

32

Page 369: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

32

Page 370: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

32

Page 371: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

32

Page 372: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

Page 373: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

Page 374: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

Page 375: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

A tree is uniquely determined by the vector B

How reconstruct the tree?

Page 376: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

subtree of 6

Page 377: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

subtree of 6( and ) balance themselves

Page 378: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) =

Page 379: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)

Page 380: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (

findClose(7)

Page 381: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

enclose(10)

= position of the parent of x in B

Page 382: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

enclose(10)

= position of the parent of x in B

They can be implemented in O(1) time.

Page 383: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

enclose(10)

= position of the parent of x in Bparent(x) =

They can be implemented in O(1) time.

Page 384: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

enclose(10)

= position of the parent of x in Bparent(x) = Rank ( (enclose(x))

They can be implemented in O(1) time.

Page 385: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

They can be implemented in O(1) time.

Page 386: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

They can be implemented in O(1) time.

Page 387: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) =

They can be implemented in O(1) time.

Page 388: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) = Rank ( (findClose(x)+1) (if any)

They can be implemented in O(1) time.

Page 389: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) = Rank ( (findClose(x)+1) (if any)subtreeSize(x) =

They can be implemented in O(1) time.

Page 390: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) = Rank ( (findClose(x)+1) (if any)subtreeSize(x) =

(findClose(x) - pos(x) + 1) / 2

They can be implemented in O(1) time.

Page 391: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) = Rank ( (findClose(x)+1) (if any)subtreeSize(x) =

(findClose(x) - pos(x) + 1) / 2depth(x) =

They can be implemented in O(1) time.

Page 392: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) = Rank ( (findClose(x)+1) (if any)subtreeSize(x) =

(findClose(x) - pos(x) + 1) / 2depth(x) =

Rank ( (pos(x)) - Rank ) (pos(x))

They can be implemented in O(1) time.

Page 393: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) = Rank ( (findClose(x)+1) (if any)subtreeSize(x) =

(findClose(x) - pos(x) + 1) / 2depth(x) =

Rank ( (pos(x)) - Rank ) (pos(x))

They can be implemented in O(1) time.

All these operations in O(1) time!

Page 394: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) = Rank ( (findClose(x)+1) (if any)subtreeSize(x) =

(findClose(x) - pos(x) + 1) / 2

degree(x) = ?

depth(x) =Rank ( (pos(x)) - Rank ) (pos(x))

They can be implemented in O(1) time.

Page 395: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (2)

B ( ( ) ( ( ) ( ) ) ( ( ( ) ( ) ) ( ( ) ( ) ) ) )

[BP - Balanced parenthesis]

1

32 6

7 104 5

8 9 11 12

1 2 3 4 5 6 7 8 9 10 11 12 12 34 5 678 9 1011 12

pos(x) = Select ( (x)findClose(x) = returns the position of ) matching x-th (enclose(x) = returns the position of ( enclosing x-th (

= position of the parent of x in B

firstChild(x) =parent(x) = Rank ( (enclose(x))

y = pos(x)+1if B[y] == )

return -1 // is a leaf

return Rank ( (y)else

sibling(x) = Rank ( (findClose(x)+1) (if any)subtreeSize(x) =

(findClose(x) - pos(x) + 1) / 2

degree(x) = ?

Quite inefficient! Solved by repeatedly calling sibling to scan x’s children.

depth(x) =Rank ( (pos(x)) - Rank ) (pos(x))

They can be implemented in O(1) time.

Page 396: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

Page 397: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

Page 398: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

Page 399: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

Page 400: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

1 2 3 546 107 98 11 1291 32 4 875 6 10 11 12

Page 401: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

1 2 3 546 107 98 11 1291 32 4 875 6 10 11 12

subtree of 6

Page 402: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

1 2 3 546 107 98 11 1291 32 4 875 6 10 11 12

children of 1

Page 403: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

1 2 3 546 107 98 11 1291 32 4 875 6 10 11 12

pos(6)

pos(x) = Select ) (x) // closing )

id(pos) = Rank ) (pos)

Page 404: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

1 2 3 546 107 98 11 1291 32 4 875 6 10 11 12

degree(x) =

pos(x) = Select ) (x) // closing )

id(pos) = Rank ) (pos)

⇋children of 6

Page 405: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

1 2 3 546 107 98 11 1291 32 4 875 6 10 11 12

degree(x) = Select ) (Rank ) (pos(x))) - x

pos(x) = Select ) (x) // closing )

id(pos) = Rank ) (pos)

⇋children of 6

Page 406: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Succinct representation of trees (3)

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

[DFUDS - Depth First Unary Degree Sequence

1

32 6

7 104 5

8 9 11 12

1 2 3 546 107 98 11 1291 32 4 875 6 10 11 12

child(x,i), parent(x), subtreeSize(x), …

degree(x) = Select ) (Rank ) (pos(x))) - x

pos(x) = Select ) (x) // closing )

id(pos) = Rank ) (pos)

All these operations in O(1) time!

Page 407: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

Page 408: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

* a b cac ba cb baS

Page 409: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Page 410: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Page 411: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Page 412: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

c?* a b cac ba cb baS

Page 413: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Page 414: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Page 415: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

b?* a b cac ba cb baS

Page 416: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Page 417: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Page 418: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

a?* a b cac ba cb baS

Page 419: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Page 420: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Search P in O(|P|) time!

Page 421: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Search P in O(|P|) time!

* 2 1 221 11 11 22L

Page 422: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Search P in O(|P|) time!

* 2 1 221 11 11 22LElias-Fano representation: n log(m/n) + O(n) bits and

O(1) time access.

Page 423: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Search P in O(|P|) time!

S, L and B use m log σ + n log (m/n)+ O(n) bits

* 2 1 221 11 11 22L

Page 424: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

Patricia trie with DFUDS

B ( ( ( ( ) ) ( ( ) ) ) ( ( ) ( ( ) ) ) ( ( ) ) )

1

32 6

7 104 5

8 9 11 12

1 2 3 546 87 910 11 1291 32 4 875 6 10 11 12

a,2 b,1

a,2 c,2

c,1

a,1 b,1

b,2a,2b,1 c,1

P = cba

* a b cac ba cb baS

Search P in O(|P|) time!

S, L and B use m log σ + n log (m/n)+ O(n) bits

* 2 1 221 11 11 22L

Project: implement a compact TST

Page 425: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P|) time

Compute the top-k strings O(k log k) time O(n) bits

3 months query log at Yahoo!

≈600 million of distinct (and clean) queries

Trie requires ≈50 Gbytes!

We will see how to reduce to ≈5 Gbytes!

Page 426: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P|) time m log σ + n log (m/n) + O(n) bits

Compute the top-k strings O(k log k) time O(n) bits

3 months query log at Yahoo!

≈600 million of distinct (and clean) queries

Trie requires ≈50 Gbytes!

We will see how to reduce to ≈5 Gbytes!

Page 427: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P|) time m log σ + n log (m/n) + O(n) bits

Compute the top-k strings O(k log k) time O(n) bits

3 months query log at Yahoo!

≈600 million of distinct (and clean) queries

Trie requires ≈50 Gbytes!

We will see how to reduce to ≈5 Gbytes!

to be compared with O(m log σ + n log m) bits

Page 428: Tries and Succinct Data Structures - Persone - …pages.di.unipi.it/.../uploads/sites/7/2015/09/Lez3-4-5.pdfTries and Succinct Data Structures Auto-completion as our target application

D = { ab (7), bab (2), bca (1), cab (4), cac (1), cbac (6), cbba (2) }

n = |D|, m total length of strings in D

7

2 1

4 1 6 2

ab b

ab ca

c

a b

baacb c

Summary

Find the node “prefixed” by P O(|P|) time m log σ + n log (m/n) + O(n) bits

Compute the top-k strings O(k log k) time O(n) bits

3 months query log at Yahoo!

≈600 million of distinct (and clean) queries

Trie requires ≈50 Gbytes!

We will see how to reduce to ≈5 Gbytes!

to be compared with O(m log σ + n log m) bits

(n=) 1 billion of strings of average length 64

m=64*109 symbols and m/n = 64