121
R-way tries ternary search tries character-based operations ROBERT SEDGEWICK | KEVIN WAYNE FOURTH EDITION Algorithms http://algs4.cs.princeton.edu Algorithms R OBERT S EDGEWICK | K EVIN WAYNE 5.2 T RIES

Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

‣ R-way tries

‣ ternary search tries

‣ character-based operations

ROBERT SEDGEWICK | KEVIN WAYNE

F O U R T H E D I T I O N

Algorithms

http://algs4.cs.princeton.edu

Algorithms ROBERT SEDGEWICK | KEVIN WAYNE

5.2 TRIES

Page 2: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Swype

2

Page 3: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

The Parable of Frankie Halfbean

3

Key

Bearman Butter Extrobophile

Dave Chocolate Superpope

Delbert Strawberry Ronald Jenkees

Edith Vanilla My Bloody Valentine

Glaser Cardamom Rx Nightly

James Rocky Road Robots are Supreme

JS Fish The Filthy Reds

Lauren Mint Jon Talabot

Lee Vanilla La(r)va

Lisa Vanilla Blue Peter

Sandra Vanilla Grimes

Swimp Chocolate Sef

Arvind Space Dust Whale Songs

Josh Vanilla Menace to the Mayor

Key

Bearman Butter Extrobophile

Dave Chocolate Superpope

Delbert Strawberry Ronald Jenkees

Edith Vanilla My Bloody Valentine

Glaser Cardamom Rx Nightly

James Rocky Road Robots are Supreme

JS Fish The Filthy RedsLauren Mint Jon Talabot

Lee Vanilla La(r)va

Lisa Vanilla Blue Peter

Sandra Vanilla Grimes

Swimp Chocolate Sef

Page 4: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

The Parable of Frankie Halfbean

4

Key

Bearman Butter Extrobophile

Dave Chocolate Superpope

Delbert Strawberry Ronald Jenkees

Edith Vanilla My Bloody Valentine

Glaser Cardamom Rx Nightly

James Rocky Road Robots are Supreme

JS Fish The Filthy Reds

Lauren Mint Jon Talabot

Lee Vanilla La(r)va

Lisa Vanilla Blue Peter

Sandra Vanilla Grimes

Swimp Chocolate Sef

Key

Bearman Butter Extrobophile

Dave Chocolate Superpope

Delbert Strawberry Ronald Jenkees

Edith Vanilla My Bloody Valentine

Glaser Cardamom Rx Nightly

James Rocky Road Robots are Supreme

JS Fish The Filthy RedsLauren Mint Jon Talabot

Lee Vanilla La(r)va

Lisa Vanilla Blue Peter

Sandra Vanilla Grimes

Swimp Chocolate Sef

Arvind Space Dust Whale Songs

Josh Vanilla Menace to the Mayor

Page 5: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

The Parable of Frankie Halfbean

5

Key

Bearman Butter Extrobophile

Dave Chocolate Superpope

Delbert Strawberry Ronald Jenkees

Edith Vanilla My Bloody Valentine

Glaser Cardamom Rx Nightly

James Rocky Road Robots are Supreme

JS Fish The Filthy Reds

Lauren Mint Jon Talabot

Lee Vanilla La(r)va

Lisa Vanilla Blue Peter

Sandra Vanilla Grimes

Swimp Chocolate Sef

Key

Arvind Space Dust Whale Songs

Bearman Butter Extrobophile

Dave Chocolate Superpope

Delbert Strawberry Ronald Jenkees

Edith Vanilla My Bloody Valentine

Glaser Cardamom Rx Nightly

James Rocky Road Robots are Supreme

Josh Vanilla Menace to the Mayor

JS Fish The Filthy RedsLauren Mint Jon Talabot

Lee Vanilla La(r)va

Lisa Vanilla Blue Peter

Sandra Vanilla Grimes

Swimp Chocolate Sef

String Symbol Tables.

・Fast basic operations.

– Search.

– Insert.

– Delete.

・Support advanced operations.

– Ordered ST operations.

– String operations.

Page 6: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Summary of the performance of symbol-table implementations

Order of growth of the frequency of operations.

Q. Can we do better?

A. Yes, if we can avoid examining the entire key, as with string sorting.

6

implementation

typical caseordered operations

implementation

search insert deleteoperations on keys

red-black BST log N log N log N yes compareTo()

hash table 1 † 1 † 1 † noequals()hashCode()

† under uniform hashing assumption

Page 7: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Is hog in the tree?

7

SHAI HULUD

MAIL

MAID

MAIAS

MAIA

EBOOKS

ARR HORSE

HORPE

Given N long random strings from an R character alphabet stored in a LLRB

and a search key X. Give an order of growth in terms of N for the following:

・What is the maximum number of string compares that you must perform

to see if X is inside the tree?

・At the root, how many characters of X must we compare on average to

determine if X is at the root?

・Extra: If the search makes it to a leaf node, how many characters of X

must we compare on average to determine if X is at the leaf?

HOG

X

Page 8: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Given N long random strings from an R character alphabet stored in a LLRB

and a search key X. Give an order of growth in terms of N for the following:

・Max number of string compares: lg N.

・Average number of character compares at root:

– Same as comparing two totally random strings.

– Overall: Constant in N. 8

2 lg N

SHAI HULUD

MAIL

MAID

MAIAS

MAIA

EBOOKS

ARR HORSE

HORPE

R/(R-1)

HOG

Is hog in the tree?

X

– Chance of zero character match: 1-1/R

– Chance of one character match: 1/R*(1-1/R), ...

– Calculate expected value for number of characters: R/(R-1)

Page 9: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Given N long random strings from an R character alphabet stored in a LLRB

and a search key X. Give an order of growth in terms of N for the following:

・Max number of string compares: lg N.

・Average number of character compares at root:

– Same as comparing two totally random strings.

– Overall: Constant in N. 9

lg N

SHAI HULUD

MAIL

MAID

MAIAS

MAIA

EBOOKS

ARR HORSE

HORPE

1

HOG

Is hog in the tree?

X

Page 10: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Given N long random strings from an R character alphabet stored in a LLRB

and a search key X. Give an order of growth in terms of N for the following:

・Average number of character compares at leaf:

– Observation: Out of N strings, the leaf string is the closest string to X.

– Expected number of leaves that match in first character: N/R.

– Expected number of leaves that match in the first k characters: N/Rk

– How many characters before only one node matches? When k = logR N.

– Order of growth: lg N

10

SHAI HULUD

MAIL

MAID

MAIAS

MAIA

EBOOKS

ARR HORSE

HORPE

HOG

X

Is hog in the tree?

logR N

lg N

1

Page 11: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Given N long random strings from an R character alphabet stored in a LLRB

and a search key X. Give an order of growth in terms of N for the following:

・Average number of character compares at leaf:

– Observation: Out of N strings, the leaf string is the closest string to X.

– Expected number of leaves that match in first character: N/R.

– Expected number of leaves that match in the first k characters: N/Rk

– How many characters before only one node matches? When k = logR N.

– Order of growth: lg N

11

SHAI HULUD

MAIL

MAID

MAIAS

MAIA

EBOOKS

ARR HORSE

HORPE

HOG

X

Is hog in the tree?

lg N

1

lg N

Page 12: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Given N long random strings from an R character alphabet stored in a LLRB

and a search key X, give an order of growth in terms of N for the following:

・Max number of string compares: lg N.

・Average number of character compares at root: Constant (in N).

・Average number of character compares at leaf: lg N.

12

lg N

SHAI HULUD

MAIL

MAID

MAIAS

MAIA

EBOOKS

ARR HORSE

HORPE

1

HOG

X

Is hog in the tree? (order of growth)

lg N

・Total number of compares on a miss: lg2 N

Page 13: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

LLRB String Symbol Table

13

character accesses (typical case)character accesses (typical case)

implementationsearch

hit

search

missinsert

space

(references)

red-black BST L + lg 2 N lg 2 N lg 2 N 4N

lg N

SHAI HULUD

MAIL

MAID

MAIAS

MAIA

EBOOKS

ARR HORSE

HORPE

1

lg N

Page 14: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Assuming no collisions, given N random strings from an R character

alphabet stored in a hash table and a query string of length L, give an order

of growth for:

・The number of characters examined on a search hit

・How many on a search miss

For the linear probing hash table above:

・Which key will result in the most character examinations on a search

hit?14

Search in a Hash Table

MAIA MAIL MAID HORPE ARR HORSE

Vanilla Mint Vanilla Mint Vanilla Vanilla

key hash

MAIA 0

MAIL 2

MAID 2

HORPE 5

ARR 6

HORSE 7HOG

L

Page 15: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Assuming no collisions, given N random strings from an R character

alphabet stored in a hash table and a query string of length L, give an order

of growth for:

・The number of characters examined on a search hit: L

・How many on a search miss:

For the linear probing hash table above:

・Which key will result in the most character examinations on a search

hit?15

Search in a Hash Table

MAIA MAIL MAID HORPE ARR HORSE

Vanilla Mint Vanilla Mint Vanilla Vanilla

key hash

MAIA 0

MAIL 2

MAID 2

HORPE 5

ARR 6

HORSE 7HOG

L

Page 16: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Assuming no collisions, given N random strings from an R character

alphabet stored in a hash table and a query string of length L, give an order

of growth for:

・The number of characters examined on a search hit: L

・How many on a search miss: L

For the linear probing hash table above:

・Which key will result in the most character examinations on a search

hit?16

Search in a Hash Table

MAIA MAIL MAID HORPE ARR HORSE

Vanilla Mint Vanilla Mint Vanilla Vanilla

key hash

MAIA 0

MAIL 2

MAID 2

HORPE 5

ARR 6

HORSE 7HOG

L

Page 17: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Assuming no collisions, given N random strings from an R character

alphabet stored in a hash table and a query string of length L, give an order

of growth for:

・The number of characters examined on a search hit: L

・How many on a search miss: L

For the linear probing hash table above:

・Which key will result in the most character examinations on a search

hit: MAID17

Search in a Hash Table

key hash

MAIA 0

MAIL 2

MAID 2

HORPE 5

ARR 6

HORSE 7

MAIA MAIL MAID HORPE ARR HORSE

Vanilla Mint Vanilla Mint Vanilla Vanilla

HOG

L

Page 18: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

18

String symbol table implementations cost summary

Challenges.

・Efficient performance for string keys.

・Support common string operations (ordered ST and beyond).

Parameters

• N = number of strings

• L = length of string

• R = radix

file size words distinct

moby.txt 1.2 MB 210 K 32 K

actors.txt 82 MB 11.4 M 900 K

character accesses (typical case)character accesses (typical case)character accesses (typical case)character accesses (typical case) dedupdedup

implementationsearch

hit

search

missinsert

space

(references)moby.txt actors.txt

red-black BST L + lg 2 N lg 2 N lg 2 N 4N 1.40 97.4

hashing

(linear probing)L L L 4N to 16N 0.76 40.6

Between 1/8th and 1/2 full

Page 19: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

String symbol table. Symbol table specialized to string keys.

Goal. Faster than hashing, more flexible than BSTs.

19

String symbol table basic API

public class StringST<Value> public class StringST<Value>

StringST()StringST() create an empty symbol table

void put(String key, Value val)put(String key, Value val) put key-value pair into the symbol table

Value get(String key)get(String key) return value paired with given key

void delete(String key)delete(String key) delete key and corresponding value

⋮⋮

Page 20: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

http://algs4.cs.princeton.edu

ROBERT SEDGEWICK | KEVIN WAYNE

Algorithms

‣ R-way tries

‣ ternary search tries

‣ character-based operations

5.2 TRIES

Page 21: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

21

Page 22: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

22

Page 23: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Tries. [from retrieval, but pronounced "try"]

・Store characters in nodes (not keys).

・Each node has R children, one for each possible character.

・For now, we do not draw null links.

23

Tries

e

r

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

s

l

l

e

h

s

root

link to trie for all keys

that start with slink to trie for all keys

that start with she

value for she in node

corresponding to last

key character

key value

by 4

sea 6

sells 1

she 0

shells 3

shore 7

the 5

for now we do not

draw null links

Page 24: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Follow links corresponding to each character in the key.

・Search hit: node where search ends has a non-null value.

・Search miss: reach null link or node where search ends has null value.

24

Search in a trie

e

r

get("shells")

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

ss

ll

ll

ee

hh

ss

return value associated

with last key character

(return 3)

3

Page 25: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Follow links corresponding to each character in the key.

・Search hit: node where search ends has a non-null value.

・Search miss: reach null link or node where search ends has null value.

25

Search in a trie

e

r

get("she")

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

s

l

l

ee

hh

ss

search may terminated

at an intermediate node

(return 0)

0

Page 26: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Follow links corresponding to each character in the key.

・Search hit: node where search ends has a non-null value.

・Search miss: reach null link or node where search ends has null value.

26

Search in a trie

e

r

get("shell")

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

s

ll

ll

ee

hh

ss

no value associated

with last key character

(return null)

Page 27: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Follow links corresponding to each character in the key.

・Search hit: node where search ends has a non-null value.

・Search miss: reach null link or node where search ends has null value.

27

Search in a trie

e

r

get("shelter")

e

a l

l

s

b

y

o

h

e

t

7

50

3

1

6

4

s

l

ll

ee

hh

ss

no link to t(return null)

Page 28: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Follow links corresponding to each character in the key.

・Encounter a null link: create new node.

・Encounter the last character of the key: set value in that node.

28

Insertion into a trie

e

r

put("shore", 7)

e

a l

l

s

e

l

s

b

y

l

o

h

e

t

hh

ss

7

50

3

1

6

4

Page 29: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie construction demo

trie

Page 30: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

e

Trie construction demo

put("she", 0)

h

s

0

value is in node

corresponding to

last character

key is sequence

of characters from

root to value

Page 31: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

e

Trie construction demo

trie

h

s

0

Page 32: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

h

e

Trie construction demo

trie

s

0

Page 33: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

h

e

Trie construction demo

put("sells", 1)

s

l

l

e

ss

1

0

Page 34: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

h

e

Trie construction demo

trie

l

l

s

s

e

1

0

Page 35: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

h

e

Trie construction demo

trie

l

l

s

s

e

0

1

Page 36: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

h

ea

Trie construction demo

put("sea", 2)

l

l

s

ee

ss

2

1

0

Page 37: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

h

ea

Trie construction demo

trie

l

l

s

s

e

1

2 0

Page 38: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

a

Trie construction demo

put("shells", 3)

l

l

s

e

s

l

l

ee

hh

ss

3

1

2 0

Page 39: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

a

Trie construction demo

trie

l

l

s

l

s

s

l

he

e

3

1

2 0

Page 40: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

y

b

a

Trie construction demo

put("by", 4)

l

l

s

l

s

s

l

he

e

4

3

1

2 0

Page 41: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

b

y

a

Trie construction demo

trie

l

l

s

l

s

s

l

he

e

3

1

2

4

0

Page 42: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

b

y

a

Trie construction demo

put("the", 5)

l

l

s

l

s

s

l

he

e e

h

t

5

3

1

2

4

0

Page 43: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

a

Trie construction demo

trie

e

l

l

s

l

s

b

y

s

l

h h

t

e e 5

3

1

2

4

0

Page 44: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

2a

Trie construction demo

put("sea", 6)

l

l

s

e

l

s

b

y

l

h h

e

t

a

ee

ss

6

overwrite

old value with

new value

5

3

1

4

0

Page 45: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie construction demo

trie

e

a l

l

s

e

l

s

b

y

s

l

h h

e

t

5

3

1

6

4

0

Page 46: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie construction demo

trie

e

a l

l

s

e

l

s

b

y

s

l

h h

e

t

3

1

6

4

0 5

Page 47: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

e

r

Trie construction demo

put("shore", 7)

e

a l

l

s

e

l

s

b

y

l

o

h

e

t

hh

ss

7

5

3

1

6

4

0

Page 48: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie construction demo

trie

e

a l

l

s

e

l

s

b

y

s

l

o

r

e

h

t

h

e 5

7

3

1

6

4

0

Page 49: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie construction demo

trie

e

a l

l

s

e

l

s

b

y

s

l

o

r

e

h

t

h

e 5

7

3

1

6

4

0

Page 50: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

To delete a key-value pair:

・Find the node corresponding to key and set value to null.

・If node has null value and all null links, remove that node (and recur).

3ss50

Deletion in an R-way trie

e

r

e

a l

l

s

b

y

o

h

e

t

7

50

1

6

4

ll

ll

ee

hh

ss

delete("shells")

set value to null

Page 51: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

To delete a key-value pair:

・Find the node corresponding to key and set value to null.

・If node has null value and all null links, remove that node (and recur).

51

Deletion in an R-way trie

e

r

delete("shells")

e

a l

l

s

b

y

o

h

e

t

7

50

1

6

4

s

s

l

l

e

h

s

null value and links

(delete node)

Page 52: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

52

Trie representation: Java implementation

Node. A value, plus references to R nodes.

private static class Node{ private Object value; private Node[] next = new Node[R];}

Trie representation

each node hasan array of links

and a value

characters are implicitlydefined by link index

s

h

e 0

e

l

l

s 1

a

s

h

e

e

l

l

s

a0

1

22

neither keys nor

characters are

explicitly stored

use Object instead of Value since

no generic array creation in Java

Page 53: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie memory usage

e

a l

l

s

e

l

s

b

y

s

l

o

r

e

h

t

h

e 5

7

3

1

6

4

0

How much memory does the trie above use? There are 7 keys, 19 characters, and the alphabet is 256 characters.A. > 100 bytes [150927] C. > 10000 bytes [150949]B. > 1000 bytes [150933]

pollEv.com/jhug text to 37607

key value

by 4

sea 6

sells 1

she 0

shells 3

shore 7

the 5

Page 54: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie memory usage

e

a l

l

s

e

l

s

b

y

s

l

o

r

e

h

t

h

e 5

7

3

1

6

4

0

How much memory does the trie above use? C. > 10000 bytes

Every node has 256 links, each link is 8 bytes. With 19 nodes, this comes to at least 19*8*256 which is more than 10 kilobytes.

pollEv.com/jhug text to 37607

key value

by 4

sea 6

sells 1

she 0

shells 3

shore 7

the 5

Page 55: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Given a search string of length L, and a trie of N random nodes on an R character

alphabet:

・What is the average number of nodes examined on a search hit?

・What is the average number of nodes examined on a search miss?55

Trie timing analysis

e

a l

l

s

e

l

s

b

y

s

l

o

r

e

h

t

h

e 5

7

3

1

6

4

0

key value

by 4

sea 6

sells 1

she 0

shells 3

shore 7

the 5

Page 56: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Given a search string of length L, and a trie of N random nodes on an R character

alphabet:

・What is the average number of nodes examined on a search hit?

– Always L, so average is L.

・What is the average number of nodes examined on a search miss?

56

Trie timing analysis

e

a l

l

e

l

b

y

s

o

r

h

t

h

e 56

4

0

Page 57: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Given a search string of length L, and a trie of N random nodes on an R character

alphabet:

・What is the average number of nodes examined on a search hit?

– Always L, so average is L.

・What is the average number of nodes examined on a search miss? logR N

– Every level deep reduces the search space by a factor of R.

– With random keys, tree will be of depth logR N.57

Trie timing analysis

e

a l

l

e

l

b

y

s

o

r

h

t

h

e 56

4

0

Page 58: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

58

String symbol table implementations cost summary

R-way trie.

・Method of choice for small R.

・Too much memory for large R.

Challenge. Use less memory, e.g., 65,536-way trie for Unicode!

character accesses (typical case)character accesses (typical case)character accesses (typical case)character accesses (typical case) dedupdedup

implementationsearch

hit

search

missinsert

space

(references)moby.txt actors.txt

red-black BST L + lg 2 N lg 2 N lg 2 N 4N 1.40 97.4

hashing

(linear probing)L L L 4N to 16N 0.76 40.6

R-way trie L lg N L (R+1) N 1.12out of

memory

Page 59: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

sɹǝʇʇǝן ǝɥʇ ןןɐ ɟosuoısɹǝʌ uʍop-ǝpısdn

sɐɥ ǝpoɔıun

59

Page 60: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

60

Page 61: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

61

Page 62: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

62

Page 63: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

63

Page 64: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

64

Page 65: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

public class TrieST<Value>{ private static final int R = 256; private Node root = new Node();

private static class Node { /* see previous slide */ } public void put(String key, Value val) { root = put(root, key, val, 0); }

private Node put(Node x, String key, Value val, int d) { if (x == null) x = new Node(); if (d == key.length()) { x.val = val; return x; } char c = key.charAt(d); x.next[c] = put(x.next[c], key, val, d+1); return x; }

65

R-way trie

extended ASCII

Page 66: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

public boolean contains(String key) { return get(key) != null; } public Value get(String key) { Node x = get(root, key, 0); if (x == null) return null; return (Value) x.val; }

private Node get(Node x, String key, int d) {

}

}

66

R-way trie

Page 67: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

public boolean contains(String key) { return get(key) != null; } public Value get(String key) { Node x = get(root, key, 0); if (x == null) return null; return (Value) x.val; }

private Node get(Node x, String key, int d) { if (x == null) return null; if (d == key.length()) return x; char c = key.charAt(d); return get(x.next[c], key, d+1); }

}67

R-way trie

Page 68: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

http://algs4.cs.princeton.edu

ROBERT SEDGEWICK | KEVIN WAYNE

Algorithms

‣ R-way tries

‣ ternary search tries

‣ character-based operations

5.2 TRIES

Page 69: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

69

Ternary search tries

・Store characters and values in nodes (not keys).

・Each node has 3 children: smaller (left), equal (middle), larger (right).

Jon L. Bentley* Robert Sedgewick#

Abstract We present theoretical algorithms for sorting and

searching multikey data, and derive from them practical C implementations for applications in which keys are charac- ter strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching algorithm blends tries and binary search trees; it is faster than hashing and other commonly used search methods. The basic ideas behind the algo- rithms date back at least to the 1960s but their practical utility has been overlooked. We also present extensions to more complex string problems, such as partial-match searching.

1. Introduction Section 2 briefly reviews Hoare’s [9] Quicksort and

binary search trees. We emphasize a well-known isomor- phism relating the two, and summarize other basic facts.

The multikey algorithms and data structures are pre- sented in Section 3. Multikey Quicksort orders a set of II vectors with k components each. Like regular Quicksort, it partitions its input into sets less than and greater than a given value; like radix sort, it moves on to the next field once the current input is known to be equal in the given field. A node in a ternary search tree represents a subset of vectors with a partitioning value and three pointers: one to lesser elements and one to greater elements (as in a binary search tree) and one to equal elements, which are then pro- cessed on later fields (as in tries). Many of the structures and analyses have appeared in previous work, but typically as complex theoretical constructions, far removed from practical applications. Our simple framework opens the door for later implementations.

The algorithms are analyzed in Section 4. Many of the analyses are simple derivations of old results.

Section 5 describes efficient C programs derived from the algorithms. The first program is a sorting algorithm

Fast Algorithms for Sorting and Searching Strings

that is competitive with the most efficient string sorting programs known. The second program is a symbol table implementation that is faster than hashing, which is com- monly regarded as the fastest symbol table implementa- tion. The symbol table implementation is much more space-efficient than multiway trees, and supports more advanced searches.

In many application programs, sorts use a Quicksort implementation based on an abstract compare operation, and searches use hashing or binary search trees. These do not take advantage of the properties of string keys, which are widely used in practice. Our algorithms provide a nat- ural and elegant way to adapt classical algorithms to this important class of applications.

Section 6 turns to more difficult string-searching prob- lems. Partial-match queries allow “don’t care” characters (the pattern “so.a”, for instance, matches soda and sofa). The primary result in this section is a ternary search tree implementation of Rivest’s partial-match searching algo- rithm, and experiments on its performance. “Near neigh- bor” queries locate all words within a given Hamming dis- tance of a query word (for instance, code is distance 2 from soda). We give a new algorithm for near neighbor searching in strings, present a simple C implementation, and describe experiments on its efficiency.

Conclusions are offered in Section 7.

2. Background Quicksort is a textbook divide-and-conquer algorithm.

To sort an array, choose a partitioning element, permute the elements such that lesser elements are on one side and greater elements are on the other, and then recursively sort the two subarrays. But what happens to elements equal to the partitioning value? Hoare’s partitioning method is binary: it places lesser elements on the left and greater ele- ments on the right, but equal elements may appear on either side.

* Bell Labs, Lucent Technologies, 700 Mountam Avenue, Murray Hill. NJ 07974; [email protected].

# Princeton University. Princeron. NJ. 08514: [email protected].

Algorithm designers have long recognized the desir- irbility and difficulty of a ternary partitioning method. Sedgewick [22] observes on page 244: “Ideally, we would llke to get all [equal keys1 into position in the file, with all

360

Page 70: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

・Store characters and values in nodes (not keys).

・Each node has 3 children: smaller (left), equal (middle), larger (right).

70

Ternary search tries

TST representation of a trie

each node hasthree links

link to TST for all keysthat start with s

link to TST for all keysthat start witha letter before s

t

h

e 8

a

r

e 12

s

h u

e 10

e

l

l

s 11

l

l

s 15

r 0

e

l

y 13

o

7

r

e

b

y 4

a 14

t

h

e 8

a

r

e 12

s

hu

e 10

e

l

l

s 11

l

l

s 15

r 0

e

l

y 13

o

7

r

e

b

y 4

a14

Page 71: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

71

Search hit in a TST

o

r

e 7

t

h

e 5

b

y 4

a

get("sea")

e

h

s

e

l

l

s 1

6

l

s

l

3

0

a

l

e

h

s

return value associated

with last key character

6

Note: Key is “sea”, not “shela”!

Page 72: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

72

Search miss in a TST

o

r

e 7

t

h

e 5

b

y 4

a

get("shelter")

e

h

s

e

l

l

s 1

6

l

s

l

3

0e

h

s

l

l

no link to t

(return null)

Page 73: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Ternary search trie construction demo

73

ternary search trie

Page 74: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Ternary search trie construction demo

74

put("she", 0)

h

s

value is in node

corresponding to

last character

key is sequence

of characters from

root to value

using middle links

e 0

Page 75: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Ternary search trie construction demo

75

put("she", 0)

e

h

s

0

Page 76: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

l

e

Ternary search trie construction demo

76

put("sells", 1)

e

h

s

s 1

0

l

h

s

Page 77: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Ternary search trie construction demo

77

ternary search trie

e

h

s

e

l

l

s 1

0

Page 78: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

a2

ll

l

s 1

ee

Ternary search trie construction demo

78

put("sea", 2)

e

h

s

0

h

s

Page 79: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

a

Ternary search trie construction demo

79

ternary search trie

e

h

s

e

l

l

s 1

2

0

Page 80: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

s 3

l

a

Ternary search trie construction demo

80

put("shells", 3)

e

h

s

e

l

l

s 1

2

l

0e

h

s

Page 81: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

a

Ternary search trie construction demo

81

ternary search trie

e

h

s

e

l

l

s 1

2

l

s

l

3

0

Page 82: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

b

y 4

a

Ternary search trie construction demo

82

put("by", 4)

e

h

s

e

l

l

s 1

2

l

s

l

3

0

s

Page 83: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

b

y 4

a

Ternary search trie construction demo

83

ternary search trie

e

h

s

e

l

l

s 1

2

l

s

l

3

0

Page 84: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

e 5

h

tb

y 4

a

Ternary search trie construction demo

84

put("the", 5)

e

h

s

e

l

l

s 1

2

l

s

l

3

0

s

Page 85: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

t

h

e 5

b

y 4

a

Ternary search trie construction demo

85

ternary search trie

e

h

s

e

l

l

s 1

2

l

s

l

3

0

Page 86: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

a

l

l

s 1

26

overwrite

old value with

new value

a

l

ee

t

h

e 5

b

y 4

Ternary search trie construction demo

86

put("sea", 6)

e

h

s

l

s

l

3

0

h

s

Page 87: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

t

h

e 5

b

y 4

a

Ternary search trie construction demo

87

ternary search trie

e

h

s

e

l

l

s 1

6

l

s

l

3

0

Page 88: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

e 7

r

o

t

h

e 5

b

y 4

a

Ternary search trie construction demo

88

put("shore", 7)

e

h

s

e

l

l

s 1

6

l

s

l

3

0e

h

s

Page 89: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

o

r

e 7

t

h

e 5

b

y 4

a

Ternary search trie construction demo

89

ternary search trie

e

h

s

e

l

l

s 1

6

l

s

l

3

0

Page 90: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Ternary search trie construction demo

90

ternary search trie

e

a

l

l

s

e

l

s

b

y h

l

o

r

e

t

h

e

s

5

7

3

1

6

4

0

Page 91: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Ternary search trie construction demo

91

ternary search trie

e

a

l

l

s

e

l

s

b

y h

l

o

r

e

t

h

e

s

5

7

3

1

6

4

0

Page 92: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Follow links corresponding to each character in the key.

・If less, take left link; if greater, take right link.

・If equal, take the middle link and move to the next key character.

Search hit. Node where search ends has a non-null value.

Search miss. Reach a null link or node where search ends has null value.

92

Search in a TST

TST search example

return valueassociated with

last key character

match: take middle link,move to next char

mismatch: take left or right link, do not move to next char

t

h

e 8

a

r

e 12

s

hu

e 10

e

l

l

s 11

l

l

s 15

r

e

l

y 13

o

7

r

e

b

y 4

a14

get("sea")

Page 93: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

A TST node is five fields:

・A value.

・A character c.

・A reference to a left TST.

・A reference to a middle TST.

・A reference to a right TST.

93

TST representation in Java

private class Node{ private Value val; private char c; private Node left, mid, right;}

Trie node representations

s

e h u

link for keysthat start with s

link for keysthat start with su

hue

standard array of links (R = 26) ternary search tree (TST)

s

Page 94: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Ternary search trie construction demo

94

ternary search trie

e

a

l

l

s

e

l

s

b

y

h

l

o

r

e

t

h

e

s

5

73

1

6

4 0

How much memory does the trie above use (there are 19 nodes)?A. > 200 bytes [153565] C. > 20000 bytes [153568]B. > 2000 bytes [153567]

pollEv.com/jhug text to 37607

private class Node{ private Value val; private char c; private Node left, mid, right;}

Page 95: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Ternary search trie construction demo

95

ternary search trie

e

a

l

l

s

e

l

s

b

y

h

l

o

r

e

t

h

e

s

5

73

1

6

4 0

How much memory does the trie above use (there are 19 nodes)?A. >200 bytes

At 64 bytes per node, we fall short of 2000 bytes per trie.

pollEv.com/jhug text to 37607

private class Node{ private Value val; private char c; private Node left, mid, right;}

Page 96: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

26-way trie. 26 null links in each leaf.

TST. 3 null links in each leaf.

96

26-way trie vs. TST

26-way trie (1035 null links, not shown)

TST (155 null links)

nowfortipilkdimtagjotsobnobskyhutacebetmeneggfewjayowljoyrapgigweewascabwadcawcuefeetapagotarjamdugand

Page 97: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

97

TST: Java implementation

public class TST<Value>{ private Node root;

private class Node { /* see previous slide */ }

public void put(String key, Value val) { root = put(root, key, val, 0); }

private Node put(Node x, String key, Value val, int d) { char c = key.charAt(d); if (x == null) { x = new Node(); x.c = c; } if (c < x.c) x.left = put(x.left, key, val, d); else if (c > x.c) x.right = put(x.right, key, val, d); else if (d < key.length() - 1) x.mid = put(x.mid, key, val, d+1); else x.val = val; return x; }

Page 98: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

98

String symbol table implementation cost summary

Remark. Can build balanced TSTs via rotations to achieve L + log Nworst-case guarantees.

Bottom line. TST is as fast as hashing (for string keys), space efficient.

character accesses (typical case)character accesses (typical case)character accesses (typical case)character accesses (typical case) dedupdedup

implementationsearch

hit

search

missinsert

space

(references)moby.txt actors.txt

red-black BST L + lg 2 N lg 2 N lg 2 N 4 N 1.40 97.4

hashing

(linear probing)L L L 4 N to 16 N 0.76 40.6

R-way trie L lg N L (R + 1) N 1.12out of

memory

TST L + lg N lg N L + lg N 4 N 0.72 38.7

Page 99: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

99

T9 texting

Goal. Type text messages on a phone keypad.

Multi-tap input. Enter a letter by repeatedly pressing a key.

Ex. hello: 4 4 3 3 5 5 5 5 5 5 6 6 6

T9 text input.

・Find all words that correspond to given sequence of numbers.

・Press 0 to see all completion options.

Ex. hello: 4 3 5 5 6

Q. How to implement? www.t9.com

"a much faster and more fun way to enter text"

Page 100: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Armstrong and Miller

100

Page 101: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie

Design a string symbol table for handling t9 texting.

・What are the keys?

・What are the values?

・Which implementation is best?

Bonus: Can you think of a case where we’d want a red-black BST or hash table?101

character accesses (typical case)character accesses (typical case)character accesses (typical case)character accesses (typical case)

implementationsearch

hit

search

missinsert

space

(references)

red-black BST L + lg 2 N lg 2 N lg 2 N 4 N

hashing

(linear probing)L L L 4 N to 16 N

R-way trie L lg N L (R + 1) N

TST L + lg N lg N L + lg N 4 N

www.t9.com

Page 102: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie

Design a string symbol table for handling t9 texting.

・What are the keys?

・What are the values?

・Which implementation is best?

Bonus: Can you think of a case where we’d want a red-black BST or hash table?102

character accesses (typical case)character accesses (typical case)character accesses (typical case)character accesses (typical case)

implementationsearch

hit

search

missinsert

space

(references)

red-black BST L + lg 2 N lg 2 N lg 2 N 4 N

hashing

(linear probing)L L L 4 N to 16 N

R-way trie L lg N L (R + 1) N

TST L + lg N lg N L + lg N 4 N

www.t9.com

Page 103: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Trie

Design a string symbol table for handling t9 texting.

・What are the keys? Strings from the alphabet {2, ..., 9}

・What are the values? Sets of english words, e.g. {“SHIV”, “PIGT”}

・Which implementation is best?

– R-way trie is probably best.

Bonus: Can you think of a case where we’d want a red-black BST or hash table?103

character accesses (typical case)character accesses (typical case)character accesses (typical case)character accesses (typical case)

implementationsearch

hit

search

missinsert

space

(references)

red-black BST L + lg 2 N lg 2 N lg 2 N 4 N

hashing

(linear probing)L L L 4 N to 16 N

R-way trie L lg N L (R + 1) N

TST L + lg N lg N L + lg N 4 N

www.t9.com

Page 104: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

104

A letter to t9.com

To: [email protected]: Tue, 25 Oct 2005 14:27:21 -0400 (EDT)

Dear T9 texting folks,

I enjoyed learning about the T9 text systemfrom your webpage, and used it as an examplein my data structures and algorithms class.However, one of my students noticed a bugin your phone keypad

http://www.t9.com/images/how.gif

Somehow, it is missing the letter 's'. (!)

Just wanted to bring this information toyour attention and thank you for your website.

Regards,Kevin

where the @#$% is the 's' ???

Page 105: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

105

A world without 's' ?

To: "'Kevin Wayne'" <[email protected]>Date: Tue, 25 Oct 2005 12:44:42 -0700

Thank you Kevin.

I am glad that you find T9 o valuable for yourcla. I had not noticed thi before. Thank forwriting in and letting u know.

Take care,

Brooke nyder OEM Dev upport AOL/Tegic Communication 1000 Dexter Ave N. uite 300 eattle, WA 98109

ALL INFORMATION CONTAINED IN THIS EMAIL IS CONSIDEREDCONFIDENTIAL AND PROPERTY OF AOL/TEGIC COMMUNICATIONS

Page 106: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

106

TST with R2 branching at root

Hybrid of R-way trie and TST.

・Do R 2-way branching at root.

・Each of R 2 root nodes points to a TST.

Q. What about one- and two-letter words?

TST TST TST TSTTST

array of 262 roots

aa ab ac zy zz

Page 107: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

107

String symbol table implementation cost summary

Bottom line. Faster than hashing for our benchmark client.

character accesses (typical case)character accesses (typical case)character accesses (typical case)character accesses (typical case) dedupdedup

implementationsearch

hit

search

missinsert

space

(references)moby.txt actors.txt

red-black BST L + lg 2 N lg 2 N lg 2 N 4 N 1.40 97.4

hashing

(linear probing)L L L 4 N to 16 N 0.76 40.6

R-way trie L lg N L (R + 1) N 1.12out of

memory

TST L + lg N lg N L + lg N 4 N 0.72 38.7

TST with R2 L + lg N lg N L + lg N 4 N + R2 0.51 32.7

Page 108: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

108

TST vs. hashing

Hashing.

・Need to examine entire key.

・Search hits and misses cost about the same.

・Performance relies on hash function.

・Does not support ordered symbol table operations.

TSTs.

・Works only for strings (or digital keys).

・Only examines just enough key characters.

・Search miss may involve only a few characters.

・Supports ordered symbol table operations (plus others!).

Bottom line. TSTs are:

・Faster than hashing (especially for search misses).

・More flexible than red-black BSTs. [stay tuned]

Page 109: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

http://algs4.cs.princeton.edu

ROBERT SEDGEWICK | KEVIN WAYNE

Algorithms

‣ R-way tries

‣ ternary search tries

‣ character-based operations

5.2 TRIES

Page 110: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Find all keys in a symbol table starting with a given prefix.

Ex. Autocomplete in a cell phone, search bar, text editor, or shell.

・User types characters one at a time.

・System reports all matching strings.

110

Prefix matches

Page 111: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Character-based operations. The string symbol table API supports several

useful character-based operations.

Prefix match [autocomplete]. Keys with prefix sh: she, shells, and shore.

Wildcard match [crosswords]. Keys that match IR..E: IRATE and IRENE.

Longest prefix [routing]. Key that is the longest prefix of shellsort: shells.

111

String symbol table API

key value

by 4

sea 6

sells 1

she 0

shells 3

shore 7

the 5

Page 112: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

To iterate through all keys in sorted order:

・Do inorder traversal of trie; add keys encountered to a queue.

・Maintain sequence of characters on path from root to node.

112

Warmup: ordered iteration

o

r

e 7

t

h

e 5

s

h

e 0

e

l

l

s 1

l

l

s 3

b

y 4

a 6

bbys

seseaselsell

sellsshshe

shellshells

shoshorshore

tth

the

by

by sea

by sea sells

by sea sells she

by sea sells she shells

by sea sells she shells shore

by sea sells she shells shore the

Collecting the keys in a trie (trace)

key q

keysWithPrefix(""); keys()

Page 113: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

To iterate through all keys in sorted order:

・Do inorder traversal of trie; add keys encountered to a queue.

・Maintain sequence of characters on path from root to node.

・Fill in the blanks below.

113

Ordered iteration: Java implementation

public Iterable<String> keys(){ Queue<String> queue = new Queue<String>(); collect(root, "", queue); return queue;}

private void collect(Node x, String prefix, Queue<String> q){ if (x == null) return; if (x.val != null) ?????????????? for (char c = 0; c < R; c++) collect(??????????????);}

sequence of characters

on path from root to x

private static class Node{ private Object val; private Node[] next = new Node[R];}

Page 114: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

To iterate through all keys in sorted order:

・Do inorder traversal of trie; add keys encountered to a queue.

・Maintain sequence of characters on path from root to node.

・Fill in the blanks below.

114

Ordered iteration: Java implementation

public Iterable<String> keys(){ Queue<String> queue = new Queue<String>(); collect(root, "", queue); return queue;}

private void collect(Node x, String prefix, Queue<String> q){ if (x == null) return; if (x.val != null) ?????????????? for (char c = 0; c < R; c++) collect(??????????????);}

sequence of characters

on path from root to x

private static class Node{ private Object val; private Node[] next = new Node[R];}

Page 115: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

To iterate through all keys in sorted order:

・Do inorder traversal of trie; add keys encountered to a queue.

・Maintain sequence of characters on path from root to node.

・Fill in the blanks below.

115

Ordered iteration: Java implementation

public Iterable<String> keys(){ Queue<String> queue = new Queue<String>(); collect(root, "", queue); return queue;}

private void collect(Node x, String prefix, Queue<String> q){ if (x == null) return; if (x.val != null) q.enqueue(R); for (char c = 0; c < R; c++) collect(x.next[c], prefix + c, q);}

sequence of characters

on path from root to x

Page 116: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Find all keys in a symbol table starting with a given prefix.

116

Prefix matches in an R-way trie

public Iterable<String> keysWithPrefix(String prefix){ Queue<String> queue = new Queue<String>(); Node x = get(root, prefix, 0); collect(x, prefix, queue); return queue;}

root of subtrie for all strings

beginning with given prefix

o

r

e 7

t

h

e 5

s

h

e 0

e

l

l

s 1

l

l

s 3

b

y 4

a 6

find subtrie for allkeys beginning with "sh"

o

r

e 7

t

h

e 5

s

h

e 0

e

l

l

s 1

l

l

s 3

b

y 4

a 6

collect keysin that subtrie

keysWithPrefix("sh");

Pre!x match in a trie

shsheshelshellshells

shoshorshore

she

she shells

she shells shore

key q

o

r

e 7

t

h

e 5

s

h

e 0

e

l

l

s 1

l

l

s 3

b

y 4

a 6

find subtrie for allkeys beginning with "sh"

o

r

e 7

t

h

e 5

s

h

e 0

e

l

l

s 1

l

l

s 3

b

y 4

a 6

collect keysin that subtrie

keysWithPrefix("sh");

Pre!x match in a trie

shsheshelshellshells

shoshorshore

she

she shells

she shells shore

key qkey queue

o

r

e 7

t

h

e 5

s

h

e 0

e

l

l

s 1

l

l

s 3

b

y 4

a 6

find subtrie for allkeys beginning with "sh"

o

r

e 7

t

h

e 5

s

h

e 0

e

l

l

s 1

l

l

s 3

b

y 4

a 6

collect keysin that subtrie

keysWithPrefix("sh");

Pre!x match in a trie

shsheshelshellshells

shoshorshore

she

she shells

she shells shore

key q

Page 117: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Use wildcard . to match any character in alphabet.

117

Wildcard matches

co....er

coalizer

coberger

codifier

cofaster

cofather

cognizer

cohelper

colander

coleader

...

compiler

...

composer

computer

cowkeper

.c...c.

acresce

acroach

acuracy

octarch

science

scranch

scratch

scrauch

screich

scrinch

scritch

scrunch

scudick

scutock

Page 118: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Implicit wildcard for basic autocorrect.

・Walk down all character paths adjacent to typed characters.

118

Wildcard matches

helohelpyelpdeli

Page 119: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

119

Patricia trie

Patricia trie. [Practical Algorithm to Retrieve Information Coded in Alphanumeric]

・Remove one-way branching.

・Each node represents a sequence of characters.

・Implementation: one step beyond this course.

Applications.

・Database search.

・P2P network search.

・IP routing tables: find longest prefix match.

・Compressed quad-tree for N-body simulation.

・Efficiently storing and querying XML documents.

Also known as: crit-bit tree, radix tree.

1

1 2

2

put("shells", 1);put("shellfish", 2);

Removing one-way branching in a trie

h

e

l

f

i

s

h

l

s

s s

shell

fish

internalone-way

branching

externalone-way

branching

standardtrie

no one-waybranching

Page 120: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

120

String symbol tables summary

A success story in algorithm design and analysis.

Red-black BST.

・Performance guarantee: log N key compares.

・Supports ordered symbol table API.

Hash tables.

・Performance guarantee: constant number of probes.

・Requires good hash function for key type.

Tries. R-way, TST.

・Expected performance: log N characters accessed on a miss!

・Supports character-based operations.

Bottom line. TSTs are extremely fast.

Page 121: Algorithms...Edith Vanilla My Bloody Valentine Glaser Cardamom Rx Nightly James Rocky Road Robots are Supreme JS Fish The Filthy Reds Lauren Mint Jon Talabot Lee Vanilla La(r)va Lisa

Swype

121