49
The Burrows Wheeler transform and the FM index

The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

The Burrows Wheeler transform and the FM index

Page 2: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Burrows –Wheeler (bzip2)

Currently best algorithm for text compression

High level:

• Apply the Borrows-Wheeler transform

• Use move-to-front to translate the sorted characters to small integers

• Use Huffman coding

Page 3: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

S = abraca שבנויה מכל ההזזות הציקליות Mיצירת מטריצה : Iשלב

: Sשל

M =

a b r a c a # b r a c a # a r a c a # a b a c a # a b r c a # a b r a a # a b r a c # a b r a c a

Page 4: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

:מיון השורות בסדר לקסיקוגרפי: IIשלב

a b r a c a #

b r a c a # a a c a # a b r

c a # a b r a

a # a b r a c # a b r a c a

r a c a # a b

L F

L is the Burrows Wheeler Transform

Page 5: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

a b r a c a #

b r a c a # a r a c a # a b a c a # a b r c a # a b r a a # a b r a c # a b r a c a

a b r a c a #

b r a c a # a a c a # a b r

c a # a b r a

a # a b r a c # a b r a c a

r a c a # a b

Claim: Every column contains all chars.

L F

You can obtain F from L by sorting

Page 6: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

a b r a c a #

b r a c a # a a c a # a b r

c a # a b r a

a # a b r a c # a b r a c a

r a c a # a b

L F

The “a’s” are in the same order in L and in F,

Similarly for every other char.

Page 7: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

From L you can reconstruct the string (backwards)

L

#

a r

a

c a

b

F #

a

r

a

c

a

b

What is the last char of S ?

Page 8: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

From L you can reconstruct the string

L

#

a r

a

c a

b

F

#

a

r

a

c

a

b

What is the last char of S ? a

Page 9: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

From L you can reconstruct the string

L

#

a r

a

c a

b

F

#

a

r

a

c

a

b

ca

Page 10: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

From L you can reconstruct the string

L

#

a r

a

c a

b

F

#

a

r

a

c

a

b

aca

Page 11: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

a b r a c a #

b r a c a # a r a c a # a b a c a # a b r c a # a b r a a # a b r a c # a b r a c a

a b r a c a #

b r a c a # a a c a # a b r

c a # a b r a

a # a b r a c # a b r a c a

r a c a # a b

Sorting is equivalent to computing the suffix array.

L F

Decoding is also linear time

Anslysis so far

Page 12: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Two useful arrays

• We do not really need them for decoding but they will be useful later

• C[·] denotes an array of length |Σ| such that C[c] contains the total number of text characters which are alphabetically smaller than c.

• Occ(c,q) denotes the number of occurrences of character c in the prefix L[1,q] of the transformed text L.

Page 13: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Two useful arrays

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

# 0

i 1

m 5

p 6

s 8

C i

p

s

s

m

#

p

i

s

s

i

i

occ(s,4) = 2

L

Page 14: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

The LF mapping

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

# 0

i 1

m 5

p 6

s 8

C

LF(i ) = C[L[i ]] + Occ(L[i ], i )

LF(10) = C[s] + Occ(s, 10) = 12

L(10) = F(12) = s

Page 15: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

The LF mapping

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

Say T[k] = jth char of L

T[k]

Where is T[k-1] ?

j

T[k-1]

T[k-1] = L[LF[j]]

# 0

i 1

m 5

p 6

s 8

C

Page 16: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Compression ?

L

#

a r

a

c a

b

All we did so far is a rather strange permutation of the text ?

Page 17: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Why is it good ?

a b r a c a #

b r a c a # a r a c a # a b a c a # a b r c a # a b r a a # a b r a c # a b r a c a

a b r a c a #

b r a c a # a a c a # a b r

c a # a b r a

a # a b r a c # a b r a c a

r a c a # a b

L F

Characters with the same (right) context appear together

Page 18: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Move to front

i i s s i p # m s s p i

1 3 1 3 4 4 4 4 4 0 0 0

L

• Replace each char in L with the number of distinct char’s seen since its last occurrence.

• Usually will result in a lot of consecutive 0’s and clusters of small numbers

Page 19: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Move to front – How ?

i i s s i p # m s s p i

s p m i #

4 3 2 1 0

1 3 1 3 4 4 4 4 4 0 0 0

L

s p m # i

4 3 2 1 0

s m # i p

4 3 2 1 0

m # i p s

4 3 2 1 0

And so on…

• Keep an array/list MTF[1,…,|Σ|], initially say sorted

• Replace each char by its index and move it to the front

Page 20: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Final step

• Finish the encoding by applying a prefix code such as Huffman’s to the integer sequence produced by MTF

Page 21: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Variations

• Different prefix codes

• Use run-length encoding: BWT + MFT + RLE + prefix

Page 22: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Example

1. 0 1+1 = 2 10 01 0

2. 00 2+1 = 3 11 11 1

3. 000 3+1 = 4 100 001 00

4. 0000 4+1 = 5 101 101 10

5. 0000000 7+1 = 8 1000 0001000

Run length encoding Replace each sequence 0m by m+1 encoded in binary using 2 new symbols 0,1

RLE(MTF(L)) is defined over { 0,1,1,2,…,|Σ|-1}

MTF(L) = 1000221000023 RLE(MTF(L) ) = 1002211023

Page 23: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Run length encoding You can retrieve as follows

Total number of zeros:

00110 = b0b1b2b3b4

Real # of ones is (1bk-1 … b3b2b1b0)2-1=(101100)2-1

Replace bi by (bi+1)2i zeros

1 1 1 1

0 0 0 0

( 1)2 2 2 2 2 1k k k k

i i i i k

i i i

i i i i

b b b

Page 24: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

A prefix code

1

1 0

0 1

0

1 0

0 1 1

0 1

0 1 0 1

1 2

3 4 5 6

For i=1 to |Σ|-1

log(i+1) zeroes

Followed by a binary

representation of i+1 which takes 1+ log(i+1) bits

0

Page 25: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

A prefix code

1

1 0

0 1

0

1 0

0 1 1

0 1

0 1 0 1

1 2

3 4 5 6

bin(1) 010

bin(2) 011

bin(3) 00100

bin(4) 00101

bin(5) 00110

bin(6) 00111

bin(7) 0001000

0

Page 26: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

How good is the compression?

( ( ( ( )))) 5 ( ) (log( ))kprefix RLE MTF BWT T nH T O n

Theorem (FM 05):

Hk(T) is the kth order empirical entropy of T

This is true simultaneously for every k

Page 27: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

The FM index

Page 28: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

fr occ=2 [lr-fr+1]

Substring search using the BWT

(Count the pattern occurrences)

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

P = si First step

fr

lr Inductive step: Given fr,lr for P[j+1,p]

Take c=P[j]

P[ j ]

Find the first c in L[fr, lr]

Find the last c in L[fr, lr]

L-to-F mapping of these chars lr

rows prefixed

by char “i” s s

slide stolen from Paolo Ferragina@

# 0

i 1

m 5

p 6

s 8

C

Page 29: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

fr occ=2 [lr-fr+1]

Substring search using the BWT

(Count the pattern occurrences)

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

P = si First step

fr

lr

lr

rows prefixed

by char “i” s s

fr’ = C[s] + occ(s,1) + 1

lr’ = C[s] + occ(s,5)

# 0

i 1

m 5

p 6

s 8

C

fr’ = C[s] + occ(s,fr-1) + 1 lr’ = C[s] + occ(s,lr)

Page 30: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Make a bit vector for each character

i

p

s

s

m

#

p

i

s

s

i

i

L

0 0 1 1 0 0 0 0 1 1 0 0

occ(s,4) = rank(4)

rank(i) = how many ones are there before position i ?

Page 31: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

How do you answer rank queries ?

0 0 1 1 0 0 0 0 1 1 0 0

rank(i) = how many ones are there before position i ?

We can prepare a vector with all answers

0 0 1 2 2 2 2 2 3 4 4 4

Page 32: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Lets do it with O(n) bits per character

0 0 1 1 0 0 1 0 1 1 0 0

Partition in 2n/log(n) blocks of size log(n)/2

0 0 1 1 0 0 0 0 1 1 0 0

logn/2 2 5 7

Keep the answer for each prefix of the blocks

There are n “kinds” of blocks, prepare a table with all

answers for each block

Page 33: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

0 0 1 1 0 0 1 0 1 1 0 0

In our solution the bit vector takes Θ(n) bits and also the “additionals” take Θ(n) bits

0 0 1 1 0 0 0 0 1 1 0 0

logn/2 2 5 7

Page 34: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Can we do it with smaller overhead : so additionals would

take o(n) ?

0 0 1 1 0 0 1 0 1 1 0 0

superblocks of size log2(n)

0 0 1 1 0 0 0 0 1 1 0 0

log2n 7 13

Each block keeps the number of one in previous blocks that are in the same superblock

0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0

2 4

Page 35: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Analysis

0 0 1 1 0 0 1 0 1 1 0 0

The superblock table is of size n/log (n)

0 0 1 1 0 0 0 0 1 1 0 0

log2n 7 13

The block table is of size (loglog(n)) * n/log (n)

0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0

2 4

The tables for the blocks √n log(n)loglog(n)

So the additionals take o(n) space

Page 36: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Summary so far

• We can count occurrence using:

• The array C

• A bit vector per character O(|Σ|n)

• Additional tables of size o(n)

Page 37: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Next steps

Do it without keeping the bit vectors themselves ?

Can we do it while keeping only prefix(RLE(MTF(BWT(T)))) ?

How do we find the occurrences themselves ?

Page 38: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Storing prefix(RLE(MTF(BWT(T))))

… L:

𝓁=γlog(n)

0100101 Z=pr(RLE(MTF(BWT(T)))):

Also partition into superblocks (not shown) each consisting of ℓ blocks

Page 39: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

The tables

• NOs[c,f]: #occ of c before superblock f

• NOb[c,i]: #occ of c from the beginning of the superblock and before block i

Page 40: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

The tables

• MTF[i]: The MTF list at the beginning of block i

• S[c,j,Z,M]: #occ of c up to char j of a block whose compressed rep. is Z and MTF at the beginning is M

We would like to use it by retrieving S[c,j,Zi,MTF[i]]

How do we find Zi ?

Page 41: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Two more tables

• Ws[f]: where does the compressed part of superblock f starts

• Wb[i]: The offset of block i within its superblock

• Retrieving block j

[ ] [ ]

( mod 0) [ 1] 1

[ ] [ 1] 1

[ , ]j

js

start Ws s Wb j

end if j Ws s else

Ws s Wb j

Z Z start end

Page 42: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Tables sizes • Ws, Wb, NOs, NOb, MTF:

O(nloglog(n)/log(n))

• What is the size of S ?

S[c,j,Z,M]: #occ of c up to char j of a block whose compressed rep. is Z and the MTF at the beginning is M

| | 1 2 log 1 2 log log( )Z n

Choose such that |Z| δlog(n)

So the # of possible Z’s is nδ

So the size of S is O(nδℓlog(ℓ))

Page 43: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Find the occurrences

• We find fr,lr, so we know that there are lr-fr+1 occurrences

• But how do we find the positions of these occurrences ?

• If we had the suffix array it would have been easy

Page 44: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

The LF mapping..

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

# 0

i 1

m 5

p 6

s 8

C

LF(i) = C[L[i]] + Occ(L[i], i)

LF(9) = C[s] + Occ(s, 9) = 11

The previous suffix is at row 11

fr

lr

Page 45: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

The LF mapping..

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

# 0

i 1

m 5

p 6

s 8

C

LF(i) = C[L[i]] + Occ(L[i], i )

LF(11) = C[i] + Occ(i, 11) = 4

The previous suffix is at row 4

fr

lr

Page 46: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Finding the occurrences

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

# 0

i 1

m 5

p 6

s 8

C

We can keep going until we get to the first prefix (and record where the first prefix is in the array)

fr

lr

By counting steps we discover the position

Page 47: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Finding the occurrences

• This may take too much time…

• Remember the positions of a subset of the suffixes in the suffix array:

Missisipi….

log1+ε(n)

Page 48: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Finding the occurrences

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

These suffixes are equally spaced along the text

But in the suffix array they are in arbitrary positions

Keep then in a dictionary structure like a hash table

Page 49: The Burrows Wheeler transform and the FM indexhaimk/adv-alg-2014/Burrows-Wheeler.pdf · Burrows –Wheeler (bzip2) Currently best algorithm for text compression High level: • Apply

Summary

Can find each occurrence in O(log1+ε(n)) time

Additional space:

#mississipp

i#mississip

ippi#missis

issippi#mis

ississippi#

mississippi

pi#mississi

ppi#mississ

sippi#missi

sissippi#mi

ssippi#miss

ssissippi#m

i

p

s

s

m

#

p

i

s

s

i

i

L

mississippi

fr

1log( )

log ( ) log ( )

n nO n O

n n

Time to find an occurrence:

1log ( )O n