Large-scale Music Identiﬁcation – Algorithms and Applicationseugenew/publications/dqe-research.pdf · 2008-11-27 · Introduction • Music identiﬁcation scenario: • Match

Large-scale Music Identification – Algorithms

and Applications

Eugene Weinstein, PhD CandidateNew York University, Courant Institute

Department of Computer ScienceDepth Qualifying Exam

June 20th, 2007

Talk Outline• Introduction, motivation

• Algorithms for music identification

• Acoustic modeling using Gaussian Mixture Models

• Song set representation using Finite-State Transducers

• Experiments on clean, noisy, and distorted data

• Theoretical results

• New bounds on the size of a factor automaton

• Conclusion

2

Introduction

• Music identification scenario:

• Match a few seconds of audio to large song database

• Many potential applications: music search, content monitoring, song analysis

• Algorithmic, theoretical, and practical challenges, e.g.,

• Recording can be distorted due to noise, transmission over limited channels

• Might only get short audio snippet from any point in song

3

Past Work

• Most past work based on hashing, e.g., [Haitsma et al. ’01]

• Exact match required between training and test features

• HMM system over music sound events: [Batlle et al. ’02]

• Similar to speech recognition with unknown phone set

• Hypothesize phones in iterative process:

1. Run decoding on training corpus, obtain labels

2. Estimate phones based on decoding output

4

Overview

• Start with database of 15,000+ songs

• Compute MFCC features over audio

• Cluster song segments to get initial music phone set

• Learn phone set and train acoustic model for each phone

• Generate compact recognition transducer

• Identify songs using Viterbi decoding

Waveform Cepstra

Phone Set/Acoustic Model

Transcription

RecognitionTransducer

Acoustic Model Training

• Segment training audio based on spectral change

• Initialize model using k-means clustering over segments

• Iterative training, repeat:

1. Run decoder with current model, get transcriptions

2. Use transcription counts to train GMM for each phone

•

Phone Set/Acoustic Model

Transcription• Edit distance measures convergence

200

300

400

500

600

700

800

900

0 2 4 6 8 10 12 14 16 18 20

Edit

Dist

ance

Training Iteration

[ICASSP ’07]

6

Finite-state Transducers

• Finite automata with input and output labels on transitions, possibly weighted

• Widely used in speech and text processing, computational biology, etc.

• We view each song transcription as a string (music phones are the symbols) and construct a FST to

• Map substrings (factors) to songsPhone Set/

Acoustic Model

Transcription

RecognitionTransducer

7

Full-song Recognition

• Want transducer mapping complete music phone sequences to corresponding songs (no snippets for now)

• Idea: one state chain per song

• Transition to final state has song identifier as output label (all other output labels are ε’s)

• Using generic automata operations, we construct deterministic minimal transducer for efficient search

0

1mp_72:!

3mp_736:!

6mp_736:!

2mp_240 :!

4mp_736 :!

7mp_28 :!

10

mp_2: Beatles--Let_It_Be

5mp_240:! mp_20:Madonna--Ray_Of_Light

8mp_349:! 9mp_448:!mp_889:Van_Halen--Right_Now

Fig. 4. Finite-state transducer T0 mapping each song to its identifier.

0 1

mp_2

mp_20

2mp_72

3mp_240

4mp_736

5mp_240mp_2

mp_20

6mp_240

7

mp_736

mp_2

mp_20

mp_240

01/0

mp_2/0

mp_20/1

2/0mp_72/0

3/0mp_240/0

4/0

mp_736/1

5/0mp_240/0

mp_2/0

mp_20/1

6/0

mp_240/0

7/0

mp_736/0

mp_2/0

mp_20/0

mp_240/0

(a) (b)

Fig. 5. (a) Deterministic and minimal unweighted factor acceptor F (A) for two songs.(b) Deterministic and minimal weighted factor acceptor Fw(A) for two songs.

The bound given by the corollary is not tight for relatively small values of k inthe sense that in practice, the size of the factor automaton does not depend onkn, the sum of the lengths of su!xes of length k, but rather on the number ofstates of A used for their representation, which for a minimal automaton canbe substantially less. However, for large k, e.g., when all strings are of the samelength and k is as long as the length of the strings accepted by A, our boundcoincides with the that of Blumer et al. [1987].

Similar results can be obtained for the number of transitions of the su!xautomaton or factor automaton of a su!x-unique automaton (|S(A)|E ! 3|A|E"4) and k-su!x-unique automaton (|S(A)|E ! 3|Ak|E + 3kn" 3k " 1), as in thestring case.

4 Factor Automata for Music Identification

We have verified the above insights into factor automata in the context of a musicidentification system [Weinstein and Moreno, 2007]. Music identification is theprocess of matching an audio stream to a particular song. In our music identifi-cation system, we learn an inventory of music phone units similar to phonemesin speech and a unique sequence of music phones characterizing each song. Wethen view the music phone set as our alphabet and the music phone sequencesas a set of strings over that alphabet. The music identification task is then pre-cisely transformed into a factor recognition task. Our approach is to constructa compact transducer that maps music phone sequences to corresponding songidentifiers.

8

Snippet (Factor) Acceptor

• Need to recognize song parts, or snippets

• Make all states initial & final

• Drop output labels

• Determinize, minimize

•• Recognizes all snippets

• But doesn’t identify songs!

0

1mp_72:!

3mp_736:!

6mp_736:!

2mp_240 :!

4mp_736 :!

7mp_28 :!

10





0 1

mp_2

mp_20

2mp_72

3mp_240

4mp_736

5mp_240mp_2

mp_20

6mp_240

7

mp_736

mp_2

mp_20

mp_240

01/0

mp_2/0

mp_20/1

2/0mp_72/0

3/0mp_240/0

4/0

mp_736/1

5/0mp_240/0

mp_2/0

mp_20/1

6/0

mp_240/0

7/0

mp_736/0

mp_2/0

mp_20/0

mp_240/0

(a) (b)






9








0

1mp_72:!

3mp_736:!

6mp_736:!

2mp_240 :!

4mp_736 :!

7mp_28 :!

10





0 1

mp_2

mp_20

2mp_72

3mp_240

4mp_736

5mp_240mp_2

mp_20

6mp_240

7

mp_736

mp_2

mp_20

mp_240

01/0

mp_2/0

mp_20/1

2/0mp_72/0

3/0mp_240/0

4/0

mp_736/1

5/0mp_240/0

mp_2/0

mp_20/1

6/0

mp_240/0

7/0

mp_736/0

mp_2/0

mp_20/0

mp_240/0

(a) (b)






ε:ε ε:ε

9








0

1mp_72:!

3mp_736:!

6mp_736:!

2mp_240 :!

4mp_736 :!

7mp_28 :!

10





0 1

mp_2

mp_20

2mp_72

3mp_240

4mp_736

5mp_240mp_2

mp_20

6mp_240

7

mp_736

mp_2

mp_20

mp_240

01/0

mp_2/0

mp_20/1

2/0mp_72/0

3/0mp_240/0

4/0

mp_736/1

5/0mp_240/0

mp_2/0

mp_20/1

6/0

mp_240/0

7/0

mp_736/0

mp_2/0

mp_20/0

mp_240/0

(a) (b)






ε:ε ε:ε

9








0

1mp_72:!

3mp_736:!

6mp_736:!

2mp_240 :!

4mp_736 :!

7mp_28 :!

10





0 1

mp_2

mp_20

2mp_72

3mp_240

4mp_736

5mp_240mp_2

mp_20

6mp_240

7

mp_736

mp_2

mp_20

mp_240

01/0

mp_2/0

mp_20/1

2/0mp_72/0

3/0mp_240/0

4/0

mp_736/1

5/0mp_240/0

mp_2/0

mp_20/1

6/0

mp_240/0

7/0

mp_736/0

mp_2/0

mp_20/0

mp_240/0

(a) (b)






ε:ε ε:ε

0

1mp_72:!

3mp_736:!

6mp_736:!

2mp_240 :!

4mp_736 :!

7mp_28 :!

10





0 1

mp_2

mp_20

2mp_72

3mp_240

4mp_736

5mp_240mp_2

mp_20

6mp_240

7

mp_736

mp_2

mp_20

mp_240

01/0

mp_2/0

mp_20/1

2/0mp_72/0

3/0mp_240/0

4/0

mp_736/1

5/0mp_240/0

mp_2/0

mp_20/1

6/0

mp_240/0

7/0

mp_736/0

mp_2/0

mp_20/0

mp_240/0

(a) (b)






9

Weighted Factor Acceptor

• Use numerical song id’s as weights on transitions

• Automata operations preserve total weight along a given path [Mohri ‘97]

0

1mp_72:!

3mp_736:!

6mp_736:!

2mp_240 :!

4mp_736 :!

7mp_28 :!

10





0 1

mp_2

mp_20

2mp_72

3mp_240

4mp_736

5mp_240mp_2

mp_20

6mp_240

7

mp_736

mp_2

mp_20

mp_240

01/0

mp_2/0

mp_20/1

2/0mp_72/0

3/0mp_240/0

4/0

mp_736/1

5/0mp_240/0

mp_2/0

mp_20/1

6/0

mp_240/0

7/0

mp_736/0

mp_2/0

mp_20/0

mp_240/0

(a) (b)






0

1mp_72:!

3mp_736:!

6mp_736:!

2mp_240 :!

4mp_736 :!

7mp_28 :!

10





0 1

mp_2

mp_20

2mp_72

3mp_240

4mp_736

5mp_240mp_2

mp_20

6mp_240

7

mp_736

mp_2

mp_20

mp_240

01/0

mp_2/0

mp_20/1

2/0mp_72/0

3/0mp_240/0

4/0

mp_736/1

5/0mp_240/0

mp_2/0

mp_20/1

6/0

mp_240/0

7/0

mp_736/0

mp_2/0

mp_20/0

mp_240/0

(a) (b)






[ICASSP ’07]

10

Final Transducer

• To construct transducer, turn weights into output labels

• After decoding, sum decoder outputs, get numeric song id

• Full-song transducer: 27.5M states, 27.6M transitions

• Final factor transducer: 32.7M states, 59.6M transitions

• Automaton recognizing any factor of song transcriptions only ~2x bigger than that of entire songs

• This is unexpected, considering there are 15,000 songs, 1,700 average phones per song:

• # possible factors = 15,000 × 1,7002 ≈ 43 × 109

11

Experiments

• Database: 15,455 songs in MP3 format

• Average song duration: 3.9 minutes

• > 1,000 hours of audio

• “Big” test set: one 10-second snippet per song

• Test identification on “clean” in-set data

• “Small” test set: 1,762 in-set and 1,856 out-of-set snippets

• Test noise robustness and rejection of out-of-set songs

12

Results: Identification

• FS: Full song (best possible performance): 99.7%

• PS: Partial songs, 10-second snippets: 99.5%

• Test on a range of beam sizes in Viterbi search

• All tests faster than real-time (0.48×real-time for beam=12)

96.5

97

97.5

98

98.5

99

99.5

100

5 10 15 20 25 30

Accu

racy

Beam Size

FSPS

13

[ICASSP ’07]

Experiments: Detection

• Detection: distinguish in-set from out-of-set songs

• Test detection capability using SVM’s

• Construct universal background model by clustering GMM components across all phones

• SVM features: log-likelihood of best path with in-set acoustic model, background model, and their difference

• Radial basis function kernel with a sweep over the parameter space ( and )C !

14

Noise, Distortions

• Noise condition: additive white noise (harsh environment)

• Add noise at different mixing levels

• Speed-up and slow-down of audio (no pitch shifting)

• Different rate multipliers

• MP3 encode/decode

• Different bitrates

[ISMIR ’07]

15

Results: Noise, DistortionsCondition Identification Accuracy Detection Accuracy

Clean 99.4% 96.9%

White noise @44.0dB SNR 98.5% 96.8%




Speed up by 2% 96.0% 96.0%

Slow down by 2% 96.4% 96.4%

Speed up by 10% 43.2% 87.7%

Slow down by 10% 45.7% 85.8%

MP3 re-encode 64kbps 98.1% 96.6%

MP3 re-encode 32kbps 95.5% 95.3%

[ICASSP ’07, ISMIR ’07]

16

Summary So Far

• “Phone” set for music ID can be learned automatically

• Match audio without relying on direct match between feature values -- should be robust to signal variation

• Acoustic modeling techniques applicable to speech, etc.

• We formulate music ID as a search problem

• Use well-established techniques from speech and text processing (FSTs, GMMs) to make effective system

• FST framework allows efficient matching of song snippets

• Compact factor transducer can be constructed17

String Matching in Music ID

• Automata allow us to solve string matching problem

• But will our approach generalize to larger data sets?

• We thus address a theoretical question

• What is the size of the smallest deterministic automaton accepting the factors of a set of strings ?

• For efficiency, can be represented with an automaton

• Or, set of strings may be given directly as an automaton

• More general question: What is the size of the factor automaton of ?

U

U A

A

[CIAA ’07]

18

Past Work

• Factor automaton of a string has at most states, and transitions [Crochemore ’85; Blumer et al. ’86]

• Can be constructed by a linear-time online algorithm

• Size bounds for a set of strings has also previously been studied [Blumer et al. ’87]

• If is the sum of the lengths of all the strings in

•

x 2|x|! 2

3|x|! 4

U

||U || U

U 2||U ||! 1

3||U ||! 3

• Factor automaton of has at most states and transitions

• We prove a substantially better bound here

19

Suffix Automaton

• We start out with an automaton recognizing strings in

• Let and be the deterministic minimal automata recognizing the suffixes and factors of , respectively

• To construct make each state of initial (by adding epsilons), determinize, minimize

• To construct make each state of final, minimize

• Consequence:

A U

S(A) F (A)

S(A) A

F (A) S(A)

|F (A)| ! |S(A)|

0 1a

2c

3a

4

b5

b

a

Fig. 1. Finite automaton A accepting the strings ac, acab, acba.

Proposition 1. Assume that A is su!x-unique. Let SA = (QA, IA, FA, EA)be the deterministic automaton whose states are the equivalence classes QA ={[x] != " : x # !!}, its initial state IA = {["]}, its final states FA = {[x] :end -set(x) $ F != "} where F is the set of final states of A, and its transitionset E = {([x], a, [xa]) : [x], [xa] # QA}. Then, SA is the minimal deterministicsu!x of A: SA = S(A).

Proof. By construction, SA is deterministic and accepts exactly the set of su!xesof A. Let [x] and [y] be two equivalent states of SA. Then, for all z # !!,[xz] # FA i" [yz] # FA, that is z is a su!x of A i" yz is a su!x of A. Since Ais su!x-unique, this implies that either x is a su!x of y or vice-versa, and thusthat [x] = [y]. Thus, SA is minimal. %&

In much of what follows, we will be interested in the case where the automa-ton A is acyclic. We denote by |A|Q the number of states of A, by |A|E thenumber of transitions of A, and by |A| the size of A defined as the sum of thenumber of states and transitions of A.

3 Space Bounds for Factor Automata

The objective of this section is to derive new bounds on the size of S(A) andF (A) in the case of interest for our applications where A is an acyclic automaton,typically deterministic and minimal, representing a set of strings.

When A represents a single string, there are standard algorithms for con-structing S(A) and F (A) from A in linear time [3, 4]. In the general case, S(A)can be constructed from A as follows: add an "-transition from the initial stateof A to each state of A, then apply an "-removal algorithm, followed by deter-minization and minimization to obtain S(A). F (A) can be obtained similarly byfurther making all states final before applying "-removal, determinization, andminimization. It can also be obtained from S(A) by making all states of S(A) fi-nal and applying minimization. Figure 1 shows a simple automaton A acceptingthree strings and Figure 2 its su!x automaton S(A).

When A represents a single string x, the size of the automata S(A) and F (A)can be proved to be linear in |x|. More precisely, the following bounds hold for|S(A)| and |F (A)| [4, 3]:

|S(A)|Q ' 2|x|( 1 |S(A)|E ' 3|x|( 4|F (A)|Q ' 2|x|( 2 |F (A)|E ' 3|x|( 4.

(1)

A

20

Suffix Automaton





• Consequence:

A U

S(A) F (A)

S(A) A

F (A) S(A)

|F (A)| ! |S(A)|

0 1a

2c

3a

4

b5

b

a










(1)

εεε

ε

ε

A

20

Suffix Automaton





• Consequence:

A U

S(A) F (A)

S(A) A

F (A) S(A)

|F (A)| ! |S(A)|

0 1a

2c

3a

4

b5

b

a










(1)

0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a

Fig. 2. Su!x automaton S(A) of the automaton A of Figure 1.

These bounds are tight for strings of length more than three. [2] gave similarresults for the case of a set of strings U by showing that the size of the factorautomaton F (U) representing this set is bounded as follows

|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)

where "U" denotes the sum of the lengths of all strings in U .In general, the size of an acyclic automaton A representing a finite set of

strings U can be substantially smaller than "U". In fact, |A| can be exponentiallysmaller than "U". Thus, we are interested in bounding the size of S(A) or F (A)in terms of the size of A, rather than the sum of the lengths of all strings acceptedby A.

For any state q of S(A), we denote by su!(q) the set of strings labeling thepaths from q to a final state. We also denote by N(q) the set of states in A fromwhich a non-empty string in su!(q) can be read to reach a final state.

Lemma 2. Let A be a su!x-unique automaton and let q and q! be two statesof S(A) such that N(q) $N(q!) %= &, then

!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)

Proof. Since S(A) is a minimal automaton, its states are accessible from theinitial state. Let u be the label of a path from the initial I of S(A) to q andsimilarly u! the label of a path from I to q!.

By assumption, there exists p ' N(q) $N(q!). Thus, there exist non-emptystrings v ' su!(q) and v! ' su!(q!) such that both v and v! label paths from pto a final state.

By definition of u and u!, both uv and u!v! are su"xes of A. Since A issu"x-unique and v is non-empty, there exists a unique string accepted by A andending with v. There exists also a unique string accepted by A and ending withuv. Thus, these two strings must coincide.

This implies that any string accepted by A and admitting v as su"x alsoadmits uv as su"x. In particular, the label of any path from an initial state to pmust admit u as su"x. Reasoning in the same way for v! let us conclude that thelabel of any path from an initial state to p must also admit u! as su"x. Thus,

εεε

ε

ε

A

20

Suffix Automaton





• Consequence:

A U

S(A) F (A)

S(A) A

F (A) S(A)

|F (A)| ! |S(A)|

0 1a

2c

3a

4

b5

b

a










(1)

0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





εεε

ε

ε

A

20

Size Bound: Strategy

• Goal: a bound on in terms of

• Work on bounding – consider suffixes only for now

• Idea: each state in accepts a distinct set of suffixes, so count the number of possible sets of suffixes

• The suffix sets can be arranged in a hierarchy, which is directly related in size to

• Motivated by similar arguments for single-string case in [Blumer et al. ’86]; string sets in [Blumer et al. ’87]

|F (A)| |A|

|S(A)|

S(A)

A

21

Suffix Sets

• Automaton is -suffix unique if no two strings accepted by share the same -length suffix. Suffix-unique if

• Define : set of states in reachable after reading

• e.g.,

• denotes

• This is a right-invariant equivalence relation

• is the equivalence class of

kA

A k = 1k

0 1a

2c

3a

4

b5

b

a










(1)

end -set(x) xA

end -set(ac) = {2, 3, 4, 5}

x ! y end -set(x) = end -set(y)

[x] x

22

• is number of strings accepted by

• If is a state of , is set of suffixes accepted from

• e.g.,

• is the set of states in from which a non-empty string in can be read to reach a final state

• e.g.,

Notation

A

S(A)

N(q) A

su!(q)

N(3) = {2, 1}

su!(q)q qS(A)

su!(3) = {ab, ba}

0 1a

2

b

3

c

b

c 4

a

5

a

6b

b

a

0 1a

2c4

b

b

3a

5a

b

23

Nstr A

Suffix Set Inclusion

• Lemma: Let be a suffix-unique automaton and let and be two states of such that , then

Suffix Set Inclusionq q

!

S(A)

0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





or

A


• Proof: Let paths in to and be labeled with and .


!

S(A)

0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





or

A

S(A) q q!

u u!

S(A)u

u!

q

q!



• Thus must have a state


!

S(A)

0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





or

A

0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





S(A) q q!

u u!

S(A)u

u!

p

Au

u!

q

q!

A



• Thus must have a state

• Thus, exist paths and from to final


!

S(A)

0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





or

A

0

1

a

2

b

3c

c

4

b

a

5a

6

b

b

a



|F (U)|Q ! 2"U" # 1 |F (U)|E ! 3"U"E # 3, (2)





!su"(q) ! su"(q!) and N(q) ! N(q!)

"or

!su"(q!) ! su"(q) and N(q!) ! N(q)

". (3)





S(A) q q!

u u!

S(A)v

v!

u

u!

p

Au

u!

q

q!

v

v!

v ! su!(q) v!! su!(q!) p

A

Suffix Set Inclusion

• Since is suffix-unique, any string accepted by and ending in , must also end in

• Thus, any path from initial to must end in

• By same reasoning, it must also end in

• Hence, is a suffix of , or vice versa

• Assume the former, then , thus QED. x

vu

u’

Fig. 3. Illustration of the situation described in Lemma 2. uv and u!v are su!xes ofthe same string x. Thus, u and u! are also su!xes of the same string. Thus, u is asu!x of u! or vice-versa.

u and u! are su!xes of the same string. Thus, u is a su!x of u! or vice-versa.Figure 3 illustrates this situation.

Assume without loss of generality that u is a su!x of u!. Then, for anystring w, if u!w is a su!x of A so is uw. Thus, su"(q!) ! su"(q), which impliesN(q!) ! N(q). When u! is a su!x of u, we obtain similarly the other case of thestatement of the lemma. "#

Note that Lemma 2 holds even when A is a non-deterministic automaton.

Lemma 3. Let A be a su!x-unique deterministic automaton and let q and q!

be two distinct states of S(A) such that N(q) = N(q!), then either q is a finalstate and q! is not, or q! is a final state and q is not.

Proof. Assume that N(q) = N(q!). By Lemma 2, this implies su"(q) = su"(q!).Thus, the same non-empty strings label the paths from q to a final state or thepaths from q! to a final state. Since S(A) is a minimal automaton, the distinctstates q and q! are not equivalent. Thus, one must admit an empty path to afinal state and not the other. "#

The following proposition extends the results of [3] which hold for a set ofstrings, to the case where A is an automaton.

Proposition 2. Let A be a su!x-unique deterministic and minimal automatonaccepting strings of length more than three. Then, the number of states of thesu!x automaton of A is bounded as follows

|S(A)|Q $ 2|A|Q % 3. (4)

Proof. If the strings accepted by A are all of the form an, S(A) can be derivedfrom A simply by making all its states final and the bound is trivially achieved.In the remaining of the proof, we can thus assume that not all strings acceptedby A are of this form.

Let F be the unique final state of S(A) with no outgoing transitions. Lem-mas 2-3 help define a tree T associated to all states of S(A) other than F byusing the ordering:

N(q) & N(q!) i"!

N(q) ' N(q!) orN(q) = N(q!) and q! final, q non-final. (5)

We will identify each node of T with its corresponding state in S(A). By Propo-sition 1, each state q of S(A) can also be identified with an equivalence class

A

u!

A

v uv

p u

u!

u

xvu

u’










|S(A)|Q $ 2|A|Q % 3. (4)



N(q) & N(q!) i"!



xvu

u’










|S(A)|Q $ 2|A|Q % 3. (4)



N(q) & N(q!) i"!



25

S(A)v

v!

u

u!

p

Au

u!

q

q!

v

v!

Suffix-unique Bound

• Theorem: If is a suffix-unique deterministic and minimal automaton, then the number of states of is bounded as

• Proof (sketch):

• Lemma: For any two states of the suffix automaton, either suffix sets are disjoint, or one includes the other

• We can show that each state of corresponds to a distinct equivalence class , count these to get bound

• The equivalence sets induce a suffix sets hierarchy which we will analyze

xvu

u’










|S(A)|Q $ 2|A|Q % 3. (4)



N(q) & N(q!) i"!



A

S(A)

q S(A)

[x]

[CIAA ’07]

26

Suffix Sets: Non-branching

• Suffix sets either disjoint or inclusive: hierarchy

• Count branching, non-branching nodes separately

• Exclude super-final state with no outgoing transitions

• Let be a state in with equivalence class , longest

• The only way to have a branching node is if there exist factors (since is a right-equivalence relation)

• So is only non-branching when is a prefix or suffix

• Empty prefix not included in non-degenerate cases

• Total non-branching nodes

q [x] x

F

S(A)

ax, bx(a != b) !

q x

!

[x]. Let q be a state of S(A) distinct from F , and let [x] be its correspondingequivalence class. Observe that since A is su!x-unique, end -set(x) coincideswith N(q).

We will show that the number of nodes of T is at most 2|A|Q! 4, which willyield the desired bound on the number of states of S(A). To do so, we boundseparately the number of non-branching and branching nodes of T .

Let q be a node of T and let [x] be the corresponding equivalence class,with x its longest member. The children of q are the nodes corresponding to theequivalence classes [ax] where a " ! and ax is a factor of A.

By Lemma 1, if x is a non-su!x and non-prefix factor, then there exist factorsax and bx with a #= b. Thus, q admits at least two children corresponding to [ax]and [bx] and is thus a branching node. Thus non-branching nodes can only beeither nodes q where x is a prefix, or those where x is a su!x, that is when q isa final state of S(A).

Since the strings accepted by A are not all of the form an for some a " !, theempty prefix " occurs at least in two distinct left contexts a and b with a #= b.Thus, the prefix ", which corresponds to the root of T , is necessarily branching.Also, let f be the unique final state of A with no outgoing transitions. Theequivalence class of the longest factor ending in f , that is the longest stringaccepted by A corresponds to the state F in S(A) which is not included in thetree T . Thus, there are at most |A|Q ! 2 non-branching prefixes.

There can be at most one non-branching node for each string accepted byA. Let Nstr denote the number of strings accepted by A, then, the number ofnon-branching nodes Nnb of T is at most Nnb $ |A|Q ! 2 + Nstr.

To bound the number of branching nodes Nb of T , observe that since A issu!x-unique, each string accepted by A must end with a distinct symbol ai,i = 1, . . . , Nstr. Each ai represents a distinct left context for the empty factor", thus the root node ["] admits all [ai]s, i = 1, . . . , Nstr, as children. Let Tai

represent the sub-tree rooted at [ai] and let nai represent the number of leavesof Tai . Let aj , j = Nstr + 1, . . . , Nstr + k denote the other children of the rootand let Taj denote each of the corresponding sub-tree. A tree with nai leaves hasless than nai branching nodes. Thus, the number of branching nodes of Tai is atmost nai ! 1. The total number of leaves of T is at most the number of disjointsubsets of Q excluding the initial state and f .

Note however that when the root node ["] admits only [ai]s, i = 1, . . . , Nstr,as children, that is when k = 0, then there is at least one ai, say a1, that isalso a prefix of A since any other symbol would have been the root node’s child.The node a1 will then have also a child since it corresponds to a su!x or finalstate of S(A). Thus, a1 cannot be a leaf in that case. Thus, there are at mostas many as

!Nstr+ki=1 nai $ |A|Q ! 2!min{1, k} leaves and the total number of

branching nodes of T , including the root is at most Nb $!Nstr+k

i=1 (nai!1)+1 $|A|Q ! 2 !min{1, k} ! (Nstr + k) + 1 $ |A|Q ! 2 ! Nstr. The total number ofnodes of the tree T is thus at most Nnb + Nb $ 2|A|Q ! 4. %&

In the specific case where A represents a single string x, the bound of Proposi-tion 2 matches that of [4] or [3] since |A|Q = |x|+1. The bound of Proposition 2

Suffix Sets: Branching

• If are the distinct final symbols of each string accepted by then each is a child of the root

• Let tree rooted at have leaves( branching nodes)

• Total number of leaves is (not initial and super-final)

• Total branching

• Total size of tree

• Add “super-final” state, get QED.














A

a1, . . . , aNstr

[ai]

[a1] ...














[a2] [aNstr] ... [aNstr+k]

[ai] nainai

! 1

|A|Q ! 2

Nb !!Nstr+k

i=1(nai

" 1) + 1 ! |A|Q " 2 " Nstr














xvu

u’










|S(A)|Q $ 2|A|Q % 3. (4)



N(q) & N(q!) i"!



Final Size Result

• If is a deterministic minimal automaton representing a set of strings then

• Substantial improvement over previous:

• When is -suffix unique accepting strings and is the part of after removing all suffixes of length

• Proof idea: add terminal symbols to make string set suffix-unique, construct suffix automaton, remove symbols

A

U

|S(U)|Q ! 2||U ||" 1

A k n Ak

A k

Proposition 3. Let A be a k-su!x-unique deterministic automaton acceptingstrings of length more than three and let n be the number of strings accepted by A.Then, the following bound holds for the number of states of the su!x automatonof A:

|S(A)|Q ! 2|Ak|Q + 2kn" 3, (9)where Ak is the part of the automaton of A obtained by removing the states andtransitions of all su!xes of length k.

Proof. Let A be a k-su!x-unique deterministic automaton accepting strings oflength more than three and let the alphabet ! be augmented with n temporarysymbols $1, . . . , $n. By marking each string accepted by A with a distinct symbol$i, we can turn A into a su!x-unique deterministic automaton A!.

To do that, we first unfold all k-length su!xes of A. In the worst case, allthese (distinct) su!xes were sharing the same (k"1)-length su!x. Unfolding canthus increase the number of states of A by as many as kn"n states in the worstcase. Marking the end of each su!x with a distinct $-sign further increases thesize by n. The resulting automaton A! is deterministic and |A!|Q ! |Ak|Q + kn.By Proposition 2, the size of the su!x automaton of A! is bounded as follows:|S(A!)| ! 2|A!| " 3. Since transitions labeled with a $-sign can only appearat the end of successful paths in S(A!), we can remove these transitions andmake their origin state final, and minimize the resulting automaton to derive adeterministic automaton A!! accepting the set of su!xes of A. The statement ofthe proposition follows the fact that |A!!| ! |S(A!)|. #$

Since the size of F (A) is always less than or equal to that of S(A), we obtaindirectly the following result.

Corollary 3. Let A be a k-su!x-unique automaton accepting strings of lengthmore than three. Then, the following bound holds for the factor automaton of A:

|F (A)|Q ! 2|Ak|Q + 2kn" 3. (10)

The bound given by the corollary is not tight for relatively small values of k inthe sense that in practice, the size of the factor automaton does not depend onkn, the sum of the lengths of su!xes of length k, but rather on the number ofstates of A used for their representation, which for a minimal automaton canbe substantially less. However, for large k, e.g., when all strings are of the samelength and k is as long as the length of the strings accepted by A, our boundcoincides with that of [2].



We have verified the above insights into factor automata in the context of a musicidentification system [9]. Music identification is the task of matching an audio

Proposition 3. Let A be a k-su!x-unique deterministic automaton acceptingstrings of length more than three and let n be the number of strings accepted by A.Then, the following bound holds for the number of states of the su!x automatonof A:

|S(A)|Q ! 2|Ak|Q + 2kn" 3, (9)where Ak is the part of the automaton of A obtained by removing the states andtransitions of all su!xes of length k.

Proof. Let A be a k-su!x-unique deterministic automaton accepting strings oflength more than three and let the alphabet ! be augmented with n temporarysymbols $1, . . . , $n. By marking each string accepted by A with a distinct symbol$i, we can turn A into a su!x-unique deterministic automaton A!.

To do that, we first unfold all k-length su!xes of A. In the worst case, allthese (distinct) su!xes were sharing the same (k"1)-length su!x. Unfolding canthus increase the number of states of A by as many as kn"n states in the worstcase. Marking the end of each su!x with a distinct $-sign further increases thesize by n. The resulting automaton A! is deterministic and |A!|Q ! |Ak|Q + kn.By Proposition 2, the size of the su!x automaton of A! is bounded as follows:|S(A!)| ! 2|A!| " 3. Since transitions labeled with a $-sign can only appearat the end of successful paths in S(A!), we can remove these transitions andmake their origin state final, and minimize the resulting automaton to derive adeterministic automaton A!! accepting the set of su!xes of A. The statement ofthe proposition follows the fact that |A!!| ! |S(A!)|. #$

Since the size of F (A) is always less than or equal to that of S(A), we obtaindirectly the following result.

Corollary 3. Let A be a k-su!x-unique automaton accepting strings of lengthmore than three. Then, the following bound holds for the factor automaton of A:

|F (A)|Q ! 2|Ak|Q + 2kn" 3. (10)

The bound given by the corollary is not tight for relatively small values of k inthe sense that in practice, the size of the factor automaton does not depend onkn, the sum of the lengths of su!xes of length k, but rather on the number ofstates of A used for their representation, which for a minimal automaton canbe substantially less. However, for large k, e.g., when all strings are of the samelength and k is as long as the length of the strings accepted by A, our boundcoincides with that of [2].



We have verified the above insights into factor automata in the context of a musicidentification system [9]. Music identification is the task of matching an audio

|S(U)|E ! 3|A|E " 4 |F (U)|E ! 3|A|E " 4

is tight for strings of length more than three and thus is also tight for automataaccepting strings of length more than three. Note that the automaton of Figure 1is su!x-unique, deterministic, and minimal and has |A|Q = 6 states. The numberof states of the minimal su!x automaton of A is |S(A)|Q = 7 < 2|A|Q ! 3.

Corollary 1. Let A be a su!x-unique deterministic and minimal automatonaccepting strings of length more than three. Then, the number of states of thefactor automaton of A is bounded as follows

|F (A)|Q " 2|A|Q ! 3. (6)

Proof. As mentioned earlier, a factor automaton F (A) can be obtained from asu!x automaton S(A) by making all states final and applying minimization.Thus, |F (A)| " |S(A)|. The result follows Proposition 2. #$

Blumer et al. (1987) showed that an automaton accepting all factors of a setof strings U has at most 2%U%!1 states, where %U% is the sum of the lengths ofall strings in U . The following gives a significantly better bound on the size ofthe factor automaton of a set of strings U as a function of the number of nodes ofa prefix-tree representing U , which is typically substantially smaller than %U%.

Corollary 2. Let U = {x1, . . . , xm} be a set of strings of length more than threeand let A be a prefix-tree representing U . Then, the number of states of the factorautomaton F (U) and that of the su!x tree S(U) of the strings of U are boundedas follows

|F (U)|Q " 2|A|Q ! 2 |S(U)|Q " 2|A|Q ! 2. (7)

Proof. Let B be a prefix-tree representing the set U ! = {x1$1, . . . , xm$m}, ob-tained by appending to each string of U a new symbol $i, i = 1, . . . , m, to maketheir su!xes distinct and let B! be the automaton obtained by minimization ofB. By construction, B has m more states than A, but since all final states ofB are equivalent and merged after minimization, B! has at most one more statethan A.

By construction, B! is a su!x-unique automaton and by Proposition 2,|S(B!)|Q " 2|B!|Q!3. Removing from S(B!) the transitions labeled with the ex-tra symbols $i and connecting the resulting automaton yields the minimal su!xautomaton S(U). In S(B!), there must be a final state reachable by the tran-sitions labeled with $i and only such transitions, which becomes non-accessibleafter removal of the extra symbols. Thus, S(U) has at least one state less thanS(B!), which gives:

|S(U)|Q " |S(B!)|Q ! 1 " 2|B!|Q ! 4 = 2|A|Q ! 2. (8)

A similar bound holds for the factor automaton F (U) following the argumentgiven in the proof of Corollary 1. #$

When A is k-su!x-unique with a relatively small k as in our applications ofinterest, the following proposition provides a convenient bound on the size ofthe su!x automaton.

is tight for strings of length more than three and thus is also tight for automataaccepting strings of length more than three. Note that the automaton of Figure 1is su!x-unique, deterministic, and minimal and has |A|Q = 6 states. The numberof states of the minimal su!x automaton of A is |S(A)|Q = 7 < 2|A|Q ! 3.

Corollary 1. Let A be a su!x-unique deterministic and minimal automatonaccepting strings of length more than three. Then, the number of states of thefactor automaton of A is bounded as follows

|F (A)|Q " 2|A|Q ! 3. (6)

Proof. As mentioned earlier, a factor automaton F (A) can be obtained from asu!x automaton S(A) by making all states final and applying minimization.Thus, |F (A)| " |S(A)|. The result follows Proposition 2. #$

Blumer et al. (1987) showed that an automaton accepting all factors of a setof strings U has at most 2%U%!1 states, where %U% is the sum of the lengths ofall strings in U . The following gives a significantly better bound on the size ofthe factor automaton of a set of strings U as a function of the number of nodes ofa prefix-tree representing U , which is typically substantially smaller than %U%.

Corollary 2. Let U = {x1, . . . , xm} be a set of strings of length more than threeand let A be a prefix-tree representing U . Then, the number of states of the factorautomaton F (U) and that of the su!x tree S(U) of the strings of U are boundedas follows

|F (U)|Q " 2|A|Q ! 2 |S(U)|Q " 2|A|Q ! 2. (7)

Proof. Let B be a prefix-tree representing the set U ! = {x1$1, . . . , xm$m}, ob-tained by appending to each string of U a new symbol $i, i = 1, . . . , m, to maketheir su!xes distinct and let B! be the automaton obtained by minimization ofB. By construction, B has m more states than A, but since all final states ofB are equivalent and merged after minimization, B! has at most one more statethan A.

By construction, B! is a su!x-unique automaton and by Proposition 2,|S(B!)|Q " 2|B!|Q!3. Removing from S(B!) the transitions labeled with the ex-tra symbols $i and connecting the resulting automaton yields the minimal su!xautomaton S(U). In S(B!), there must be a final state reachable by the tran-sitions labeled with $i and only such transitions, which becomes non-accessibleafter removal of the extra symbols. Thus, S(U) has at least one state less thanS(B!), which gives:

|S(U)|Q " |S(B!)|Q ! 1 " 2|B!|Q ! 4 = 2|A|Q ! 2. (8)

A similar bound holds for the factor automaton F (U) following the argumentgiven in the proof of Corollary 1. #$

When A is k-su!x-unique with a relatively small k as in our applications ofinterest, the following proposition provides a convenient bound on the size ofthe su!x automaton.

|F (U)|E ! 3||U ||" 3

|S(A)|E ! 2|Ak|E + 3kn " 3k " 1 |F (A)|E ! 2|Ak|E + 3kn " 3k " 1

29

Music ID Experiments

• In our music ID application, we have

• Factor automaton size scales linearly with # of songs

0

1e+07

2e+07

3e+07

4e+07

5e+07

6e+07

0 2000 4000 6000 8000 10000 12000 14000 16000

Siz

e

# Songs

# States factor# Arcs factor

# States/Arcs Non-factor

0

2000

4000

6000

8000

10000

12000

14000

16000

0 5 10 15 20 25 30 35 40 45

Non-u

niq

ue s

ongs

k (suffix length)

(a) (b)

Fig. 6. (a) Comparison of automaton sizes for di!erent numbers of songs.“#States/Arcs Non-factor” is the size of the automaton A accepting the entire songtranscriptions. “# States factor” and “# Arcs factor” is the number of states andtransitions in the weighted factor acceptor Fw(A), respectively. (b) Number of stringsin U for which the su"x of length k is also a su"x of another string in U .

Chandler Street, Fort Detrick MD 21702-5014 is the awarding and administering ac-

quisition o"ce. The content of this material does not necessarily reflect the position or

the policy of the Government and no o"cial endorsement should be inferred.

References

1. Cyril Allauzen, Mehryar Mohri, and Murat Saraclar. General Indexation ofWeighted Automata – Application to Spoken Utterance Retrieval. In Proceedingsof the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval(HLT/NAACL 2004), pages 33–40, Boston, Massachusetts, May 2004.

2. A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnell. Completeinverted files for e"cient text retrieval and analysis. Journal of the ACM, 34:578–589, 1987.

3. A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M.T. Chen, and J. Seiferas.The smallest automaton recognizing the subwords of a text. Theoretical ComputerScience, 40:31–55, 1985.

4. M. Crochemore. Transducers and repetitions. Theoretical Computer Science, 45:63–86, 1986.

5. M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific, 2002.6. Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University

Press, Cambridge, UK., 1997.7. M. Mohri. Finite-state transducers in language and speech processing. Computa-

tional Linguistics, 23(2):269–311, 1997.8. M. Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied

Combinatorics on Words. Cambridge University Press, 2005.9. E. Weinstein and P. Moreno. Music Identification with Weighted Finite-State Trans-

ducers. In Proceedings of ICASSP 2007, Honolulu, Hawaii, 2007.

|F (A)|E ! 2.1|A|E

[CIAA ’07, ISMIR ’07]

Music ID Experiments

• For 15,000+ songs, transcription set is 45-suffix unique

• Number of “collisions” among song suffixes/factors drops off rapidly with increasing length

0

1e+07

2e+07

3e+07

4e+07

5e+07

6e+07

0 2000 4000 6000 8000 10000 12000 14000 16000

Siz

e

# Songs

# States factor# Arcs factor

# States/Arcs Non-factor

0

2000

4000

6000

8000

10000

12000

14000

16000

0 5 10 15 20 25 30 35 40 45

No

n-u

niq

ue s

ongs

k (suffix length)

(a) (b)

Fig. 6. (a) Comparison of automaton sizes for di!erent numbers of songs.“#States/Arcs Non-factor” is the size of the automaton A accepting the entire songtranscriptions. “# States factor” and “# Arcs factor” is the number of states andtransitions in the weighted factor acceptor Fw(A), respectively. (b) Number of stringsin U for which the su"x of length k is also a su"x of another string in U .

Chandler Street, Fort Detrick MD 21702-5014 is the awarding and administering ac-

quisition o"ce. The content of this material does not necessarily reflect the position or

the policy of the Government and no o"cial endorsement should be inferred.

References

1. Cyril Allauzen, Mehryar Mohri, and Murat Saraclar. General Indexation ofWeighted Automata – Application to Spoken Utterance Retrieval. In Proceedingsof the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval(HLT/NAACL 2004), pages 33–40, Boston, Massachusetts, May 2004.

2. A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnell. Completeinverted files for e"cient text retrieval and analysis. Journal of the ACM, 34:578–589, 1987.

3. A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M.T. Chen, and J. Seiferas.The smallest automaton recognizing the subwords of a text. Theoretical ComputerScience, 40:31–55, 1985.

4. M. Crochemore. Transducers and repetitions. Theoretical Computer Science, 45:63–86, 1986.

5. M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific, 2002.6. Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University

Press, Cambridge, UK., 1997.7. M. Mohri. Finite-state transducers in language and speech processing. Computa-

tional Linguistics, 23(2):269–311, 1997.8. M. Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied

Combinatorics on Words. Cambridge University Press, 2005.9. E. Weinstein and P. Moreno. Music Identification with Weighted Finite-State Trans-

ducers. In Proceedings of ICASSP 2007, Honolulu, Hawaii, 2007.

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

0 20 40 60 80 100 120

No

n-u

niq

ue

Fa

cto

rs

Factor Length

Figure 3. Number of factors occurring in more than one song in S for different factor lengths.

dorsement should be inferred.

7 REFERENCES

[1] E. Batlle, J. Masip, and E. Guaus. Automatic songidentification in noisy broadcast audio. In IASTEDInternational Conference on Signal and Image Pro-cessing, Kauai, Hawaii, 2002.

[2] P. Cano, E. Batlle, T. Kalker, and J. Haitsma. A re-view of audio fingerprinting. Journal of VLSI SignalProcessing Systems, 41:271–284, 2005.

[3] C. Cortes and V. Vapnik. Support-vector networks.Machine Learning, 20(3):273–297, 1995.

[4] M. Covell and S. Baluja. Audio fingerprinting:Combining computer vision & data stream process-ing. In International Conference on Acoustics,Speech, and Signal Processing (ICASSP), Honolulu,Hawaii, 2007.

[5] J. Haitsma, T. Kalker, and J. Oostveen. Robust au-dio hashing for content identification. In Content-Based Multimedia Indexing (CBMI), Brescia, Italy,September 2001.

[6] Y. Ke, D. Hoiem, and R. Sukthankar. Computer vi-sion for music identification. In IEEE Computer So-ciety Conference on Computer Vision and PatternRecognition (CVPR), pages 597–604, San Diego,June 2005.

[7] M.Bacchiani and M. Ostendorf. Joint lexicon,acoustic unit inventory and model design. SpeechCommunication, 29:99–114, November 1999.

[8] M. Mohri. Finite-state transducers in languageand speech processing. Computational Linguistics,23(2):269–311, 1997.

[9] M. Mohri. Statistical Natural Language Processing.In M. Lothaire, editor, Applied Combinatorics onWords. Cambridge University Press, 2005.

[10] M. Mohri, F. C. N. Pereira, and M. Riley.Weighted Finite-State Transducers in Speech Recog-nition. Computer Speech and Language, 16(1):69–88, 2002.

[11] Mehryar Mohri, Pedro Moreno, and Eugene Wein-stein. Factor automata of automata and applications.submitted, 2007.

[12] A. Park and T.J. Hazen. ASR dependent techniquesfor speaker identification. In International Confer-ence on Spoken Language Processing (ICSLP), Den-ver, Colorado, September 2002.

[13] D. Pye. Content-based methods for the managementof digital music. In ICASSP, pages 2437–2440, Is-tanbul, Turkey, June 2000.

[14] A. L. Wang. An industrial-strength audio search al-gorithm. In International Conference on Music In-formation Retrieval (ISMIR), Washington, DC, Oc-tober 2003.

[15] E. Weinstein and P. Moreno. Music identificationwith weighted finite-state transducers. In Interna-tional Conference on Acoustics, Speech, and SignalProcessing (ICASSP), Honolulu, Hawaii, 2007.

31

Automata Summary

• We have addressed the size of a factor automaton of a set of strings, or more generally of another automaton

• We have proven substantially better size bounds

• This suggests factor automata are useful for indexing potentially very large sets of strings

• Our conclusions are verified experimentally in our music identification system

32

Future/Ongoing Work

• More experiments: test accuracy in presence of different kinds of noise, distortions

• Analyze song structure

• Find repeated phone sequences: chorus detection, etc.

• Find common sequences between songs

• Work on an on-line linear time algorithm for suffix/factor automaton construction

• Do a finer theoretical analysis

• Get rid of the term in the -suffix unique bound kn k

33

References• E. Weinstein and P. Moreno. Music Identification with Weighted Finite-State Transducers. In International Conference on Acoustics,

Speech, and Signal Processing (ICASSP), Honolulu, Hawaii, 2007.• M. Mohri, P. Moreno, and E. Weinstein. Factor Automata of Automata and Applications. To appear at the International Conference on

Implementation and Application of Automata (CIAA), July 2007, Prague, Czech Republic.• M. Mohri, P. Moreno, and E. Weinstein. Music identification, detection, and analysis in adverse conditions. To appear at the International

Conference on Music Information Retrieval (ISMIR), September 2007, Vienna, Austria.

• M. Bacchiani and M. Ostendorf. Joint lexicon, acoustic unit inventory and model design. Speech Communication, 29:99–114, November 1999.

• E. Batlle, J. Masip, and E. Guaus. Automatic song identification in noisy broadcast audio. In IASTED International Conference on Signal and Image Processing, Kauai, Hawaii, 2002.

• A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnell. Complete inverted files for efficient text retrieval and analysis. Journal of the ACM, 34:578–589, 1987.

• A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M.T. Chen, and J. Seiferas. The smallest automaton recognizing the subwords of a text. Theoretical Computer Science, 40:31–55, 1985.

• M. Crochemore. Transducers and repetitions. Theoretical Computer Science, 45(1):63–86, 1986.• M. Fink, M. Covell, and S. Baluja. Social and interactive television application based on real time ambient audio identification. EuroITV

2006, May 2006.• J. Haitsma, T. Kalker, and J. Oostveen. Robust audio hashing for content identification. In Content-Based Multimedia Indexing (CBMI),

Brescia, Italy, September 2001.• M. Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269–311, 1997.• M. Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press,

2005.• M. Mohri, F. C. N. Pereira, and M. Riley. Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language, 16(1):

69–88, 2002.

34

The EndThank You!

35

Documents

Large-scale Music Identiﬁcation – Algorithms and Applicationseugenew/publications/dqe-research.pdf · 2008-11-27 · Introduction • Music identiﬁcation scenario: • Match