73
Spring-2008 MMDB-Audio 1 Audio Databases

Spring-2008MMDB-Audio 1 Audio Databases. Spring-2008 MMDB-Audio 2 Metadata Using metadata to represent audio content is done in a very similar way as

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Spring-2008 MMDB-Audio 1

Audio Databases

Spring-2008 MMDB-Audio 2

Metadata

Using metadata to represent audio content is done in a very similar way as we did for video.

The metadata used to represent audio content may be viewed as a set of objects spread out cover a time line.

We may index the metadata associated with audio in exactly the same way as we indexed video, and the same query-processing techniques may be used over again.

Spring-2008 MMDB-Audio 3

Example: The following figure shows the line segments

associated with part of an opera. Activity1 may be Act 1 of the opera, activity2 may

be Act 1, Scene 1, and so on.

Spring-2008 MMDB-Audio 4

Example: (conti.) Each activity may have an associated set of fields.

Singers: It may be a set valued field containing records having a Role, SingerType and SingerName. If the triple (Lohengrin, Tenor, Rene Kollo) appears in the segment [50, 100), Rene Kollo, a tenor, is singing the role of Lohengrin during the time segment [50, 100) of the opera.

Score: It may be a field of type music_doc which points to a relevant part of the music score associated with the time segment [50, 100).

Transcript: It may be a field of type document that points to the relevant part of the libretto during the time segment [50, 100).

Spring-2008 MMDB-Audio 5

Signal-Based Audio Content In some applications, creation of metadata is

somewhat complex, speaker unknown or content unclear.

Audio data is considered as a signal, (x), over time x.

Different features of the signal are extracted, indexed and stored for efficient retrieval.

Metadata may still be used to complement the signal data.

Spring-2008 MMDB-Audio 6

Sample Audio Signals

Spring-2008 MMDB-Audio 7

Signal Period of vibration, T = time taken for a “particle”

in the wave to return to its starting position, ex. from point A to point B.

Frequency of vibration, f = number of vibrations per second. f = 1/T.

Velocity, v = the speed of the crests and troughs move to the right. v= w/T = w f, where w denotes the wavelength of the wave.

Amplitude, a = the maximum intensity of the signal associated with the wave.

Spring-2008 MMDB-Audio 8

Indexing by Segmentation Split up the audio signal into relatively homogeneous

“windows.” This may be done in one of two ways:

Application developer can specify, a priori, a window size w (in sec. or min.), and assume that the wave’s properties within that window are obtained by averaging.

Use a homogeneity predicate as in the case of images, except that this homogeneity predicate applies to the one-dimensional case..

Spring-2008 MMDB-Audio 9

Windowing Using audio signalThe following figure shows a nonhomogeneous audio signal. After split into five windows, each window is homogeneous in the sense that it has a constant amplitude, wavelength, and wave velocity.

Spring-2008 MMDB-Audio 10

Indexing Using Feature Extraction After segmentation, the audio signal may be viewe

d as a sequence of n windows, w1, …, wn.

For each window, we extract some features associated with the audio signal.

If k features are extracted, then an audio signal may be considered to be a sequence of n points in a k-dimensional space.

Spring-2008 MMDB-Audio 11

Example Features Intensity(I): the power of the signal generated by

the wave (in Watts per square meters).

Where is the density of the material through which the sound is being propagated.

Loudness(L):

Where L0 denotes the loudness with the lowest frequency (about 15Hz) that a human ear can detect.

vafI 2222

)log(100L

IL

Spring-2008 MMDB-Audio 12

Content IndexIn general, to index the content of an audio signal, we

proceed with the following two step:

1. Find a set w1, …, wn of window segments.

2. For each window wi, store a vector consisting of K acoustical attributes.

An audio database may be viewed as a set of (K+3)-tuples consisting of the audio source (audio file), the window (within that audio file), the duration of the window, and the K feature values associated with that window.

A k-d tree can be used to index audio data.

Spring-2008 MMDB-Audio 13

Content-based Retrieval for Music Databases

Spring-2008 MMDB-Audio 14

Introduction

The management of large collections of music data in a multimedia database has received much attention in the past few years.

For music content-based retrieval, we can extract the features, such as melodies, rhythms and chords, from the music data and develop indices that will help to retrieve the relevant music data quickly.

Spring-2008 MMDB-Audio 15

Music Feature string

Ex: “ sol-do-re-mi-mi-mi-mi-re-mi-do-do”

Melody feature string:eabccccbaa

Rhythm string:1-1-1-2-2-1-1-1-1-2-2

Music feature sting:e1a1b1c2c2c1c1b1a2a2

A sample of “You Are My Sunshine”

Spring-2008 MMDB-Audio 16

Features of Music Data

Coding scheme: a music object a sequence of music segments music segment = (segment type, segment duration,

segment pitch)

four segment types: ┌┐(type A), └┘(type B), ┌┘(type C), and └┐(type D)

Spring-2008 MMDB-Audio 17

Features of Music Data

For example,

the sequence of music segments: (B,3,-3) (A,1,+1) (D,3,-3) (B,1,-2) (C,1,+2) (C,1,+2) (C,1,+1)

Spring-2008 MMDB-Audio 18

music segment = (type, duration, pitch)

note number

beat

60

62

65

64

67

(B, 3, -3)

(A, 1, +1)

(D, 3, -3)

(B, 1, -2)

(C, 1, +2)

(C, 1, +2)

(C, 1, +1)

Spring-2008 MMDB-Audio 19

Music Data Retrieval: System Architecture

Spring-2008 MMDB-Audio 20

Indexing

String Indexing for music data Suffix tree

Numeric Indexing for music data R-tree

Spring-2008 MMDB-Audio 21

Suffix tree

A suffix tree is an index structure that has been proposed to locate strings that are exactly matched to a target string.

No two edges out of a node can have edge-labels beginning with the same character.

For any leaf i, the concatenation of the edge-labels on the path from the root to leaf i exactly spells out the suffix of string that starts at position i.

Spring-2008 MMDB-Audio 22

1

ababc

1

ababc

2

babc

Ex:ababc {ababc,babc,abc,bc,c}

ab2

babc

1

abc c

3

b

1

ab

c

3

abc

4

abc

2

c

1 4

ab b

c

3

abc abc

2

c5

c

Spring-2008 MMDB-Audio 23

Ex:”Do Re Do Re Mi”

→ababc

5

1 3

4 2

ab

a

a

c

cb

c

Spring-2008 MMDB-Audio 24

Numeric Mapping

Numeric Mapping Function

v(m):the integer value of segment of m adjacent notes m: adjacent notes from melody feature string P(xi):the integer value of each note 1 i m

1

1

im

ii n)x(P)m(v

Spring-2008 MMDB-Audio 25

Numeric Mapping (Con.)

For example: A music feature string denoted by ‘bcdbc’ , n=10, m=4

b c d b c

100 101 102 103=1321

1 2 3 1

b c d b

100 101 102 103=2132

2 3 1 2

c d b c

Spring-2008 MMDB-Audio 26

Example:

two tigers (S1: Do Re Mi Do Do Re Mi Do)

Melody Music Segment V(4) Integer value

abcaabca abca 0*100+1*101+2*102+0*103

210

bcaa 1*100+2*101+0*102+0*103

21

caab 2*100+0*101+0*102+1*103

1002

aabc 0*100+0*101+1*102+2*103

2100

abca 0*100+1*101+2*102+0*103

210

The integer value of music of two tigers.

Spring-2008 MMDB-Audio 27

Numeric Indexing Structure (R-Tree)

(21,2110)

21 210 1002 2110

s1 2

NULL

s1 1 s1 3

NULL

s1 4

NULL

s1 5

NULL

Non-leaf Node

Leaf Node

Link List

Spring-2008 MMDB-Audio 28

Pitch Change

abca→bcdb─ 》 1,1,-2

m: adjacent notes from melody feature string Adj: the maximum value of distance of two pitches D: the total number of distances of pitches

11

1

1

xm

x

D)Adj)x(P)x(P()m(c

Spring-2008 MMDB-Audio 29

Example: “abcaabca”Suppose: m=10, Adj=9, D=19

Music Segment

value V(4) Integer value

0120 10,10, 7 10*190+10*191+7*192 2727

1200 10, 7, 9 10*190+7*191+9*192 3392

2001 7,9,10 7*190+9*191+10*192 3778

0012 9,10, 10 9*190+10*191+10*192 3809

0120 10,10, 7 10*190+10*191+7*192 2727

Spring-2008 MMDB-Audio 30

Numeric Index

2726,2727) 3392,3809)

2726 2727 3392 37798 38096

s2 2

NULL

s1 1 s1 2 s1 3 s1 4

NULL

s1 5

NULL

s2 3

NULL

s2 1

NULL

Spring-2008 MMDB-Audio 31

Searching in Numeric Index

Exact Matching

For example: Music query segment is ‘ccdbb’

{ccdbb}→{ccdb}

→{cdbb}

V(4) {1322}V(4) {1132}

Spring-2008 MMDB-Audio 32

21,1002) 1132,1322) 2110,3224) 

21 210 1002 1132 1321 1322 2110 2132 3224

s1 2

NULL

s1 1 s1 3

NULL

s2 3 s3 3

NULL

s2 2 s1 4

NULL

s3 2

NULL

s2 1

NULL

s3 1

NULL

s3 4

NULL

s1 5

NULL

1132

s2 3

s3 4

NULL

1322

s2 2

s3 1

NULL

Non-leaf Node

Leaf Node

Link List

1132,1322)

{s2,s3} {s2,s3}→ {s2,s3}position_s2 2,3),position_s3 1,4) →s2.

Spring-2008 MMDB-Audio 33

Approximate Searching

<= hn0

<= hn1 a multiple of n1

<= hn2 a multiple of n2

<= hnm-1 a multiple of nm-1

• n: the number of pitches

• m: adjacent notes from melody feature string

• h: the distance of two pitches

We can examine the difference between the transformed value of the query string and existing data.

Spring-2008 MMDB-Audio 34

Example:

1) <= 1100

2) <= 1101 a multiple of 101

3) <= 1102 a multiple of 102

4) <= 1103 a multiple of 103

Approximate matching conditions for m=4, n=10,h=1

Ex: b b c d a b c d 1 1 2 3 0 1 2 3 3 2 1 1 3 2 1 0

Spring-2008 MMDB-Audio 35

Multi-Feature indexing

Combine Suffix tree Independent Suffix tree Twin Suffix tree Grid-Twin Suffix tree Numeric Index Hybrid Multi-feature Index

Spring-2008 MMDB-Audio 36

Combine Suffix Tree

Ex:”a1a2b1→{12,7}”

“121→{12,7,1,6…}”

The feature strings are directly used to construct the index in the index structure Combined Suffix Tree.

Spring-2008 MMDB-Audio 37

Independent Suffix Tree

constructed from “a1b2a1b2c2”

(Melody:ababc) (Rhythm:12122)

The Independent Suffix Trees separates the feature strings into a melody and a rhythm string and stores them in two independent suffix trees.

Spring-2008 MMDB-Audio 39

Twin Suffix Tree

The Twin Suffix Tree constructed from “a1b2a2b1a2b2c2”

Spring-2008 MMDB-Audio 41

Grid-Twin Suffix Tree

”a1b2a2c1a

3”

Spring-2008 MMDB-Audio 42

Condensed Grid-Twin Suffix Tree

Spring-2008 MMDB-Audio 43

Condensed Grid-Twin Suffix Tree “abaca”

“caaca” aa ab ac

ba bb bc

ca cb cc

座標資料

ID歌曲

座標資料

ID歌曲

a

c

a

c

a

座標資料

ID歌曲

座標資料

ID歌曲

座標資料

ID歌曲

座標資料

ID歌曲

entryMusic ID

entry

entry

entry

entry

entryMusic ID

Music ID

Music ID

Music ID

Music ID

Spring-2008 MMDB-Audio 44

Multi-Feature Numeric Indexing for Music Data

500

1000

rhythm

0 500 1000

Melody:“a1b1c1a1”

melody

Spring-2008 MMDB-Audio 45

Multi-Feature Numeric Indexing for Music Data

Non- Leaf Node

Leaf Node

Link List

Spring-2008 MMDB-Audio 46

Multi-Feature Numeric Indexing for Music Data

melody

rhythmchord

500

500

Spring-2008 MMDB-Audio 47

Hybrid Multi-Feature Index

Using a multi-feature tree structure instead of grid structure in GTST.

(2, 3)

(3, 5)

(6, 2)

(5, 5)

(4, 3.75)

(1, 1)

(1.5, 2 )

Spring-2008 MMDB-Audio 48

Suffix Trees with Bit Arrays

Instead of the links between corresponding feature nodes in Twin Suffix Tree, the bit arrays are created to indicate the relationships between suffix trees.

Spring-2008 MMDB-Audio 49

Feature Extraction of Music Data

We can find some sequence of notes appeared more than one time in a music object, which are called the repeating patterns.

A lot of researches in musicology and music psychology consent that the repeating pattern is one of general features in music structure modeling.

Spring-2008 MMDB-Audio 50

Repeating Patterns of Music Data

Repeating patterns: In string S, there is a sub-string appearing more than once and its length being equal to or greater than 2 .

Non-trivial repeating patterns: The frequency of the repeating pattern X appearing in the string S is more than it is appearing in any other repeating patterns.

Fault tolerant non-trivial repeating patterns: It allows the sequences with partial different notes being as in the same non-trivial repeating pattern.

Spring-2008 MMDB-Audio 51

Example:

RP C-D-E-F C-D-E D-E-F C-D D-E

Freq 2 3 2 3 3

RP E-F C D E F

Freq 2 3 3 3 2

non-trivial:freq(“C-D-E-F”) = freq(“D-E-F”) = freq(“E-F”) = freq(“F”) =2freq(“C-D-E”) = freq(“C-D”) = freq(“D-E”) = freq(“C”) =freq(“D”) = freq(“E”) = 3.===>only “C-D-E-F” and “C-D-E” are non-trivial.

Consider the melody string “C-D-E-F-C-D-E-C-D-E-F”, this melody string has ten repeating patterns

Spring-2008 MMDB-Audio 52

Music Feature Extractions

Correlative Matrix FastPET RP-Tree   2RC Similar Non-trivial Repeating Pattern Fault Tolerance Non-trivial Repeating Patterns

Spring-2008 MMDB-Audio 53

CORRELATIVE MATRIX

The correlative matrix of the string S=“CAACCAACDCBC"

There are four cases to set CS :1.Ti,j=1 and T(i+1),(j+1) = 0 T1,4 =1 and T2,5=0 ---> insert CS=("C",1,0)

2.Ti,j=1 and T(i+1),(j+1) ≠ 0 T1,5 =1 and T2,6≠0 ---> modify to CS=("C",2,1)

3.Ti,j>1 and T(i+1),(j+1) ≠ 0 T2,6 =2 and T3,7≠0 --> insert CS=("CA",1,1),("A",1,1) 4.Ti,j>1 and T(i+1),(j+1) = 0 T4,8 =4 and T5,9=0 ---> insert CS=("CAAC",1,0),("AAC",1,1),("AC",1,1) change ("C",6,1) into ("C",7,2)

CS=candidate set ==> CS(pattern,rep_count,sub_count)

-C

-B

1-C

-D

11-C

-A

1-A

111-C

1141-C

31-A

121-A

11111-C

CBCDCAACCAAC

Spring-2008 MMDB-Audio 54

CORRELATIVE MATRIX (cont.)

There are two more tasks we have to do :1.If a repeating pattern is a substring of another repeating

pattern, and their repeating are the same, it will be removed from the candidate set CS.

EX:("CA",1,1),("CAA",1,1),("AA",1,1),("AAC",1,1) and

("AC",1,1) are be moved

since they are all the substring of the repeating pattern

("CAAC",1,0)

2.We should calculate the real repeating frequency for every repeating pattern found.

EX: "C" =

rep_count=2

)1( ff

2

_*811 countrepf =

62

15*811

Spring-2008 MMDB-Audio 55

RP-TREE

The RP-tree for the music feature string S=“ABCDEFGHABCDEFGHIJABC”

{ABCDEFGH,2,(1,9)}

{ABCD,2,(1,9)} {BCDE,2,(2,10)} {CDEF,2,(3,11)} {DEFG,2,(4,12)} {EFGH,2,(5,13)}

{AB,3,(1,9,19)} {BC,3,(2,10,20)} {CD,2,(3,11)} {DE,2,(4,12)} {EF,2,(5,13)} {FG,2,(6,14)} {GH,2,(7,15)}

{A,3,(1,9,19)} {B,3,(2,10,20)} {C,3,(3,11,21)} {D,2,(4,12)} {E,2,(5,13)} {F,2,(6,14)} {G,2,(7,15)} {H,2,(8,16)}

Spring-2008 MMDB-Audio 56

RP-TREE (cont.)

{ABCDEFGH,2,(1,9)}

{AB,3,(1,9,19)} {BC,3,(2,10,20)}

(a)

{AB,3,(1,9,19)} {BC,3,(2,10,20)}

{ABCDEFGH,2,(1,9)}

{ABC,3,(1,9,19)}

(b)

{ABCDEFGH,2,(1,9)}

{ABC,3,(1,9,19)}

(c)

Spring-2008 MMDB-Audio 57

FastPET: Fast Pattern Extracting Technique

1M M else

1 M then 1 i if , S SFor

0 M , S SFor

1)-(j1),-(iji,

j1,ji

ji,ji

1M M else

1 M then 1 i if , S SFor

0 M , S SFor

1)-(j1),-(iji,

j1,ji

ji,ji

  a b c d b c d a b c a b c da -             1     1      b   -     1       2     2    c     -     2       3     3  d       -     3             4b         -       1     1    c           -       2     2  d             -             3a               -     1      b                 -     2    c                   -     3  a                     -      b                       -    c                         -  d                           -

Correlative Matrix for “abcdbcdabcabcd”

i

j

Spring-2008 MMDB-Audio 58

FastPET (cont.)

  a b c d b c d a b c a b c d

a -             1     1    

b   -     1       2     2    

c     -     2       3     3

d       -     3             4

..                            

…                            

d                           -

-1 -1

1 2 3 4 5 6 7 8 9 10 11 12 13 14

‘abc’

P[8] = {3},P[11] = {3}PatternSet = {{‘abc’,3}}

Spring-2008 MMDB-Audio 59

FastPET (cont.)

  a b c d b c d a b c a b c d

a -             -1     -1      

b   -     1       2     2    

c     -     2       3     3  

d       -     3             4

b         -       1     1    

…                            

…                            

d                           -

1 2 3 4 5 6 7 8 9 10 11 12 13 14

P[8] = {3},P[11] = {3, 4}PatternSet = {{‘abc’,3},{‘abcd’, 2}}

Spring-2008 MMDB-Audio 60

FastPET (cont.)

iNon-trivial

repeating patternbc abc bcd abcd

Frequency 4 3 3 2

Pattern Length 2 3 3 4

Starting position 2,5,9,12 1,8,11 2,5,12 1,11

P[5] = {3}, P[8] = {3},P[9] = {2}, P[11] = {3, 4},P[12] = {3}PatternSet = {{‘bc’, 4}, {‘abc’,3}, {‘bcd’, 3}, {‘abcd’, 2}}

j 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Non-trivial RP for ’abcdbcdabcabcd’

Spring-2008 MMDB-Audio 61

2RC (Two-Row Comparsion)

a b c d b c d a b c a b c d

a 1 1 1

2RC can provide memory saving, O(n).

Example : S=“abcdbcdabcabcd”

Row A

i=1i=1 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Spring-2008 MMDB-Audio 62

2RC (cont.)

a b c d b c d a b c a b c d

a 1 1 1

b 2 1 2 2

Row A

Row B

1 2 3 4 5 6 7 8 9 10 11 12 13 14

i=2i=2

1 else

1 then 1 if , For

0 , For

j-1j

jji

jji

AB

BjS S

BSS

Spring-2008 MMDB-Audio 63

a b c d b c d a b c a b c d

b 2 1 2 2

c 3 2 3 3

Row A

Row B

1 2 3 4 5 6 7 8 9 10 11 12 13 14i=3i=3

2RC (cont.)

Spring-2008 MMDB-Audio 64

a b c d b c d a b c a b c d

c 3 2 3 3

d 4 3 4

Row A

Row B

1 2 3 4 5 6 7 8 9 10 11 12 13 14

PatternSet={{“abc”,3}}

i=4i=4

2RC (cont.)

Spring-2008 MMDB-Audio 65

True suffix tree approach for non-trivial repeating pattern discovering (TRP)

Step 1. constructing suffix tree by adding a stop symbol ‘#’ into the tail of string S.

Step 2. finding out repeating patterns.

Step 3. pattern sweeping.

Spring-2008 MMDB-Audio 66

Example 1 - Step 1 of TRP

3

root

2 bcdabcabcd#

12 #

5

abcabcd#

d

9

abcd#

bc

6

abcabcd#

bcdabcabcd#

13

#

d

10

abcd#

c

4

bcdabcabcd#

7

abcabcd#

14

#

d

8 1

bcdabcabcd#

11#

abcd#

dabc

True suffix tree of S=“abcdbcdabcabcd#”.

Spring-2008 MMDB-Audio 67

Example 1 - Step 2 of TRP

Repeating patterns abcd abc bcd cd bc

Frequency 2 3 3 3 4

Pattern Length 4 3 3 2 2

Starting position 1,11 1,8,11 2,5,12 3,6,13 2,5,9,12

All repeating patterns of music object S – “abcdbcdabcabcd”.

Spring-2008 MMDB-Audio 68

Example 1 - Step 3 of TRP

Repeating pattern abcd abc bcd cd bc

Pattern Length 4 3 3 2 2

Starting position 1,11 1,8,11 2,5,12 3,6,13 2,5,9,12

Ending positions 4, 14 3, 10, 13 4, 7, 14 4, 7, 14 3, 6, 10, 13

Scopes of repeating pattern

1~411~14

1~38~1011~13

2~45~7

12~14

3~46~7

13~14

2~35~6

9~1012~13

Pattern sweeping for music object S – “abcdbcdabcabcd”.

Non-trivial repeating patterns

Spring-2008 MMDB-Audio 69

Example 2 - TRP

Repeating patterns Length Frequency Scope

aa 2 9 1~2

aaa 3 8 1~3

aaaa 4 7 1~4

aaaaa 5 6 1~5

aaaaaa 6 5 1~6

aaaaaaa 7 4 1~7

aaaaaaaa 8 3 1~8

aaaaaaaaa 9 2 1~9

Non-trivial repeating pattern

Pattern sweeping for repeating patterns of S = “aaaaaaaaaa”.

Spring-2008 MMDB-Audio 70

Fault Tolerant Non-trivial Repeating Pattern Discovering

Step 1. Constructing Suffix Tree

Step 2. Creating Repeating Pattern Table

Step 3. Greedy Concatenating Repeating Patterns

Step 4. Exacting Fault Tolerant Non-trivial Repeating Patterns

Spring-2008 MMDB-Audio 71

Step 2 of FTRP

RP Table

RP abcbc bcbc cbc

Length 5 4 3

Start 1,6 2,7 3,8

End1+5-1=5

6+5-1=10

2+4-1=5

7+4-1=10

3+3-1=5

8+3-1=10

Scope1~5

6~10

2~5

7~10

3~5

8~10

Creating Repeating Pattern Table

Spring-2008 MMDB-Audio 72

Step 3 of FTRP

Greedy Concatenating Repeating Patterns

Position 1 2 3 4 5 6 7 8 9 10 11 12 13

Note b c f d a e h g b c d a e

RP

fault 1 fault 0

bc?dae

RP RP RP

RP bc dae

Scope1~2, 9~10

4~6,

11~13

Spring-2008 MMDB-Audio 73

Step 4 of FTRP

FTRP bc?dae

Scope 1~6 9~13

RP bc dae

Scope 1~2,9~10 4~6,11~13

“bc” and “dae” are all in “bc?dae”

Exacting Fault Tolerant Non-trivial Repeating Patterns

Spring-2008 MMDB-Audio 74

Performance Study

The Effect on Repeating Pattern Found

0

20

40

60

80

100

0 1 2 3

no. of fault tolerant notes allowed

avg.

no.

of

RP

s--

Spring-2008 MMDB-Audio 75

Hit Ratio Improvement