36
Rainer Typke, Panos Giannopoulos, Remco C. Veltkamp, Utrecht University, Institute of Information and Computing Sciences Frans Wiering, Ren´ e van Oostrum Center for Geometry, Imaging and Virtual Environments Using Transportation Distances for Measuring Melodic Similarity

Using Transportation Distances for Measuring Melodic ...ismir2003.ismir.net/presentations/Typke.pdf · In a database containing 476,000 melodies, the top 17 matches contain 12 out

  • Upload
    vothien

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Rainer Typke, Panos Giannopoulos, Remco C. Veltkamp,

Utrecht University, Institute of Information and Computing Sciences

Frans Wiering, Rene van Oostrum

Center for Geometry, Imaging and Virtual Environments

Using Transportation Distances

for Measuring Melodic Similarity

Using transportation distances such as the Earth Mover’s Distance (EMD) forcomparing symbolic music notation seems to be a good idea:

• Good matching results

• Efficient search possible (e. g. with Proportional Transportation Distance)

• Polyphonic searches in polyphonic music pose no additional complicationsin comparison to monophonic matching

• Can be easily adjusted to different purposes by modifying weighting schemeand ground distance

• Opens up interesting possibilities (e. g., QBH without separate note onsetdetection step)

The EMD has been used for comparing audio, but we are not aware of previouswork involving the comparison of notated music.

c© 2003 Rainer Typke

Why transportation distances are promising

1

In a database containing 476,000 melodies, the top 17 matches contain 12 outof 15 known occurrences of the query, “Roslin Castle”.

c© 2003 Rainer Typke

Good matching results

Distance measure: EMDWeights: Duration only

2

Comparison with earlier work involving the RISM A/II data

Grouping occurrences of “Roslin Castle” together

• Howard (1998) encoded the RISM A/II collection in the DARMS formatand tried various sorting methods. None of his methods sorted more than46 % of the known occurrences together.

• We were able to group 73 % together.

Identifying Anonymous Pieces

• Schlichte (1990) was able to identify 2.08 % of anonymous pieces in theRISM A/II collection by looking for identical Plaine & Easie encodings.

• We compared about 80,000 anonymous incipits to all 476,000 pieces inour database and could identify 3.9 %.

c© 2003 Rainer Typke3

c© 2003 Rainer Typke4

Query: Anonymus: Andante. Keyboard piece without title in a collection ofmanuscripts with piano pieces by Clementi, J. Chr. Bach, Boccherini, and Pleyel,Musikbibliothek, Kloster Einsiedeln, Switzerland.

Match: I. Umlauff: Singspiel “Das Irrlicht”, Basso: “Zu Steffen sprach imTraume” (Due corni, due fagotti, due violini, due viole e basso), manuscript inValdemars Slot, Tasinge, Denmark

Match: Ignaz Umlauff/ A. F. J. Eberl/W. A. Mozart: Ariette variee for piano.Excerpt from “Irrlicht”, “Zu Steffen sprach im Traume.” Manuscripts in Bresciaand Dubrovnik.

17,895 more examples on http://give-lab.cs.uu.nl/MIR/anon/idx.html

Example of an identified piece

time

pitch

1.5

0.5 1.5 0.5 0.50.5

12

c© 2003 Rainer Typke

• Coordinates represent note onset time and pitch.

• Weights should reflect the notes’ importance. So far, we used mainlythe duration, but other aspects can also be reflected in the weights.

Representing Melodies as Weighted Point Sets

5

c© 2003 Rainer Typke

• These melodies differ only in the measure structure.

• By adding a weight component for emphasized notes in every bar, themeasure structure can be taken into account and a distance > 0 canbe achieved for cases like this.

Example for Weight Components: Stress Weight

6

Jean-Baptiste Lully: La Grotte de Versailles

Anonymus: Litanies (Coro, without title)

c© 2003 Rainer Typke

Measures the minimum amount of work needed to transform one weightedpoint set into the other by moving weight.

The set of all possible flows is defined by these constraints:

• no negative flow component.

• no point emits or receives more weight than it has.

• the lighter of the two point sets is completely matched.

The EMD is the weighted sum of the optimum flow components’ distances,divided by the matched weight:

EMD(A,B) =minF∈F

∑m

i=1

∑n

j=1 fijdij

min(W,U)

The Earth Mover’s Distance

7

c© 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle

Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

8

c© 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle

Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5

0.50.5

0.5

0.50.5

0.50.5 time

pitch

8

blue: Lauda Sion.

c© 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle

Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5

0.50.5

0.5

0.50.5

0.50.5 time

pitch

0.50.5

1

0.50.5

10.5

0.50.5

0.50.5

0.51

0.50.5

1

8

red/gray: Roslin Castle. Un-matched points or parts thereofare shown in gray.

blue: Lauda Sion.

c© 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle

Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5

0.50.5

0.5

0.50.5

0.50.5 time

pitch

0.50.5

1

0.50.5

10.5

0.50.5

0.50.5

0.51

0.50.5

1

time

pitch

8

red/gray: Roslin Castle. Un-matched points or parts thereofare shown in gray.

blue: Lauda Sion.

c© 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle

Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5

0.50.5

0.5

0.50.5

0.50.5 time

pitch

0.50.5

1

0.50.5

10.5

0.50.5

0.50.5

0.51

0.50.5

1

time

pitch

0.5

0.5 0.5

0.50.5

0.5

0.5

0.5

8

red/gray: Roslin Castle. Un-matched points or parts thereofare shown in gray.

blue: Lauda Sion.

Weights are shown as black num-bers, flows as green numbers.

c© 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle

Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5

0.50.5

0.5

0.50.5

0.50.5 time

pitch

0.50.5

1

0.50.5

10.5

0.50.5

0.50.5

0.51

0.50.5

1

time

pitch

0.5

0.5 0.5

0.50.5

0.5

0.5

0.5

8

i j fij dij

1 1 0.5 05 2 0.5 1.58 3 0.5 1.59 4 0.5 011 5 0.5 012 6 0.5 1.515 7 0.5 1.78916 8 0.5 0

EMD = 3(0.5·1.5)+0.5·1.7898·0.5 = 0.79

c© 2003 Rainer Typke

Properties of the EMD

• The EMD is continuous.

• For unequal weight sums, it does not have the positivity property. I. e.,partial matching is possible, and there are cases where the EMD does notdistinguish between different pairs of non-identical sets.

• For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

9

1.5 time

pitch

1.5

0.5 1.5

A

c© 2003 Rainer Typke

Properties of the EMD

• The EMD is continuous.

• For unequal weight sums, it does not have the positivity property. I. e.,partial matching is possible, and there are cases where the EMD does notdistinguish between different pairs of non-identical sets.

• For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

time

0.50.5

0.51

2

pitch B

EMD(A,B) > 0

9

1.5 time

pitch

1.5

0.5 1.5

A

c© 2003 Rainer Typke

Properties of the EMD

• The EMD is continuous.

• For unequal weight sums, it does not have the positivity property. I. e.,partial matching is possible, and there are cases where the EMD does notdistinguish between different pairs of non-identical sets.

• For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

time

pitch

0.5 1.50.5

0.5

0.51

2

C

EMD(A,C) = 0

9

c© 2003 Rainer Typke

Properties of the EMD

• The EMD is continuous.

• For unequal weight sums, it does not have the positivity property. I. e.,partial matching is possible, and there are cases where the EMD does notdistinguish between different pairs of non-identical sets.

• For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

time

0.50.5

0.51

2

pitch B

time

pitch

0.5 1.50.5

0.5

0.51

2

C EMD(C,B) = 0

9

1.5 time

pitch

1.5

0.5 1.5

A

c© 2003 Rainer Typke

Properties of the EMD

• The EMD is continuous.

• For unequal weight sums, it does not have the positivity property. I. e.,partial matching is possible, and there are cases where the EMD does notdistinguish between different pairs of non-identical sets.

• For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

time

0.50.5

0.51

2

pitch B

time

pitch

0.5 1.50.5

0.5

0.51

2

C

EMD(A,B) > EMD(A,C)+ EMD(C, B)

9

c© 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

10

c© 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

1.93 0.46

1.790.33

0.34

1.730.26

2.120.35

0.26

1.45

1.66

red: top voice only (monophonic), recorded with a MIDI keyboard

10

c© 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

1.9

0.26

0.3

0.31

3.26

1.41 0.24

1.341.341.55

1.561.61

2.081.34

0.381.49 1.56

1.69

1.721.75

1.71

1.69

1.681.5

1.461.411.34 1.62 1.4 1.33 2.08

1.21

blue/gray: all voices, in a different MIDI keyboard recording. Thenon-matched notes (or parts thereof) are shown gray.

10

c© 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

1.93 0.46

1.790.33

0.34

1.730.26

2.120.35

0.26

1.45

1.66

1.9

0.26

0.3

0.31

3.26

1.41 0.24

1.341.341.55

1.561.61

2.081.34

0.381.49 1.56

1.69

1.721.75

1.71

1.69

1.681.5

1.461.411.34 1.62 1.4 1.33 2.08

1.21

red: top voice only (monophonic), recorded with a MIDI keyboardblue/gray: all voices, in a different MIDI keyboard recording. Thenon-matched notes (or parts thereof) are shown gray.

10

c© 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

1.75

1.21

1.93

1.79 0.01

0.26

0.07

0.38

0.02

1.66

2.121.73

0.30.07

0.26

1.93 0.46

1.790.33

0.34

1.730.26

2.120.35

0.26

1.45

1.66

1.9

0.26

0.3

0.31

3.26

1.41 0.24

1.341.341.55

1.561.61

2.081.34

0.381.49 1.56

1.69

1.721.75

1.71

1.69

1.681.5

1.461.411.34 1.62 1.4 1.33 2.08

1.21

red: top voice only (monophonic), recorded with a MIDI keyboardblue/gray: all voices, in a different MIDI keyboard recording. Thenon-matched notes (or parts thereof) are shown gray.

10

c© 2003 Rainer Typke

• Takes a weight surplus into account.

• Triangle inequality holds.

• Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, whilepreserving the weight proportions. Then, the EMD is calculated.

The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

1.5 time

pitch

1.5

0.5 1.5

A

c© 2003 Rainer Typke

time

0.50.5

0.51

2

pitch B

EMD(A,B) > 0

• Takes a weight surplus into account.

• Triangle inequality holds.

• Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, whilepreserving the weight proportions. Then, the EMD is calculated.

The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

1.5 time

pitch

1.5

0.5 1.5

A

c© 2003 Rainer Typketime

pitch

0.5 1.50.5

0.5

0.51

2

C

EMD(A,C) > 0

• Takes a weight surplus into account.

• Triangle inequality holds.

• Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, whilepreserving the weight proportions. Then, the EMD is calculated.

The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

c© 2003 Rainer Typke

time

0.50.5

0.51

2

pitch B

time

pitch

0.5 1.50.5

0.5

0.51

2

C EMD(C,B) > 0

• Takes a weight surplus into account.

• Triangle inequality holds.

• Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, whilepreserving the weight proportions. Then, the EMD is calculated.

The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

1.5 time

pitch

1.5

0.5 1.5

A

c© 2003 Rainer Typke

time

0.50.5

0.51

2

pitch B

time

pitch

0.5 1.50.5

0.5

0.51

2

C

EMD(A,B) <= EMD(A,C)+ EMD(C, B)

• Takes a weight surplus into account.

• Triangle inequality holds.

• Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, whilepreserving the weight proportions. Then, the EMD is calculated.

The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

c© 2003 Rainer Typke

• The weight sum is 1 for both melodies.

• Because of this, augmentation or diminution is ignored.

PTD Matching example

12

Alexandre Stievenart: Variations

Anonymus: Les trois cousines - Distance: 0.93

∑n

i=0 wi = 1

∑m

i=0 ui = 10.06

0.030.03

0.06 0.06 0.06 0.060.12 0.06 0.06 0.06 0.06

0.12 0.12

0.06 0.06

0.06 0.060.06 0.06 0.12

0.06 0.06 0.06 0.06 0.06 0.06 0.12

0.06

0.12

0.060.060.06 0.06

0.030.06

0.060.060.03

0.120.060.060.06

c© 2003 Rainer Typke

Indexing with Vantage Objects

Q

Feature Space

We want to retrieve all objects with a distance <= r from the query object Q.

r

13

c© 2003 Rainer Typke

Indexing with Vantage Objects

Q

V1

V2

Feature Space

We want to retrieve all objects with a distance <= r from the query object Q.

r

V1, V2: Vantage objects.

Instead of searching the whole featurespace:

• For each object, pre-calculate thedistances to vantage objects.

• Use these distances as coordinatesin a Euclidean space.

• Restrict the search to those objectswith a Euclidean distance <= r inthe Euclidean space of vantage dis-tances.

13

c© 2003 Rainer Typke

Indexing with Vantage Objects

Q

V1

V2

Feature Space

F

We want to retrieve all objects with a distance <= r from the query object Q.

r

V1, V2: Vantage objects.

Objects with the same coordinates as Qin the Euclidean space lie on the inter-sections of the black circles around V1

and V2.With two vantage objects, there can bean object F 6= Q with the same coordi-nates as Q.

13

c© 2003 Rainer Typke

Indexing with Vantage Objects

Q

V1

V2

Feature Space

We want to retrieve all objects with a distance <= r from the query object Q.

r

V1, V2: Vantage objects.

Objects with a distance <= r from Q inthe Euclidean space lie within the inter-section of the blue bands around V1 andV2.

Search steps:

• Find all objects with an Euclideandistance <= r from Q in the Eu-clidean vantage space.

• Only for those, calculate the dis-tances to Q in the feature spaceand discard those with a distance> r.

13

F

c© 2003 Rainer Typke14

• Query-by-Humming without Note Onset Detection:

– Represent every note in the database with a point set instead of asingle point.

– Match the sequence of fundamental frequencies from FFT windowswith the scores in the database.

This would avoid the notoriously difficult and error-prone note onset de-tection.

• Partial Matching/Tempo Tracking: Split the queries and the scores inthe database into overlapping, polyphonic chunks with a duration of justa few notes and then search for sequences of similar chunks.

– Tempo variations are possible without requiring explicit tempo track-ing. Neither tempo nor measure structure need to be known.

– The indexing method with vantage objects can be used to make thesearch for matching chunks efficient.

Future Goals

c© 2003 Rainer Typke

Measures the minimum amount of work needed to transform one weightedpoint set into the other by moving weight.

A = {a1, a2, .., am}, B = {b1, b2, .., bn}: weighted point setswi, uj ∈ R+ ∪ {0}: their weights.W=

∑m

i=1 wi, U=∑n

i=1 ui.fij : the flow of weight from ai to bj over the ground distance dij .

The set of all possible flows F = [fij ] is defined by these constraints:

1. fij ≥ 0, i = 1, ...,m, j = 1, ..., n

2.∑n

j=1 fij ≤ wi, i = 1, ...,m

3.∑m

i=1 fij ≤ uj , j = 1, ..., n

4.∑m

i=1

∑n

j=1 fij = min(W,U)

EMD(A,B) =minF∈F

∑m

i=1

∑n

j=1 fijdij

min(W,U)

The Earth Mover’s Distance

(no negative flow component.)

} (no point emits or receivesmore weight than it has.)

(the lighter of the two point setsis completely matched.)

7 (long version)