24
Finding Musically Meaningful Words Using Sparse CCA David A. Torres, Douglas Turnbull, Bharath K. Sriperumbudur, Luke Barrington & Gert Lanckriet University of California, San Diego Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 1 / 22

Finding Musically Meaningful Words Using Sparse CCA

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finding Musically Meaningful Words Using Sparse CCA

Finding Musically Meaningful Words Using Sparse CCA

David A. Torres, Douglas Turnbull, Bharath K. Sriperumbudur,Luke Barrington & Gert Lanckriet

University of California, San Diego

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 1 / 22

Page 2: Finding Musically Meaningful Words Using Sparse CCA

Introduction

Goal: Create a content-based music search engine for natural languagequeries.

it annotates songs with semantically meaningful words andretrieve relevant songs based on a text query.CAL music search engine [Turnbull et al., 2007].

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 2 / 22

Page 3: Finding Musically Meaningful Words Using Sparse CCA

Introduction

Goal: Create a content-based music search engine for natural languagequeries.

it annotates songs with semantically meaningful words andretrieve relevant songs based on a text query.CAL music search engine [Turnbull et al., 2007].

Problem: Picking a vocabulary of musically meaningful words (vocabularyselection).

discover words that can be modeled accurately.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 2 / 22

Page 4: Finding Musically Meaningful Words Using Sparse CCA

Introduction

Goal: Create a content-based music search engine for natural languagequeries.

it annotates songs with semantically meaningful words andretrieve relevant songs based on a text query.CAL music search engine [Turnbull et al., 2007].

Problem: Picking a vocabulary of musically meaningful words (vocabularyselection).

discover words that can be modeled accurately.

Solution: Find words that have a high correlation with the audio featurerepresentation.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 2 / 22

Page 5: Finding Musically Meaningful Words Using Sparse CCA

Two-view Representation

Consider a set of annotated songs. Each song is represented by:

Annotation vector in a semantic space

Audio feature vector in a acoustic space

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 3 / 22

Page 6: Finding Musically Meaningful Words Using Sparse CCA

Semantic Representation

Vocabulary of words

CAL 500: 174 phrases from a human survey

Instrumentation, genre, emotion, usages, visual characteristics

LastFM: 15,000 tags from social music site

Web mining: 100,000+ words minded from text documents

Annotation vector, s

Each element represents the semantic association between a word and thesong.

s ∈ Rd , where d is the size of the vocabulary.

Example: Frank Sinatra’s ”Fly me the moon”

Vocabulary={funk, jazz, guitar, female vocals, sad, passionate}s = [ 0

4, 3

4, 4

4, 0

4, 2

4, 1

4]

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 4 / 22

Page 7: Finding Musically Meaningful Words Using Sparse CCA

Acoustic Representation

Each song is represented by an audio feature vector a that is automaticallyextracted from the audio-content.

Mel-frequency cepstral coefficients (MFCC).

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 5 / 22

Page 8: Finding Musically Meaningful Words Using Sparse CCA

Canonical Correlation Analysis

Let X ∈ Rdx and Y ∈ R

dy be two random variables.

Problem: Find wx and wy such that ρ(wTx X,wT

y Y) is maximized.

Solution: Solve

maxwx , wy

wTx Sxywy

wTx Sxxwx

wTy Syywy

(1)

which is equivalent to

maxwx , wy

wTx Sxywy

s.t. wTx Sxxwx = 1 ,wT

y Syywy = 1. (2)

The above is the variational formulation of CCA.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 6 / 22

Page 9: Finding Musically Meaningful Words Using Sparse CCA

Canonical Correlation Analysis

In our analysis, a variation of Eq. (2) is used as given below.

maxw

wTPw

s.t. wTQw = 1. (3)

where P =

(

0 Sxy

Syx 0

)

, Q =

(

Sxx 0

0 Syy

)

and w =

(

wx

wy

)

.

Eq. (3) is a generalized eigenvalue problem with P being indefinite and

Q ∈ Sdx+dy

++ .

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 7 / 22

Page 10: Finding Musically Meaningful Words Using Sparse CCA

Need for sparsity

CCA solution is usually not sparse.

The solution vector has components along all the features (here, words).

Difficult to interpret the results.

Few relevant features might be sufficient to describe the correlation.

In our application, vocabulary pruning results in modeling fewer words.

Solution: Sparsify the CCA solution.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 8 / 22

Page 11: Finding Musically Meaningful Words Using Sparse CCA

Sparse CCA

Heurisitc: wy = [wy1, . . . ,wyny

]T . If |wyi| < ǫ, choose wyi

= 0. (non-optimal)

Solution: Introduce the sparsity constraint in CCA’s variational formulation.

Sparse CCA: The variational formulation is given by

maxw

wTPw

s.t. wTQw = 1

||w||0 ≤ k, (4)

where 1 ≤ k ≤ n, n = dx + dy and ||w||0 is the cardinality of w.

Issues: Eq. (4) is NP-hard and therefore intractable. ℓ1-relaxation is stillcomputationally hard.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 9 / 22

Page 12: Finding Musically Meaningful Words Using Sparse CCA

Convex Relaxation

Primal:

maxw

wTPw

s.t. wTQw ≤ 1

||w||1 ≤ k. (5)

Trick: Compute the bi-dual (dual of the dual of the primal).

Bi-dual:

maxW,w

tr(WP)

s.t. tr(WQ) ≤ 1

||w||1 ≤ k(

W w

wT 1

)

� 0. (SDP) (6)

Issue: SDP relaxation is prohibitively expensive to solve for large n.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 10 / 22

Page 13: Finding Musically Meaningful Words Using Sparse CCA

Approximation to ||x||0

Two observations

The ℓ1-norm relaxation does not simplify Eq. (4) ⇒ a better approximation tocardinality would improve sparsity.

The convex SDP approximation to Eq. (4) scales terribly in size ⇒ use alocally convergent algorithm with better scalability.

Eq. (4) can be written as

maxw

wTPw − ρ ||w||0

s.t. wTQw ≤ 1, (7)

where ρ ≥ 0.

Approximate ||x||0 by∑n

i=1 log(|xi |). (Refer to [Sriperumbudur et al., 2007]for more details)

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 11 / 22

Page 14: Finding Musically Meaningful Words Using Sparse CCA

Approximation to ||x||0

Eq. (7) can be written as

minw

µ||w||2 −

(

wT (P + µI)w − ρ

n∑

i=1

log |wi |

)

s.t. wTQw ≤ 1. (8)

where µ ≥ max(0,−λmin(P)).

The objective in Eq. (8) is a difference of two convex functions and thereforeis a d.c. program.

Solving Eq. (8) using the DC minimization algorithm(DCA) [Tao and An, 1998] yields the following algorithm.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 12 / 22

Page 15: Finding Musically Meaningful Words Using Sparse CCA

Sparse CCA Algorithm

Require: P ∈ Sn, Q ∈ S

n++ and ρ ≥ 0

1: Choosew0 ∈ {w : wTQw ≤ 1} arbitrarily2: repeat

3:

w̄∗ = arg minw̄

µw̄TD2(wl)w̄ − 2wTl [P + µI]D(wl)w̄ + ρ||w̄||1

s.t. w̄TD(wl)QD(wl)w̄ ≤ 1 (9)

4: wl+1 = D(wl )w̄∗

5: until wl+1 = wl

6: return wl , w̄∗

where D(w) = diag(w).

solves a sequence of convex quadratically constrained quadratic programs(QCQPs).

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 13 / 22

Page 16: Finding Musically Meaningful Words Using Sparse CCA

Modification to Vocabulary Selection

For vocabulary selection, the sparsity constraint is required only on wy

instead of on w.

Modify Eq. (9) as

w̄∗ = arg minw̄

µw̄TD2(wl)w̄ − 2wTl [P + µI]D(wl )w̄ + ||τ ◦ w̄||1

s.t. w̄TD(wl)QD(wl)w̄ ≤ 1 (10)

where (p ◦ q)i = piqi and τ = [0, 0, dx. . ., 0, ρ, ρ, dy. . ., ρ]T .

The non-zero elements of wy can be interpreted as those words which have ahigh correlation with the audio representation.

Setting ρ: Not straightforward (increasing ρ reduces the vocabulary size).

Issues: Quality of the solution is hard to derive unlike in SDP.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 14 / 22

Page 17: Finding Musically Meaningful Words Using Sparse CCA

Experimental Setup

Dataset: CAL500 [Turnbull et al., 2007]

500 songs by 500 artists

Semantic representation:

173 words (e.g. genre, instrumentation, usages, emotions,vocals, etc.)

Annotation vector, s is an average from 4 listeners.

Word agreement score: measures how consistently listenersapply a word to songs.

Acoustic representation:

Bag of dynamic MFCC vectors (52-dimensional).

Duplicate annotation vector for each dynamic MFCC.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 15 / 22

Page 18: Finding Musically Meaningful Words Using Sparse CCA

Experiment: Vocabulary Pruning

Web2131 Text corpus [Turnbull et al., 2006]

Collection of 2131 songs and accompanying expert song reviews mined fromwww.allmusic.com.

315 word vocabulary.

Annotation vector is based on the presence or absence of a word in the review.

More noisy word-song relationships than CAL500.

Experimental design

Merge vocabularies: 173+315=488 words.

Prune noisy words as we increase amount of sparsity in CCA.

Hypothesis

Web2131 words will be pruned before CAL500 words.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 16 / 22

Page 19: Finding Musically Meaningful Words Using Sparse CCA

Results: Vocabulary Pruning

Vocabulary size 488 249 203 149 103 50

# CAL500 words 173 118 101 85 65 39

# Web2131 words 315 131 102 64 38 11

%Web2131 .64 .52 .50 .42 .36 .22

Table: The fraction of noisy web-mined words in a vocabulary as vocabulary size isreduced: As the size shrinks sparse CCA prunes noisy words and the web-mined wordsare eliminated over higher quality CAL500 words.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 17 / 22

Page 20: Finding Musically Meaningful Words Using Sparse CCA

Experiment: Vocabulary Selection for Music Retrieval

P(song |word) is modeled as a Gaussian mixture model.

The system can annotate a novel song with words from its vocabulary or itcan retrieve an ordered list of novel songs based on a keyword query.

Evaluation metric for retrieval: Area under the ROC curve.

204060801001201401601800.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

mea

n av

g ro

c

vocab size

sparse ccarandomhuman agreement

Figure: Comparison of vocabulary selection techniques: We compare vocabulary selectionusing human agreement, acoustic correlation, and a random baseline, as it effectsretrieval performance.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 18 / 22

Page 21: Finding Musically Meaningful Words Using Sparse CCA

Summary

Constructing a meaningful vocabulary is the first step in building acontent-based, natural-language search engine for music.

Given a semantic representation and acoustic representation, sparse CCA canbe used to find musically meaningful words.

semantic dimensions linearly correlated with audio features.

Automatically pruning words is important when using noisy sources ofsemantic information.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 19 / 22

Page 22: Finding Musically Meaningful Words Using Sparse CCA

References

Sriperumbudur, B. K., Torres, D., and Lanckriet, G. R. G. (2007).Sparse eigen methods by d.c. programming.In ICML 2007.

Tao, P. D. and An, L. T. H. (1998).D.c. optimization algorithms for solving the trust region subproblem.SIAM J. Optim., pages 476–505.

Turnbull, D., Barrington, L., and Lanckriet, G. R. G. (2006).Modelling music and words using a multi-class naive bayes approach.In ISMIR 2006.

Turnbull, D., Barrington, L., Torres, D., and Lanckriet, G. R. G. (2007).Towards musical query-by-semantic description using the cal500 dataset.In SIGIR 2007.

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 20 / 22

Page 23: Finding Musically Meaningful Words Using Sparse CCA

Questions

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 21 / 22

Page 24: Finding Musically Meaningful Words Using Sparse CCA

Thank You

Bharath K. Sriperumbudur (UCSD) Finding Musically Meaningful Words Using Sparse CCA Music, Brain & Cognition Workshop 22 / 22