124
Sparse and Low Rank Representations in Music Signal Analysis Constantine Kotropoulos , Yannis Panagakis Artificial Intelligence & Information Analysis Laboratory Department of Informatics Aristotle University of Thessaloniki Thessaloniki 54124, GREECE 2nd Greek Signal Processing Jam, Thessaloniki, May 17th, 2012 Sparse and Low Rank Representations in Music Signal Analysis 1/54

Sparse and Low Rank Representations in Music Signal Analysis

Embed Size (px)

DESCRIPTION

Constantine Kotropoulos, Associate Professor, Aristotle University of Thessaloniki, Department of Informatics, Sparse and Low Rank Representations in Music Signal Analysis

Citation preview

Page 1: Sparse and Low Rank Representations in Music Signal  Analysis

Sparse and Low Rank Representations in MusicSignal Analysis

Constantine Kotropoulos, Yannis Panagakis

Artificial Intelligence & Information Analysis LaboratoryDepartment of Informatics

Aristotle University of ThessalonikiThessaloniki 54124, GREECE

2nd Greek Signal Processing Jam, Thessaloniki, May 17th, 2012

Sparse and Low Rank Representations in Music Signal Analysis 1/54

Page 2: Sparse and Low Rank Representations in Music Signal  Analysis

Outline

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 ConclusionsSparse and Low Rank Representations in Music Signal Analysis 2/54

Page 3: Sparse and Low Rank Representations in Music Signal  Analysis

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 Conclusions

Sparse and Low Rank Representations in Music Signal Analysis 3/54

Page 4: Sparse and Low Rank Representations in Music Signal  Analysis

Introduction

Music genre classificationGenre: the most popular description of music content despite thelack of a commonly agreed definition. To classify music recordingsinto distinguishable genres using information extracted from theaudio signal.

Musical structure analysisTo derive the musical form, i.e., the structural description of amusic piece at the time scale of segments, such as intro, verse,chorus, bridge, from the audio signal.

Music taggingTags: text-based labels encoding semantic information related tomusic (i.e., instrumentation, genres, emotions, etc.). Manualtagging (expensive, time consuming, applicable to popular music);Automatic tagging (fast, applied to new and unpopular music).

Sparse and Low Rank Representations in Music Signal Analysis 4/54

Page 5: Sparse and Low Rank Representations in Music Signal  Analysis

Introduction

Music genre classificationGenre: the most popular description of music content despite thelack of a commonly agreed definition. To classify music recordingsinto distinguishable genres using information extracted from theaudio signal.

Musical structure analysisTo derive the musical form, i.e., the structural description of amusic piece at the time scale of segments, such as intro, verse,chorus, bridge, from the audio signal.

Music taggingTags: text-based labels encoding semantic information related tomusic (i.e., instrumentation, genres, emotions, etc.). Manualtagging (expensive, time consuming, applicable to popular music);Automatic tagging (fast, applied to new and unpopular music).

Sparse and Low Rank Representations in Music Signal Analysis 4/54

Page 6: Sparse and Low Rank Representations in Music Signal  Analysis

Introduction

Music genre classificationGenre: the most popular description of music content despite thelack of a commonly agreed definition. To classify music recordingsinto distinguishable genres using information extracted from theaudio signal.

Musical structure analysisTo derive the musical form, i.e., the structural description of amusic piece at the time scale of segments, such as intro, verse,chorus, bridge, from the audio signal.

Music taggingTags: text-based labels encoding semantic information related tomusic (i.e., instrumentation, genres, emotions, etc.). Manualtagging (expensive, time consuming, applicable to popular music);Automatic tagging (fast, applied to new and unpopular music).

Sparse and Low Rank Representations in Music Signal Analysis 4/54

Page 7: Sparse and Low Rank Representations in Music Signal  Analysis

Introduction

MotivationThe appealing properties of slow temporal and spectro-temporalmodulations from the human perceptual point of viewa;The strong theoretical foundations of sparse representationsbc

and low-rank representationsd.a

K. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech andAudio Processing, vol. 3, no. 5, pp. 382–396, 1995.

bE. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly

incomplete frequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489–509, February 2006.c

D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289–1306, April2006.

dG. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,

IEEE Trans. Pattern Analysis and Machine Intelligence, 2011, arXiv:1010.2955v4 (preprint).

Sparse and Low Rank Representations in Music Signal Analysis 5/54

Page 8: Sparse and Low Rank Representations in Music Signal  Analysis

Introduction

MotivationThe appealing properties of slow temporal and spectro-temporalmodulations from the human perceptual point of viewa;The strong theoretical foundations of sparse representationsbc

and low-rank representationsd.a

K. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech andAudio Processing, vol. 3, no. 5, pp. 382–396, 1995.

bE. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly

incomplete frequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489–509, February 2006.c

D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289–1306, April2006.

dG. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,

IEEE Trans. Pattern Analysis and Machine Intelligence, 2011, arXiv:1010.2955v4 (preprint).

Sparse and Low Rank Representations in Music Signal Analysis 5/54

Page 9: Sparse and Low Rank Representations in Music Signal  Analysis

Notations

SpanLet span(X) denote the linear space spanned by the columns of X.Then, Y ∈ span(X) denotes that all column vectors of Y belong tospan(X).

Sparse and Low Rank Representations in Music Signal Analysis 6/54

Page 10: Sparse and Low Rank Representations in Music Signal  Analysis

Notations

Vector norms‖x‖0 is `0 quasi-norm counting the number of nonzero entries in x.

If |.| denotes the absolute value operator, ‖x‖1 =∑

i |xi | and

‖x‖2 =√∑

i x2i are the `1 and the `2 norm of x, respectively.

Sparse and Low Rank Representations in Music Signal Analysis 6/54

Page 11: Sparse and Low Rank Representations in Music Signal  Analysis

Notations

Matrix norms

mixed `p,q matrix norm: ‖X‖p,q =

(∑i

(∑j |xij |p

) qp) 1

q

.

For p = q = 0, the matrix `0 quasi-norm, ‖X‖0, returns the number ofnonzero entries in X. For p = q = 1, the matrix `1 norm is obtained‖X‖1 =

∑i∑

j |xij |.

Frobenius norm: ‖X‖F =√∑

i∑

j x2ij .

`2/`1 norm of X: ‖X‖2,1 =∑

j

√∑i x2

ij .

nuclear norm of X, ‖X‖∗, is the sum of singular values of X.

Sparse and Low Rank Representations in Music Signal Analysis 6/54

Page 12: Sparse and Low Rank Representations in Music Signal  Analysis

Notations

SupportA vector x is said to be q-sparse if the size of the support of x (i.e., theset of indices associated to non-zero vector elements) is no larger thanq.

The support of a collection of vectors X = [x1,x2, . . . ,xN ] is defined asthe union over all the individual supports.

A matrix X is called q joint sparse, if |supp(X)| ≤ q. That is, there areat most q rows in X that contain nonzero elements, because‖X‖0,q = |supp(X)| for any qa.

aM. Davies and Y. Eldar, “Rank awareness in joint sparse recovery,” arXiv:1004.4529v1, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 6/54

Page 13: Sparse and Low Rank Representations in Music Signal  Analysis

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 Conclusions

Sparse and Low Rank Representations in Music Signal Analysis 7/54

Page 14: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Computational auditory modelIt is inspired by psychoacoustical and neurophysiologicalinvestigations in the early and central stages of the humanauditory system.

Earl

yau

dit

ory

mo

del

Cen

tralau

dit

ory

mo

delAuditory Spectrogram

Auditory Temporal Modulations

Auditory Spectro-Temporal Modulations(Cortical Representation)

Sparse and Low Rank Representations in Music Signal Analysis 8/54

Page 15: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Computational auditory modelIt is inspired by psychoacoustical and neurophysiologicalinvestigations in the early and central stages of the humanauditory system.

Earl

yau

dit

ory

mo

del

Cen

tralau

dit

ory

mo

delAuditory Spectrogram

Auditory Temporal Modulations

Auditory Spectro-Temporal Modulations(Cortical Representation)

Sparse and Low Rank Representations in Music Signal Analysis 8/54

Page 16: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Early auditory systemAuditory Spectrogram: time-frequency distribution of energy alonga tonotopic (logarithmic frequency) axis.

Early auditory model

Auditory Spectrogram

Sparse and Low Rank Representations in Music Signal Analysis 9/54

Page 17: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Central auditory system - Temporal modulations

Auditory Spectrogram

ω(H

z)

ω(H

z)

Auditory Temporal Modulations

Sparse and Low Rank Representations in Music Signal Analysis 10/54

Page 18: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Auditory temporal modulations across 10 music genres

Blues Classical Country Disco Hiphop

Jazz Metal RockPop Reggae

Sparse and Low Rank Representations in Music Signal Analysis 11/54

Page 19: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Central auditory system - Spectro-temporal modulations

Auditory Spectrogram Auditory Spectro-Temporal Modulations

ωH

z)(

Ω(c

/o)

Sparse and Low Rank Representations in Music Signal Analysis 12/54

Page 20: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Efficient implementation through constant Q transform (CQT)

Sparse and Low Rank Representations in Music Signal Analysis 13/54

Page 21: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Parameters and implementation (1)The audio signal is analyzed by employing 128 constant-Q filterscovering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters peroctave). The magnitude of the CQT is compressed by raisingeach element of the CQT matrix to the power of 0.1a.The 2D multiresolution wavelet analysis is implemented via a bankof 2D Gaussian filters with scales ∈ {0.25,0.5,1,2,4,8}(Cycles/Octave) and rates ∈ {±2,±4,±8,±16,±32} (Hz).For each music recording, the extracted 4D cortical representationis time- averaged and the resulting rate-scale-frequency 3Dcortical representation is thus obtained.

aC. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music

Computing Conf., Barcelona, Spain, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 14/54

Page 22: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Parameters and implementation (1)The audio signal is analyzed by employing 128 constant-Q filterscovering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters peroctave). The magnitude of the CQT is compressed by raisingeach element of the CQT matrix to the power of 0.1a.The 2D multiresolution wavelet analysis is implemented via a bankof 2D Gaussian filters with scales ∈ {0.25,0.5,1,2,4,8}(Cycles/Octave) and rates ∈ {±2,±4,±8,±16,±32} (Hz).For each music recording, the extracted 4D cortical representationis time- averaged and the resulting rate-scale-frequency 3Dcortical representation is thus obtained.

aC. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music

Computing Conf., Barcelona, Spain, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 14/54

Page 23: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Parameters and implementation (1)The audio signal is analyzed by employing 128 constant-Q filterscovering 8 octaves from 44.9 Hz to 11 KHz (i.e., 16 filters peroctave). The magnitude of the CQT is compressed by raisingeach element of the CQT matrix to the power of 0.1a.The 2D multiresolution wavelet analysis is implemented via a bankof 2D Gaussian filters with scales ∈ {0.25,0.5,1,2,4,8}(Cycles/Octave) and rates ∈ {±2,±4,±8,±16,±32} (Hz).For each music recording, the extracted 4D cortical representationis time- averaged and the resulting rate-scale-frequency 3Dcortical representation is thus obtained.

aC. Schoerkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing, ” in 7th Sound and Music

Computing Conf., Barcelona, Spain, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 14/54

Page 24: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Parameters and implementation (2)To sum up, each music recording is represented by a vectorx ∈ R7680

+ by stacking the elements of the 3D corticalrepresentation into a vector.An ensemble of music recordings is represented by the datamatrix X ∈ R7680×S

+ , where S is the number of the availablerecordings.Each row of X is normalized to the range [0,1] by subtracting fromeach entry the row minimum and then by dividing it with thedifference between the row maximum and the row minimum.

Sparse and Low Rank Representations in Music Signal Analysis 15/54

Page 25: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Parameters and implementation (2)To sum up, each music recording is represented by a vectorx ∈ R7680

+ by stacking the elements of the 3D corticalrepresentation into a vector.An ensemble of music recordings is represented by the datamatrix X ∈ R7680×S

+ , where S is the number of the availablerecordings.Each row of X is normalized to the range [0,1] by subtracting fromeach entry the row minimum and then by dividing it with thedifference between the row maximum and the row minimum.

Sparse and Low Rank Representations in Music Signal Analysis 15/54

Page 26: Sparse and Low Rank Representations in Music Signal  Analysis

Auditory spectro-temporal modulations

Parameters and implementation (2)To sum up, each music recording is represented by a vectorx ∈ R7680

+ by stacking the elements of the 3D corticalrepresentation into a vector.An ensemble of music recordings is represented by the datamatrix X ∈ R7680×S

+ , where S is the number of the availablerecordings.Each row of X is normalized to the range [0,1] by subtracting fromeach entry the row minimum and then by dividing it with thedifference between the row maximum and the row minimum.

Sparse and Low Rank Representations in Music Signal Analysis 15/54

Page 27: Sparse and Low Rank Representations in Music Signal  Analysis

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 Conclusions

Sparse and Low Rank Representations in Music Signal Analysis 16/54

Page 28: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

StatementLet X ∈ Rd×S be the data matrix that contains S vector samples inits columns of size d . That is, xs ∈ Rd , s = 1,2, . . . ,S.Without loss of generality, the data matrix can be partitioned asX = [A | Y], where

A = [A1|A2| . . . |AK ] ∈ Rd×N represents a set of N training samplesthat belong to K classesY = [Y1|Y2| . . . |YK ] ∈ Rd×M contains M = S − N test vectorsamples in its columns.

If certain assumptions hold, learn a block diagonal matrixZ = diag[Z1,Z2, . . . ,ZK ] ∈ RN×M such that Y = AZ.

Sparse and Low Rank Representations in Music Signal Analysis 17/54

Page 29: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

StatementLet X ∈ Rd×S be the data matrix that contains S vector samples inits columns of size d . That is, xs ∈ Rd , s = 1,2, . . . ,S.Without loss of generality, the data matrix can be partitioned asX = [A | Y], where

A = [A1|A2| . . . |AK ] ∈ Rd×N represents a set of N training samplesthat belong to K classesY = [Y1|Y2| . . . |YK ] ∈ Rd×M contains M = S − N test vectorsamples in its columns.

If certain assumptions hold, learn a block diagonal matrixZ = diag[Z1,Z2, . . . ,ZK ] ∈ RN×M such that Y = AZ.

Sparse and Low Rank Representations in Music Signal Analysis 17/54

Page 30: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

StatementLet X ∈ Rd×S be the data matrix that contains S vector samples inits columns of size d . That is, xs ∈ Rd , s = 1,2, . . . ,S.Without loss of generality, the data matrix can be partitioned asX = [A | Y], where

A = [A1|A2| . . . |AK ] ∈ Rd×N represents a set of N training samplesthat belong to K classesY = [Y1|Y2| . . . |YK ] ∈ Rd×M contains M = S − N test vectorsamples in its columns.

If certain assumptions hold, learn a block diagonal matrixZ = diag[Z1,Z2, . . . ,ZK ] ∈ RN×M such that Y = AZ.

Sparse and Low Rank Representations in Music Signal Analysis 17/54

Page 31: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

StatementLet X ∈ Rd×S be the data matrix that contains S vector samples inits columns of size d . That is, xs ∈ Rd , s = 1,2, . . . ,S.Without loss of generality, the data matrix can be partitioned asX = [A | Y], where

A = [A1|A2| . . . |AK ] ∈ Rd×N represents a set of N training samplesthat belong to K classesY = [Y1|Y2| . . . |YK ] ∈ Rd×M contains M = S − N test vectorsamples in its columns.

If certain assumptions hold, learn a block diagonal matrixZ = diag[Z1,Z2, . . . ,ZK ] ∈ RN×M such that Y = AZ.

Sparse and Low Rank Representations in Music Signal Analysis 17/54

Page 32: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

StatementLet X ∈ Rd×S be the data matrix that contains S vector samples inits columns of size d . That is, xs ∈ Rd , s = 1,2, . . . ,S.Without loss of generality, the data matrix can be partitioned asX = [A | Y], where

A = [A1|A2| . . . |AK ] ∈ Rd×N represents a set of N training samplesthat belong to K classesY = [Y1|Y2| . . . |YK ] ∈ Rd×M contains M = S − N test vectorsamples in its columns.

If certain assumptions hold, learn a block diagonal matrixZ = diag[Z1,Z2, . . . ,ZK ] ∈ RN×M such that Y = AZ.

Sparse and Low Rank Representations in Music Signal Analysis 17/54

Page 33: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

AssumptionsIf

1 the data are exactly drawn from independent linear subspaces, i.e.,span(Ak ) linearly spans the k th class data space, k = 1,2, . . . ,K ,

2 Y ∈ span(A),3 the data contain neither outliers nor noise,

then each test vector sample that belongs to the k th class can berepresented as a linear combination of the training samples in Ak .

Sparse and Low Rank Representations in Music Signal Analysis 18/54

Page 34: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

AssumptionsIf

1 the data are exactly drawn from independent linear subspaces, i.e.,span(Ak ) linearly spans the k th class data space, k = 1,2, . . . ,K ,

2 Y ∈ span(A),3 the data contain neither outliers nor noise,

then each test vector sample that belongs to the k th class can berepresented as a linear combination of the training samples in Ak .

Sparse and Low Rank Representations in Music Signal Analysis 18/54

Page 35: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

AssumptionsIf

1 the data are exactly drawn from independent linear subspaces, i.e.,span(Ak ) linearly spans the k th class data space, k = 1,2, . . . ,K ,

2 Y ∈ span(A),3 the data contain neither outliers nor noise,

then each test vector sample that belongs to the k th class can berepresented as a linear combination of the training samples in Ak .

Sparse and Low Rank Representations in Music Signal Analysis 18/54

Page 36: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

AssumptionsIf

1 the data are exactly drawn from independent linear subspaces, i.e.,span(Ak ) linearly spans the k th class data space, k = 1,2, . . . ,K ,

2 Y ∈ span(A),3 the data contain neither outliers nor noise,

then each test vector sample that belongs to the k th class can berepresented as a linear combination of the training samples in Ak .

Sparse and Low Rank Representations in Music Signal Analysis 18/54

Page 37: Sparse and Low Rank Representations in Music Signal  Analysis

Learning Problem

AssumptionsIf

1 the data are exactly drawn from independent linear subspaces, i.e.,span(Ak ) linearly spans the k th class data space, k = 1,2, . . . ,K ,

2 Y ∈ span(A),3 the data contain neither outliers nor noise,

then each test vector sample that belongs to the k th class can berepresented as a linear combination of the training samples in Ak .

Sparse and Low Rank Representations in Music Signal Analysis 18/54

Page 38: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

Sparsest Representation (SR)

Z ∈ RN×M is the sparsest representation of the test data Y ∈ Rd×M

with respect to the training data A ∈ Rd×N obtained by solving theoptimization problema:

SR: argminzi

‖zi‖0 subject to yi = A zi , (1)

aE. Elhamifar and R. Vidal, “Sparse subspace clustering,” in IEEE Int. Conf. Computer Vision and Pattern

Recognition, Miami, FL, USA, 2009, pp. 2790-2797.

Sparse and Low Rank Representations in Music Signal Analysis 19/54

Page 39: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

Lowest-rank representation (LRR)

or Z ∈ RN×M is the lowest-rank representation of the test dataY ∈ Rd×M with respect to the training data A ∈ Rd×N obtained bysolving the optimization problema:

LRR: argminZ

rank(Z) subject to Y = A Z. (2)

aG. Liu, Z. Lin, S. Yan, J. Sun, and Y. Ma (2011)

Sparse and Low Rank Representations in Music Signal Analysis 19/54

Page 40: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

Convex relaxationsThe convex envelope of the `0 norm is the `1 norma, while the convexenvelope of the rank function is the nuclear normb.Convex relaxations can be obtained by replacing the `0 norm and therank function by their convex envelopes:

SR: argminzi

‖zi‖1 subject to yi = A zi , (3)

LRR: argminZ

‖Z‖∗ subject to Y = A Z. (4)

aD. Donoho, “For most large underdetermined systems of equations, the minimal l1-norm near-solution

approximates the sparsest near-solution,” Communications on Pure and Applied Mathematics, vol. 59, no. 7, pp.907-934, 2006.

bM. Fazel, Matrix Rank Minimization with Applications, Ph.D. thesis, Dept. Electrical Engineering, Stanford

University, CA, USA, 2002.

Sparse and Low Rank Representations in Music Signal Analysis 19/54

Page 41: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

SR pros and cons

The SR matrix Z ∈ RN×M is sparse block-diagonal and has gooddiscriminative properties, as has been demonstrated for the SRbased classifiersa.However, the SR

1 can not model generic subspace structures. Indeed the SR modelsaccurately subregions on subspaces, the so-called bouquets, ratherthan generic subspacesb.

2 does not capture the global structure of the data, since it iscomputed for each data sample individually. Indeed, although thesparsity offers an efficient representation, it damages the highwithin-class homogeneity, which is desirable for classification,especially in the presence of noise.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE

Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.b

J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no.7, pp. 3540-3560, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 20/54

Page 42: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

SR pros and cons

The SR matrix Z ∈ RN×M is sparse block-diagonal and has gooddiscriminative properties, as has been demonstrated for the SRbased classifiersa.However, the SR

1 can not model generic subspace structures. Indeed the SR modelsaccurately subregions on subspaces, the so-called bouquets, ratherthan generic subspacesb.

2 does not capture the global structure of the data, since it iscomputed for each data sample individually. Indeed, although thesparsity offers an efficient representation, it damages the highwithin-class homogeneity, which is desirable for classification,especially in the presence of noise.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE

Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.b

J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no.7, pp. 3540-3560, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 20/54

Page 43: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

SR pros and cons

The SR matrix Z ∈ RN×M is sparse block-diagonal and has gooddiscriminative properties, as has been demonstrated for the SRbased classifiersa.However, the SR

1 can not model generic subspace structures. Indeed the SR modelsaccurately subregions on subspaces, the so-called bouquets, ratherthan generic subspacesb.

2 does not capture the global structure of the data, since it iscomputed for each data sample individually. Indeed, although thesparsity offers an efficient representation, it damages the highwithin-class homogeneity, which is desirable for classification,especially in the presence of noise.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE

Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.b

J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no.7, pp. 3540-3560, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 20/54

Page 44: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

SR pros and cons

The SR matrix Z ∈ RN×M is sparse block-diagonal and has gooddiscriminative properties, as has been demonstrated for the SRbased classifiersa.However, the SR

1 can not model generic subspace structures. Indeed the SR modelsaccurately subregions on subspaces, the so-called bouquets, ratherthan generic subspacesb.

2 does not capture the global structure of the data, since it iscomputed for each data sample individually. Indeed, although thesparsity offers an efficient representation, it damages the highwithin-class homogeneity, which is desirable for classification,especially in the presence of noise.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE

Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.b

J. Wright and Y. Ma, “Dense error correction via l1-minimization,” IEEE Trans. Information Theory, vol. 56, no.7, pp. 3540-3560, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 20/54

Page 45: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

LRR pros and cons

The LRR matrix Z ∈ RN×M

1 models data steming from generic subspace structures2 preserves accurately the global data structure.3 For clean data, the LRR also exhibits dense within-class

homogeneity and zero between-class affinities, making it anappealing representation for classification purposes, e.g., in musicmood classificationa.

4 For data contaminated with noise and outliers, the low-rankconstraint seems to enforce noise correctionb.

But LRR looses sparsity within the classes.a

Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.

bE. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.

3, pp. 1-37, 2011.

Sparse and Low Rank Representations in Music Signal Analysis 21/54

Page 46: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

LRR pros and cons

The LRR matrix Z ∈ RN×M

1 models data steming from generic subspace structures2 preserves accurately the global data structure.3 For clean data, the LRR also exhibits dense within-class

homogeneity and zero between-class affinities, making it anappealing representation for classification purposes, e.g., in musicmood classificationa.

4 For data contaminated with noise and outliers, the low-rankconstraint seems to enforce noise correctionb.

But LRR looses sparsity within the classes.a

Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.

bE. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.

3, pp. 1-37, 2011.

Sparse and Low Rank Representations in Music Signal Analysis 21/54

Page 47: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

LRR pros and cons

The LRR matrix Z ∈ RN×M

1 models data steming from generic subspace structures2 preserves accurately the global data structure.3 For clean data, the LRR also exhibits dense within-class

homogeneity and zero between-class affinities, making it anappealing representation for classification purposes, e.g., in musicmood classificationa.

4 For data contaminated with noise and outliers, the low-rankconstraint seems to enforce noise correctionb.

But LRR looses sparsity within the classes.a

Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.

bE. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.

3, pp. 1-37, 2011.

Sparse and Low Rank Representations in Music Signal Analysis 21/54

Page 48: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

LRR pros and cons

The LRR matrix Z ∈ RN×M

1 models data steming from generic subspace structures2 preserves accurately the global data structure.3 For clean data, the LRR also exhibits dense within-class

homogeneity and zero between-class affinities, making it anappealing representation for classification purposes, e.g., in musicmood classificationa.

4 For data contaminated with noise and outliers, the low-rankconstraint seems to enforce noise correctionb.

But LRR looses sparsity within the classes.a

Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.

bE. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.

3, pp. 1-37, 2011.

Sparse and Low Rank Representations in Music Signal Analysis 21/54

Page 49: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

LRR pros and cons

The LRR matrix Z ∈ RN×M

1 models data steming from generic subspace structures2 preserves accurately the global data structure.3 For clean data, the LRR also exhibits dense within-class

homogeneity and zero between-class affinities, making it anappealing representation for classification purposes, e.g., in musicmood classificationa.

4 For data contaminated with noise and outliers, the low-rankconstraint seems to enforce noise correctionb.

But LRR looses sparsity within the classes.a

Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.

bE. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.

3, pp. 1-37, 2011.

Sparse and Low Rank Representations in Music Signal Analysis 21/54

Page 50: Sparse and Low Rank Representations in Music Signal  Analysis

Solutions

LRR pros and cons

The LRR matrix Z ∈ RN×M

1 models data steming from generic subspace structures2 preserves accurately the global data structure.3 For clean data, the LRR also exhibits dense within-class

homogeneity and zero between-class affinities, making it anappealing representation for classification purposes, e.g., in musicmood classificationa.

4 For data contaminated with noise and outliers, the low-rankconstraint seems to enforce noise correctionb.

But LRR looses sparsity within the classes.a

Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in Proc.19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 689–693.

bE. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no.

3, pp. 1-37, 2011.

Sparse and Low Rank Representations in Music Signal Analysis 21/54

Page 51: Sparse and Low Rank Representations in Music Signal  Analysis

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 Conclusions

Sparse and Low Rank Representations in Music Signal Analysis 22/54

Page 52: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank representations (JSLRR)

MotivationIntuitively, a representation matrix that is able to reveal the mostcharacteristic subregions of the subspaces must besimultaneously row sparse and low-rank.The row sparsity ensures that only a small fraction of the trainingsamples is involved in the representation.The low-rank constraint ensures that the representation vectors(i.e., the columns of the representation matrix) are correlated inthe sense that the data lying onto a single subspace arerepresented as a linear combination of the same few trainingsamples.

Sparse and Low Rank Representations in Music Signal Analysis 23/54

Page 53: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank representations (JSLRR)

MotivationIntuitively, a representation matrix that is able to reveal the mostcharacteristic subregions of the subspaces must besimultaneously row sparse and low-rank.The row sparsity ensures that only a small fraction of the trainingsamples is involved in the representation.The low-rank constraint ensures that the representation vectors(i.e., the columns of the representation matrix) are correlated inthe sense that the data lying onto a single subspace arerepresented as a linear combination of the same few trainingsamples.

Sparse and Low Rank Representations in Music Signal Analysis 23/54

Page 54: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank representations (JSLRR)

MotivationIntuitively, a representation matrix that is able to reveal the mostcharacteristic subregions of the subspaces must besimultaneously row sparse and low-rank.The row sparsity ensures that only a small fraction of the trainingsamples is involved in the representation.The low-rank constraint ensures that the representation vectors(i.e., the columns of the representation matrix) are correlated inthe sense that the data lying onto a single subspace arerepresented as a linear combination of the same few trainingsamples.

Sparse and Low Rank Representations in Music Signal Analysis 23/54

Page 55: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

Problem statement and solutionThe JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrixZ ∈ RN×M with rank r � min(q,M), where q � N is the size of thesupport of Z.It can be found by minimizing the rank function regularized by the`0,q quasi-norm.The `0,q regularization term ensures that the low-rank matrix isalso row sparse, since ‖Z‖0,q = |supp(Z)| for any q.A convex relaxation of the just mentioned problem is solved:

JSLRR: argminZ

‖Z‖∗ + θ1‖Z‖1 subject to Y = A Z, (5)

where the term ‖Z‖1 promotes sparsity to the LRR matrix andθ1 > 0 balances the two norms in (5).

Sparse and Low Rank Representations in Music Signal Analysis 24/54

Page 56: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

Problem statement and solutionThe JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrixZ ∈ RN×M with rank r � min(q,M), where q � N is the size of thesupport of Z.It can be found by minimizing the rank function regularized by the`0,q quasi-norm.The `0,q regularization term ensures that the low-rank matrix isalso row sparse, since ‖Z‖0,q = |supp(Z)| for any q.A convex relaxation of the just mentioned problem is solved:

JSLRR: argminZ

‖Z‖∗ + θ1‖Z‖1 subject to Y = A Z, (5)

where the term ‖Z‖1 promotes sparsity to the LRR matrix andθ1 > 0 balances the two norms in (5).

Sparse and Low Rank Representations in Music Signal Analysis 24/54

Page 57: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

Problem statement and solutionThe JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrixZ ∈ RN×M with rank r � min(q,M), where q � N is the size of thesupport of Z.It can be found by minimizing the rank function regularized by the`0,q quasi-norm.The `0,q regularization term ensures that the low-rank matrix isalso row sparse, since ‖Z‖0,q = |supp(Z)| for any q.A convex relaxation of the just mentioned problem is solved:

JSLRR: argminZ

‖Z‖∗ + θ1‖Z‖1 subject to Y = A Z, (5)

where the term ‖Z‖1 promotes sparsity to the LRR matrix andθ1 > 0 balances the two norms in (5).

Sparse and Low Rank Representations in Music Signal Analysis 24/54

Page 58: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

Problem statement and solutionThe JSLRR of Y ∈ Rd×M with respect to A ∈ Rd×N is the matrixZ ∈ RN×M with rank r � min(q,M), where q � N is the size of thesupport of Z.It can be found by minimizing the rank function regularized by the`0,q quasi-norm.The `0,q regularization term ensures that the low-rank matrix isalso row sparse, since ‖Z‖0,q = |supp(Z)| for any q.A convex relaxation of the just mentioned problem is solved:

JSLRR: argminZ

‖Z‖∗ + θ1‖Z‖1 subject to Y = A Z, (5)

where the term ‖Z‖1 promotes sparsity to the LRR matrix andθ1 > 0 balances the two norms in (5).

Sparse and Low Rank Representations in Music Signal Analysis 24/54

Page 59: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

Any theoretical quarantee?The JSLRR has a block-diagonal structure, a property that makes itappealing for classification. This fact is proved in Theorem 1, which isa consequence of Lemma 1.

Sparse and Low Rank Representations in Music Signal Analysis 25/54

Page 60: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

Lemma 1Let ‖.‖θ = ‖.‖∗ + θ‖.‖1, with θ > 0. For any four matrices B,C,D, and Fof compatible dimensions,∥∥∥∥[ B C

D F

]∥∥∥∥θ

≥∥∥∥∥[ B 0

0 F

]∥∥∥∥θ

= ‖B‖θ + ‖F‖θ. (6)

Sparse and Low Rank Representations in Music Signal Analysis 25/54

Page 61: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

Theorem 1Assume that the data are exactly drawn from independent linearsubspaces. That is, span(Ak ) linearly spans the training vectors of thek th class, k = 1,2, . . . ,K , and Y ∈ span(A). Then, the minimizer of (5)is block-diagonal.

Sparse and Low Rank Representations in Music Signal Analysis 25/54

Page 62: Sparse and Low Rank Representations in Music Signal  Analysis

Example 1

Ideal case4 linear pairwise independent subspaces are constructed whosebases {Ui}4i=1 are computed by Ui+1 = RUi , i = 1,2,3.U1 ∈ R600×110 is a column orthonormal random matrix andR ∈ R600×600 is a random rotation matrix.The data matrix X = [X1,X2,X3,X4] ∈ R600×400 is obtained bypicking 100 samples from each subspace. That is, Xi ∈ R600×100,i = 1,2,3,4.Next, the data matrix is partitioned into the training matrixA ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a10-fold cross validation.

Sparse and Low Rank Representations in Music Signal Analysis 26/54

Page 63: Sparse and Low Rank Representations in Music Signal  Analysis

Example 1

Ideal case4 linear pairwise independent subspaces are constructed whosebases {Ui}4i=1 are computed by Ui+1 = RUi , i = 1,2,3.U1 ∈ R600×110 is a column orthonormal random matrix andR ∈ R600×600 is a random rotation matrix.The data matrix X = [X1,X2,X3,X4] ∈ R600×400 is obtained bypicking 100 samples from each subspace. That is, Xi ∈ R600×100,i = 1,2,3,4.Next, the data matrix is partitioned into the training matrixA ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a10-fold cross validation.

Sparse and Low Rank Representations in Music Signal Analysis 26/54

Page 64: Sparse and Low Rank Representations in Music Signal  Analysis

Example 1

Ideal case4 linear pairwise independent subspaces are constructed whosebases {Ui}4i=1 are computed by Ui+1 = RUi , i = 1,2,3.U1 ∈ R600×110 is a column orthonormal random matrix andR ∈ R600×600 is a random rotation matrix.The data matrix X = [X1,X2,X3,X4] ∈ R600×400 is obtained bypicking 100 samples from each subspace. That is, Xi ∈ R600×100,i = 1,2,3,4.Next, the data matrix is partitioned into the training matrixA ∈ R600×360 and the test matrix Y ∈ R600×40 by employing a10-fold cross validation.

Sparse and Low Rank Representations in Music Signal Analysis 26/54

Page 65: Sparse and Low Rank Representations in Music Signal  Analysis

Example 1

JSLRR, LRR, SR matrices Z ∈ R360×40

Sparse and Low Rank Representations in Music Signal Analysis 27/54

Page 66: Sparse and Low Rank Representations in Music Signal  Analysis

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 Conclusions

Sparse and Low Rank Representations in Music Signal Analysis 28/54

Page 67: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

RevisitingThe data are approximately drawn from a union of subspaces. Thedeviations from the ideal assumptions can be treated collectivelyas additive noise contaminating the ideal model, i.e., Y = AZ + E.The noise term E models both small (but densely supported)deviations and grossly (but sparse) corrupted observations (i.e.,outliers or missing data).In the presence of noise, both the rank and the density of therepresentation matrix Z increases, since the columns in Z containnon-zero elements associated to more than one class.If one requests to reduce the rank of Z or to increase the sparsityof Z, the noise in the test set can be smoothed and Zsimultaneously admits a close to block-diagonal structure.

Sparse and Low Rank Representations in Music Signal Analysis 29/54

Page 68: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

RevisitingThe data are approximately drawn from a union of subspaces. Thedeviations from the ideal assumptions can be treated collectivelyas additive noise contaminating the ideal model, i.e., Y = AZ + E.The noise term E models both small (but densely supported)deviations and grossly (but sparse) corrupted observations (i.e.,outliers or missing data).In the presence of noise, both the rank and the density of therepresentation matrix Z increases, since the columns in Z containnon-zero elements associated to more than one class.If one requests to reduce the rank of Z or to increase the sparsityof Z, the noise in the test set can be smoothed and Zsimultaneously admits a close to block-diagonal structure.

Sparse and Low Rank Representations in Music Signal Analysis 29/54

Page 69: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

RevisitingThe data are approximately drawn from a union of subspaces. Thedeviations from the ideal assumptions can be treated collectivelyas additive noise contaminating the ideal model, i.e., Y = AZ + E.The noise term E models both small (but densely supported)deviations and grossly (but sparse) corrupted observations (i.e.,outliers or missing data).In the presence of noise, both the rank and the density of therepresentation matrix Z increases, since the columns in Z containnon-zero elements associated to more than one class.If one requests to reduce the rank of Z or to increase the sparsityof Z, the noise in the test set can be smoothed and Zsimultaneously admits a close to block-diagonal structure.

Sparse and Low Rank Representations in Music Signal Analysis 29/54

Page 70: Sparse and Low Rank Representations in Music Signal  Analysis

JSLRR

RevisitingThe data are approximately drawn from a union of subspaces. Thedeviations from the ideal assumptions can be treated collectivelyas additive noise contaminating the ideal model, i.e., Y = AZ + E.The noise term E models both small (but densely supported)deviations and grossly (but sparse) corrupted observations (i.e.,outliers or missing data).In the presence of noise, both the rank and the density of therepresentation matrix Z increases, since the columns in Z containnon-zero elements associated to more than one class.If one requests to reduce the rank of Z or to increase the sparsityof Z, the noise in the test set can be smoothed and Zsimultaneously admits a close to block-diagonal structure.

Sparse and Low Rank Representations in Music Signal Analysis 29/54

Page 71: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

Optimization ProblemA solution is sought for the convex optimization problem:

Robust JSLRR: argminZ,E

‖Z‖∗ + θ1‖Z‖1 + θ2‖E‖2,1

subject to Y = A Z + E, (7)

where θ2 > 0 is a regularization parameter and ‖.‖2,1 denotes the`2/`1 norm.Problem (7) can be solved iteratively by employing the LinearizedAlternating Direction Augmented Lagrange Multiplier (LADALM)methoda, a variant of the Alternating Direction AugmentedLagrange Multiplier methodb.

aJ. Yang and X. M. Yuan,“Linearized augmented Lagrangian and alternating direction methods for nuclear norm

minimization,” Math. Comput., (to appear) 2011.b

D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, Belmont, MA,2/e, 1996.

Sparse and Low Rank Representations in Music Signal Analysis 30/54

Page 72: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

Optimization ProblemA solution is sought for the convex optimization problem:

Robust JSLRR: argminZ,E

‖Z‖∗ + θ1‖Z‖1 + θ2‖E‖2,1

subject to Y = A Z + E, (7)

where θ2 > 0 is a regularization parameter and ‖.‖2,1 denotes the`2/`1 norm.Problem (7) can be solved iteratively by employing the LinearizedAlternating Direction Augmented Lagrange Multiplier (LADALM)methoda, a variant of the Alternating Direction AugmentedLagrange Multiplier methodb.

aJ. Yang and X. M. Yuan,“Linearized augmented Lagrangian and alternating direction methods for nuclear norm

minimization,” Math. Comput., (to appear) 2011.b

D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, Belmont, MA,2/e, 1996.

Sparse and Low Rank Representations in Music Signal Analysis 30/54

Page 73: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

LADALMThat is, one solves

argminJ,Z,W,E

‖J‖∗ + θ1‖W‖1 + θ2‖E‖2,1

subject to Y = A Z + E, Z = J, J = W, (8)

by minimizing the augmented Lagrangian function:

L(J,Z,W,E,Λ1,Λ2,Λ3) = ‖J‖∗ + θ1‖W‖1 + θ2‖E‖2,1+tr(ΛT

1 (Y− AZ− E))

+ tr(ΛT

2 (Z− J))

+ tr(ΛT

3 (J−W))

2

(‖Y− AZ− E‖2F + ‖Z− J‖2F + ‖J−W‖2F

), (9)

where Λ1,Λ2, and Λ3 are the Lagrange multipliers and µ > 0 is apenalty parameter.

Sparse and Low Rank Representations in Music Signal Analysis 31/54

Page 74: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

LADALMThat is, one solves

argminJ,Z,W,E

‖J‖∗ + θ1‖W‖1 + θ2‖E‖2,1

subject to Y = A Z + E, Z = J, J = W, (8)

by minimizing the augmented Lagrangian function:

L(J,Z,W,E,Λ1,Λ2,Λ3) = ‖J‖∗ + θ1‖W‖1 + θ2‖E‖2,1+tr(ΛT

1 (Y− AZ− E))

+ tr(ΛT

2 (Z− J))

+ tr(ΛT

3 (J−W))

2

(‖Y− AZ− E‖2F + ‖Z− J‖2F + ‖J−W‖2F

), (9)

where Λ1,Λ2, and Λ3 are the Lagrange multipliers and µ > 0 is apenalty parameter.

Sparse and Low Rank Representations in Music Signal Analysis 31/54

Page 75: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

Optimization with respect to J[t]

J[t+1] = argminJ[t]

L(J[t],Z[t],W[t],E[t],Λ1[t],Λ2[t],Λ3[t])

≈ argminJ[t]

1µ‖J[t]‖∗

+12‖J[t] − (Z[t] − J[t] − Λ3[t]/µ+ W[t] + Λ2[t]/µ)‖2F

J[t+1] ← Dµ−1

[Z[t] − J[t] − Λ3[t]/µ+ W[t] + Λ2[t]/µ

]. (10)

The solution is obtained via the singular value thresholding operatordefined for any matrix Q as Dτ [Q] = USτVT with Q = UΣVT being thesingular value decomposition and Sτ [q] = sgn(q)max(|q| − τ,0) beingthe shrinkage operator.

Sparse and Low Rank Representations in Music Signal Analysis 32/54

Page 76: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

Optimization with respect to Z[t]

Z[t+1] = argminZ[t]

L(J[t+1],Z[t],W[t],E[t],Λ1[t],Λ2[t],Λ3[t])

Z[t+1] =(

I + AT A)−1

(AT (Y− E[t]) + J[t+1] + (ATΛ1[t] − Λ2[t])/µ

). (11)

i.e., an unconstrained least squares problem.

Sparse and Low Rank Representations in Music Signal Analysis 32/54

Page 77: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

Optimization with respect to W[t]

W[t+1] = argminW[t]

L(J[t+1],Z[t+1],W[t],E[t],Λ1[t],Λ2[t],Λ3[t])

= argminW[t]

θ1

µ‖W[t]‖1 +

12‖W[t] − (J[t+1] + Λ3[t]/µ)‖2F

W[t+1] ← Sθ1µ−1

[J[t+1] + Λ3[t]/µ

]. (12)

Sparse and Low Rank Representations in Music Signal Analysis 32/54

Page 78: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

Optimization with respect to E[t]

E[t+1] = argminE[t]

L(J[t+1],Z[t+1],W[t+1],E[t],Λ1[t],Λ2[t],Λ3[t])

= argminE[t]

θ2

µ‖E[t]‖2,1 +

+12‖E[t] − (Y− AZ[t+1] + Λ1[t]/µ)‖2F . (13)

Let M[t] = Y− AZ[t+1] + Λ1[t]/µ. Update E[t+1] column-wise as follows:

ej[t+1] ← Sθ2µ−1

[‖mj[t]‖2

] mj[t]

‖mj[t]‖2. (14)

Sparse and Low Rank Representations in Music Signal Analysis 32/54

Page 79: Sparse and Low Rank Representations in Music Signal  Analysis

Robust JSLRR

Updating of Lagrange multiplier matrices

Λ1[t+1] = Λ1[t] + µ(Y− AZ[t+1] − E[t+1]),

Λ2[t+1] = Λ2[t] + µ(Z[t+1] − J[t+1]),

Λ3[t+1] = Λ3[t] + µ(J[t+1] −W[t+1]). (15)

Sparse and Low Rank Representations in Music Signal Analysis 32/54

Page 80: Sparse and Low Rank Representations in Music Signal  Analysis

Special cases

Robust joint SR (JSR)The solution of the convex optimization problem is sought:

Robust JSR: argminZ,E

‖Z‖1 + θ2‖E‖2,1 subject to Y = A Z + E.

(16)

(16) takes into account the correlations between the test samples,while seeking to jointly represent the test samples from a specificclass by a few columns of the training matrix.

L1(Z,J,E,Λ1,Λ2) = ‖J‖1 + θ2‖E‖2,1 + tr(ΛT

1 (Y− AZ− E))

+tr(ΛT

2 (Z− J))

2

(‖Y− AZ− E‖2F + ‖Z− J‖2F

), (17)

where Λ1,Λ2 are Lagrange multipliers and µ > 0 is a penaltyparameter.

Sparse and Low Rank Representations in Music Signal Analysis 33/54

Page 81: Sparse and Low Rank Representations in Music Signal  Analysis

Special cases

Robust joint SR (JSR)The solution of the convex optimization problem is sought:

Robust JSR: argminZ,E

‖Z‖1 + θ2‖E‖2,1 subject to Y = A Z + E.

(16)

(16) takes into account the correlations between the test samples,while seeking to jointly represent the test samples from a specificclass by a few columns of the training matrix.

L1(Z,J,E,Λ1,Λ2) = ‖J‖1 + θ2‖E‖2,1 + tr(ΛT

1 (Y− AZ− E))

+tr(ΛT

2 (Z− J))

2

(‖Y− AZ− E‖2F + ‖Z− J‖2F

), (17)

where Λ1,Λ2 are Lagrange multipliers and µ > 0 is a penaltyparameter.

Sparse and Low Rank Representations in Music Signal Analysis 33/54

Page 82: Sparse and Low Rank Representations in Music Signal  Analysis

Special cases

Robust joint SR (JSR)The solution of the convex optimization problem is sought:

Robust JSR: argminZ,E

‖Z‖1 + θ2‖E‖2,1 subject to Y = A Z + E.

(16)

(16) takes into account the correlations between the test samples,while seeking to jointly represent the test samples from a specificclass by a few columns of the training matrix.

L1(Z,J,E,Λ1,Λ2) = ‖J‖1 + θ2‖E‖2,1 + tr(ΛT

1 (Y− AZ− E))

+tr(ΛT

2 (Z− J))

2

(‖Y− AZ− E‖2F + ‖Z− J‖2F

), (17)

where Λ1,Λ2 are Lagrange multipliers and µ > 0 is a penaltyparameter.

Sparse and Low Rank Representations in Music Signal Analysis 33/54

Page 83: Sparse and Low Rank Representations in Music Signal  Analysis

Special cases

Robust LRRThe solution of the convex optimization problem is sought:

Robust LRR: argminZ,E

‖Z‖∗ + θ2‖E‖2,1 subject to Y = A Z + E.

(18)

by minimizing an augmented Lagrangian function similar to (17)where the first term ‖J‖1 is replaced by ‖J‖∗.

Sparse and Low Rank Representations in Music Signal Analysis 34/54

Page 84: Sparse and Low Rank Representations in Music Signal  Analysis

Special cases

Robust LRRThe solution of the convex optimization problem is sought:

Robust LRR: argminZ,E

‖Z‖∗ + θ2‖E‖2,1 subject to Y = A Z + E.

(18)

by minimizing an augmented Lagrangian function similar to (17)where the first term ‖J‖1 is replaced by ‖J‖∗.

Sparse and Low Rank Representations in Music Signal Analysis 34/54

Page 85: Sparse and Low Rank Representations in Music Signal  Analysis

Example 2

Noisy case4 linear pairwise independent subspaces are constructed as in theExample 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 areobtained.We pick randomly 50 columns of A and we replace them by alinear combination of randomly chosen vectors from twosubspaces with random weights. Thus, the training set is nowcontaminated by outliers.The 5th column of the test matrix Y is replaced by a linearcombination of vectors not drawn from any of the 4 subspacesand the 15th column of Y is replaced by a vector drawn from the1st and the 4th subspace, as previously said.

Sparse and Low Rank Representations in Music Signal Analysis 35/54

Page 86: Sparse and Low Rank Representations in Music Signal  Analysis

Example 2

Noisy case4 linear pairwise independent subspaces are constructed as in theExample 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 areobtained.We pick randomly 50 columns of A and we replace them by alinear combination of randomly chosen vectors from twosubspaces with random weights. Thus, the training set is nowcontaminated by outliers.The 5th column of the test matrix Y is replaced by a linearcombination of vectors not drawn from any of the 4 subspacesand the 15th column of Y is replaced by a vector drawn from the1st and the 4th subspace, as previously said.

Sparse and Low Rank Representations in Music Signal Analysis 35/54

Page 87: Sparse and Low Rank Representations in Music Signal  Analysis

Example 2

Noisy case4 linear pairwise independent subspaces are constructed as in theExample 1 and the matrices A ∈ R600×360 and Y ∈ R600×40 areobtained.We pick randomly 50 columns of A and we replace them by alinear combination of randomly chosen vectors from twosubspaces with random weights. Thus, the training set is nowcontaminated by outliers.The 5th column of the test matrix Y is replaced by a linearcombination of vectors not drawn from any of the 4 subspacesand the 15th column of Y is replaced by a vector drawn from the1st and the 4th subspace, as previously said.

Sparse and Low Rank Representations in Music Signal Analysis 35/54

Page 88: Sparse and Low Rank Representations in Music Signal  Analysis

Example 2

Representation matrices (zoom in the 5th and 15th test samples)

Sparse and Low Rank Representations in Music Signal Analysis 36/54

Page 89: Sparse and Low Rank Representations in Music Signal  Analysis

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 Conclusions

Sparse and Low Rank Representations in Music Signal Analysis 37/54

Page 90: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Algorithm

Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .Output: A class label for each column of Y.

1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .2 for m = 1 to M3 ym = ym − em.4 for k = 1 to K5 Compute the residuals rk (ym) = ‖ym − A δk (zm)‖2.6 end for7 class(ym) = argmink rk (ym).8 end for

Sparse and Low Rank Representations in Music Signal Analysis 38/54

Page 91: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Algorithm

Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .Output: A class label for each column of Y.

1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .2 for m = 1 to M3 ym = ym − em.4 for k = 1 to K5 Compute the residuals rk (ym) = ‖ym − A δk (zm)‖2.6 end for7 class(ym) = argmink rk (ym).8 end for

Sparse and Low Rank Representations in Music Signal Analysis 38/54

Page 92: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Algorithm

Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .Output: A class label for each column of Y.

1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .2 for m = 1 to M3 ym = ym − em.4 for k = 1 to K5 Compute the residuals rk (ym) = ‖ym − A δk (zm)‖2.6 end for7 class(ym) = argmink rk (ym).8 end for

Sparse and Low Rank Representations in Music Signal Analysis 38/54

Page 93: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Algorithm

Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .Output: A class label for each column of Y.

1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .2 for m = 1 to M3 ym = ym − em.4 for k = 1 to K5 Compute the residuals rk (ym) = ‖ym − A δk (zm)‖2.6 end for7 class(ym) = argmink rk (ym).8 end for

Sparse and Low Rank Representations in Music Signal Analysis 38/54

Page 94: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Algorithm

Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .Output: A class label for each column of Y.

1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .2 for m = 1 to M3 ym = ym − em.4 for k = 1 to K5 Compute the residuals rk (ym) = ‖ym − A δk (zm)‖2.6 end for7 class(ym) = argmink rk (ym).8 end for

Sparse and Low Rank Representations in Music Signal Analysis 38/54

Page 95: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Algorithm

Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .Output: A class label for each column of Y.

1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .2 for m = 1 to M3 ym = ym − em.4 for k = 1 to K5 Compute the residuals rk (ym) = ‖ym − A δk (zm)‖2.6 end for7 class(ym) = argmink rk (ym).8 end for

Sparse and Low Rank Representations in Music Signal Analysis 38/54

Page 96: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Algorithm

Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .Output: A class label for each column of Y.

1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .2 for m = 1 to M3 ym = ym − em.4 for k = 1 to K5 Compute the residuals rk (ym) = ‖ym − A δk (zm)‖2.6 end for7 class(ym) = argmink rk (ym).8 end for

Sparse and Low Rank Representations in Music Signal Analysis 38/54

Page 97: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Algorithm

Input: Training matrix A ∈ Rd×N and test matrix Y ∈ Rd×M .Output: A class label for each column of Y.

1 Solve (8) to obtain Z ∈ RN×M and E ∈ Rd×M .2 for m = 1 to M3 ym = ym − em.4 for k = 1 to K5 Compute the residuals rk (ym) = ‖ym − A δk (zm)‖2.6 end for7 class(ym) = argmink rk (ym).8 end for

Sparse and Low Rank Representations in Music Signal Analysis 38/54

Page 98: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Linearity concentration index

The LCI of a coefficient vector zm ∈ RN associated to the mth testsample is defined as

LCI(zm) =K ·maxk‖δk (zm)‖2/‖zm‖2 − 1

K − 1∈ [0,1]. (19)

If LCI(zm) = 1, the test sample is drawn from a single subspace. IfLCI(zm) = 0 the test sample is drawn evenly from all subspaces.By choosing a threshold c ∈ (0,1), the mth test sample is claimedto be valid if LCI(zm) > c. Otherwise, the test sample can beeither rejected as totally invalid (for very small values of LCI(zm))or be classified into multiple classes by assigning to it the labelsassociated to the larger values ‖δk (zm)‖2.

Sparse and Low Rank Representations in Music Signal Analysis 39/54

Page 99: Sparse and Low Rank Representations in Music Signal  Analysis

Joint sparse low-rank subspace-based classification

Linearity concentration index

The LCI of a coefficient vector zm ∈ RN associated to the mth testsample is defined as

LCI(zm) =K ·maxk‖δk (zm)‖2/‖zm‖2 − 1

K − 1∈ [0,1]. (19)

If LCI(zm) = 1, the test sample is drawn from a single subspace. IfLCI(zm) = 0 the test sample is drawn evenly from all subspaces.By choosing a threshold c ∈ (0,1), the mth test sample is claimedto be valid if LCI(zm) > c. Otherwise, the test sample can beeither rejected as totally invalid (for very small values of LCI(zm))or be classified into multiple classes by assigning to it the labelsassociated to the larger values ‖δk (zm)‖2.

Sparse and Low Rank Representations in Music Signal Analysis 39/54

Page 100: Sparse and Low Rank Representations in Music Signal  Analysis

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 Conclusions

Sparse and Low Rank Representations in Music Signal Analysis 40/54

Page 101: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification: Datasets and evaluationprocedure

GTZAN dataset1000 audio recordings 30 seconds longa;

10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;

Each genre class contains 100 audio recordings.

The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.

aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio

Processing, vol. 10, no. 5, pp. 293–302, July 2002.

Sparse and Low Rank Representations in Music Signal Analysis 41/54

Page 102: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification: Datasets and evaluationprocedure

ISMIR 2004 Genre dataset1458 full audio recordings;

6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).

Sparse and Low Rank Representations in Music Signal Analysis 41/54

Page 103: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification: Datasets and evaluationprocedure

ProtocolsGTZAN dataset: stratified 10-fold cross-validation: Each training setconsists of 900 audio recordings yielding a training matrix AGTZAN .

ISMIR 2004 Genre dataset: The ISMIR2004 Audio Description Contestprotocol defines training and evaluation sets, which consist of 729audio files each.

Sparse and Low Rank Representations in Music Signal Analysis 41/54

Page 104: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification: Datasets and evaluationprocedure

ClassifiersJSLRSC, the JSSC, and the LRSC;

SRCa with the coefficients estimated by the LASSOb;

linear regression classifier (LRC)c;

the SVM with a linear kernel, and the NN classifier with the cosinesimilarity.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE

Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.b

R. Tibshirani, “Regression shrinkage and selection via the LASSO,” J. Royal. Statist. Soc B., vol. 58, no. 1, pp.267-288, 1996.

cI. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 32, no. 11, pp. 2106-2112, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 41/54

Page 105: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Parameters θ1 > 0 and θ2 > 0

Sparse and Low Rank Representations in Music Signal Analysis 42/54

Page 106: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Classification accuracyDataset: GTZAN ISMIR 2004 GenreClassifier Parameters Accuracy (%) Parameters Accuracy (%)JSLRSC tuning 90.40 (3.06) tuning 88.75JSSC tuning 88.80 (3.22) tuning 87.51LRSC tuning 88.70 (2.79) tuning 85.18JSLRSC θ1 = 0.2,

θ2 = 0.5,ρ = 1.1

88.30 (2.21) θ1 = 0.5,θ2 = 0.2,ρ = 1.4

86.18

JSSC θ2 = 0.5,ρ = 1.2

88.00 (3.26) θ2 = 0.5,ρ = 1.2

87.51

LRSC θ2 = 0.2,ρ = 1.4

87.70 (2.54) θ2 = 0.2,ρ = 1.4

84.63

SRC - 86.50 (2.46) - 83.95LRC - 87.30 (3.05) - 62.41SVM - 86.20 (2.52) - 83.25NN - 81.30 (2.79) - 79.42

Sparse and Low Rank Representations in Music Signal Analysis 43/54

Page 107: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Comparison with the state-of-the-artDataset: GTZAN ISMIR 2004 Genre

Rank Reference Accuracy (%) Reference Accuracy (%)1) Chang et al.a 92.70 Lee et al.b 86.832) Lee et al. 90.60 Holzapfel et al.c 83.503) Panagakis et al.d 84.30 Panagakis et al. 83.154) Bergstra et al.e 82.50 Pampalk et al. 82.305) Tsunoo et al.f 77.20

aK. Chang, J. S. R. Jang, and C. S. Iliopoulos, “Music genre classification via compressive sampling,” in Proc.

11th Int. Symp. Music Information Retrieval, pp. 387-392, 2010.b

C. H. Lee, J. L. Shih, K. M. Yu, and H. S. Lin, ”Automatic music genre classification based on modulationspectral analysis of spectral and cepstral features,” IEEE Trans. Multimedia, vol. 11, no. 4, pp. 670-682, 2009.

cA. Holzapfel and Y. Stylianou, “Musical genre classification using nonnegative matrix factorization-based

features,” IEEE Trans. Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 424-434, February 2008.d

Y. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis ofauditory temporal modulations for music genre classification,” IEEE Trans. Audio, Speech, and LanguageTechnology, vol. 18, no. 3, pp. 576-588, 2010.

eJ. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kegl, “Aggregate features and AdaBoost for music

classification,” Machine Learning, vol. 65, no. 2-3, pp. 473–484, 2006.fE. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama, “Beyond timbral statistics: Improving music classification

using percussive patterns and bass lines,” IEEE Trans. Audio, Speech, and Language Processing, vol. 19, no. 4, pp.1003-1014, 2011.

Sparse and Low Rank Representations in Music Signal Analysis 44/54

Page 108: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Confusion matrices

Sparse and Low Rank Representations in Music Signal Analysis 45/54

Page 109: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Dimensionality reduction via random projectionsLet the true low dimensionality of the data be denoted by r . Arandom projection matrix, drawn from a normal zero-meandistribution, provides with high probability a stable embeddinga

with the dimensionality of the projection d ′ selected as theminimum value such that d ′ > 2r log(7680/d ′).r is estimated by the robust principal component analysis on atraining set for each dataset.d ′ = 1581 for the GTZAN dataset and d ′ = 1398 for the ISMIR2004 Genre dataset is found.

aR.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal

recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 46/54

Page 110: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Dimensionality reduction via random projectionsLet the true low dimensionality of the data be denoted by r . Arandom projection matrix, drawn from a normal zero-meandistribution, provides with high probability a stable embeddinga

with the dimensionality of the projection d ′ selected as theminimum value such that d ′ > 2r log(7680/d ′).r is estimated by the robust principal component analysis on atraining set for each dataset.d ′ = 1581 for the GTZAN dataset and d ′ = 1398 for the ISMIR2004 Genre dataset is found.

aR.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal

recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 46/54

Page 111: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Dimensionality reduction via random projectionsLet the true low dimensionality of the data be denoted by r . Arandom projection matrix, drawn from a normal zero-meandistribution, provides with high probability a stable embeddinga

with the dimensionality of the projection d ′ selected as theminimum value such that d ′ > 2r log(7680/d ′).r is estimated by the robust principal component analysis on atraining set for each dataset.d ′ = 1581 for the GTZAN dataset and d ′ = 1398 for the ISMIR2004 Genre dataset is found.

aR.G. Baraniuk, V. Cevher, and M.B. Wakin, “Low-dimensional models for dimensionality reduction and signal

recovery: A geometric perspective,” Proceedings of the IEEE, vol. 98, no. 6, pp. 959–971, 2010.

Sparse and Low Rank Representations in Music Signal Analysis 46/54

Page 112: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Accuracy after dimensionality reductionDataset: GTZAN ISMIR 2004 GenreClassifier Parameters Accuracy (%) Parameters Accuracy (%)JSLRSC θ1 = 1.8,

θ2 = 0.7 ,ρ = 1.4

87.5 (2.41) θ1 = 1.5,θ2 = 0.2,ρ = 1.4

85.87

JSSC θ2 = 1.5 ,ρ = 1.2

86.9 (3.28) θ2 = 0.6,ρ = 1.1

87.30

LRSC θ2 = 0.7 ,ρ = 1.4

86.6 (2.75) θ2 = 0.8 ,ρ = 2.4

84.08

SRC - 86.90 (3.21) - 83.67LRC - 85.30 (3.16) - 54.18SVM - 86.00 (2.53) - 83.26NN - 80.80 (3.01) - 78.87

Sparse and Low Rank Representations in Music Signal Analysis 47/54

Page 113: Sparse and Low Rank Representations in Music Signal  Analysis

Music genre classification

Accuracy after rejecting 1 out of 5 test samplesJLRSC achieves classification accuracy 95.51% in GTZAN dataset.For the ISMIR 2004 Genre dataset, the accuracy of the JSSC is92.63%, while that of the JLRSC is 91.55%.

0.29 0.3 0.31 0.32 0.33 0.34 0.3580

82

84

86

88

90

92

94

96

Threshold c

Cla

ssifi

catio

n ac

cura

cy (

%)

JSLRSCJSSCLRSCLRCSRCSVMNN

0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.4880

82

84

86

88

90

92

Threshold c

Cla

ssifi

catio

n ac

cura

cy (

%)

JSLRSCJSSCLRSCLRCSRCSVMNN

Sparse and Low Rank Representations in Music Signal Analysis 48/54

Page 114: Sparse and Low Rank Representations in Music Signal  Analysis

Music structure analysis

Optimization problemGiven a music recording of K music segments be represented bya sequence of beat-synchronous feature vectors X = [x1|x2| . . .|xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing

Let Z = U Σ VT . Define U = U Σ12 . Set M = UUT . Build a

nonnegative symmetric affinity matrix W ∈ RN×N+ with elements

wij = m2ij and apply the normalized cutsa.

aJ. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 22, no. 8, pp. 888-905, 2000.

Sparse and Low Rank Representations in Music Signal Analysis 49/54

Page 115: Sparse and Low Rank Representations in Music Signal  Analysis

Music structure analysis

Optimization problemGiven a music recording of K music segments be represented bya sequence of beat-synchronous feature vectors X = [x1|x2| . . .|xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing

argminZ

λ1 ‖Z‖1 +λ2

2‖Z‖2F subject to X = X Z, zii = 0

Let Z = U Σ VT . Define U = U Σ12 . Set M = UUT . Build a

nonnegative symmetric affinity matrix W ∈ RN×N+ with elements

wij = m2ij and apply the normalized cutsa.

aJ. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 22, no. 8, pp. 888-905, 2000.

Sparse and Low Rank Representations in Music Signal Analysis 49/54

Page 116: Sparse and Low Rank Representations in Music Signal  Analysis

Music structure analysis

Optimization problemGiven a music recording of K music segments be represented bya sequence of beat-synchronous feature vectors X = [x1|x2| . . .|xN ] ∈ Rd×N learn Z ∈ RN×N by minimizing

argminZ,E

λ1‖Z‖1+λ2

2‖Z‖2F +λ3 ‖E‖1 subject to X = X Z + E, zii = 0.

Let Z = U Σ VT . Define U = U Σ12 . Set M = UUT . Build a

nonnegative symmetric affinity matrix W ∈ RN×N+ with elements

wij = m2ij and apply the normalized cutsa.

aJ. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 22, no. 8, pp. 888-905, 2000.

Sparse and Low Rank Representations in Music Signal Analysis 49/54

Page 117: Sparse and Low Rank Representations in Music Signal  Analysis

Music tagging

Optimization problemAssume that the tag-recording matrix Y and the matrix of the ATMrepresentations X are jointly low-rank. Learn a low-rank weight matrixW such that:

argminW,E

‖W‖∗ + λ‖E‖1 subject to Y = W X + E. (20)

Sparse and Low Rank Representations in Music Signal Analysis 50/54

Page 118: Sparse and Low Rank Representations in Music Signal  Analysis

Example 3

Sparse and Low Rank Representations in Music Signal Analysis 51/54

Page 119: Sparse and Low Rank Representations in Music Signal  Analysis

1 Introduction

2 Auditory spectro-temporal modulations

3 Suitable data representations for classification

4 Joint sparse low-rank representations in the ideal case

5 Joint sparse low-rank representations in the presence of noise

6 Joint sparse low-rank subspace-based classification

7 Music signal analysis

8 Conclusions

Sparse and Low Rank Representations in Music Signal Analysis 52/54

Page 120: Sparse and Low Rank Representations in Music Signal  Analysis

Conclusions

Summary-Future WorkA robust framework for solving classification and clusteringproblems in music signal analysis has been developed.In all the three problems addressed, the proposed techniquesachieve either top performance or meet the state-of-the-art.Efficient implementations exploiting incremental update rules aredesparately needed.Performance improvement for small sample sets deserves furtherelaboration.

Sparse and Low Rank Representations in Music Signal Analysis 53/54

Page 121: Sparse and Low Rank Representations in Music Signal  Analysis

Conclusions

Summary-Future WorkA robust framework for solving classification and clusteringproblems in music signal analysis has been developed.In all the three problems addressed, the proposed techniquesachieve either top performance or meet the state-of-the-art.Efficient implementations exploiting incremental update rules aredesparately needed.Performance improvement for small sample sets deserves furtherelaboration.

Sparse and Low Rank Representations in Music Signal Analysis 53/54

Page 122: Sparse and Low Rank Representations in Music Signal  Analysis

Conclusions

Summary-Future WorkA robust framework for solving classification and clusteringproblems in music signal analysis has been developed.In all the three problems addressed, the proposed techniquesachieve either top performance or meet the state-of-the-art.Efficient implementations exploiting incremental update rules aredesparately needed.Performance improvement for small sample sets deserves furtherelaboration.

Sparse and Low Rank Representations in Music Signal Analysis 53/54

Page 123: Sparse and Low Rank Representations in Music Signal  Analysis

Conclusions

Summary-Future WorkA robust framework for solving classification and clusteringproblems in music signal analysis has been developed.In all the three problems addressed, the proposed techniquesachieve either top performance or meet the state-of-the-art.Efficient implementations exploiting incremental update rules aredesparately needed.Performance improvement for small sample sets deserves furtherelaboration.

Sparse and Low Rank Representations in Music Signal Analysis 53/54

Page 124: Sparse and Low Rank Representations in Music Signal  Analysis

Thank You!Questions?

Sparse and Low Rank Representations in Music Signal Analysis 54/54