25
A small footprint for audio and music classification Hamid Eghbal-zadeh 1

A small footprint for audio and music classification

Embed Size (px)

Citation preview

Page 1: A small footprint for audio and music classification

A small footprint foraudio and music classification

Hamid Eghbal-zadeh

1

Page 2: A small footprint for audio and music classification

Outline

1. Introduction

2. I-Vector representation

3. Some results

4. Conclusion

2

Page 3: A small footprint for audio and music classification

INTRODUCTION

3

Page 4: A small footprint for audio and music classification

A small footprint for Audio and Music classification

4

𝑎1𝑎2

𝑎𝑛

.

.

.

Audio Acoustic features Front-end Small footprint Classifier

o Front-end:• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]

Signal processing

Machine learning

Machine learning

Machine learning

Page 5: A small footprint for audio and music classification

5

• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]

Signal processing

Machine learning

Machine learning

Dev:

Train:

Test:

Machine learning

Dev db Universal Background Model(UBM)

Train db

+

UBM Adaptation Adapted UBM params

Classifier

Train

Test db

+

UBM Adaptation Adapted UBM params

Classifier

Test

Train

Page 6: A small footprint for audio and music classification

6

• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]

Signal processing

Machine learning

Machine learning

Dev:

Train:

Test:

Machine learning

Dev db Universal Background Model(UBM)

Train db

+

UBM AdaptationAdapted UBM

paramsClassifier

Train

Test db

+

UBM Adaptation Adapted UBM params

Classifier

Test

Train

Train

Test

Factor analysis

Factor analysis

Page 7: A small footprint for audio and music classification

Effect of Factor Analysis step

7

An example of songs in GTZAN dataset from 3 genres [Eghbal-zadeh, ISMIR2015]:Right: without Factor AnalysisLeft: With Factor Analysis

Artist recognition performance on Artist20 with and Without Factor Analysis [Eghbal-zadeh, Eusipco2015]

Without FA

With FA

Page 8: A small footprint for audio and music classification

8

Other benefits:

• Noise-Robust features [Eghbal-zadeh,ISMIR2016]

• Combined with Neural Nets [Eghbal-zadeh, DAFx2016]

• Successfully used in different tasks:• Speaker verification• Language recognition• Artist recognition• Music similarity• Audio scene classification

Page 9: A small footprint for audio and music classification

Why to apply Factor Analysis?

• They provide an information-rich, fixed-length, low-dimensional representation

• They have a single-Gaussian distribution• We can use the properties of Gaussians

• They can be easily scored• Using cosine distance

• They are the estimated latent factors with a good discrimination power resulted from a Factor Analysis procedure

9

Page 10: A small footprint for audio and music classification

I-VECTOR

REPRESENTATION AS

A SMALLFOOTPRINT

10

Page 11: A small footprint for audio and music classification

11

• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]

Signal processing

Machine learning

Machine learning

Dev:

Train:

Test:

Machine learning

Dev db UBM (GMM)

Train db

+

UBMAdapted GMM params

(statistical representation)Classifier

Train

Test db

+

UBM Classifier

Test

Train

Train

Test

Factor analysis

Factor analysisAdapted GMM params

(statistical representation)

Page 12: A small footprint for audio and music classification

12

Different Factor Analysis approaches:

Adapted GMM mean

UBM mean

Eigenvoice subspace

Hidden vectorM = m + V y

Adapted GMM mean

UBM mean

Song subspace

residualM = m + Vy + Ux + Dz

Artist subspace

Adapted GMM mean

UBM mean

Low-rank matrix model both artist and song together

Hidden vector(i-vector)

M = m + T y

Eigenvoice FA:

Joint Factor Analysis (JFA):

I-vector FA:

Page 13: A small footprint for audio and music classification

13

An example of i-vector based systems

{I-vector extraction}{Cosine score,…}{MFCC}

Extractfeatures

Computestatistics

Extract i-vectors

Post-Processing

{LDA/WCCN/…}

feat

ure

s

Classification

Page 14: A small footprint for audio and music classification

14

Within-Class Covariance Normalization

Averaged i-vectors for class c

𝑖𝑡ℎ i-vectors from class c

Number of i-vectors from class cNumber of classes

WCCN projection matrix

Within-class covariance matrix

Page 15: A small footprint for audio and music classification

15

Within-Class Covariance Normalization

Class B

Class A

WCCN projection

The within-class variabilityIs reduced

Page 16: A small footprint for audio and music classification

Some results

16

Page 17: A small footprint for audio and music classification

• Audio Scene Classification

– DCASE-2016 challenge

– 15 different scenes (30 sec audios from: train, tram, office, outdoor, etc…)

– We won the challenge!!!

• Music Similarity

– GTZAN and 1517Artists

– Eval using genre

• Music Artist Recognition

– Artist20 and MSD

– Noise-robust MAR using 12 different kinds and levels of noise

17

Tasks

Page 18: A small footprint for audio and music classification

• Our approach: an i-vector DNN hybrid (4 submissions Among 49 participants)

– 1st place: hybrid

– 2nd place: i-vector

– 5th place: i-vector

– 14th place: DNN

18

Audio Scene Classification Challenge (𝐃𝐂𝐀𝐒𝐄 − 𝟐𝟎𝟏𝟔[𝟏])

[1] http://www.cs.tut.fi/sgn/arg/dcase2016/

Page 19: A small footprint for audio and music classification

• UBM trained on 1517Artists db, tested on GTZAN

• I-vectors are extracted unsupervised

• Evaluated with genre labels

19

Music Similarity [ISMIR-2015]

Page 20: A small footprint for audio and music classification

• Artist20 db– 20 artists

– 1413 songs

20

Music Artist Recognition [Eusipco-2015]

Page 21: A small footprint for audio and music classification

• MSD db– 50 Artists

– 5,000 songs

21

Music Artist Recognition [DAFx-2016]

CDB-Net

Experiment 2 – Raw i-vectors

Page 22: A small footprint for audio and music classification

• Artist20 db– 4 different noises :

• festival noise

• humming noise

• pink noise

• PUB noise

– 3 different SNR levels

22

Noise-Robust Music Artist Recognition [ISMIR-2016]

Page 23: A small footprint for audio and music classification

Conclusion

23

Page 24: A small footprint for audio and music classification

Conclusion:

• A small footprint using FA

• Useful for different audio and music related tasks

• Robustness against noise

• Useful as Neural Net features

24

Page 25: A small footprint for audio and music classification

Thank you

for your time

25