Upload
hamid-eghbal-zadeh
View
191
Download
1
Embed Size (px)
Citation preview
A small footprint foraudio and music classification
Hamid Eghbal-zadeh
1
Outline
1. Introduction
2. I-Vector representation
3. Some results
4. Conclusion
2
INTRODUCTION
3
A small footprint for Audio and Music classification
4
𝑎1𝑎2
𝑎𝑛
.
.
.
Audio Acoustic features Front-end Small footprint Classifier
o Front-end:• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Machine learning
5
• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model(UBM)
Train db
+
UBM Adaptation Adapted UBM params
Classifier
Train
Test db
+
UBM Adaptation Adapted UBM params
Classifier
Test
Train
6
• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model(UBM)
Train db
+
UBM AdaptationAdapted UBM
paramsClassifier
Train
Test db
+
UBM Adaptation Adapted UBM params
Classifier
Test
Train
Train
Test
Factor analysis
Factor analysis
Effect of Factor Analysis step
7
An example of songs in GTZAN dataset from 3 genres [Eghbal-zadeh, ISMIR2015]:Right: without Factor AnalysisLeft: With Factor Analysis
Artist recognition performance on Artist20 with and Without Factor Analysis [Eghbal-zadeh, Eusipco2015]
Without FA
With FA
8
Other benefits:
• Noise-Robust features [Eghbal-zadeh,ISMIR2016]
• Combined with Neural Nets [Eghbal-zadeh, DAFx2016]
• Successfully used in different tasks:• Speaker verification• Language recognition• Artist recognition• Music similarity• Audio scene classification
Why to apply Factor Analysis?
• They provide an information-rich, fixed-length, low-dimensional representation
• They have a single-Gaussian distribution• We can use the properties of Gaussians
• They can be easily scored• Using cosine distance
• They are the estimated latent factors with a good discrimination power resulted from a Factor Analysis procedure
9
I-VECTOR
REPRESENTATION AS
A SMALLFOOTPRINT
10
11
• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db UBM (GMM)
Train db
+
UBMAdapted GMM params
(statistical representation)Classifier
Train
Test db
+
UBM Classifier
Test
Train
Train
Test
Factor analysis
Factor analysisAdapted GMM params
(statistical representation)
12
Different Factor Analysis approaches:
Adapted GMM mean
UBM mean
Eigenvoice subspace
Hidden vectorM = m + V y
Adapted GMM mean
UBM mean
Song subspace
residualM = m + Vy + Ux + Dz
Artist subspace
Adapted GMM mean
UBM mean
Low-rank matrix model both artist and song together
Hidden vector(i-vector)
M = m + T y
Eigenvoice FA:
Joint Factor Analysis (JFA):
I-vector FA:
13
An example of i-vector based systems
{I-vector extraction}{Cosine score,…}{MFCC}
Extractfeatures
Computestatistics
Extract i-vectors
Post-Processing
{LDA/WCCN/…}
feat
ure
s
Classification
14
Within-Class Covariance Normalization
Averaged i-vectors for class c
𝑖𝑡ℎ i-vectors from class c
Number of i-vectors from class cNumber of classes
WCCN projection matrix
Within-class covariance matrix
15
Within-Class Covariance Normalization
Class B
Class A
WCCN projection
The within-class variabilityIs reduced
Some results
16
• Audio Scene Classification
– DCASE-2016 challenge
– 15 different scenes (30 sec audios from: train, tram, office, outdoor, etc…)
– We won the challenge!!!
• Music Similarity
– GTZAN and 1517Artists
– Eval using genre
• Music Artist Recognition
– Artist20 and MSD
– Noise-robust MAR using 12 different kinds and levels of noise
17
Tasks
• Our approach: an i-vector DNN hybrid (4 submissions Among 49 participants)
– 1st place: hybrid
– 2nd place: i-vector
– 5th place: i-vector
– 14th place: DNN
18
Audio Scene Classification Challenge (𝐃𝐂𝐀𝐒𝐄 − 𝟐𝟎𝟏𝟔[𝟏])
[1] http://www.cs.tut.fi/sgn/arg/dcase2016/
• UBM trained on 1517Artists db, tested on GTZAN
• I-vectors are extracted unsupervised
• Evaluated with genre labels
19
Music Similarity [ISMIR-2015]
• Artist20 db– 20 artists
– 1413 songs
20
Music Artist Recognition [Eusipco-2015]
• MSD db– 50 Artists
– 5,000 songs
21
Music Artist Recognition [DAFx-2016]
CDB-Net
Experiment 2 – Raw i-vectors
• Artist20 db– 4 different noises :
• festival noise
• humming noise
• pink noise
• PUB noise
– 3 different SNR levels
22
Noise-Robust Music Artist Recognition [ISMIR-2016]
Conclusion
23
Conclusion:
• A small footprint using FA
• Useful for different audio and music related tasks
• Robustness against noise
• Useful as Neural Net features
24
Thank you
for your time
25