A small footprint for audio and music classification

A small footprint foraudio and music classification

Hamid Eghbal-zadeh

Outline

1. Introduction

2. I-Vector representation

3. Some results

4. Conclusion

INTRODUCTION

A small footprint for Audio and Music classification

𝑎1𝑎2

𝑎𝑛

Audio Acoustic features Front-end Small footprint Classifier

o Front-end:• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]

Signal processing

Machine learning

• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]

Signal processing

Machine learning

Train:

Machine learning

Dev db Universal Background Model(UBM)

Train db

UBM Adaptation Adapted UBM params

Classifier

Test db

Classifier

Signal processing

Machine learning

Train:

Machine learning

Dev db Universal Background Model(UBM)

Train db

UBM AdaptationAdapted UBM

paramsClassifier

Test db

Classifier

Factor analysis

Effect of Factor Analysis step

An example of songs in GTZAN dataset from 3 genres [Eghbal-zadeh, ISMIR2015]:Right: without Factor AnalysisLeft: With Factor Analysis

Artist recognition performance on Artist20 with and Without Factor Analysis [Eghbal-zadeh, Eusipco2015]

Without FA

With FA

Other benefits:

• Noise-Robust features [Eghbal-zadeh,ISMIR2016]

• Combined with Neural Nets [Eghbal-zadeh, DAFx2016]

• Successfully used in different tasks:• Speaker verification• Language recognition• Artist recognition• Music similarity• Audio scene classification

Why to apply Factor Analysis?

• They provide an information-rich, fixed-length, low-dimensional representation

• They have a single-Gaussian distribution• We can use the properties of Gaussians

• They can be easily scored• Using cosine distance

• They are the estimated latent factors with a good discrimination power resulted from a Factor Analysis procedure

I-VECTOR

REPRESENTATION AS

A SMALLFOOTPRINT

Signal processing

Machine learning

Train:

Machine learning

Dev db UBM (GMM)

Train db

UBMAdapted GMM params

(statistical representation)Classifier

Test db

UBM Classifier

Factor analysis

Factor analysisAdapted GMM params

(statistical representation)

Different Factor Analysis approaches:

Adapted GMM mean

UBM mean

Eigenvoice subspace

Hidden vectorM = m + V y

Adapted GMM mean

UBM mean

Song subspace

residualM = m + Vy + Ux + Dz

Artist subspace

Adapted GMM mean

UBM mean

Low-rank matrix model both artist and song together

Hidden vector(i-vector)

M = m + T y

Eigenvoice FA:

Joint Factor Analysis (JFA):

I-vector FA:

An example of i-vector based systems

{I-vector extraction}{Cosine score,…}{MFCC}

Extractfeatures

Computestatistics

Extract i-vectors

Post-Processing

{LDA/WCCN/…}

Classification

Within-Class Covariance Normalization

Averaged i-vectors for class c

𝑖𝑡ℎ i-vectors from class c

Number of i-vectors from class cNumber of classes

WCCN projection matrix

Within-class covariance matrix

Within-Class Covariance Normalization

Class B

Class A

WCCN projection

The within-class variabilityIs reduced

Some results

• Audio Scene Classification

– DCASE-2016 challenge

– 15 different scenes (30 sec audios from: train, tram, office, outdoor, etc…)

– We won the challenge!!!

• Music Similarity

– GTZAN and 1517Artists

– Eval using genre

• Music Artist Recognition

– Artist20 and MSD

– Noise-robust MAR using 12 different kinds and levels of noise

• Our approach: an i-vector DNN hybrid (4 submissions Among 49 participants)

– 1st place: hybrid

– 2nd place: i-vector

– 5th place: i-vector

– 14th place: DNN

Audio Scene Classification Challenge (𝐃𝐂𝐀𝐒𝐄 − 𝟐𝟎𝟏𝟔[𝟏])

[1] http://www.cs.tut.fi/sgn/arg/dcase2016/

• UBM trained on 1517Artists db, tested on GTZAN

• I-vectors are extracted unsupervised

• Evaluated with genre labels

Music Similarity [ISMIR-2015]

• Artist20 db– 20 artists

– 1413 songs

Music Artist Recognition [Eusipco-2015]

• MSD db– 50 Artists

– 5,000 songs

Music Artist Recognition [DAFx-2016]

CDB-Net

Experiment 2 – Raw i-vectors

• Artist20 db– 4 different noises :

• festival noise

• humming noise

• pink noise

• PUB noise

– 3 different SNR levels

Noise-Robust Music Artist Recognition [ISMIR-2016]

Conclusion

Conclusion:

• A small footprint using FA

• Useful for different audio and music related tasks

• Robustness against noise

• Useful as Neural Net features

Thank you

for your time

A small footprint for audio and music classification

Technology

AUTOMATIC CLASSIFICATION OF ELECTRONIC MUSIC AND …isle.Illinois.edu/sst/pubs/2014/chen14thesis.pdfAUTOMATIC CLASSIFICATION OF ELECTRONIC MUSIC AND SPEECH/MUSIC AUDIO CONTENT BY AUSTIN

Automatic Classification of Audio Data€¦ · Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference

INTEGRATED AMPLIFIER · Roksan Audio ltd CLASSIFICATION: General Use E&OE 9 —1— ENGLISH Roksan Audio ltd Kandy K2 Integrated Amplifier User Manual CLASSIFICATION: General Use

Music Genre Classification and Variance Comparison on ...cs229.stanford.edu/proj2013/FranciscoKim-Music... · The MARSYAS (Music Audio Retrieval and Synthesis for Audio Signals) open

BFT Analog Audio Classification Using Device Impedance Characteristics

Content analysis for audio classification and segmentation - Speech · PDF file · 2017-03-22segmenting an audio stream into speech, music, environment sound, ... works on audio content

Audio signal classification - TUT · Audio signal classification Klapuri ... – most important for general audio classification – basis for speech and speaker recognition. Classification

Automatic Music Genre Classification of Audio Signals George Tzanetakis, Georg Essl & Perry Cook

MOODetector: A System for Mood-based Classification and Retrieval of Audio Music 2010

AUDIO SET CLASSIFICATION WITH ATTENTION MODEL: A ... · Audio Set classiﬁcation, a bag is a collection of Lfeatures from an audio clip. Each instance x nl 2RM is a feature, where

Adaptive Speaker Identiﬂcation with AudioVisual …mcl.usc.edu/wp-content/uploads/2014/01/200403-Adaptive...3 Shot Detection Audio Stream Visual Stream I Shot-based Audio Classification

Simple yet Versatile - Yamaha Corporation...DIGITAL AUDIO POWER AMPLIFIER Substantially Reduced System Footprint YDA176 integrates functions previously performed by system components

Audio Segmentation and Classification€¦ · sections: a signal processing section and a classification section. The signal processing part deals with the extraction of features

Understanding Emotion Classification In Audio Data

DNN-BASED AUDIO SCENE CLASSIFICATION FOR DCASE … · DNN-BASED AUDIO SCENE CLASSIFICATION FOR DCASE 2017:DUAL INPUT FEATURES, BALANCING COST, AND STOCHASTIC DATA DUPLICATION Jee-Weon

Evaluation of LiDAR and image segmentation based ...ltoma/teaching/cs3225-GIS... · image segmentation based classification techniques for automatic building footprint extraction

Classification of Vehicles Based on Audio Signals

History and Future of Audio Signal Processing to search and manage di gitized audiovisual content Requires high-level audio processing (sound recognition, classification,…) • MPEG-21

Urban Sound Event Classification for Audio-Based

Classification of Musical Instruments from Audio Printing