Speaker Recognition using Gaussian Mixture Model

GMMGaussian mixture models

04/10/2023

Saurab Dulal

IOE, pulchowk Campus

Introduction to GMM• Gaussian“Gaussian is a

characteristic symmetric "bell curve" shape that quickly falls off towards 0 (practically)”

• Mixture Model“mixture model is a

probabilistic model which assumes the underlying data to belong to a mixture distribution”

Introduction to GMM• Mathematical Description of GMM

p(x) = w1 p1 (x) + w2p2 (x) + w3 p3 (x) ……… +wn pn (x)

where p(x) = mixture component

w1, w2 ….. wn = mixture weight or mixture coefficient

pi (x) = Density functions

Fig :- Image

showing

Best fit

Gaussian

Introduction to GMM“The most common mixture distribution is the Gaussian

(Normal) density function, in which each of the mixture components are Gaussian distributions, each with their own mean and variance parameters.”

p(x) = w1N( x | µ1∑1 )+ w1N( x | µ2∑2 )… +w1N( x | µn∑n )

µi ‘s are means and ∑i ‘s are covariance-matrix of individual components(probability density function)

G1,w1 G2,w2

-5 0 5 100

Component 1 Component 2p(

-5 0 5 100

Mixture Model

-5 0 5 100

Component 1 Component 2p(

-5 0 5 100

Mixture Model

-5 0 5 100

Component Modelsp(

-5 0 5 100

Mixture Model

GMM for Speaker Recognition

Motivation • Interpretation that Gaussian component

represent some general speaker –dependent spectral shapes

• Capabilities of Gaussian mixture to model arbitrary densities

Description of SR-using GMM

• Speech Analysis• Model Description• Model Interpretations• Maximum Likelihood Parameters Estimation• Speaker Identification

Speech Analysis

• Linear predictive coding(LPC)• Mel-scale filter-bank(to reducenoise)

Analysis is ended with the generation of Cepstrum coefficients x1

’, x2’

x3’….xn’

A cepstrum is the result of taking the Inverse Fourier transform (IFT) of the logarithm of the estimated spectrum of a signal.

Cosine transform

2000/05/03 11

Model Description

Gaussian Mixture Density

)()|(1

xbpxpM

Where x

D-dimensional random vector

212 iii

iDi xxxb

iiip ,, Mi ,,1

Nodal, Grand,Global

Nodal, diagonal (this)

Covariance matrix

Component Density

Speaker Model

Choice of Covariance Matrix• Nodal Covariance One co-variance matrix per Gaussian component

• Grand CovarianceOne co-variance matrix for all Gaussian component

• Global Covariance single co-variance matrix shared by all speaker component

Model Interpretation

• Intuitive notion Acoustic classes(vowels, nasals, fricatives) reflects

some general speaker-dependent vocal tract configuration that are useful for characterizing speaker-identity

• GMM have ability to form smooth approximation to arbitrary shaped density

• It doesn’t only have smooth approx but also multimodal nature of densities

2000/05/03 14

ML-Parameters EstimationStep:

1. Beginning with an initial model

2. Estimate a new model such that

Mixture density

3. Repeated 2. until certain threshold is reached.

…Maximum Likelihood

)|()|( XpXp

2000/05/03 15

(Mixture Weights)

(Means)

(Variances)

tti xip

),|(iT

xbpxip

)(),|(

Mixture

Density

ComponentDensity

and refers to arbitrary elements of vectors ii

,2 and tx

ii ','2

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

4.4ANEMIA PATIENTS AND CONTROLS

Red Blood Cell Volume

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

EM ITERATION 1

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

EM ITERATION 3

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

EM ITERATION 5

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

EM ITERATION 10

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

EM ITERATION 15

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

EM ITERATION 25

0 5 10 15 20 25400

490LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS

EM Iteration

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

ANEMIA DATA WITH LABELS

Anemia Group

Control Group

2000/05/03 25

Speaker IdentificationA group of speakers S = {1,2,…,S} is represented by GMM’s λ1, λ2, …, λs, the obective is to find the speaker model which has the maximum a posteriori probability for a given observation sequence

)Pr()|(maxarg)|Pr(maxargˆ11 Xp

XpXS kk

)|(maxargˆ1

)|(logmaxargˆ1

ttiikt xbpxp

)()|( which

logtake

ReferencesD. A. Reynolds and R. C. Rose, “Robust Text- Independent

Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1, pp.72-83,January 1995.

• http://en.wikipedia.org/wiki/Probability_density_function• http://crsouza.blogspot.com/2010/10/gaussian-mixture-

models-and-expectation.html• https://www.ll.mit.edu/mission/communications/ist/public

ations/0802_Reynolds_Biometrics-GMM.pdf• http://statweb.stanford.edu/~tibs/stat315a/LECTURES/e

m.pdf• http://eprints.pascal network.org/archive/00008291/01/S

oftAssignReconstr_ICIP2011.pdf• http://home.deib.polimi.it/matteucc/Clustering/tutorial_ht

ml/kmeans.html

Speaker Recognition using Gaussian Mixture Model

Engineering

Dirichlet Process Gaussian Mixture Models: Choice of …mlg.eng.cam.ac.uk/pub/pdf/GoeRas10.pdf · Dirichlet process Gaussian mixture models: ... Dirichlet Process Gaussian Mixture

The Infinite Gaussian Mixture Modelpapers.nips.cc/paper/1745-the-infinite-gaussian-mixture-model.pdf · The Infinite Gaussian Mixture Model ... here we derive the model as the limiting

Gaussian Mixture Copula Model Ashutosh Tewari, Madhusudana ...pluto.huji.ac.il/~galelidan/CopulaWorkshop/Material/TewariPoster.pdf · Gaussian Mixture Copula Model Ashutosh Tewari,

Gaussian Mixture Models (GMM) and ML Estimation Examples Materials/gmm... · 2020-02-28 · Gaussian Mixture Model • GMM Gaussian Mixture Model • Probabilistic story: Each cluster

Dimension-Decoupled Gaussian Mixture Model for Short Utterance Speaker Recognition Thilo Stadelmann, Bernd Freisleben, Ralph Ewerth University of Marburg,

Gaussian Mixture Models and Expectation-Maximization Algorithm

Text Independent Speaker Veri cation Using Adapted Gaussian Mixture Modelsneiberg/neiberg02mst.pdf · Text Independent Speaker Veri cation Using Adapted Gaussian Mixture Models Textoberoende

An Alternative Inﬁnite Mixture Of Gaussian Process Expertsosindero/PUBLICATIONS/MeedsOsindero_dpme_nips.pdf · An Alternative Inﬁnite Mixture Of Gaussian Process Experts Edward

A Speaker Recognition System Using Gaussian Mixture Model ... · parts: speech recognition, speech recognition, and speaker recognition [2]. A. Human Speech Production System Human's

Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models

The Linear Model under Gaussian Mixture Inputs · Besides their ability to approximate other distributions, Gaussian mixtures can account for asymmetry, ... 1 Gaussian Mixture Modeling

Neural Network - Gaussian Mixture Hybrid for Speech ...papers.nips.cc/paper/521-neural-network-gaussian... · Neural Network-Gaussian Mixture Hybrid for Speech Recognition or Density

Gaussian Mixture Models and Expectation Maximization

Parameter Estimation For Autoregressive Gaussian-mixture

Lecture 12: Gaussian Mixture Models

A Speaker Recognition System Using Gaussian Mixture Model

Deep Clustering by Gaussian Mixture Variational ...openaccess.thecvf.com/content_ICCV_2019/papers/...Deep Clustering by Gaussian Mixture Variational Autoencoders with Graph Embedding

Automatic Speaker Recognition System in Adverse ... on speaker recognition systems using Gaussian mixture models (GMM), hidden Markov models (HMM) and quantization models (VQ) has

A Look Up Table-free Gaussian Mixture Model-based Speaker Classiﬁer · 2019-01-08 · A Look Up Table-free Gaussian Mixture Model-based Speaker Classiﬁer Relatori: Prof. Mariagrazia

Gaussian Mixture Reduction for Time-Constrained