29
Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Audio Source SeparationAnd ICA

by

Mike Davies & Nikolaos MitianoudisDigital Signal Processing Lab

Queen Mary, University of London

Page 2: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Outline of TalkIntroduction:

-Audio Source Separation: Beamforming, ICA & CASA

ICA for source separation- dealing with convolutive mixtures

A Frequency Domain Framework- unmixing in the frequency domain- source modelling & the permutation problem

Beamforming for Source Separation- Using geometric information

A Reverberant BSS Example:- ICA as a beamformer- Real reverberant transfer functions- Using beamforming with ICA-Moving sources

Page 3: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Cocktail Party Problem

Page 4: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Computational Cocktail Party

Problem

Page 5: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Audio Source Separation Problem

Page 6: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Computational tools for audio source separation

• Computational Auditory Scene Analysis (CASA)─ Typically extracts one source from a single channel of audio using heuristic psychological grouping rules (pattern matching).

• Blind Source Separation (BSS aka ICA)─ Uses spatial diversity based on source independence. Extensions include: convolutional mixing, overcomplete mixtures.

• Beamforming─ Uses spatial diversity based on the known geometry of the microphone array and the directions of arrival (DOA) of the source signals

Page 7: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Review of ICA

The ICA model:

Aim: estimate s(t) from x(t), (mixing matrix A unknown).

If no. of sources = no. of observations, we can estimate s(t) by estimating W = A-1 to give s = Wx. A is identifiable if we assume the sources are statistically independent:

and non-Gaussian.

n s x,s(t),A)t(x

i

i (t))s( (t))( PsP

Page 8: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Audio observations are linear convolution (plus additive noise)

Unmixing filter uses an FIR approximation (complete case):

ICA for Audio Source Separation

m

j

q

kijiji tekt(k)satx

1 1

)()()(

n

i

p

kijij kt(k)xwts

1 1

)()(ˆ

Page 9: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Frequency (subband) filtering

.

.

.

.

.

.

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointISTF

T

LpointISTF

T

Wn

W1

x1x1x2x2

x3x3

.

.

.

.

.

.

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

.

.

.

.

.

.

X(ω)X(ω)

s3s3

s2s2

s1s1

S(ω)S(ω)

The unmixing filtering can be efficiently performed within a subband framework.

This does not necessarily imply a frequency domain model for the sources.

Page 10: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Various authors have suggested the simple gradient-based algorithm for ICA:

This can be viewed as a Maximum Likelihood estimator with

(s) often takes a tanh-like shape superGaussian prior.

For convolutive mixing this can be adapted to:

(time domain source model)

WIW ))(-( Tss

ssPs /)(log)(

ML Natural Gradient Algorithm

)( )())),((fft()( WstsIW H

Page 11: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Frequency (subband) filtering

.

.

.

.

.

.

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointISTF

T

LpointISTF

T

Wn

W1

x1x1x2x2

x3x3

.

.

.

.

.

.

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

.

.

.

.

.

.

X(ω)X(ω)

s3s3

s2s2

s1s1

S(ω)S(ω)

Source modelSource model

Time domain modelling e.g Lee et al. 1997

)()())),((fft()( WstsIW H

STFT

Page 12: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Frequency Domain Source Model

An alternative strategy is to model the sources in the frequency domain (e.g. Smaragdis 1997).

Advantages:

• Computational Efficiency

• Sparser Statistics (→ better estimates)

)( )())(()( WssIW H

Page 13: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Frequency (subband) filtering

.

.

.

.

.

.

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointISTF

T

LpointISTF

T

Wn

W1

x1x1x2x2

x3x3

.

.

.

.

.

.

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

LpointSTFT

.

.

.

.

.

.

X(ω)X(ω)

s3s3

s2s2

s1s1

S(ω)S(ω)

Frequency domain modelling (e.g Smaragdis 1997).

Disadvantage: The Permutation Problem.

)( )())(()( WssIW H

Page 14: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Solutions to Permutation ProblemSource Modelling Solutions

Time Domain no permutation problem (Lee et al. 1997).Time-Frequency couples adaptive filters, – using signal envelopes (Ikeda et al. 1999) or – TF generative models (Mitianoudis & Davies 2001).

Permutation problem can persist with gradient learning (Davies 2002).

Channel Modelling SolutionsConstrained Unmixing Filters couples adaptive filters – Heuristic (Smaragdis 1997)– Constrained filter model (Parra & Spence 1998)

Solutions tend to get trapped in local minima (Ikram & Morgan 2000)– Directivity patterns to resolve permutation (Kurita et al. 2000)

Problems at high frequencies and with high reverberation

Page 15: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Permutation Problem ExampleTwo speech signals mixed with a single echoes of about ~ 5ms

Mitianoudis & Davies Alg.Smaragdis Alg.

Page 16: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Beamforming for Source SeparationA traditional approach to microphone array processing is to use Beamforming.

Microphone outputs are combined to amplify signals from desired direction while suppressing other signals from other directions. Hence ICA is a blind beamformer!

Note beamformer directivity patterns are frequency dependent

Direction of Arrival

Narrowband beamformer directivity pattern

Main lobe

nulls

Page 17: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

ICA as a Beamformer

FD-ICA is essentially a FD-Beamformer,

i.e. place nulls to other sources, so as to separate one at a time.

d

θ

ICA employs statistical information only

Beamforming employs geometrical info, i.e. Directions Of Arrival (DOA)

One can perform permutation alignment for FD-ICA using DOA, i.e. align the directivity patterns.

Null direction

Page 18: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Ideal Directivity PatternsSingle Delay transfer function ~ anechoic room Ideal situation for permutation alignment

Multiple ripples around c/d Hz

A null around 25°

Page 19: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

A real room experiment

2m

~ 7.5m

~ 6m

1m

1.5m

We recorded a 2 microphone - 2 speaker setup in a real lecture room, to explore the application of beamforming on BSS.

Page 20: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Real Directivity PatternsDirectivity pattern for source 1, estimated and aligned by Likelihood Ratio (amplitude only criterion).

DOA around 22°

Observations

• More smeared than single delay.• A main DOA still apparent.

Questions

• How can we accurately estimate DOA

from a directivity pattern ?• How can align the permutation to form

a consistent beam-pattern?• Can we approximate with a single delay ?

Page 21: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

DOA estimation ambiguityMultiple nulls appear after c/d Hz Difficult to estimate DOA.

• Saruwatari et al used null statistics along all frequencies to estimate DOA.

• Ikram and Morgan used only lower frequencies to estimateDOA.

• Estimate the average alongfrequency directivity pattern forseveral frequency bands.

• The average directivity patternbetween 0-2KHz can give a consistent DOA.

Page 22: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

DOA estimation ambiguity (cont) The exact low-frequency range is dependent on d.

multiple nulls appear at higher frequencies For small d recorded signals will be more similar => low separation quality

For more accurate DOA estimation, one can use extra sensorsand subspace methods like MuSIC. (Parra and Alvino 2002)

Sensor spacing choice is a trade-off between separation quality and beampattern clarity.

Page 23: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Permutation alignment using DOABasic Problem:

The nulls are slightly drifted around the DOA, due to reverberation.

Solution:

Look for a null in a “neighbourhood” around the DOA.

• Not accurate enough.• Definition of neighbourhood.• Classification really difficult in mid-higher frequencies.

Remedy:

Use beamforming (phase information) in lower-mid frequencies and LR (amplitude information) for mid-higher frequencies.

Page 24: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Permutation alignment using DOASound Samples:

Mixtures:

Separated : using LR:

Using BF:

Page 25: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Sensitivity analysis

Effects of a misplaced beamformer:

Repeated the recordings with source 2 misplaced by 50 cm.

Beamformer’s sensitivity to movement

We unmixed the 50cm recordings and compared the beampatterns.

We observed the following:

Page 26: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Sensitivity analysis (cont)A moving source will not greatly affect our beamformer at lower frequencies,but mainly at higher frequencies. @160Hz @750Hz

Page 27: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Sensitivity analysis (cont)Distortion introduced due to movement.

We used the original filters to unmix the 50cm case.

Distortion is a function offrequency.

Page 28: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Sensitivity analysis (cont)Distortion introduced due to Distortion introduced due to movement..

The source that moved can still be separated, butis a bit more echoic due to incorrect mapping.

The source that didn’t move won’t be separateddue to incorrect beamforming. It will still bemapped back correctly.

Page 29: Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Conclusions

Beamforming is a useful tool for permutation alignment. It is a semi-blind method since it exploits known array configuration.

Phase information seems more important at lower frequencies.

Amplitude information seems more important at higher frequencies. (Lord Rayleigh’s Law of Hearing)

Distortion introduced due to movement is a function of frequency.

Problems when aligning at higher frequencies.