The Use of Sparsity Hypothesis for Source Separation · 2012. 3. 14. · • compressed sensing. 12...

Preview:

Citation preview

1

The Use of Sparsity Hypothesis for Source Separation

Prasad Sudhakar, PhDPost-doctoral researcher, ICTEAM/ELEN, UCL

prasad.sudhakar@uclouvain.be

13 October, 2011

SPS Seminar Series, ICTEAM/ELEN, UCL

1

22

Context

MixturesSources Mixing process

33

Source separation

Unmixing process N Source estimates

M < N : Underdetermined

M Mixtures

44

Acoustic mixing model

Filter of length L

N Sources

M convolutive mixtures

5

Blind filter estimation and source separation

Needs further hypothesis

Source Estimation

Filter Estimation

• Mathematically, ill posed and hence impossible to solve

• Source separation: Known mixtures Unknown sources

• Filter estimation: Known mixtures Unknown filters

6

Enabling hypotheses

• Independent Component Analysis (ICA)• Hypothesis: Sources are statistically independent • Objective: Estimate the sources by maximising independence: Ex. Minimise mutual information.

• Non-negative Matrix Factorisation (NMF)• Hypothesis: Mixtures can be factored into two non-negative matrices• Objective: Seek factor matrices which well approximates the mixtures

• Sparse Component Analysis (SCA)• Hypothesis: Sources and/or mixing filters are sparse• Objective: Seek sparse sources and/or filters from mixtures

7

Filter types

Instantaneous Anechoic SparseFully

convolutive

8

Sparse sources• Many audio sources are sparse in the time-frequency domain

• Short-Time-Fourier-Transform (STFT): commonly used analysis tool

STFTSparse

9

Can Sparsity still help?

10

Mixing complexity

Landscape of methods

Source sparsity

State of the art: Sparse Component

Analysis (SCA)

Instantaneous/Anechoic

Fully convolutive

Rela

tive

Num

ber

of s

ourc

es

N = 1

N <= M

N > M

Sparse

N = M

State of the art: Cross-Relation (CR)

based approaches

State of the art: 1) Independent

ComponentAnalysis (ICA)

2) Non-negative MatrixFactorisation

10

What is sparsity?

• Sparsity: few significant coefficients compared to the size of the signal

Discrete, sensitive to noise

• Relaxed convex norm:

• True sparsity measure:

k - sparse vector

11

Relevance of sparsity

• Helps reduce problem complexity

• Aids data compression: data can be sparse in a transform domain

• e.g. DCT of natural images, STFT of audio sources, etc.

• Of late, a popular prior for solving linear inverse problems: sparse recovery problems

• compressed sensing

12

• Suppose we need to solve

• Under-determined, non-uniqueness of solution

• Prior: Solution is sparse

Sparsity for linear underdetermined systems

• Ideally:

• Sparse approximation:

13

Algorithmic families for sparse recovery

• Principle: Iteratively estimate the non-zero coefficients

• Sub-optimal and heuristic

• Easy on computations

• Theoretically not very well understood

• Ex.: Matching Pursuit, OMP, StOMP, etc.

Convex relaxation Greedy methods

• Principle: Replace norm by convex norms

• Provides provably optimal solutions in several settings

• Computation intensive• Well rounded theory:

•Thanks to ‘Compressed Sensing’

• Ex.: Basis pursuit, BPDN, etc.

14

Sparse component analysis

15

Estimating sparse sources with known parameters

• Suppose the mixing matrix A is known, then

• Likewise, if symbolises convolutive mixtures

What if the mixing coefficients are unknown?

• Model: Instantaneous mixtures of sparse sources

16

Instantaneous mixtures of sparse sources

• Model: Sparse and disjoint sources

Sparse and disjoint sources

Instantaneous mixture

References•A. Jourjine et al., 2000•P. Bofill & M. Zibulevsky, 2001•M. Zibulevsky & B.A. Pearlmutter, 2001

• Suppose the sets are known, use time-frequency masking and invert transform to estimate the sources.

Use geometric ideas• How to obtain the sets ?

17

Colours: Sources

Lengths: Scaling

Use of scatter plots to identify (Stereo M=2)

• Knowledge of can then be used for: 1) Filter estimation and 2) Separation

eA ⇡ A�gPg• Up to a permutation and scaling:

Scatter plot

18

Generalisations and improvements

• Anechoic mixtures:

• The mixing coefficients also involve a phase parameter

• In stereo case, only one phase needs to be estimated (fixing the other)

• Extension to quasi-disjoint sources

• In a generic setting, one can allow at most M-1 sources to be active at a given TF location

• Statistical confidence measures to obtain the sets

xi(t) =NX

j=1

(aij ⇥ sj)(t) �! xj(�, f) ⇡NX

j=1

aij(�, f)sj(�, f), 1 j M

• Convolutive mixtures

• Narrowband approximation through a suitable transform, like STFT

• One convolutive problem to M complex instantaneous problems

Can we use the techniques developed for instantaneous mixtures?

19

Convolutive mixtures: Permutation and scaling

Estimated mixingparameters

• Frequency dependent scaling:

Scatter plot

Conclusions

• Scatter plots alone are not sufficient

• Need permutation and scaling correction

• Time / frequency domain ICA approaches also suffer

• Each frequency bin is arbitrarily scaled and permuted

• If : estimated filter coefficients, then

Consequences

20

Sparse filters for permutationcorrection

21

Relevance of sparse filters

An example of an underwater communication channel

• Few reflection paths = time domain sparsity- Underwater acoustics- Communications engineering

Wide band CDMA- Biomedical

Ultrasound imaging

Picture courtesy: Berger et al., 2009

22

Permutation correction problem

• Setting: Permutation only

Permutation correction

• Claim: Suppose the filters are sufficiently sparse, thenPermutation corrected filters have least norm

and hence, permutations can be recovered by minimisation

: Set of permuted frequencies

• Problem:

Fail

Success

|�|

23

Permutation correction by minimisation

• Conclusion: Sparsity prior on filters helps solve permutation problem, in the absence of scaling.!

ATD• - True time-domain filter matrix; - Estimated filter matrix

• Recovery Performance measure:

eATD

24

Scaling correction

25

Single input - two output setting

• Model:

1. Normalisation of solution2. Possibly non-unique solution

• Issues:

• Conclusion: Needs a prior

• Consequence: Removes TF dependent source scaling

Matrix form:

Double Toeplitz matrix

Cross-relation (CR)• Property:

References•H. Liu et al., 1994•G.Xu et al., 1995

26

Formulation of sparse filter estimation problem

Normalisation

• Issue: How to exploit this approach in multiple source setting?

• In standard sparse recovery problems, minimisation seeks sparse solutions

• Convex and can be solved using standard packages

• Is noise aware

where and

• Filter recovery problem formulation

References•A. Aïssa-El-Bey et al., 2008

27

Scatter plots

Instantaneous/Anechoic

Fully convolutive

Num

ber

of s

ourc

es

N = 1

N <= M

N > M

Sparse

State of the art: Cross-Relation (CR)

based approaches

Source sparsity

State of the art: Sparse Component

Analysis (SCA)

N = M

27

Revisiting the landscape

• Permutations• Scaling

Multiple filter estimation framework

28

Multiple sparse filter estimation

29

References•A. Aïssa-El-Bey et al., 2009

Multiple filter estimation using time-domain disjointness and CR

• Suppose there exists intervals where only one source is active

• The mixtures contain intervals where only one source contributes

• Mixtures in those intervals satisfy time-domain CR

• Filters can be estimated by solving the sparse recovery problem

What if the sources are not disjoint?

3030

• Given the mixtures, if we know which source is active at which TF locations

• and build a matrix or for each source such that

• then we can express the CR in the TF domain in two different forms:

1) narrowband (NB) approximation2) wideband (WB) formulation

Multiple filter estimation using time-frequency (TF) domain disjointness and CR

31

Narrowband and Wideband CR

CR-NB CR-WB

No narrowband approximation

Narrowband approximation

CR-TDWe have:

Narrowband CR Wideband CR

Given:

3231

Multiple filter estimation using TF domain CR

B = B⌦j

NB B = B⌦j

WBwhere or and

(A)

• Further, use the matrix B to solve the following and estimate the filters

• A single N filter estimation problem is reduced to N single filter estimation problem

3332

A two stage framework

• Filter estimation: using generic toolboxes for convex programming• Clustering: difficult problem, dictates the performance of filter estimation

• For each source

1) Time-frequency clustering:

2) Filter estimation:

Solve the optimisation problem to estimate the filters

Identify the time-frequency regions where only one source is active

3438

Experiments with controlled setting

Sparse filter

Flute

Guitar

Instantaneous• Setup:

• Main issue: Blind clustering of TF points where only source 2 is active

• Remove the points corresponding to instantaneously mixed source

3539

Experiments: Blind clustering

• Remove the points corresponding to source 1

• Remaining points correspond to source 2

• STFT magnitude of a mixture

• Use DEMIX or similar approach to identify the regions where source 1 is active

Reference• Arberet et. al., 2010

3634

Performance measure

• SNR measure of the estimated filters

• Takes care of global shift and global scale ambiguity inherent to problem formulation

3740

Results

• Debiasing: Extract valid support and readjust the coefficients by performing minimisation

• Wideband method with debiasing outperforms state of the art by at least 10 dB

Reference•C. Knapp & G. Carter, 1976

38

Summary and perspectives

3942

Summary

• Yes: framework for multiple filter estimation problem, central to convolutive source separation

Sparsity hypothesis

• can it be used to solve problems beyond standard linear inverse problems?

• by combining the notions of time-frequency domain sparsity of sources and time-domain sparsity of filters• Empirically results show the ability of the two stage framework to estimate filters

4044

Perspectives (1/2)

• Anechoic approximations using DOA information.!

• Central issue with the framework: blind clustering• Ideas from anechoic settings?

• Cluster initialisation using filter approximations?

Clustering

+Noise

FilterEstimation

• From filter estimation to source estimation

4144

Perspectives (2/2)

• Connections with subspace learning

• Sparse vector orthogonal to matrix B. • Subspaces characterised by sparse vectors.

• Exploiting sparsity in non-standard domains• Connections with synthesis and analysis priors

• Exploiting structured sparsity

• Theoretical analysis of the filter estimation framework• Understanding identifiability and recovery conditions

42

1) P. Sudhakar. Sparse models and convex optimisation for convolutive blind source separation. PhD thesis, University of Rennes 1, France, February 2011.

2) A. Benichoux, P. Sudhakar, R. Gribonval. Well-posedness of the frequency permutation problem in sparse filter estimation with lp minimization. In SPARS’11, Jun 2011.

3) P. Sudhakar, S. Arberet and R. Gribonval. Sparse models for multiple mixing filter estimation from stereo convolutive mixtures. Submitted to IEEE TALSP, June 2011.

4) S. Arberet, P. Sudhakar, and R. Gribonval. A wideband doubly-sparse approach for MITO sparse filter estimation. In proceedings of ICASSP 2011, May 2011.

5) P. Sudhakar, S. Arberet, and R. Gribonval. Double Sparsity: Towards Blind Estimation of Multiple Channels. In proceedings of LVA/ICA, 2010, St. Malo, France.

6) P. Sudhakar and R. Gribonval. A sparsity-based method to solve the permutation indeterminacy in frequency domain convolutive blind source separation. In proceedings of ICA, 2009, Paraty, Brazil.

Some relevant publications

•Clipart from: http://www.clker.com/

Thanks to my collaborators

• Remi Gribonval, METISS, INRIA Rennes-Bretagne Atlantique, France remi.gribonval@inria.fr

• Simon Arberet, LTS2, EPFL, Switzerland simon.arberet@epfl.ch

43

44

•CR at a given point

•For a given frame index

•If there are frames, then define

where is the forward Fourier matrix of size FxF

•it satisfies:

Structure of NB matrix

45

Projection of convoluted sequences

Lemma 1: Let be a bounded real valued signal, let be a finite real valued signal and let be a finite signal, possibly complex,

then

where

46

Structure of WB matrix

TF domain CR

If is a STFT dictionary of one sample shift, then

By lemma 1

47

Structure of WB matrix

If we have

and

then we can define which satisfies

48

Time-frequency disjointness in NB formulation

Consider

If then

49

Time-frequency disjointness in WB formulation

Consider

If then

Note that

By lemma 1

50

Oracle clustering

True filters satisfy the CR for source k

CR-NB

CR-WB

51

Blind clustering

Cluster initialisation using filter approximations

Clustering

+Noise

FilterEstimation

5258

Permutation correction: Disjoint time supports

Let be filters with mutually disjoint

supports and let be the filters obtained

after frequency domain permutations at frequency indices in , then

Theorem 1:

• Independent of

1) Sparsity

2) Number of permutations

53

Let and be two sparse filters and let and be

the filters obtained after frequency domain permutations at frequency

indices in , then

Theorem 2:

a)

59

Permutation correction: Sparse filters

• Doesn’t assume disjoint supports

• Gives a regime of and for which minimisation recovers permutations

5460

• Inequality result comes from Theorem 1

• Equality condition implies global permutation

• Conclusion: Under assumed conditions

• Permuted filters have larger norm than the corresponding true filters

• True filters can be uniquely recovered by minimisation

Permutation correction: Equality case

Further, equality in the above equation implies

EITHER I)

OR II)

b) If and have disjoint supports, then

55

Variation of norm against permutations

• Performance measure:

• Objective: To assess whether permutations increase norm of the filters

• Conclusion: Empirically, bigger the number of permutations, larger the increase in norm

• Results:

56

Sparsity in the thesis

In this work, we use the sparsity hypothesis twice

1. Source sparsity in the time-frequency domain

2. Filter sparsity in the time domain

57

Plan of the talk

1. Tools

i. Sparse component analysis

ii. Cross-relation based approaches

2. Permutation correction using sparsity

3. Framework for multiple filter estimation

4. Summary and perspectives

58

Time-frequency masking

Instantaneous mixture

• Model: Sparse and disjoint sources

Sparse and disjoint sources

• How to identify ?

• Consequence: just need to know which source is active at which TF locations for source localisation and separation

Use scatter plots

References•A. Jourjine et al., 2000•P. Bofill & M. Zibulevsky, 2001•M. Zibulevsky & B.A. Pearlmutter, 2001

5933

Clustering

• Goal: To assess the overall performance in a realistic setting

• Blind clusteringAssumes all sources except one are

instantaneously mixed

• In this work, two kinds of experiment are done

Experiments with synthetic data Experiments with audio data

• Goal: To assess the performance of the filter estimation step

• Oracle clustering:• uses the knowledge of true filters (ground truth) • depends on a threshold

6035

Experiments with synthetic data

• Source modelSum of sinusoids with Gaussian envelopes

of random lengths

• Study the effect of 1) STFT window size F

2) Clustering threshold

on the filter recovery performance, using oracle clustering

• Sparse filters of length L = 256

• Number of sources N = 3

6136

Results: Effect of STFT window size

CR-NB

• NB approach gets better as window size increases relative to filter length

• WB approach performs better when window size is less or equal to filter length

CR-WB

6237

Results: Effect of clustering threshold CR-NB CR-WB

• NB approach degrades when threshold increases: due to lesser number of observations

• WB approach performs better when threshold increases: due to accurate CR

Window size = 1024 Window size = 128

Recommended