34
Advances in Models for Acoustic Processing David Barber and Taylan Cemgil Signal Processing and Communications Lab. 9 Dec 2006 Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada

Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Advances in Models for Acoustic Processing

David Barber and Taylan Cemgil

Signal Processing and Communications Lab.

9 Dec 2006

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada

Page 2: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Outline

• Acoustic Modeling and applications

• Parameter estimation and Inference

– Subspace methods, Variational, Monte Carlo

• Issues

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 1

Page 3: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Acoustic Modeling

t/sec

f/Hz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

t/sec

f/Hz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 2

Page 4: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Probabilistic Models

• Once a realistic model is constructed many related task can be cast to posteriorinference problems

p(Structure|Observations) ∝ p(Observations|Structure)p(Structure)

– analysis,– localisation,– restoration,– transcription,– source separation,– identification,– coding,– resynthesis, cross synthesis

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 3

Page 5: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Source Separation

s1t

s2t

. . . sn

t

x1t

. . . xm

t

t = 1 . . . T

a1 r1 . . . a

m rm

• Joint estimation Sources, Channel noise and mixing system

• Typically underdetermined (Channels < Sources) ⇒ Multimodal posterior

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 4

Page 6: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Source Separation

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 5

Page 7: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Source Separation

t/sec

f/Hz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Speech + Piano + Guitar)

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 6

Page 8: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Audio Interpolation

• Estimate missing samples given observed ones

• Restoration, concatenative expressive speech synthesis, ...

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 7

Page 9: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Audio Interpolation

p(x¬κ|xκ) ∝

dHp(x¬κ|H)p(xκ|H)p(H)

H ≡ (parameters, hidden states)

H

x¬κ xκ

Missing y

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 8

Page 10: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Application: Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

• Each latent process ν = 1 . . . W corresponds to a “voice”. Indicators r1:W,1:K

encode a latent “piano roll”

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 9

Page 11: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Tempo, Rhythm, Meter analysis

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1:K)

50 100 150 200 250 300 350 400 450

800

600

400

200 −10

−5

0Q

uart

er n

otes

per

min

. log p(nk|y

1:K)

50 100 150 200 250 300 350 400 450

180

120

60−10

−5

0

p(rk|y

1:K)

Frame Index, k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 0.20.40.60.8

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 10

Page 12: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Hierarchical ModelingS ore Expression

Ä äÅ ä

øø

tt tt tt tIt tt tYtc d tY t t

øø

tYt tt tIt tIt tt tY"tt d tY t t

0 500 1000 1500 2000 2500

−3.5

−3

−2.5

ω

t

True EstimatedPiano Roll

0 500 1000 1500 2000 2500

θ

Audio SignalCemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 11

Page 13: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Hierarchical Modeling

M

1 2 : : : tv1 v2 : : : vtk1 k2 : : : kth1 h2 : : : ht1 2 : : : tm1 m2 : : : mt

gj;1 gj;2 : : : gj;trj;1 rj;2 : : : rj;tnj;1 nj;2 : : : nj;txj;1 xj;2 : : : xj;tyj;1 yj;2 : : : yj;ty1 y2 : : : ytCemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 12

Page 14: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Time Series Modeling

• Sound is primarily about oscillations and resonance

• Cascade of second order sytems

• Audio signals can often be compactly represented by sinusoidals

(real) yn =

p∑

k=1

αke−γkn cos(ωkn + φk)

(complex) yn =

p∑

k=1

ck(e−γk+jωk)n

y = F (γ1:p, ω1:p)c

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 13

Page 15: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

State space Parametrisation

xn+1 =

e−γ1+jω1

. . .e−γp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1

c2...cp

yn =(

1 1 . . . 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 14

Page 16: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

State Space Parametrisation

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 15

Page 17: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Classical System identification approach

• The state space representation implies

xn+1 = Axn

yn = Cxn⇒ yn = CAnx0

• Therefore we can write for arbitrary L and M the Hankel matrix

y0 y1 . . . yM

y1 y2 . . . yM+1... ... . . . ...

yL yL+1 . . . yL+M

︸ ︷︷ ︸

Y

=

C

CA...

CAL

︸ ︷︷ ︸

ΓL+1

(x0 Ax0 . . . AMx0

)

︸ ︷︷ ︸

ΩM+1

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 16

Page 18: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Identification via matrix factorisation

1. Given the “impulse response” Hankel matrix Y (Ho and Kalman 1966, Rao and Arun1992, Viberg 1995), compute a matrix factorisation (typically via SVD)

Y = ΓL+1ΩM+1 =

C

CA...

CAL

(x0 Ax0 . . . AMx0

)

︸ ︷︷ ︸

2. Read off C and x0 from factors ΓL+1 and ΩM+1

3. Compute transition matrix by exploiting shift invariance

CA

CA2

...CAL

=

C

CA...

CAL−1

A ⇒ A = Γ†1:nΓ2:n+1

Matrix factorisation ideas have lead to useful methods (N4SID, NMF, MMMF...)

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 17

Page 19: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Pros and Cons

• Uses well understood algorithms from numerical linear algebra ⇒ often quitefast and numerically stable

• Model selection can be based on numerical rank analysis; inspection ofsingular values e.t.c.

• Handling of uncertainty and nonstationarity is not very transparent

• Prior knowledge is hard to incorporate

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 18

Page 20: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Hierarchical Factorial Models

• Each component models a latent process

• The observations are projections

rν0 · · · rν

k · · · rνK

θν0 · · · θ

νk · · · θ

νK

ν = 1 . . . W

yk yK

• Generalises Source-filter models

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 19

Page 21: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Harmonic model with changepoints

rk|rk−1 ∼ p(rk|rk−1)

θk|θk−1, rk ∼ [rk = 0]N (Aθk−1, Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0, S)︸ ︷︷ ︸

new

yk|θk ∼ N (Cθk, R)

A =

G2ω

. . .GH

ω

N

Gω = ρk

(cos(ω) − sin(ω)sin(ω) cos(ω)

)

damping factor 0 < ρk < 1, framelength N and damped sinusoidal basis matrix C of sizeN × 2H

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 20

Page 22: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Harmonic model with changepoints

r kfr

eque

ncy

k

k

x k

• Each changepoint denotes the onset of a new audio event

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 21

Page 23: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Monophonic transcription

• Detecting onsets, offsets and pitch (Cemgil et. al. 2006, IEEE TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference is possible

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 22

Page 24: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Factorial Changepoint model

r0,ν ∼ C(r0,ν; π0,ν)

θ0,ν ∼ N (θ0,ν; µν, Pν)

rk,ν|rk−1,ν ∼ C(rk,ν; πν(rt−1,ν)) Changepoint indicator

θk,ν|θk−1,ν ∼ N (θk,ν; Aν(rk)θk−1,ν, Qν(rk)) Latent state

yk|θk,1:W ∼ N (yk; Ckθk,1:W , R) Observation

rν0 · · · rν

k · · · rνK

θν0 · · · θ

νk · · · θ

νK

ν = 1 . . . W

yk yK

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 23

Page 25: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Application: Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

• Each latent changepoint process ν = 1 . . . W corresponds to a “piano key”.Indicators r1:W,1:K encode a latent “piano roll”

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 24

Page 26: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Single time slice - Bayesian Variable Selection

ri ∼ C(ri; πon, πoff)

si|ri ∼ [ri = on]N (si; 0, Σ) + [ri 6= on]δ(si)

x|s1:W ∼ N (x; Cs1:W , R)

C ≡ [ C1 . . . Ci . . . CW ]

r1 . . . rW

s1 . . . sW

x

• Generalized Linear Model – Column’s of C are the basis vectors• The exact posterior is a mixture of 2W Gaussians• When W is large, computation of posterior features becomes intractable.• Sparsity by construction (Olshausen and Millman, Attias, ...)

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 25

Page 27: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Chord detection example

0 50 100 150 200 250 300 350 400−20

−10

0

10

20

0 π/4 π/2 3π/40

100

200

300

400

500

600

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 26

Page 28: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Inference : Iterative Improvement

r∗1:W = arg maxr1:W

ds1:Wp(y|s1:W )p(s1:W |r1:W )p(r1:W )

iteration r1 rM log p(y1:T , r1:M )

1 • −1220638254

2 • • −665073975

3 • • • −311983860

4 • • • • −162334351

5 • • • • • −43419569

6 • • • • • • −1633593

7 • • • • • • • −14336

8 • • • • • • • • −5766

9 • • • • • • • −5210

10 • • • • • • −4664

True • • • • • • −4664

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 27

Page 29: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Inference : MCMC/Gibbs sampler

• MCMC: Construct a markov chain with stationary distribution as the desiredposterior P

• Gibbs sampler: We cycle through all variables rν = 1 . . . W and sample fromfull conditionals

rν ∼ p(rν|r(t+1)1 , r

(t+1)2 , . . . , r

(t+1)ν−1 , r

(t)ν+1, . . . , r

(t)W )

• Rao-Blackwellisation: Conditioned on r1:W , the latent variables s1:W can beintegrated over analytically.

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 28

Page 30: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Variational Bayes – Structured mean field

• VB: Approximate a complicated distribution P with a simpler, tractable one Qin the sense of

Q∗ = argminQ

KL(Q||P)

• KL is the Kullback-Leibler divergence

KL(Q||P) ≡ 〈logQ〉Q − 〈logP〉Q ≥ 0

• If Q obeys the factorisation as Q =∏

ν Qν the solution is given by the fixedpoint

Qν ∝ exp(〈logP〉Q¬ν

)

• Leads to powerful generalisations of the Expectation Maximisation (EM)algorithm (Hinton and Neal 1998, Attias 2000)

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 29

Page 31: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

MCMC versus Variational Bayes (VB)

• Each configuration of r1:W corresponds to a corner of a W dimensionalhypercube

b

b b

bb

b

b

b

• MCMC moves along the edges stochastically

• VB moves inside the hypercube deterministically

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 30

Page 32: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Sequential Inference

• Filtering: Mixture Kalman Filter (Rao-Blackwellized PF) (Chen and Liu 2001)

• MMAP: Breadth-first search algorithm with greedy or randomised pruning,multi-hypothesis tracker (MHT)

• For each hypothesis, there are 2W possible branches at each timeslice

⇒ Need a fast proposal to find promising branches without exhaustiveevaluation

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 31

Page 33: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Music Processing challenges

• Computational modeling of human listening and music performance abilities

– complex and nonstationary temporal structure, both on physical-signal andcognitive-symbolic level

– Applications: Interactive Music performance, Musicology, Music InformationRetrieval, Education

• Analysis

– identification of individual sound events - notes, kicks– invariant characteristics - timbre– extraction of higher structure information - tempo, harmony, rhythm– not well defined attributes - expression, mood, genre

• Synthesis

– design of soud synthesis models - abstract or physical– performance rendering: generation of a physically, perceptually or artistically

feasible control policy

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 32

Page 34: Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops › amac › amac-intro.pdfAdvances in Models for Acoustic Processing David Barber and

Issues

• What types of modelling approaches are useful for acoustic processing (e.g.hierarchical, generative, discriminative) ?

• What classes of inference algorithms are suitable for these potentially largeand hybrid models of sound ?

• How can we improve the quality and speed of inference ?

• Can efficient online algorithms be developed?

• How can we learn efficient auditory codes based on independenceassumptions about the generating processes?

• What can biology and cognitive science can tell us about acousticrepresentations and processing? (and vice versa)

Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 33