32
Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 – Lecture 8 July 16, 2014 V

Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Lecture 8Time-Frequency Representations

Dennis SunStats 253

July 16, 2014

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 2: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Outline of Lecture

1 Short-Time Fourier Transform

2 Non-negative Matrix Factorization

3 Audio Source Separation

4 Wrapping Up

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 3: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Where are we?

1 Short-Time Fourier Transform

2 Non-negative Matrix Factorization

3 Audio Source Separation

4 Wrapping Up

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 4: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Stationary Processes

• We’ve seen the Spectral Representation Theorem, which says thatall stationary processes are essentially just linear combinations ofsinusoids.

• So: is a typical music recording stationary?

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 5: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 6: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 7: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

=⇒

0.0 0.2 0.4 0.6

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 8: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

0.0 0.2 0.4 0.6

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 9: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

=⇒

0.0 0.2 0.4 0.6

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 10: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

0.0 0.2 0.4 0.6

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 11: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

=⇒

0.0 0.2 0.4 0.6

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 12: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

=⇒

0.0 0.2 0.4 0.6

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 13: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

• Idea: Audio is locally stationary.

• Take Fourier transform of only a chunk of a signal at a time.

=⇒

0.1 0.2 0.3 0.4 0.5 0.6 0.7

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 14: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Short-Time Fourier Transform

Plot log-magnitudes (deciBels) to make contrast clearer.

=⇒

0.1 0.2 0.3 0.4 0.5 0.6 0.7

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 15: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

STFT of the Audio File

1 2 3 4

020

0040

0060

0080

00

Time (secs)

Fre

quen

cy (

Hz)

play stop

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 16: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Short-Time Fourier Transform

Summary

• The Short-Time Fourier Transform (STFT) takes in an inputsignal, and computes local DFTs to obtain a matrix.

• A plot of the magnitudes of this complex-valued matrix is called aspectrogram. (Log-magnitudes are often plotted instead ofmagnitudes.)

• There is also an inverse STFT (ISTFT), which takes in a complexmatrix, computes the IDFT of each column, and adds the signal pieceby piece to recover the time-domain signal.

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 17: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Non-negative Matrix Factorization

Where are we?

1 Short-Time Fourier Transform

2 Non-negative Matrix Factorization

3 Audio Source Separation

4 Wrapping Up

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 18: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Non-negative Matrix Factorization

Matrix Factorization

In many applications, we wish to decompose a matrix X as a product oftwo matrices:

X = AB.

The most important example is principal components analysis (PCA). x1...xn

≈d1u11 d2u12

......

d1un1 d2un2

[ vT1

vT2

]

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 19: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Non-negative Matrix Factorization

Non-Negative Matrix Factorization

Data[V

]≈

Basis Vectors[W

] Weights[H

]

• A matrix factorization where everything is non-negative

• V ∈ RF×T+ - original non-negative data

• W ∈ RF×K+ - matrix of basis vectors, dictionary elements

• H ∈ RK×T+ - matrix of activations, weights, or gains

• Typically, K � F, T• A compressed representation of the data• A low-rank approximation to V

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 20: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Non-negative Matrix Factorization

NMF With Spectrogram Data

V ≈ W H

Figure : NMF of Mary Had a Little Lamb with K = 3 play stop

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 21: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Non-negative Matrix Factorization

Factorization Interpretation I

Each column of V is a weighted sum of the columns of W .

v1 v2 ... vT

≈ K∑j=1

Hj1wj

K∑j=1

Hj2wj ...K∑j=1

HjTwj

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 22: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Non-negative Matrix Factorization

Factorization Interpretation II

V is a sum of rank-1 matrices.v1 v2 . . . vT

≈w1 w2 . . . wK

hT1

hT2...

hTK

V ≈ w1h

T1 +w2h

T2 + . . .+wKhT

K

= + +

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 23: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Non-negative Matrix Factorization

The NMF Cost Function

• So far we’ve been vague by saying V ≈WH.

• Formally, NMF solves the optimization problem

minimizeW,H

D(V |WH) subject to W,H ≥ 0

for some choice of “distance” measure D.

• Choices of D:• Euclidean distance: D(V |V̂ ) =

∑i,j

(Vij − V̂ij)2

• Kullback-Leibler (KL) divergence:

D(V |V̂ ) =∑i,j

Vij logVij

V̂ij− Vij + V̂ij

• KL divergence is more appropriate for audio.

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 24: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Non-negative Matrix Factorization

Algorithms for NMF

• General Strategy: Fix W and update H. Fix H and update W .Iterate until convergence.

• The two problems are symmetric. So let’s look at fixing W andupdating H (for KL divergence).

D(V |WH) = −∑i,j

Vij log∑k

WikHkj +∑i,j

∑k

WikHkj + const.

• This cannot be minimized in closed form for H. (Try it!)

• Suppose H̃kj is our current guess and define πijk =WikHkj∑k WikHkj

. Then

we can write

D(V |WH) = −∑i,j

Vij log∑k

πijkWikHkj

πijk+∑i,j

∑k

WikHkj+const.

and use Jensen’s inequality on the log.• This “majorizing” function can easily be minimized!

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 25: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Audio Source Separation

Where are we?

1 Short-Time Fourier Transform

2 Non-negative Matrix Factorization

3 Audio Source Separation

4 Wrapping Up

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 26: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Audio Source Separation

The Problem of Source Separation

play stop

How do we recover the individual sources from just the mixture?

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 27: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Audio Source Separation

Source Separation Pipeline

V ≈WH

Ideal Pipeline:

1 Find a segment of music where only the backing band is playing.Take the magnitude STFT and run NMF. Keep Wbg.

2 Find a segment of music where only vocals are present. Take themagnitude STFT and run NMF. Keep Wvocal.

3 Now take the STFT of the mixture. Run NMF on the magnitudes,where Wbg and Wvocal are fixed, and estimate Hbg and Hvocal.

V ≈[Wbg Wvocal

] [ Hbg

Hvocal

]4 The backing band magnitudes can be recovered as WbgHbg. Use the

phases from the STFT, and take the ISTFT to recover thetime-domain signal.

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 28: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Audio Source Separation

Source Separation Pipeline

V ≈WH

In practice:

1 Find a segment of music where only the backing band is playing.Take the magnitude STFT and run NMF. Keep Wbg.

2 Find a segment of music where only vocals are present. Take themagnitude STFT and run NMF. Keep Wvocal.

3 Now take the STFT of the mixture. Run NMF on the magnitudes,where Wbg and Wvocal are fixed, and estimate Wvocal, Hbg and Hvocal.

V ≈[Wbg Wvocal

] [ Hbg

Hvocal

]4 The backing band magnitudes can be recovered as WbgHbg. Use the

phases from the STFT, and take the ISTFT to recover thetime-domain signal.

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 29: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Audio Source Separation

A Demo

Let’s try this in R!

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 30: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Wrapping Up

Where are we?

1 Short-Time Fourier Transform

2 Non-negative Matrix Factorization

3 Audio Source Separation

4 Wrapping Up

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 31: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Wrapping Up

Projects

• Project proposals due Friday. (Please fill in survey first.)

• I will be available after class. I will also be available tomorrowmorning. Please e-mail me to schedule a time.

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt

Page 32: Lecture 8 Time-Frequency Representations · Lecture 8 Time-Frequency Representations Dennis Sun Stats 253 July 16, 2014 Dennis Sun Stats 253 { Lecture 8 July 16, 2014Vskip0pt

Wrapping Up

Homework 3

• Geostatistics packages in R: gstat and geoR

• Prediction competition

Dennis Sun Stats 253 – Lecture 8 July 16, 2014 Vskip0pt