40
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio Topic 6 The Digital Fourier Transform 1

Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Topic 6

The Digital Fourier Transform

1

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why bother bull The ear processes sound by decomposing it into sine

waves at different frequencies bull An algorithm that does the same would be a step towards

machines that hears things as humans do bull Sohelliphow do we do this by machine

0 10 20 30 40 50 60 70 80 90 100-25

-2

-15

-1

-05

0

05

1

15

2

25

220 Hz

660 Hz

1100 HzPeripheral Auditory system

Cochlea Auditory nerve

0 10 20 30 40 50 60 70 80 90 100-1

-08

-06

-04

-02

0

02

04

06

08

1

0 10 20 30 40 50 60 70 80 90 100-1

-08

-06

-04

-02

0

02

04

06

08

1

0 10 20 30 40 50 60 70 80 90 100-1

-08

-06

-04

-02

0

02

04

06

08

1

2

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Jean Baptiste Joseph FourierHe was a French mathematician and physicist who lived from 1768-1830

He presented a paper in 1807 to the Institut de France claiming any continuous signal with finite period could be represented as the sum of an infinite series of properly chosen sinusoidal wave functions at different frequencies

We now call this the Fourier Series

Could this be the way to decompose sounds into different frequencies like the ear does

3

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

FactoidAmong the reviewers of Fourierrsquos paper were two famous mathematicians

ndash Joseph Louis Lagrange (1736-1813) ndash Pierre Simon de Laplace (1749-1827)

Lagrange said sine waves could not perfectly represent signals with discontinuous slopes like square waves (He was technically right)

Thus the Institut de France did not publish Fourierrsquos work until 15 years later after Lagrange died

4

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Fourier Series

[ ]01

( ) cos( ) sin( )n n n nn

x t A A t B tω ωinfin

=

= + +sum

Given a periodic function x(t)

An INFINITE series of sine and cosine functions reproduces x(t)

0 2 n n n Tω ω π= =

the period

IntegersmtxmTtx isinforall=+ )()(

Where frequency ωn is an integer multiple n of fundamental frequency ω0

5

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Aside Two defs for ldquofrequencyrdquo

bull The frequency of a periodic function can be defined in two ways

f =ω 2π =1TFrequency

Angular Frequency Period

6

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fourier Series

01

cos( )( )

sin( )n n

n n n

A tx t A

B tω

ω

infin

=

⎡ ⎤= + ⎢ ⎥+⎣ ⎦

sumThe original signal

Quite a few

The frequency of this component

Amplitude the contribution of this component

7

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it

It isnrsquot practical to calculate an infinite number of things in the general case

We need to frame the problem as a finite one so we can actually solve the general case

8

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 2: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why bother bull The ear processes sound by decomposing it into sine

waves at different frequencies bull An algorithm that does the same would be a step towards

machines that hears things as humans do bull Sohelliphow do we do this by machine

0 10 20 30 40 50 60 70 80 90 100-25

-2

-15

-1

-05

0

05

1

15

2

25

220 Hz

660 Hz

1100 HzPeripheral Auditory system

Cochlea Auditory nerve

0 10 20 30 40 50 60 70 80 90 100-1

-08

-06

-04

-02

0

02

04

06

08

1

0 10 20 30 40 50 60 70 80 90 100-1

-08

-06

-04

-02

0

02

04

06

08

1

0 10 20 30 40 50 60 70 80 90 100-1

-08

-06

-04

-02

0

02

04

06

08

1

2

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Jean Baptiste Joseph FourierHe was a French mathematician and physicist who lived from 1768-1830

He presented a paper in 1807 to the Institut de France claiming any continuous signal with finite period could be represented as the sum of an infinite series of properly chosen sinusoidal wave functions at different frequencies

We now call this the Fourier Series

Could this be the way to decompose sounds into different frequencies like the ear does

3

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

FactoidAmong the reviewers of Fourierrsquos paper were two famous mathematicians

ndash Joseph Louis Lagrange (1736-1813) ndash Pierre Simon de Laplace (1749-1827)

Lagrange said sine waves could not perfectly represent signals with discontinuous slopes like square waves (He was technically right)

Thus the Institut de France did not publish Fourierrsquos work until 15 years later after Lagrange died

4

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Fourier Series

[ ]01

( ) cos( ) sin( )n n n nn

x t A A t B tω ωinfin

=

= + +sum

Given a periodic function x(t)

An INFINITE series of sine and cosine functions reproduces x(t)

0 2 n n n Tω ω π= =

the period

IntegersmtxmTtx isinforall=+ )()(

Where frequency ωn is an integer multiple n of fundamental frequency ω0

5

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Aside Two defs for ldquofrequencyrdquo

bull The frequency of a periodic function can be defined in two ways

f =ω 2π =1TFrequency

Angular Frequency Period

6

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fourier Series

01

cos( )( )

sin( )n n

n n n

A tx t A

B tω

ω

infin

=

⎡ ⎤= + ⎢ ⎥+⎣ ⎦

sumThe original signal

Quite a few

The frequency of this component

Amplitude the contribution of this component

7

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it

It isnrsquot practical to calculate an infinite number of things in the general case

We need to frame the problem as a finite one so we can actually solve the general case

8

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 3: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Jean Baptiste Joseph FourierHe was a French mathematician and physicist who lived from 1768-1830

He presented a paper in 1807 to the Institut de France claiming any continuous signal with finite period could be represented as the sum of an infinite series of properly chosen sinusoidal wave functions at different frequencies

We now call this the Fourier Series

Could this be the way to decompose sounds into different frequencies like the ear does

3

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

FactoidAmong the reviewers of Fourierrsquos paper were two famous mathematicians

ndash Joseph Louis Lagrange (1736-1813) ndash Pierre Simon de Laplace (1749-1827)

Lagrange said sine waves could not perfectly represent signals with discontinuous slopes like square waves (He was technically right)

Thus the Institut de France did not publish Fourierrsquos work until 15 years later after Lagrange died

4

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Fourier Series

[ ]01

( ) cos( ) sin( )n n n nn

x t A A t B tω ωinfin

=

= + +sum

Given a periodic function x(t)

An INFINITE series of sine and cosine functions reproduces x(t)

0 2 n n n Tω ω π= =

the period

IntegersmtxmTtx isinforall=+ )()(

Where frequency ωn is an integer multiple n of fundamental frequency ω0

5

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Aside Two defs for ldquofrequencyrdquo

bull The frequency of a periodic function can be defined in two ways

f =ω 2π =1TFrequency

Angular Frequency Period

6

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fourier Series

01

cos( )( )

sin( )n n

n n n

A tx t A

B tω

ω

infin

=

⎡ ⎤= + ⎢ ⎥+⎣ ⎦

sumThe original signal

Quite a few

The frequency of this component

Amplitude the contribution of this component

7

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it

It isnrsquot practical to calculate an infinite number of things in the general case

We need to frame the problem as a finite one so we can actually solve the general case

8

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 4: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

FactoidAmong the reviewers of Fourierrsquos paper were two famous mathematicians

ndash Joseph Louis Lagrange (1736-1813) ndash Pierre Simon de Laplace (1749-1827)

Lagrange said sine waves could not perfectly represent signals with discontinuous slopes like square waves (He was technically right)

Thus the Institut de France did not publish Fourierrsquos work until 15 years later after Lagrange died

4

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Fourier Series

[ ]01

( ) cos( ) sin( )n n n nn

x t A A t B tω ωinfin

=

= + +sum

Given a periodic function x(t)

An INFINITE series of sine and cosine functions reproduces x(t)

0 2 n n n Tω ω π= =

the period

IntegersmtxmTtx isinforall=+ )()(

Where frequency ωn is an integer multiple n of fundamental frequency ω0

5

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Aside Two defs for ldquofrequencyrdquo

bull The frequency of a periodic function can be defined in two ways

f =ω 2π =1TFrequency

Angular Frequency Period

6

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fourier Series

01

cos( )( )

sin( )n n

n n n

A tx t A

B tω

ω

infin

=

⎡ ⎤= + ⎢ ⎥+⎣ ⎦

sumThe original signal

Quite a few

The frequency of this component

Amplitude the contribution of this component

7

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it

It isnrsquot practical to calculate an infinite number of things in the general case

We need to frame the problem as a finite one so we can actually solve the general case

8

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 5: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Fourier Series

[ ]01

( ) cos( ) sin( )n n n nn

x t A A t B tω ωinfin

=

= + +sum

Given a periodic function x(t)

An INFINITE series of sine and cosine functions reproduces x(t)

0 2 n n n Tω ω π= =

the period

IntegersmtxmTtx isinforall=+ )()(

Where frequency ωn is an integer multiple n of fundamental frequency ω0

5

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Aside Two defs for ldquofrequencyrdquo

bull The frequency of a periodic function can be defined in two ways

f =ω 2π =1TFrequency

Angular Frequency Period

6

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fourier Series

01

cos( )( )

sin( )n n

n n n

A tx t A

B tω

ω

infin

=

⎡ ⎤= + ⎢ ⎥+⎣ ⎦

sumThe original signal

Quite a few

The frequency of this component

Amplitude the contribution of this component

7

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it

It isnrsquot practical to calculate an infinite number of things in the general case

We need to frame the problem as a finite one so we can actually solve the general case

8

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 6: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Aside Two defs for ldquofrequencyrdquo

bull The frequency of a periodic function can be defined in two ways

f =ω 2π =1TFrequency

Angular Frequency Period

6

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fourier Series

01

cos( )( )

sin( )n n

n n n

A tx t A

B tω

ω

infin

=

⎡ ⎤= + ⎢ ⎥+⎣ ⎦

sumThe original signal

Quite a few

The frequency of this component

Amplitude the contribution of this component

7

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it

It isnrsquot practical to calculate an infinite number of things in the general case

We need to frame the problem as a finite one so we can actually solve the general case

8

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 7: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fourier Series

01

cos( )( )

sin( )n n

n n n

A tx t A

B tω

ω

infin

=

⎡ ⎤= + ⎢ ⎥+⎣ ⎦

sumThe original signal

Quite a few

The frequency of this component

Amplitude the contribution of this component

7

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it

It isnrsquot practical to calculate an infinite number of things in the general case

We need to frame the problem as a finite one so we can actually solve the general case

8

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 8: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it

It isnrsquot practical to calculate an infinite number of things in the general case

We need to frame the problem as a finite one so we can actually solve the general case

8

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 9: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

PROBLEM 2

A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely

Real world audio is not infinite in lengthhellipand things are only locally periodic

9

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 10: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window

STEP 2 Pretend what you saw in that window goes on forever

Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points

10

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 11: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)

Discrete Fourier Transformbull Represents a finite sequence of complex values as a

finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly

reproduced by the finite set of sinusoids

[ ]X k

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N2

N2

Real portion

Imaginary portion

[ ]x nTime domain Frequency domain

IDFT

11

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 12: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Digital Sampling

0123456

-4-3-2-1

AM

PLI

TUD

E

TIME

sample interval

quantization increment

An analog signal is sampled into sequence of discrete sample points x[n]

12

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 13: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Windowing x[n] is windowed by function w[n]

(multiply the ith value of x by the ith value of w)

x[n] w[n] z[n]

x =

Example windowing x[n] with a rectangular window

13

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 14: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Only whatrsquos in the window

z[n]

Do the DFT only on the values in the window

x[n] w[n]

x =

Ignore Ignore

14

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 15: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Windowing can lead to problemsThe original signal The sample window

What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window

15

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 16: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window size

bull Your window should be longer than the period of the function you want to analyze

bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear

16

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 17: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Window shape

bull Making the window ldquosmallrdquo at the edges reduces weirdness

BETTER FOR GETT ING A GOOD IDEA

BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON

17

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 18: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Why window shape matters

bull Donrsquot forget that a DFT assumes the signal in the window is periodic

bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal

bull Making the edges of the window less prominent helps suppress undesirable artifacts

18

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 19: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some famous windowsbull Rectangular

bull Hann (Julius von Hann)

bull Bartlett

[ ] 1w n =

[ ] 205 1 cos

1nw n

Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠

Note we assume w[n] = 0 outside some range [0N]

sample

sample

sample

ampl

itude

ampl

itude

ampl

itude

⎟⎟⎠

⎞⎜⎜⎝

⎛ minusminusminus

minus

minus=

21

21

12][ NnNN

nw19

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 20: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The DFT and IDFT formulae

21

021

0

[ ] [ ]

1[ ] [ ]

iN knN

n

iN knN

k

X k x n e

x n X k eN

π

π

minus minus

=

minus

=

=

=

sum

sum

Discrete Fourier Transform

Inverse Discrete Fourier Transform

Whatrsquos different between them

20

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 21: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Some points

bull If you have n samples in your window you will get n frequencies of analysis from your FFT

bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain

21

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 22: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Freq of a Signal

bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of

an FFT bull The lowest frequency a sine or cosine can have

and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0

0 0 2 1f Tω π= =22

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 23: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Fundamental Frequency of Analysisbull The lowest frequency component a Fourier

transform can analyze meaningfully bull All the frequencies of analysis are integer

multiples of the fundamental frequency of analysis

bull This is also the spacing between frequencies of analysis

fanalysis =SN

Number of samples in my window

Sample rate

23

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 24: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

About Frequencies of Analysis

bull A recording with N points produces a DFT with N points

bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)

bull Letrsquos look at an example

24

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 25: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The numbers in an 8 point FFT

-3 -i -1-i 0 1+i 2 1-i 0 -1+i

-013 -098 038 -027 -013 -027 -063 -098

0 1 2 3 4 5 6 7Array Index

8 point signal

FFT of signal

DC offset Nyquist frequency

Complex conjugates

Better index 0 1 2 3 4 (-4) -3 -2 -1

Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN

0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8

25

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 26: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Another view of the same data

Complex conjugates

Nyquist frequency 26

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 27: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Numbers

Ay

( )2 2

cossin

x Ay A

A x y

θ

θ

=

=

= +

( ) cos sin

z x iy

A iθ θ

equiv +

equiv +

x

imag

inar

y

real

θ

27

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 28: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Eulerrsquos Formula

bull Useful for relating polar coordinates to rectangular coordinates

cos sinThus

i

i

e i

z Ae

θ

θ

θ θ= +

=AMPLITUDE

PHASE

28

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 29: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull POLAR notation EASIER for this

( ) ( )

1 2

1 2

1 1 2 2

3 1 2

i i

i

z A e z A e

z A A e

θ θ

θ θ+

= =

=

29

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 30: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Multiplying Complex Numbers

bull Cartesian works as followshellip

( )1 1 1 2 2 2

3 1 2 1 2 1 2 2 1

z x iy z x iy

z x x y y i x y x y

= + = +

= minus + +

30

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 31: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Complex Conjugate

bull the complex conjugate of a complex number is given by changing the sign of the imaginary part

z a ibz a ib= +

= minus

A complex number

Its complex conjugate

31

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 32: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

DFT in Cartesian Coordinates

21

0

1

0

[ ] [ ]

2 2[ ] cos sin

iN knN

n

N

n

X k x n e

kn knx n i

N N

π

π π

minus minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

I got this using Eulerrsquos formula

In Cartesian coordinates the fact that these are complex values is more obvious

32

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 33: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Inverse DFT

21

0

1

0

1[ ] [ ]

1 2 2[ ] cos sin

iN knN

k

N

k

x n X k eN

kn knX k i

N N N

π

π π

minus

=

minus

=

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

sum

complex number

33

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 34: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Computational complexity

bull How many operations does this take for each frequency

bull How many operations total

1

0

2 2[ ] [ ] cos sin

N

n

kn knX k x n i

N Nπ πminus

=

⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

sum

34

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 35: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The FFT

bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2

35

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 36: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Steps when making a spectrogram

bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to

remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a

good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the

graph) horizontal = time step (left is earlier right is later) and color = how loud

36

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 37: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Short time Fourier Transform

bullBreak signal into windows bullApply windowing function bullCalculate fft of each window

37

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 38: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

Reconstructing the waveform

38

To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT

Do the inverse Fourier transform Align the resulting signal where it was originally

Now element-wise add the values in the time-aligned windows

Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 39: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogramspectrogram(y256128256fsyaxis)

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40

Page 40: Topic 6 - Interactive Audio Labinteractiveaudiolab.github.io/.../MPM19-topic6-DFT.pdf · Topic 6 The Digital Fourier Transform!1. Bryan Pardo, 2008, Northwestern University EECS 352:

Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio

The Spectrogram

bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab

spectrogram(y10245121024fsyaxis)

40