Upload
others
View
26
Download
0
Embed Size (px)
Citation preview
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Topic 6
The Digital Fourier Transform
1
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why bother bull The ear processes sound by decomposing it into sine
waves at different frequencies bull An algorithm that does the same would be a step towards
machines that hears things as humans do bull Sohelliphow do we do this by machine
0 10 20 30 40 50 60 70 80 90 100-25
-2
-15
-1
-05
0
05
1
15
2
25
220 Hz
660 Hz
1100 HzPeripheral Auditory system
Cochlea Auditory nerve
0 10 20 30 40 50 60 70 80 90 100-1
-08
-06
-04
-02
0
02
04
06
08
1
0 10 20 30 40 50 60 70 80 90 100-1
-08
-06
-04
-02
0
02
04
06
08
1
0 10 20 30 40 50 60 70 80 90 100-1
-08
-06
-04
-02
0
02
04
06
08
1
2
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Jean Baptiste Joseph FourierHe was a French mathematician and physicist who lived from 1768-1830
He presented a paper in 1807 to the Institut de France claiming any continuous signal with finite period could be represented as the sum of an infinite series of properly chosen sinusoidal wave functions at different frequencies
We now call this the Fourier Series
Could this be the way to decompose sounds into different frequencies like the ear does
3
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
FactoidAmong the reviewers of Fourierrsquos paper were two famous mathematicians
ndash Joseph Louis Lagrange (1736-1813) ndash Pierre Simon de Laplace (1749-1827)
Lagrange said sine waves could not perfectly represent signals with discontinuous slopes like square waves (He was technically right)
Thus the Institut de France did not publish Fourierrsquos work until 15 years later after Lagrange died
4
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Fourier Series
[ ]01
( ) cos( ) sin( )n n n nn
x t A A t B tω ωinfin
=
= + +sum
Given a periodic function x(t)
An INFINITE series of sine and cosine functions reproduces x(t)
0 2 n n n Tω ω π= =
the period
IntegersmtxmTtx isinforall=+ )()(
Where frequency ωn is an integer multiple n of fundamental frequency ω0
5
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Aside Two defs for ldquofrequencyrdquo
bull The frequency of a periodic function can be defined in two ways
f =ω 2π =1TFrequency
Angular Frequency Period
6
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fourier Series
01
cos( )( )
sin( )n n
n n n
A tx t A
B tω
ω
infin
=
⎡ ⎤= + ⎢ ⎥+⎣ ⎦
sumThe original signal
Quite a few
The frequency of this component
Amplitude the contribution of this component
7
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it
It isnrsquot practical to calculate an infinite number of things in the general case
We need to frame the problem as a finite one so we can actually solve the general case
8
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why bother bull The ear processes sound by decomposing it into sine
waves at different frequencies bull An algorithm that does the same would be a step towards
machines that hears things as humans do bull Sohelliphow do we do this by machine
0 10 20 30 40 50 60 70 80 90 100-25
-2
-15
-1
-05
0
05
1
15
2
25
220 Hz
660 Hz
1100 HzPeripheral Auditory system
Cochlea Auditory nerve
0 10 20 30 40 50 60 70 80 90 100-1
-08
-06
-04
-02
0
02
04
06
08
1
0 10 20 30 40 50 60 70 80 90 100-1
-08
-06
-04
-02
0
02
04
06
08
1
0 10 20 30 40 50 60 70 80 90 100-1
-08
-06
-04
-02
0
02
04
06
08
1
2
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Jean Baptiste Joseph FourierHe was a French mathematician and physicist who lived from 1768-1830
He presented a paper in 1807 to the Institut de France claiming any continuous signal with finite period could be represented as the sum of an infinite series of properly chosen sinusoidal wave functions at different frequencies
We now call this the Fourier Series
Could this be the way to decompose sounds into different frequencies like the ear does
3
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
FactoidAmong the reviewers of Fourierrsquos paper were two famous mathematicians
ndash Joseph Louis Lagrange (1736-1813) ndash Pierre Simon de Laplace (1749-1827)
Lagrange said sine waves could not perfectly represent signals with discontinuous slopes like square waves (He was technically right)
Thus the Institut de France did not publish Fourierrsquos work until 15 years later after Lagrange died
4
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Fourier Series
[ ]01
( ) cos( ) sin( )n n n nn
x t A A t B tω ωinfin
=
= + +sum
Given a periodic function x(t)
An INFINITE series of sine and cosine functions reproduces x(t)
0 2 n n n Tω ω π= =
the period
IntegersmtxmTtx isinforall=+ )()(
Where frequency ωn is an integer multiple n of fundamental frequency ω0
5
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Aside Two defs for ldquofrequencyrdquo
bull The frequency of a periodic function can be defined in two ways
f =ω 2π =1TFrequency
Angular Frequency Period
6
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fourier Series
01
cos( )( )
sin( )n n
n n n
A tx t A
B tω
ω
infin
=
⎡ ⎤= + ⎢ ⎥+⎣ ⎦
sumThe original signal
Quite a few
The frequency of this component
Amplitude the contribution of this component
7
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it
It isnrsquot practical to calculate an infinite number of things in the general case
We need to frame the problem as a finite one so we can actually solve the general case
8
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Jean Baptiste Joseph FourierHe was a French mathematician and physicist who lived from 1768-1830
He presented a paper in 1807 to the Institut de France claiming any continuous signal with finite period could be represented as the sum of an infinite series of properly chosen sinusoidal wave functions at different frequencies
We now call this the Fourier Series
Could this be the way to decompose sounds into different frequencies like the ear does
3
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
FactoidAmong the reviewers of Fourierrsquos paper were two famous mathematicians
ndash Joseph Louis Lagrange (1736-1813) ndash Pierre Simon de Laplace (1749-1827)
Lagrange said sine waves could not perfectly represent signals with discontinuous slopes like square waves (He was technically right)
Thus the Institut de France did not publish Fourierrsquos work until 15 years later after Lagrange died
4
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Fourier Series
[ ]01
( ) cos( ) sin( )n n n nn
x t A A t B tω ωinfin
=
= + +sum
Given a periodic function x(t)
An INFINITE series of sine and cosine functions reproduces x(t)
0 2 n n n Tω ω π= =
the period
IntegersmtxmTtx isinforall=+ )()(
Where frequency ωn is an integer multiple n of fundamental frequency ω0
5
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Aside Two defs for ldquofrequencyrdquo
bull The frequency of a periodic function can be defined in two ways
f =ω 2π =1TFrequency
Angular Frequency Period
6
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fourier Series
01
cos( )( )
sin( )n n
n n n
A tx t A
B tω
ω
infin
=
⎡ ⎤= + ⎢ ⎥+⎣ ⎦
sumThe original signal
Quite a few
The frequency of this component
Amplitude the contribution of this component
7
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it
It isnrsquot practical to calculate an infinite number of things in the general case
We need to frame the problem as a finite one so we can actually solve the general case
8
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
FactoidAmong the reviewers of Fourierrsquos paper were two famous mathematicians
ndash Joseph Louis Lagrange (1736-1813) ndash Pierre Simon de Laplace (1749-1827)
Lagrange said sine waves could not perfectly represent signals with discontinuous slopes like square waves (He was technically right)
Thus the Institut de France did not publish Fourierrsquos work until 15 years later after Lagrange died
4
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Fourier Series
[ ]01
( ) cos( ) sin( )n n n nn
x t A A t B tω ωinfin
=
= + +sum
Given a periodic function x(t)
An INFINITE series of sine and cosine functions reproduces x(t)
0 2 n n n Tω ω π= =
the period
IntegersmtxmTtx isinforall=+ )()(
Where frequency ωn is an integer multiple n of fundamental frequency ω0
5
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Aside Two defs for ldquofrequencyrdquo
bull The frequency of a periodic function can be defined in two ways
f =ω 2π =1TFrequency
Angular Frequency Period
6
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fourier Series
01
cos( )( )
sin( )n n
n n n
A tx t A
B tω
ω
infin
=
⎡ ⎤= + ⎢ ⎥+⎣ ⎦
sumThe original signal
Quite a few
The frequency of this component
Amplitude the contribution of this component
7
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it
It isnrsquot practical to calculate an infinite number of things in the general case
We need to frame the problem as a finite one so we can actually solve the general case
8
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Fourier Series
[ ]01
( ) cos( ) sin( )n n n nn
x t A A t B tω ωinfin
=
= + +sum
Given a periodic function x(t)
An INFINITE series of sine and cosine functions reproduces x(t)
0 2 n n n Tω ω π= =
the period
IntegersmtxmTtx isinforall=+ )()(
Where frequency ωn is an integer multiple n of fundamental frequency ω0
5
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Aside Two defs for ldquofrequencyrdquo
bull The frequency of a periodic function can be defined in two ways
f =ω 2π =1TFrequency
Angular Frequency Period
6
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fourier Series
01
cos( )( )
sin( )n n
n n n
A tx t A
B tω
ω
infin
=
⎡ ⎤= + ⎢ ⎥+⎣ ⎦
sumThe original signal
Quite a few
The frequency of this component
Amplitude the contribution of this component
7
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it
It isnrsquot practical to calculate an infinite number of things in the general case
We need to frame the problem as a finite one so we can actually solve the general case
8
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Aside Two defs for ldquofrequencyrdquo
bull The frequency of a periodic function can be defined in two ways
f =ω 2π =1TFrequency
Angular Frequency Period
6
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fourier Series
01
cos( )( )
sin( )n n
n n n
A tx t A
B tω
ω
infin
=
⎡ ⎤= + ⎢ ⎥+⎣ ⎦
sumThe original signal
Quite a few
The frequency of this component
Amplitude the contribution of this component
7
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it
It isnrsquot practical to calculate an infinite number of things in the general case
We need to frame the problem as a finite one so we can actually solve the general case
8
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fourier Series
01
cos( )( )
sin( )n n
n n n
A tx t A
B tω
ω
infin
=
⎡ ⎤= + ⎢ ⎥+⎣ ⎦
sumThe original signal
Quite a few
The frequency of this component
Amplitude the contribution of this component
7
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it
It isnrsquot practical to calculate an infinite number of things in the general case
We need to frame the problem as a finite one so we can actually solve the general case
8
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 1 The Fourier Series has an infinite number of sinusoids in it
It isnrsquot practical to calculate an infinite number of things in the general case
We need to frame the problem as a finite one so we can actually solve the general case
8
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
PROBLEM 2
A function representable by a Fourier series is perfectly periodic Therefore it goes on infinitely
Real world audio is not infinite in lengthhellipand things are only locally periodic
9
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Solution Sample Window HopeSTEP 1 Take a ldquosnapshotrdquo of the signal by sampling it a finite number of times within a brief time window
STEP 2 Pretend what you saw in that window goes on forever
Voila Now you have a ldquoperiodicrdquo signal represented by a finite number of points
10
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Image from The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith)
Discrete Fourier Transformbull Represents a finite sequence of complex values as a
finite number of discrete sinusoidal functions bull This finite sequence of samples can be perfectly
reproduced by the finite set of sinusoids
[ ]X k
DFT0 N-1
0 N-1
0 N-1
0 N-1
Real portion
Imaginary portion
N2
N2
Real portion
Imaginary portion
[ ]x nTime domain Frequency domain
IDFT
11
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Digital Sampling
0123456
-4-3-2-1
AM
PLI
TUD
E
TIME
sample interval
quantization increment
An analog signal is sampled into sequence of discrete sample points x[n]
12
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Windowing x[n] is windowed by function w[n]
(multiply the ith value of x by the ith value of w)
x[n] w[n] z[n]
x =
Example windowing x[n] with a rectangular window
13
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Only whatrsquos in the window
z[n]
Do the DFT only on the values in the window
x[n] w[n]
x =
Ignore Ignore
14
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Windowing can lead to problemsThe original signal The sample window
What the Fourier Transform ldquoimaginesrdquo the signal looks like based on what was in the window
15
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window size
bull Your window should be longer than the period of the function you want to analyze
bull At a sample rate of 8000 Hz what is the minimum window size that can capture the lowest sound you can hear
16
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Window shape
bull Making the window ldquosmallrdquo at the edges reduces weirdness
BETTER FOR GETT ING A GOOD IDEA
BAD FOR GETTING A GOOD IDEA OF WHATrsquoS GOING ON
17
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Why window shape matters
bull Donrsquot forget that a DFT assumes the signal in the window is periodic
bull The boundary conditions mess things uphellipunless you manage to have a window whose length is an exact integer multiple of the period of your signal
bull Making the edges of the window less prominent helps suppress undesirable artifacts
18
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some famous windowsbull Rectangular
bull Hann (Julius von Hann)
bull Bartlett
[ ] 1w n =
[ ] 205 1 cos
1nw n
Nπ⎛ ⎞⎛ ⎞= minus ⎜ ⎟⎜ ⎟minus⎝ ⎠⎝ ⎠
Note we assume w[n] = 0 outside some range [0N]
sample
sample
sample
ampl
itude
ampl
itude
ampl
itude
⎟⎟⎠
⎞⎜⎜⎝
⎛ minusminusminus
minus
minus=
21
21
12][ NnNN
nw19
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The DFT and IDFT formulae
21
021
0
[ ] [ ]
1[ ] [ ]
iN knN
n
iN knN
k
X k x n e
x n X k eN
π
π
minus minus
=
minus
=
=
=
sum
sum
Discrete Fourier Transform
Inverse Discrete Fourier Transform
Whatrsquos different between them
20
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Some points
bull If you have n samples in your window you will get n frequencies of analysis from your FFT
bull If the samples were grabbed at regular times then the analysis frequencies will be spaced regularly in the frequency domain
21
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Freq of a Signal
bull This is NOT the DC offset bull This is NOT fundamental frequency of analysis of
an FFT bull The lowest frequency a sine or cosine can have
and still fit into one period of the signal function bull Often called ldquoF zerordquo and written F0
0 0 2 1f Tω π= =22
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Fundamental Frequency of Analysisbull The lowest frequency component a Fourier
transform can analyze meaningfully bull All the frequencies of analysis are integer
multiples of the fundamental frequency of analysis
bull This is also the spacing between frequencies of analysis
fanalysis =SN
Number of samples in my window
Sample rate
23
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
About Frequencies of Analysis
bull A recording with N points produces a DFT with N points
bull The energy in the DFT is symmetric around two ldquopivot pointsrdquo the DC offset (the 0 frequency) and frac12 the sample frequency (S)
bull Letrsquos look at an example
24
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The numbers in an 8 point FFT
-3 -i -1-i 0 1+i 2 1-i 0 -1+i
-013 -098 038 -027 -013 -027 -063 -098
0 1 2 3 4 5 6 7Array Index
8 point signal
FFT of signal
DC offset Nyquist frequency
Complex conjugates
Better index 0 1 2 3 4 (-4) -3 -2 -1
Frequency of FFT 0 SN 2SN 3SN 4SN -3SN -2SN -SN
0 125Hz 250Hz 375Hz 500Hz -375Hz -250Hz -125HzIf S=1000 Hz and N = 8
25
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Another view of the same data
Complex conjugates
Nyquist frequency 26
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Numbers
Ay
( )2 2
cossin
x Ay A
A x y
θ
θ
=
=
= +
( ) cos sin
z x iy
A iθ θ
equiv +
equiv +
x
imag
inar
y
real
θ
27
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Eulerrsquos Formula
bull Useful for relating polar coordinates to rectangular coordinates
cos sinThus
i
i
e i
z Ae
θ
θ
θ θ= +
=AMPLITUDE
PHASE
28
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull POLAR notation EASIER for this
( ) ( )
1 2
1 2
1 1 2 2
3 1 2
i i
i
z A e z A e
z A A e
θ θ
θ θ+
= =
=
29
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Multiplying Complex Numbers
bull Cartesian works as followshellip
( )1 1 1 2 2 2
3 1 2 1 2 1 2 2 1
z x iy z x iy
z x x y y i x y x y
= + = +
= minus + +
30
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Complex Conjugate
bull the complex conjugate of a complex number is given by changing the sign of the imaginary part
z a ibz a ib= +
= minus
A complex number
Its complex conjugate
31
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
DFT in Cartesian Coordinates
21
0
1
0
[ ] [ ]
2 2[ ] cos sin
iN knN
n
N
n
X k x n e
kn knx n i
N N
π
π π
minus minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
I got this using Eulerrsquos formula
In Cartesian coordinates the fact that these are complex values is more obvious
32
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Inverse DFT
21
0
1
0
1[ ] [ ]
1 2 2[ ] cos sin
iN knN
k
N
k
x n X k eN
kn knX k i
N N N
π
π π
minus
=
minus
=
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
sum
complex number
33
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Computational complexity
bull How many operations does this take for each frequency
bull How many operations total
1
0
2 2[ ] [ ] cos sin
N
n
kn knX k x n i
N Nπ πminus
=
⎛ ⎞⎛ ⎞ ⎛ ⎞= minus + minus⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
sum
34
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The FFT
bull Fast Fourier Transform ndash A much much faster way to do the DFT ndash Introduced by Carl F Gauss in 1805 (ish) ndash Rediscovered by Cooley amp Tukey in 1965 ndash Big O notation for this is O(N log N) ndash Matlab functions fft and ifft are standard ndash REQUIRES the window size be a power of 2
35
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Steps when making a spectrogram
bull Cut up a signal into (possibly overlapping) windows bull Apply a windowing function (see slide 14) if you want to
remove those edge effects bull Take the FFT of each window bull if X is the FFT do npabs(X) to get the magnitude of the signal bull For display take the log of the values (deciBels would be a
good way) bull Only display frequencies up to 12 the sample rate bull Plot it where vertical = freq (low frequecy = lower on the
graph) horizontal = time step (left is earlier right is later) and color = how loud
36
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Short time Fourier Transform
bullBreak signal into windows bullApply windowing function bullCalculate fft of each window
37
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
Reconstructing the waveform
38
To invert the STFT to reconstruct the original wave form do thishellip For each complex spectrum in your STFT
Do the inverse Fourier transform Align the resulting signal where it was originally
Now element-wise add the values in the time-aligned windows
Note if you use a Hann window and the hop size is 12 the window size yoursquoll end up with a perfect reconstruction in the overlapped portions Think about why that is
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogramspectrogram(y256128256fsyaxis)
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab39
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40
Bryan Pardo 2008 Northwestern University EECS 352 Machine Perception of Music and Audio
The Spectrogram
bullA series of short term DFTs bullTypically just displays the magnitudes in X of the frequencies up to frac12 sampe rate bullThere is a spectrogram function in matlab
spectrogram(y10245121024fsyaxis)
40