Upload
davegreenwood
View
230
Download
0
Embed Size (px)
Citation preview
7/31/2019 Lecture 1 Speech Analysis
1/49
Lecture 1
Overview of Sound
1
1
7/31/2019 Lecture 1 Speech Analysis
2/49
Lecture overview
Overview of sound processing part of module
Sound processing coursework
Speech processing applications
Analysis of speech signals
Time-domain
Frequency-domain
Time-frequency domain (spectrogram)
2
7/31/2019 Lecture 1 Speech Analysis
3/49
Sound processing component overview
Lecture 1 Analysis of sound signals
Lecture 2 Fourier transform
Lecture 3 Acoustic phonetics
Lecture 4 Articulatory phonetics
Lecture 5 Speech recognition I - overview
Lecture 6 Speech recognition II feature extraction
Lecture 7 Speech recognition III acoustic modelling
Lecture 8 Speech recognition IV language modelling
Lecture 9 Speech enhancement
Lecture 10 Case study : Formula 1 motor racing
3
3
7/31/2019 Lecture 1 Speech Analysis
4/49
Sound processing coursework
Design, implementation and testing of a speech recogniser capable of
providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs
Two parts to assessment:
Technical report outlining design and evaluation of speech recogniser
Practical demonstration Tasks to be carried out:
Speech collection
Speech labelling
Design and implementation of feature extraction Training of speech recogniser
Evaluation of speech recogniser
Aim to combine use of commercial tools and toolkits and own
implementation through MATLAB 4
4
7/31/2019 Lecture 1 Speech Analysis
5/49
Sound processing coursework
Design, implementation and testing of a speech recogniser capable of
providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs
Two parts to assessment:
Technical report outlining design and evaluation of speech recogniser
Practical demonstration Tasks to be carried out:
Speech collection
Speech labelling
Design and implementation of feature extraction Training of speech recogniser
Evaluation of speech recogniser
Aim to combine use of commercial tools and toolkits and own
implementation through MATLAB 4
Speech Filing System (SFS)
http://www.phon.ucl.ac.uk/resource/sfs/
4
7/31/2019 Lecture 1 Speech Analysis
6/49
Sound processing coursework
Design, implementation and testing of a speech recogniser capable of
providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs
Two parts to assessment:
Technical report outlining design and evaluation of speech recogniser
Practical demonstration Tasks to be carried out:
Speech collection
Speech labelling
Design and implementation of feature extraction Training of speech recogniser
Evaluation of speech recogniser
Aim to combine use of commercial tools and toolkits and own
implementation through MATLAB 4
Speech Filing System (SFS)
http://www.phon.ucl.ac.uk/resource/sfs/
HMM Toolkit (HTK)
http://htk.eng.cam.ac.uk/
4
7/31/2019 Lecture 1 Speech Analysis
7/49
Sound processing coursework
Design, implementation and testing of a speech recogniser capable of
providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs
Two parts to assessment:
Technical report outlining design and evaluation of speech recogniser
Practical demonstration Tasks to be carried out:
Speech collection
Speech labelling
Design and implementation of feature extraction Training of speech recogniser
Evaluation of speech recogniser
Aim to combine use of commercial tools and toolkits and own
implementation through MATLAB 4
Speech Filing System (SFS)
http://www.phon.ucl.ac.uk/resource/sfs/
HMM Toolkit (HTK)
http://htk.eng.cam.ac.uk/
MATLAB
4
7/31/2019 Lecture 1 Speech Analysis
8/49
Speech processing
Sound part of module will concentrate mainly on speech processing
Will study core signal processing techniques: Spectral analysis and spectrograms
Fourier transform
Acoustic and articulatory phonetics
Feature extraction
Acoustic modelling
Language modelling
Classification
Filtering
Examine them in the context of speech processing applications
Speech recognition
Speech enhancement
Speech synthesis
Work in speech processing requires a wide range of skills and knowledge
5
5
7/31/2019 Lecture 1 Speech Analysis
9/49
Where are the jobs in speech processing ?
Speech (and signal) processing is found in a very broad range of
applications, services and products Many companies involved in speech processing
Specific speech processing companies e.g. Nuance, SRC, .
Computer companies Apple, IBM, Microsoft, .
Internet companies Google, Yahoo, Skype, .
Mobile phone companies and providers Nokia, Motorola, .
Telcos BT, AT&T, France Telecom, Deutsch Telecom, .
Plus many other smaller companies in a range of areas
Speech recognition market worth $40 billion in 2010, growth of 8.8% peryear expected
Signal processing even more wide ranging e.g. acoustics, sonar, medical,
image recognition/processing, .. 6
6
7/31/2019 Lecture 1 Speech Analysis
10/49
Audio signals are perceived and understood (in the case of speech) by
theirfrequency content, not by theirwaveform representation The ear acts as a frequency analyser and feeds information about
frequency content to the brain
In many speech processing applications the frequency content of
signals is required speech recognition, coding, synthesis,enhancement, etc
However, frequency analysis is also important for other signals such asimages and radio-frequency signals: in fact, for any information-bearingsignal
To highlight this will now compare the time-domain and frequency-domain representations of signals to see what information can be
7
Speech analysis
7
7/31/2019 Lecture 1 Speech Analysis
11/49
A speech signal changes constantly as different speech sounds are made
For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions
8
Time-varying nature of speech
8
7/31/2019 Lecture 1 Speech Analysis
12/49
A speech signal changes constantly as different speech sounds are made
For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions
500ms
8
Time-varying nature of speech
8
7/31/2019 Lecture 1 Speech Analysis
13/49
A speech signal changes constantly as different speech sounds are made
For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions
30ms
500ms
8
Time-varying nature of speech
8
7/31/2019 Lecture 1 Speech Analysis
14/49
A speech signal changes constantly as different speech sounds are made
For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions
So we need a frequency analysis technique that cananalyse short periods of signal
This is the Discrete Fourier Transform (DFT)(discrete because the signal is sampled, not continuous)
30ms
500ms
8
Time-varying nature of speech
8
7/31/2019 Lecture 1 Speech Analysis
15/49
Take analogue speech signal continuous in amplitude, continuous in time
9
Digital speech sampling and quantisation
,me
amplitude
9
7/31/2019 Lecture 1 Speech Analysis
16/49
Take analogue speech signal continuous in amplitude, continuous in time
9
Digital speech sampling and quantisation
Sampling - take samples of the waveform every Ts seconds
Ts
,me
amplitude
9
7/31/2019 Lecture 1 Speech Analysis
17/49
Take analogue speech signal continuous in amplitude, continuous in time
9
Digital speech sampling and quantisation
q1
q2
q3
q4
q5
q6
q7
q8
Sampling - take samples of the waveform every Ts seconds
Ts
,me
amplitude
9
7/31/2019 Lecture 1 Speech Analysis
18/49
Take analogue speech signal continuous in amplitude, continuous in time
9
Digital speech sampling and quantisation
q1
q2
q3
q4
q5
q6
q7
q8
Sampling - take samples of the waveform every Ts seconds
Ts
Quantisation allocate sample amplitudes to nearest quantisation levels
,me
amplitude
9
7/31/2019 Lecture 1 Speech Analysis
19/49
Take analogue speech signal continuous in amplitude, continuous in time
9
Digital speech sampling and quantisation
q1q2
q3
q4
q5
q6
q7
q8
Sampling - take samples of the waveform every Ts seconds
Ts
Quantisation allocate sample amplitudes to nearest quantisation levels
Can represent the discrete time, discrete amplitude signal as a vector, x(n)
x = [1, 3, 7, 7, 5, 3, 3, 3, 3, 1, -3, -7, -7]
So, x(1) = 1; x(2) = 3; .. x(11) = -3, ..
,me
amplitude
9
7/31/2019 Lecture 1 Speech Analysis
20/49
10
Time-domain analysis of speech
Examine a time-domain waveform of a sentence of speech
x-axis shows time seconds or samples y-axis shows amplitude of each sample
What does it show?
10
7/31/2019 Lecture 1 Speech Analysis
21/49
11
Time-domain analysis of speech
What does it show?
Duration of utterance
Guide to energy speech or non-speech
Maybe indication of voicing voiced or unvoiced
Quite limited in terms of detail shown
A tanker is a ship designed to carry large volumes of oil or other liquid cargo
11
7/31/2019 Lecture 1 Speech Analysis
22/49
11
Time-domain analysis of speech
What does it show?
Duration of utterance
Guide to energy speech or non-speech
Maybe indication of voicing voiced or unvoiced
Quite limited in terms of detail shown
A tanker is a ship designed to carry large volumes of oil or other liquid cargo
11
Ti d i l i f h
7/31/2019 Lecture 1 Speech Analysis
23/49
12
Time-domain analysis of speech
Now zoom in to look at a small section of the utterance
This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified
Still limited cannot identify actual sound (phoneme)
12
Ti d i l i f h
7/31/2019 Lecture 1 Speech Analysis
24/49
12
Time-domain analysis of speech
Now zoom in to look at a small section of the utterance
This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified
Still limited cannot identify actual sound (phoneme)
12
F d i l i f h
7/31/2019 Lecture 1 Speech Analysis
25/49
13
Frequency-domain analysis of speech
For frequency-domain analysis need to transform the time-domain signal
into the frequency-domain Several methods exist to do this e.g. Fourier transform, filterbank
For signal processing applications most common is to use the Fouriertransform
Fourier transform comes in different forms:
Fourier transform
Discrete Fourier transform (DFT)
Fast Fourier transform (FFT) - fft function in MATLAB
DFT
Time-domainsignal
Frequency-domainsignal
13
F d i l i f h
7/31/2019 Lecture 1 Speech Analysis
26/49
14
Frequency-domain analysis of speech
This is useful as it shows which frequencies are present in a signal and
how much energy is present at that frequency This is important for analysing and classifying signals and generating
signals
14
F d i l i f h
7/31/2019 Lecture 1 Speech Analysis
27/49
14
Frequency-domain analysis of speech
This is useful as it shows which frequencies are present in a signal and
how much energy is present at that frequency This is important for analysing and classifying signals and generating
signals
DFT
Time-domain signal Frequency-domain signal
14
Frequency domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
28/49
14
Frequency-domain analysis of speech
This is useful as it shows which frequencies are present in a signal and
how much energy is present at that frequency This is important for analysing and classifying signals and generating
signals
DFT
Time-domain signal Frequency-domain signal
DFT
14
Frequency domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
29/49
14
Frequency-domain analysis of speech
This is useful as it shows which frequencies are present in a signal and
how much energy is present at that frequency This is important for analysing and classifying signals and generating
signals
DFT
Time-domain signal Frequency-domain signal
DFT
DFT
14
Frequency domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
30/49
15
Frequency-domain analysis of speech
15
Frequency domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
31/49
15
Frequency-domain analysis of speech
15
Frequency domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
32/49
15
Frequency-domain analysis of speech
DFT
Magnitude spectrum
15
Frequency-domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
33/49
15
Frequency-domain analysis of speech
DFT
Magnitude spectrum
Magnitude spectrumprovides much moreinformation
Spectral envelope(phoneme sound)
Harmonics (pitch)
Energy
15
Frequency-domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
34/49
15
Frequency-domain analysis of speech
DFT
Magnitude spectrum
Magnitude spectrumprovides much moreinformation
Spectral envelope(phoneme sound)
Harmonics (pitch)
Energy
15
Frequency-domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
35/49
15
Frequency-domain analysis of speech
DFT
Magnitude spectrum
Magnitude spectrumprovides much moreinformation
Spectral envelope(phoneme sound)
Harmonics (pitch)
Energy
15
Frequency-domain analysis of speech
7/31/2019 Lecture 1 Speech Analysis
36/49
16
Frequency-domain analysis of speech
DFT
Magnitude spectrum
16
Time-frequency analysis of speech
7/31/2019 Lecture 1 Speech Analysis
37/49
17
Time frequency analysis of speech
Time-domain analysis shows that speech signals changes substantially
over time it is a time-varying signal Frequency-domain enables us to see the frequency composition of signals
that is much more useful for analysis that the time-domain signal
However, the frequency-domain analysis can only be performed on quasi-stationaryportions of the signal, which are short by nature (10-50ms)
Solution is time-frequency analysis, orspectrogram
17
Time-frequency analysis of speech
7/31/2019 Lecture 1 Speech Analysis
38/49
18
Time frequency analysis of speech
Process to create a spectrogram:
1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum
3. Allocate spectral amplitudes different colours
4. Plot colours
5. Return to 1 until end of signal
Time
Freq.
18
Time-frequency analysis of speech
7/31/2019 Lecture 1 Speech Analysis
39/49
18
Time frequency analysis of speech
Process to create a spectrogram:
1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum
3. Allocate spectral amplitudes different colours
4. Plot colours
5. Return to 1 until end of signal
Time
Freq.
18
Time-frequency analysis of speech
7/31/2019 Lecture 1 Speech Analysis
40/49
18
Time frequency analysis of speech
Process to create a spectrogram:
1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum
3. Allocate spectral amplitudes different colours
4. Plot colours
5. Return to 1 until end of signal
Time
DFT
Freq.
18
Time-frequency analysis of speech
7/31/2019 Lecture 1 Speech Analysis
41/49
18
e eque cy a a ys s o speec
Process to create a spectrogram:
1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum
3. Allocate spectral amplitudes different colours
4. Plot colours
5. Return to 1 until end of signal
Time
DFT
Freq.
18
Time-frequency analysis of speech
7/31/2019 Lecture 1 Speech Analysis
42/49
18
q y y p
Process to create a spectrogram:
1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum
3. Allocate spectral amplitudes different colours
4. Plot colours
5. Return to 1 until end of signal
Time
DFT
Freq.
18
Time-frequency analysis of speech
7/31/2019 Lecture 1 Speech Analysis
43/49
19
q y y p
Time
large volumes of oil or other liquid cargo
19
Time-frequency analysis of other signals
7/31/2019 Lecture 1 Speech Analysis
44/49
20
q y y g
What characteristics does this signal have?
What could have produced this sound?
20
Time-frequency analysis of other signals
7/31/2019 Lecture 1 Speech Analysis
45/49
20
q y y g
What characteristics does this signal have?
What could have produced this sound?
20
Time-frequency analysis of other signals
7/31/2019 Lecture 1 Speech Analysis
46/49
21
y y g
Compare to equivalent time-domain signal
Very hard to identify any features only duration and gradual increase inenergy
21
Time-frequency analysis of other signals
7/31/2019 Lecture 1 Speech Analysis
47/49
2222
What characteristics does this signal have?
What could have produced this sound?
Also information about the recording
22
Time-frequency analysis of other signals
7/31/2019 Lecture 1 Speech Analysis
48/49
2222
What characteristics does this signal have?
What could have produced this sound?
Also information about the recording
22
Summary
7/31/2019 Lecture 1 Speech Analysis
49/49
2323
Considered methods for analysing the characteristics and features of anaudio signal
Observed that real signals (e.g. speech) are not stationary but can varyrapidly over time
Time-domain can show this variation
Frequency-domain usually provides more information but requires a quasi-stationary signal for analysis
One solution is time-frequency representation (spectrogram)
Will return to spectrograms when we study acoustic phonetics
23