Lecture 1 Speech Analysis

7/31/2019 Lecture 1 Speech Analysis

1/49

Lecture 1

Overview of Sound

1

1


2/49

Lecture overview

Overview of sound processing part of module

Sound processing coursework

Speech processing applications

Analysis of speech signals

Time-domain

Frequency-domain

Time-frequency domain (spectrogram)

2


3/49

Sound processing component overview

Lecture 1 Analysis of sound signals

Lecture 2 Fourier transform

Lecture 3 Acoustic phonetics

Lecture 4 Articulatory phonetics

Lecture 5 Speech recognition I - overview

Lecture 6 Speech recognition II feature extraction

Lecture 7 Speech recognition III acoustic modelling

Lecture 8 Speech recognition IV language modelling

Lecture 9 Speech enhancement

Lecture 10 Case study : Formula 1 motor racing

3

3


4/49


Design, implementation and testing of a speech recogniser capable of

providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs

Two parts to assessment:

Technical report outlining design and evaluation of speech recogniser

Practical demonstration Tasks to be carried out:

Speech collection

Speech labelling

Design and implementation of feature extraction Training of speech recogniser

Evaluation of speech recogniser

Aim to combine use of commercial tools and toolkits and own

implementation through MATLAB 4

4


5/49







Speech collection

Speech labelling





Speech Filing System (SFS)

http://www.phon.ucl.ac.uk/resource/sfs/

4


6/49







Speech collection

Speech labelling







HMM Toolkit (HTK)

http://htk.eng.cam.ac.uk/

4


7/49







Speech collection

Speech labelling







HMM Toolkit (HTK)

http://htk.eng.cam.ac.uk/

MATLAB

4


8/49

Speech processing

Sound part of module will concentrate mainly on speech processing

Will study core signal processing techniques: Spectral analysis and spectrograms

Fourier transform

Acoustic and articulatory phonetics

Feature extraction

Acoustic modelling

Language modelling

Classification

Filtering

Examine them in the context of speech processing applications

Speech recognition

Speech enhancement

Speech synthesis

Work in speech processing requires a wide range of skills and knowledge

5

5


9/49

Where are the jobs in speech processing ?

Speech (and signal) processing is found in a very broad range of

applications, services and products Many companies involved in speech processing

Specific speech processing companies e.g. Nuance, SRC, .

Computer companies Apple, IBM, Microsoft, .

Internet companies Google, Yahoo, Skype, .

Mobile phone companies and providers Nokia, Motorola, .

Telcos BT, AT&T, France Telecom, Deutsch Telecom, .

Plus many other smaller companies in a range of areas

Speech recognition market worth $40 billion in 2010, growth of 8.8% peryear expected

Signal processing even more wide ranging e.g. acoustics, sonar, medical,

image recognition/processing, .. 6

6


10/49

Audio signals are perceived and understood (in the case of speech) by

theirfrequency content, not by theirwaveform representation The ear acts as a frequency analyser and feeds information about

frequency content to the brain

In many speech processing applications the frequency content of

signals is required speech recognition, coding, synthesis,enhancement, etc

However, frequency analysis is also important for other signals such asimages and radio-frequency signals: in fact, for any information-bearingsignal

To highlight this will now compare the time-domain and frequency-domain representations of signals to see what information can be

7

Speech analysis

7


11/49

A speech signal changes constantly as different speech sounds are made

For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions

8

Time-varying nature of speech

8


12/49



500ms

8


8


13/49



30ms

500ms

8


8


14/49



So we need a frequency analysis technique that cananalyse short periods of signal

This is the Discrete Fourier Transform (DFT)(discrete because the signal is sampled, not continuous)

30ms

500ms

8


8


15/49

Take analogue speech signal continuous in amplitude, continuous in time

9

Digital speech sampling and quantisation

,me

amplitude

9


16/49


9


Sampling - take samples of the waveform every Ts seconds

Ts

,me

amplitude

9


17/49


9


q1

q2

q3

q4

q5

q6

q7

q8


Ts

,me

amplitude

9


18/49


9


q1

q2

q3

q4

q5

q6

q7

q8


Ts

Quantisation allocate sample amplitudes to nearest quantisation levels

,me

amplitude

9


19/49


9


q1q2

q3

q4

q5

q6

q7

q8


Ts

Quantisation allocate sample amplitudes to nearest quantisation levels

Can represent the discrete time, discrete amplitude signal as a vector, x(n)

x = [1, 3, 7, 7, 5, 3, 3, 3, 3, 1, -3, -7, -7]

So, x(1) = 1; x(2) = 3; .. x(11) = -3, ..

,me

amplitude

9


20/49

10

Time-domain analysis of speech

Examine a time-domain waveform of a sentence of speech

x-axis shows time seconds or samples y-axis shows amplitude of each sample

What does it show?

10


21/49

11


What does it show?

Duration of utterance

Guide to energy speech or non-speech

Maybe indication of voicing voiced or unvoiced

Quite limited in terms of detail shown

A tanker is a ship designed to carry large volumes of oil or other liquid cargo

11


22/49

11


What does it show?

Duration of utterance

Guide to energy speech or non-speech

Maybe indication of voicing voiced or unvoiced

Quite limited in terms of detail shown

A tanker is a ship designed to carry large volumes of oil or other liquid cargo

11

Ti d i l i f h


23/49

12


Now zoom in to look at a small section of the utterance

This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified

Still limited cannot identify actual sound (phoneme)

12

Ti d i l i f h


24/49

12


Now zoom in to look at a small section of the utterance

This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified

Still limited cannot identify actual sound (phoneme)

12

F d i l i f h


25/49

13

Frequency-domain analysis of speech

For frequency-domain analysis need to transform the time-domain signal

into the frequency-domain Several methods exist to do this e.g. Fourier transform, filterbank

For signal processing applications most common is to use the Fouriertransform

Fourier transform comes in different forms:

Fourier transform

Discrete Fourier transform (DFT)

Fast Fourier transform (FFT) - fft function in MATLAB

DFT

Time-domainsignal

Frequency-domainsignal

13

F d i l i f h


26/49

14


This is useful as it shows which frequencies are present in a signal and

how much energy is present at that frequency This is important for analysing and classifying signals and generating

signals

14

F d i l i f h


27/49

14




signals

DFT

Time-domain signal Frequency-domain signal

14

Frequency domain analysis of speech


28/49

14




signals

DFT


DFT

14



29/49

14




signals

DFT


DFT

DFT

14



30/49

15


15



31/49

15


15



32/49

15


DFT

Magnitude spectrum

15



33/49

15


DFT

Magnitude spectrum

Magnitude spectrumprovides much moreinformation

Spectral envelope(phoneme sound)

Harmonics (pitch)

Energy

15



34/49

15


DFT

Magnitude spectrum



Harmonics (pitch)

Energy

15



35/49

15


DFT

Magnitude spectrum



Harmonics (pitch)

Energy

15



36/49

16


DFT

Magnitude spectrum

16

Time-frequency analysis of speech


37/49

17

Time frequency analysis of speech

Time-domain analysis shows that speech signals changes substantially

over time it is a time-varying signal Frequency-domain enables us to see the frequency composition of signals

that is much more useful for analysis that the time-domain signal

However, the frequency-domain analysis can only be performed on quasi-stationaryportions of the signal, which are short by nature (10-50ms)

Solution is time-frequency analysis, orspectrogram

17



38/49

18


Process to create a spectrogram:

1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum

3. Allocate spectral amplitudes different colours

4. Plot colours

5. Return to 1 until end of signal

Time

Freq.

18



39/49

18





4. Plot colours


Time

Freq.

18



40/49

18





4. Plot colours


Time

DFT

Freq.

18



41/49

18

e eque cy a a ys s o speec




4. Plot colours


Time

DFT

Freq.

18



42/49

18

q y y p




4. Plot colours


Time

DFT

Freq.

18



43/49

19

q y y p

Time

large volumes of oil or other liquid cargo

19

Time-frequency analysis of other signals


44/49

20

q y y g

What characteristics does this signal have?

What could have produced this sound?

20



45/49

20

q y y g



20



46/49

21

y y g

Compare to equivalent time-domain signal

Very hard to identify any features only duration and gradual increase inenergy

21



47/49

2222



Also information about the recording

22



48/49

2222



Also information about the recording

22

Summary


49/49

2323

Considered methods for analysing the characteristics and features of anaudio signal

Observed that real signals (e.g. speech) are not stationary but can varyrapidly over time

Time-domain can show this variation

Frequency-domain usually provides more information but requires a quasi-stationary signal for analysis

One solution is time-frequency representation (spectrogram)

Will return to spectrograms when we study acoustic phonetics

23

Documents

Lecture 1 Speech Analysis