Lecture 1 Speech Analysis

Embed Size (px)

Citation preview

  • 7/31/2019 Lecture 1 Speech Analysis

    1/49

    Lecture 1

    Overview of Sound

    1

    1

  • 7/31/2019 Lecture 1 Speech Analysis

    2/49

    Lecture overview

    Overview of sound processing part of module

    Sound processing coursework

    Speech processing applications

    Analysis of speech signals

    Time-domain

    Frequency-domain

    Time-frequency domain (spectrogram)

    2

  • 7/31/2019 Lecture 1 Speech Analysis

    3/49

    Sound processing component overview

    Lecture 1 Analysis of sound signals

    Lecture 2 Fourier transform

    Lecture 3 Acoustic phonetics

    Lecture 4 Articulatory phonetics

    Lecture 5 Speech recognition I - overview

    Lecture 6 Speech recognition II feature extraction

    Lecture 7 Speech recognition III acoustic modelling

    Lecture 8 Speech recognition IV language modelling

    Lecture 9 Speech enhancement

    Lecture 10 Case study : Formula 1 motor racing

    3

    3

  • 7/31/2019 Lecture 1 Speech Analysis

    4/49

    Sound processing coursework

    Design, implementation and testing of a speech recogniser capable of

    providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs

    Two parts to assessment:

    Technical report outlining design and evaluation of speech recogniser

    Practical demonstration Tasks to be carried out:

    Speech collection

    Speech labelling

    Design and implementation of feature extraction Training of speech recogniser

    Evaluation of speech recogniser

    Aim to combine use of commercial tools and toolkits and own

    implementation through MATLAB 4

    4

  • 7/31/2019 Lecture 1 Speech Analysis

    5/49

    Sound processing coursework

    Design, implementation and testing of a speech recogniser capable of

    providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs

    Two parts to assessment:

    Technical report outlining design and evaluation of speech recogniser

    Practical demonstration Tasks to be carried out:

    Speech collection

    Speech labelling

    Design and implementation of feature extraction Training of speech recogniser

    Evaluation of speech recogniser

    Aim to combine use of commercial tools and toolkits and own

    implementation through MATLAB 4

    Speech Filing System (SFS)

    http://www.phon.ucl.ac.uk/resource/sfs/

    4

  • 7/31/2019 Lecture 1 Speech Analysis

    6/49

    Sound processing coursework

    Design, implementation and testing of a speech recogniser capable of

    providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs

    Two parts to assessment:

    Technical report outlining design and evaluation of speech recogniser

    Practical demonstration Tasks to be carried out:

    Speech collection

    Speech labelling

    Design and implementation of feature extraction Training of speech recogniser

    Evaluation of speech recogniser

    Aim to combine use of commercial tools and toolkits and own

    implementation through MATLAB 4

    Speech Filing System (SFS)

    http://www.phon.ucl.ac.uk/resource/sfs/

    HMM Toolkit (HTK)

    http://htk.eng.cam.ac.uk/

    4

  • 7/31/2019 Lecture 1 Speech Analysis

    7/49

    Sound processing coursework

    Design, implementation and testing of a speech recogniser capable of

    providing voice dialling for the class of CMPE3I07 in noisy conditions Coursework to be undertaken in pairs

    Two parts to assessment:

    Technical report outlining design and evaluation of speech recogniser

    Practical demonstration Tasks to be carried out:

    Speech collection

    Speech labelling

    Design and implementation of feature extraction Training of speech recogniser

    Evaluation of speech recogniser

    Aim to combine use of commercial tools and toolkits and own

    implementation through MATLAB 4

    Speech Filing System (SFS)

    http://www.phon.ucl.ac.uk/resource/sfs/

    HMM Toolkit (HTK)

    http://htk.eng.cam.ac.uk/

    MATLAB

    4

  • 7/31/2019 Lecture 1 Speech Analysis

    8/49

    Speech processing

    Sound part of module will concentrate mainly on speech processing

    Will study core signal processing techniques: Spectral analysis and spectrograms

    Fourier transform

    Acoustic and articulatory phonetics

    Feature extraction

    Acoustic modelling

    Language modelling

    Classification

    Filtering

    Examine them in the context of speech processing applications

    Speech recognition

    Speech enhancement

    Speech synthesis

    Work in speech processing requires a wide range of skills and knowledge

    5

    5

  • 7/31/2019 Lecture 1 Speech Analysis

    9/49

    Where are the jobs in speech processing ?

    Speech (and signal) processing is found in a very broad range of

    applications, services and products Many companies involved in speech processing

    Specific speech processing companies e.g. Nuance, SRC, .

    Computer companies Apple, IBM, Microsoft, .

    Internet companies Google, Yahoo, Skype, .

    Mobile phone companies and providers Nokia, Motorola, .

    Telcos BT, AT&T, France Telecom, Deutsch Telecom, .

    Plus many other smaller companies in a range of areas

    Speech recognition market worth $40 billion in 2010, growth of 8.8% peryear expected

    Signal processing even more wide ranging e.g. acoustics, sonar, medical,

    image recognition/processing, .. 6

    6

  • 7/31/2019 Lecture 1 Speech Analysis

    10/49

    Audio signals are perceived and understood (in the case of speech) by

    theirfrequency content, not by theirwaveform representation The ear acts as a frequency analyser and feeds information about

    frequency content to the brain

    In many speech processing applications the frequency content of

    signals is required speech recognition, coding, synthesis,enhancement, etc

    However, frequency analysis is also important for other signals such asimages and radio-frequency signals: in fact, for any information-bearingsignal

    To highlight this will now compare the time-domain and frequency-domain representations of signals to see what information can be

    7

    Speech analysis

    7

  • 7/31/2019 Lecture 1 Speech Analysis

    11/49

    A speech signal changes constantly as different speech sounds are made

    For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions

    8

    Time-varying nature of speech

    8

  • 7/31/2019 Lecture 1 Speech Analysis

    12/49

    A speech signal changes constantly as different speech sounds are made

    For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions

    500ms

    8

    Time-varying nature of speech

    8

  • 7/31/2019 Lecture 1 Speech Analysis

    13/49

    A speech signal changes constantly as different speech sounds are made

    For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions

    30ms

    500ms

    8

    Time-varying nature of speech

    8

  • 7/31/2019 Lecture 1 Speech Analysis

    14/49

    A speech signal changes constantly as different speech sounds are made

    For speech recognition, synthesis and coding applications the frequency content ofthe signal needs to be measured every 10-50 ms pseudo-stationary regions

    So we need a frequency analysis technique that cananalyse short periods of signal

    This is the Discrete Fourier Transform (DFT)(discrete because the signal is sampled, not continuous)

    30ms

    500ms

    8

    Time-varying nature of speech

    8

  • 7/31/2019 Lecture 1 Speech Analysis

    15/49

    Take analogue speech signal continuous in amplitude, continuous in time

    9

    Digital speech sampling and quantisation

    ,me

    amplitude

    9

  • 7/31/2019 Lecture 1 Speech Analysis

    16/49

    Take analogue speech signal continuous in amplitude, continuous in time

    9

    Digital speech sampling and quantisation

    Sampling - take samples of the waveform every Ts seconds

    Ts

    ,me

    amplitude

    9

  • 7/31/2019 Lecture 1 Speech Analysis

    17/49

    Take analogue speech signal continuous in amplitude, continuous in time

    9

    Digital speech sampling and quantisation

    q1

    q2

    q3

    q4

    q5

    q6

    q7

    q8

    Sampling - take samples of the waveform every Ts seconds

    Ts

    ,me

    amplitude

    9

  • 7/31/2019 Lecture 1 Speech Analysis

    18/49

    Take analogue speech signal continuous in amplitude, continuous in time

    9

    Digital speech sampling and quantisation

    q1

    q2

    q3

    q4

    q5

    q6

    q7

    q8

    Sampling - take samples of the waveform every Ts seconds

    Ts

    Quantisation allocate sample amplitudes to nearest quantisation levels

    ,me

    amplitude

    9

  • 7/31/2019 Lecture 1 Speech Analysis

    19/49

    Take analogue speech signal continuous in amplitude, continuous in time

    9

    Digital speech sampling and quantisation

    q1q2

    q3

    q4

    q5

    q6

    q7

    q8

    Sampling - take samples of the waveform every Ts seconds

    Ts

    Quantisation allocate sample amplitudes to nearest quantisation levels

    Can represent the discrete time, discrete amplitude signal as a vector, x(n)

    x = [1, 3, 7, 7, 5, 3, 3, 3, 3, 1, -3, -7, -7]

    So, x(1) = 1; x(2) = 3; .. x(11) = -3, ..

    ,me

    amplitude

    9

  • 7/31/2019 Lecture 1 Speech Analysis

    20/49

    10

    Time-domain analysis of speech

    Examine a time-domain waveform of a sentence of speech

    x-axis shows time seconds or samples y-axis shows amplitude of each sample

    What does it show?

    10

  • 7/31/2019 Lecture 1 Speech Analysis

    21/49

    11

    Time-domain analysis of speech

    What does it show?

    Duration of utterance

    Guide to energy speech or non-speech

    Maybe indication of voicing voiced or unvoiced

    Quite limited in terms of detail shown

    A tanker is a ship designed to carry large volumes of oil or other liquid cargo

    11

  • 7/31/2019 Lecture 1 Speech Analysis

    22/49

    11

    Time-domain analysis of speech

    What does it show?

    Duration of utterance

    Guide to energy speech or non-speech

    Maybe indication of voicing voiced or unvoiced

    Quite limited in terms of detail shown

    A tanker is a ship designed to carry large volumes of oil or other liquid cargo

    11

    Ti d i l i f h

  • 7/31/2019 Lecture 1 Speech Analysis

    23/49

    12

    Time-domain analysis of speech

    Now zoom in to look at a small section of the utterance

    This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified

    Still limited cannot identify actual sound (phoneme)

    12

    Ti d i l i f h

  • 7/31/2019 Lecture 1 Speech Analysis

    24/49

    12

    Time-domain analysis of speech

    Now zoom in to look at a small section of the utterance

    This shows more detail Speech/nonspeech, voiced/unvoiced more clearly identified

    Still limited cannot identify actual sound (phoneme)

    12

    F d i l i f h

  • 7/31/2019 Lecture 1 Speech Analysis

    25/49

    13

    Frequency-domain analysis of speech

    For frequency-domain analysis need to transform the time-domain signal

    into the frequency-domain Several methods exist to do this e.g. Fourier transform, filterbank

    For signal processing applications most common is to use the Fouriertransform

    Fourier transform comes in different forms:

    Fourier transform

    Discrete Fourier transform (DFT)

    Fast Fourier transform (FFT) - fft function in MATLAB

    DFT

    Time-domainsignal

    Frequency-domainsignal

    13

    F d i l i f h

  • 7/31/2019 Lecture 1 Speech Analysis

    26/49

    14

    Frequency-domain analysis of speech

    This is useful as it shows which frequencies are present in a signal and

    how much energy is present at that frequency This is important for analysing and classifying signals and generating

    signals

    14

    F d i l i f h

  • 7/31/2019 Lecture 1 Speech Analysis

    27/49

    14

    Frequency-domain analysis of speech

    This is useful as it shows which frequencies are present in a signal and

    how much energy is present at that frequency This is important for analysing and classifying signals and generating

    signals

    DFT

    Time-domain signal Frequency-domain signal

    14

    Frequency domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    28/49

    14

    Frequency-domain analysis of speech

    This is useful as it shows which frequencies are present in a signal and

    how much energy is present at that frequency This is important for analysing and classifying signals and generating

    signals

    DFT

    Time-domain signal Frequency-domain signal

    DFT

    14

    Frequency domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    29/49

    14

    Frequency-domain analysis of speech

    This is useful as it shows which frequencies are present in a signal and

    how much energy is present at that frequency This is important for analysing and classifying signals and generating

    signals

    DFT

    Time-domain signal Frequency-domain signal

    DFT

    DFT

    14

    Frequency domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    30/49

    15

    Frequency-domain analysis of speech

    15

    Frequency domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    31/49

    15

    Frequency-domain analysis of speech

    15

    Frequency domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    32/49

    15

    Frequency-domain analysis of speech

    DFT

    Magnitude spectrum

    15

    Frequency-domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    33/49

    15

    Frequency-domain analysis of speech

    DFT

    Magnitude spectrum

    Magnitude spectrumprovides much moreinformation

    Spectral envelope(phoneme sound)

    Harmonics (pitch)

    Energy

    15

    Frequency-domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    34/49

    15

    Frequency-domain analysis of speech

    DFT

    Magnitude spectrum

    Magnitude spectrumprovides much moreinformation

    Spectral envelope(phoneme sound)

    Harmonics (pitch)

    Energy

    15

    Frequency-domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    35/49

    15

    Frequency-domain analysis of speech

    DFT

    Magnitude spectrum

    Magnitude spectrumprovides much moreinformation

    Spectral envelope(phoneme sound)

    Harmonics (pitch)

    Energy

    15

    Frequency-domain analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    36/49

    16

    Frequency-domain analysis of speech

    DFT

    Magnitude spectrum

    16

    Time-frequency analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    37/49

    17

    Time frequency analysis of speech

    Time-domain analysis shows that speech signals changes substantially

    over time it is a time-varying signal Frequency-domain enables us to see the frequency composition of signals

    that is much more useful for analysis that the time-domain signal

    However, the frequency-domain analysis can only be performed on quasi-stationaryportions of the signal, which are short by nature (10-50ms)

    Solution is time-frequency analysis, orspectrogram

    17

    Time-frequency analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    38/49

    18

    Time frequency analysis of speech

    Process to create a spectrogram:

    1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum

    3. Allocate spectral amplitudes different colours

    4. Plot colours

    5. Return to 1 until end of signal

    Time

    Freq.

    18

    Time-frequency analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    39/49

    18

    Time frequency analysis of speech

    Process to create a spectrogram:

    1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum

    3. Allocate spectral amplitudes different colours

    4. Plot colours

    5. Return to 1 until end of signal

    Time

    Freq.

    18

    Time-frequency analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    40/49

    18

    Time frequency analysis of speech

    Process to create a spectrogram:

    1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum

    3. Allocate spectral amplitudes different colours

    4. Plot colours

    5. Return to 1 until end of signal

    Time

    DFT

    Freq.

    18

    Time-frequency analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    41/49

    18

    e eque cy a a ys s o speec

    Process to create a spectrogram:

    1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum

    3. Allocate spectral amplitudes different colours

    4. Plot colours

    5. Return to 1 until end of signal

    Time

    DFT

    Freq.

    18

    Time-frequency analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    42/49

    18

    q y y p

    Process to create a spectrogram:

    1. Extract short-duration window of signal2. Take DFT and obtain magnitude spectrum

    3. Allocate spectral amplitudes different colours

    4. Plot colours

    5. Return to 1 until end of signal

    Time

    DFT

    Freq.

    18

    Time-frequency analysis of speech

  • 7/31/2019 Lecture 1 Speech Analysis

    43/49

    19

    q y y p

    Time

    large volumes of oil or other liquid cargo

    19

    Time-frequency analysis of other signals

  • 7/31/2019 Lecture 1 Speech Analysis

    44/49

    20

    q y y g

    What characteristics does this signal have?

    What could have produced this sound?

    20

    Time-frequency analysis of other signals

  • 7/31/2019 Lecture 1 Speech Analysis

    45/49

    20

    q y y g

    What characteristics does this signal have?

    What could have produced this sound?

    20

    Time-frequency analysis of other signals

  • 7/31/2019 Lecture 1 Speech Analysis

    46/49

    21

    y y g

    Compare to equivalent time-domain signal

    Very hard to identify any features only duration and gradual increase inenergy

    21

    Time-frequency analysis of other signals

  • 7/31/2019 Lecture 1 Speech Analysis

    47/49

    2222

    What characteristics does this signal have?

    What could have produced this sound?

    Also information about the recording

    22

    Time-frequency analysis of other signals

  • 7/31/2019 Lecture 1 Speech Analysis

    48/49

    2222

    What characteristics does this signal have?

    What could have produced this sound?

    Also information about the recording

    22

    Summary

  • 7/31/2019 Lecture 1 Speech Analysis

    49/49

    2323

    Considered methods for analysing the characteristics and features of anaudio signal

    Observed that real signals (e.g. speech) are not stationary but can varyrapidly over time

    Time-domain can show this variation

    Frequency-domain usually provides more information but requires a quasi-stationary signal for analysis

    One solution is time-frequency representation (spectrogram)

    Will return to spectrograms when we study acoustic phonetics

    23