37
Speech & Audio Processing Speech & Audio Coding Examples

Speech & Audio Processing Speech & Audio Coding Examples

Embed Size (px)

Citation preview

Page 1: Speech & Audio Processing Speech & Audio Coding Examples

Speech & Audio Processing

Speech & Audio Coding Examples

Page 2: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 2

A Simple Speech Coder

LPC Based Analysis Structure

Pre-emphasis

WindowingAnalysis

Auto-Correlation

Levinson-Durbin

Linear Prediction Analysis

AudioInput

AnalysisFilter

Residual

Filter Coeffs

Residual

Filter CoeffsQ

uanti

zati

on

Page 3: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 3

Windowing Analysis Stage

N – Length of the Analysis Window

10-30 msec

Page 4: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 4

Some Analysis Windows

Page 5: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 5

MATLAB Useful Functions

wintool Use “doc wintool” for more information

window Use “>doc window” for the list of supported windows

Define your own window if needed e.g: Sine window and Vorbis window

windowvorbis

5.0sin

2sin

windowsine5.0

sin

2

N

nnw

N

nnw

Page 6: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 6

LPC Analysis Stage

LPC Method Described in: Ch5-Analysis_&_Synthesis_of_Pole-

Zero_Speech_Models.ppt

Summary: Perform Autocorrelation Solve system of equations with Durbin-

Levinson Method

MATLAB help doc lpc, etc.

Page 7: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 7

Example of MATLAB Codefunction myLPCCodec(wavfile, N)%% wavfile - input MS wav file % N - LPC Filter Order%[x, fs, nbits] = wavread(wavfile);% plot(x);% Playing Original Signalsoundsc(x,fs);% Performing LPC analysis using MATLAB lpc function[a, g] = lpc(x,N);% performing filtering operation on estimated filter coeffs% producing predicted samplesest_x = filter([0 -a(2:end)], 1, x);% error signale = x - est_x;% Testing the quality of predicted samplessoundsc(est_x, fs); % Synthesis Stage With Zero Loss of Informationsyn_x = filter([0 -a(2:end)], 1, g.*e);soundsc(syn_x,fs);

zAzH1

ge[n] ŝ[n]

p

kk ngeknsns

1

ˆˆ

Page 8: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 8

Analysis of Quantization Errors Use MATLAB functions to research the effects of

quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation).

Useful MATLAB functions: Fix, floor, round, ceil Example:

sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits.

Page 9: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 9

Quantization of Error Signal & Filter Coefficients

Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form

are found to be sensitive to quantization errors: Small quantization error can have a large effect

on filter characteristics. Issue is that polynomial coefficients have non-

linear mapping to poles of the filter (e.g., roots of the polynomial).

Alternate representations possible that have significantly better tolerance to quantization error.

Page 10: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 10

LPC Filter Representations

As noted previously when Levinson-Durbin algorithm was introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients:

LPC to PARCOR:

111

21 11

1

11

1

iii

i

iji

ii

iji

j

jpj

ak

ijk

aaaa

,,p,pifor

pja

Page 11: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 11

PARCOR Filter Representation

PARCOR to LPC:

pja

ijakaa

ka

,pifor

pjj

ijii

ij

ij

iii

1

11

,2,1

11

Page 12: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 12

Line Spectral Frequency Representation

It turns out that PARCOR coefficients can be represented with LSF that have significantly better properties.

Note that:

The PARCOR lattice structure of the LPC synthesis filter above:

zAzH1

z-1z-1

kp-

+

z-1z-1

kp-1

+

-z-1z-1 k 0

=-1

Input OutputA0Ap-1Ap

B0Bp-1Bp

k p+

1=

∓ 1

Page 13: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 13

Line Spectral Frequency Representation

From previous slide the following holds:

From this realization of the filter the LSP representation is derived:

& & 1

11

100

111

11

zAzzB

zzBzA

zAkzBzzB

zBkzAzA

pp

p

pppp

pppp

Page 14: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 14

LSF Representation

zQzPzA

zBzAzQk

zBzAzPk

ppp

pppp

pppp

11

11

11

2

1

1

1

Page 15: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 15

LPC Synthesis Filter with LSF

1121

1

1

11

11

11

zQzP

zAzAzH

pp

Page 16: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 16

A Simple Speech Coder

LPC Based Synthesis Structure

ResidualSynthesis

FilterAudioOutput

Filter Coeffs

De-emphasis

Deco

din

g

ResidualSignal

FilterCoeffs

Page 17: Speech & Audio Processing Speech & Audio Coding Examples

Audio Coding

Page 18: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 18

Audio Coding

Most of the Audio Coding Standards use principles of Psychoacoustics.

Example of Basic Structure of MP3 encoder:

Filterbank &Transform

Filterbank &Transform

QuantizationQuantization

PsychoacousticModel

PsychoacousticModel

AudioInput Bit-stream

Page 19: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 19

Basic Structure of Audio Coders

Filterbank Processing Psychoacoustic Model Quantization

Page 20: Speech & Audio Processing Speech & Audio Coding Examples

Filter Bank Analysis Synthesis

Page 21: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 21

Filterbank Processing:

Splitting full-band signal into several sub-bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear

transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations

2

7500arctan*5.3*00076.0arctan*13

7001ln*01048.1127

ffBark

fMel

Page 22: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 22

Mel-Scale

Page 23: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 23

Bark-Scale

Page 24: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 24

Analysis Structure of Filterbank

hk[n]hk[n]

AudioInput

hN[n]hN[n]

h1[n]h1[n]

↓↓

↓↓

↓↓

MDCTMDCT

MDCTMDCT

MDCTMDCT

hk[n] – Impulse Response of a Quadrature Mirror kth-filter

N – Number of Channels. Typically 32

↓ - Down-sampling

MDCT – Modified Discrete Cosine Transform

MDCTMDCT

MDCTMDCT

MDCTMDCT

Quanti

zati

on

Bit Stream

Page 25: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 25

MDCTMDCT

MDCTMDCT

MDCTMDCT

Analysis Structure of Filterbank

IMDCTIMDCT AudioOutput

IMDCTIMDCT

IMDCTIMDCT

↑↑

↑↑

↑↑

gk[n]gk[n]

gN[n]gN[n]

g1[n]g1[n]

gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter

N – Number of Channels. Typically 32

↑ - Up-sampling

IMDCT – Inverse Modified Discrete Cosine Transform

Deco

din

g

Bit Stream

Page 26: Speech & Audio Processing Speech & Audio Coding Examples

Psycho-Acoustic Modeling

Page 27: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 27

Psychoacoustic Model

Masking Threshold according to the human auditory perception. Masking threshold is used to quantize

the Discrete Cosine Transform Coefficients

Analysis is done in frequency domain represented by DFT and computed by FFT.

Page 28: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 28

Threshold of Hearing

Absolute threshold of audibly perceptible events in quiet conditions (no other sounds).

Any signal bellow the threshold can be removed without effect on the perception.

Page 29: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 29

Threshold of Hearing

Page 30: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 30

Frequency Masking

Schröder Spreading Function Bark Scale Function:

2

12

10

ker

2

474.015.17474.05.781.15log*10

7500arctan*5.3*00076.0arctan*13

zzzF

fzfzz

fffz

masmaskee

Page 31: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 31

Masking Curve

Page 32: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 32

Primary Tone 1kHz

Page 33: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 33

Masked Tone 900 Hz

Page 34: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 34

Combined Sound 1kHz + 0.9kHz

Page 35: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 35

Combined 1kHz + 0.9kHz (-10dB)

Page 36: Speech & Audio Processing Speech & Audio Coding Examples

April 18, 2023 Veton Këpuska 36

Combined 1kHz + 5kHz (-10dB)

Page 37: Speech & Audio Processing Speech & Audio Coding Examples

END

April 18, 2023 Veton Këpuska 37