Download ppt - 6- Speech Quality Assessment

6-Speech Quality Assessment6-Speech Quality Assessment

Quality LevelsQuality Levels

Subjective TestsSubjective Tests

Objective TestsObjective Tests

IntelligibilityIntelligibility

NaturalnessNaturalness

Quality LevelsQuality Levels

Synthetic Quality (Under 4.8 kbps)Synthetic Quality (Under 4.8 kbps)

Communication Quality (4.8 to 13 kbps)Communication Quality (4.8 to 13 kbps)

Toll Quality (13 to 64 kbps)Toll Quality (13 to 64 kbps)

Broadcast Quality (Upper than 64 kbps)Broadcast Quality (Upper than 64 kbps)

Test TypesTest Types

IntelligibilityIntelligibility NaturalnessNaturalness

SubjectiveSubjective DRT, MRTDRT, MRT MOS, DAMMOS, DAM

ObjectiveObjective None.None.

Future ASR Future ASR systemssystems

AI, Global SNR, Seg. AI, Global SNR, Seg. SNR, FW-Seg. SNR, SNR, FW-Seg. SNR,

Itakura Measure,Itakura Measure,

WSSMWSSM

First ClassFirst ClassSubjective Intelligibility TestsSubjective Intelligibility Tests

Diagnostic Rhyme Test (DRT)Diagnostic Rhyme Test (DRT)– Selecting between two CVC by different first CSelecting between two CVC by different first C– First C should have specific propertiesFirst C should have specific properties– Ex. hop - fop And than - dan Ex. hop - fop And than - dan

Modified Rhyme Test (MRT)Modified Rhyme Test (MRT)– Selecting between CVC’s by different first CSelecting between CVC’s by different first C– Ex. Cat, bat, rat, mat, fat, satEx. Cat, bat, rat, mat, fat, sat

First Class (Cont’d)First Class (Cont’d)Subjective Intelligibility testsSubjective Intelligibility tests

DRT is very applicable and credibleDRT is very applicable and credible

In this test user can hear the speech only In this test user can hear the speech only onceonce

100%

Tests

IncorrectCorrect

N

NNDRT

Second ClassSecond ClassSubjective Naturalness testsSubjective Naturalness tests

Mean Opinion Score (MOS)Mean Opinion Score (MOS)– MOS is very applicable and credibleMOS is very applicable and credible– In this test user can hear the speech a lotIn this test user can hear the speech a lot

Diagnostic Acceptability Measure (DAM)Diagnostic Acceptability Measure (DAM)– This test is very complexThis test is very complex

Mean Opinion Score (MOS)Mean Opinion Score (MOS)

Scores for MOS are like thisScores for MOS are like this

Score Speech Quality1

2

3

4

5

Not Acceptable

Weak

Medium

Good

Excellent

Diagnostic Acceptability Diagnostic Acceptability Measure (DAM)Measure (DAM)

This test is very complexThis test is very complex

In this test there is 19 different In this test there is 19 different parameters for score. These parameters for score. These parameters divide into 3 main groups:parameters divide into 3 main groups:– Signal QualitySignal Quality– Background QualityBackground Quality– Total QualityTotal Quality

Objective TestsObjective Tests

These tests can not be used for These tests can not be used for intelligibility. Because system couldn’t intelligibility. Because system couldn’t recognize speech intelligibilityrecognize speech intelligibility

Objective tests can only be used for Objective tests can only be used for speech Naturalnessspeech Naturalness

Objective Tests (Cont’d)Objective Tests (Cont’d)

Articulation Index (AI)Articulation Index (AI)

Signal to Noise Ratio (SNR)Signal to Noise Ratio (SNR)– Global (Classic) SNRGlobal (Classic) SNR– Segmental SNRSegmental SNR– Frequency Weighted Segmental SNRFrequency Weighted Segmental SNR

Articulation Index (AI)Articulation Index (AI)

AI assumes that different frequency bands AI assumes that different frequency bands distortion are independent, and measure distortion are independent, and measure signal quality in different bands.signal quality in different bands.

In each band determines percentage of In each band determines percentage of perceptible signal by listenerperceptible signal by listener

. . . . . . . . . 20 BandsHZ

200 6100

Articulation index (Cont’d)Articulation index (Cont’d)

Perceptible by user signal :Perceptible by user signal :– 1- Upper than human hearing threshold1- Upper than human hearing threshold– 2- Under than human pain threshold2- Under than human pain threshold– 3- Upper than Masking Noise level3- Upper than Masking Noise level

– In each case one of the states 1 or 3 is In each case one of the states 1 or 3 is prevail prevail

Articulation index (Cont’d)Articulation index (Cont’d)

In AI SNR measured isolated in each In AI SNR measured isolated in each bandband

20

1 30

)30,(

20

1

j

SNRMinAI

Signal To Noise Ratio(SNR)Signal To Noise Ratio(SNR)

)()()( ˆ nnn ss

n

nnn

n ssE 2)()(

2)( ]ˆ[

n

ns sE 2)(

nnn

nn

sglobal

ss

s

E

ESNR

2)()(

2)(

)(

]ˆ[log10log10

Segmental SNRSegmental SNR

1

0

1

2)()(

1

2)(

)( ]

]ˆ[

[log101 M

jm

Nmnnn

m

Nmnn

segj

j

j

j

ss

s

MSNR

j’th Frame SNR

M : Number of frames

Frequency Weighted Frequency Weighted Segmental SNRSegmental SNR

1

0

1,

1,,,

)( ]])()([

log[101 M

jK

kkj

K

kjkjkskj

segfw

W

mEmEW

MSNR

K : Number of frequency bands

M : Number of frames

Itakura MeasureItakura Measure

)(H

)(S

)(H Is the envelope spectrum

2|)(|)()}({)( XSRFS

Use from All-Pole (AR) Model

Itakura Measure (Cont’d)Itakura Measure (Cont’d)

p

i

jiea

H

1

1

1)(

This is based on the spectrum difference between main signal and assessment signal

ia

iRiK

Autoregressive Coefficients

Reflection Coefficients

Autocorrelation Coefficients


M

lssss mlgmlg

Mmgmgd

1

2ˆˆ )],(),([

1))(),((

m :Index of frame

l : Index of coefficients


1

1',,

1ˆ',,

ˆ

])]',(),([

[

))'(),((~

M

lmml

M

lssmml

sslp

W

mlmlW

mmd

),( mls Is the l’th parameter of the frame that conduces m’th sample

Weighted Spectral Slope MeasureWeighted Spectral Slope Measure(WSSM)(WSSM)

|),(||),1(||),(| mksmksmks |),(ˆ||),1(ˆ||),(ˆ| mksmksmks

236

1, ]|),(ˆ||),(|[

|)),(ˆ||,),((|

k

mk

WSSM

mksmksWK

msmsd

),( mks Is STFT of k’th band of the frame that conduces m’th sample

dB.in are|),(||),1(| mksandmks