6-Speech Quality Assessment6-Speech Quality Assessment
Quality LevelsQuality Levels
Subjective TestsSubjective Tests
Objective TestsObjective Tests
IntelligibilityIntelligibility
NaturalnessNaturalness
Quality LevelsQuality Levels
Synthetic Quality (Under 4.8 kbps)Synthetic Quality (Under 4.8 kbps)
Communication Quality (4.8 to 13 kbps)Communication Quality (4.8 to 13 kbps)
Toll Quality (13 to 64 kbps)Toll Quality (13 to 64 kbps)
Broadcast Quality (Upper than 64 kbps)Broadcast Quality (Upper than 64 kbps)
Test TypesTest Types
IntelligibilityIntelligibility NaturalnessNaturalness
SubjectiveSubjective DRT, MRTDRT, MRT MOS, DAMMOS, DAM
ObjectiveObjective None.None.
Future ASR Future ASR systemssystems
AI, Global SNR, Seg. AI, Global SNR, Seg. SNR, FW-Seg. SNR, SNR, FW-Seg. SNR,
Itakura Measure,Itakura Measure,
WSSMWSSM
First ClassFirst ClassSubjective Intelligibility TestsSubjective Intelligibility Tests
Diagnostic Rhyme Test (DRT)Diagnostic Rhyme Test (DRT)– Selecting between two CVC by different first CSelecting between two CVC by different first C– First C should have specific propertiesFirst C should have specific properties– Ex. hop - fop And than - dan Ex. hop - fop And than - dan
Modified Rhyme Test (MRT)Modified Rhyme Test (MRT)– Selecting between CVC’s by different first CSelecting between CVC’s by different first C– Ex. Cat, bat, rat, mat, fat, satEx. Cat, bat, rat, mat, fat, sat
First Class (Cont’d)First Class (Cont’d)Subjective Intelligibility testsSubjective Intelligibility tests
DRT is very applicable and credibleDRT is very applicable and credible
In this test user can hear the speech only In this test user can hear the speech only onceonce
100%
Tests
IncorrectCorrect
N
NNDRT
Second ClassSecond ClassSubjective Naturalness testsSubjective Naturalness tests
Mean Opinion Score (MOS)Mean Opinion Score (MOS)– MOS is very applicable and credibleMOS is very applicable and credible– In this test user can hear the speech a lotIn this test user can hear the speech a lot
Diagnostic Acceptability Measure (DAM)Diagnostic Acceptability Measure (DAM)– This test is very complexThis test is very complex
Mean Opinion Score (MOS)Mean Opinion Score (MOS)
Scores for MOS are like thisScores for MOS are like this
Score Speech Quality1
2
3
4
5
Not Acceptable
Weak
Medium
Good
Excellent
Diagnostic Acceptability Diagnostic Acceptability Measure (DAM)Measure (DAM)
This test is very complexThis test is very complex
In this test there is 19 different In this test there is 19 different parameters for score. These parameters for score. These parameters divide into 3 main groups:parameters divide into 3 main groups:– Signal QualitySignal Quality– Background QualityBackground Quality– Total QualityTotal Quality
Objective TestsObjective Tests
These tests can not be used for These tests can not be used for intelligibility. Because system couldn’t intelligibility. Because system couldn’t recognize speech intelligibilityrecognize speech intelligibility
Objective tests can only be used for Objective tests can only be used for speech Naturalnessspeech Naturalness
Objective Tests (Cont’d)Objective Tests (Cont’d)
Articulation Index (AI)Articulation Index (AI)
Signal to Noise Ratio (SNR)Signal to Noise Ratio (SNR)– Global (Classic) SNRGlobal (Classic) SNR– Segmental SNRSegmental SNR– Frequency Weighted Segmental SNRFrequency Weighted Segmental SNR
Articulation Index (AI)Articulation Index (AI)
AI assumes that different frequency bands AI assumes that different frequency bands distortion are independent, and measure distortion are independent, and measure signal quality in different bands.signal quality in different bands.
In each band determines percentage of In each band determines percentage of perceptible signal by listenerperceptible signal by listener
. . . . . . . . . 20 BandsHZ
200 6100
Articulation index (Cont’d)Articulation index (Cont’d)
Perceptible by user signal :Perceptible by user signal :– 1- Upper than human hearing threshold1- Upper than human hearing threshold– 2- Under than human pain threshold2- Under than human pain threshold– 3- Upper than Masking Noise level3- Upper than Masking Noise level
– In each case one of the states 1 or 3 is In each case one of the states 1 or 3 is prevail prevail
Articulation index (Cont’d)Articulation index (Cont’d)
In AI SNR measured isolated in each In AI SNR measured isolated in each bandband
20
1 30
)30,(
20
1
j
SNRMinAI
Signal To Noise Ratio(SNR)Signal To Noise Ratio(SNR)
)()()( ˆ nnn ss
n
nnn
n ssE 2)()(
2)( ]ˆ[
n
ns sE 2)(
nnn
nn
sglobal
ss
s
E
ESNR
2)()(
2)(
)(
]ˆ[log10log10
Segmental SNRSegmental SNR
1
0
1
2)()(
1
2)(
)( ]
]ˆ[
[log101 M
jm
Nmnnn
m
Nmnn
segj
j
j
j
ss
s
MSNR
j’th Frame SNR
M : Number of frames
Frequency Weighted Frequency Weighted Segmental SNRSegmental SNR
1
0
1,
1,,,
)( ]])()([
log[101 M
jK
kkj
K
kjkjkskj
segfw
W
mEmEW
MSNR
K : Number of frequency bands
M : Number of frames
Itakura MeasureItakura Measure
)(H
)(S
)(H Is the envelope spectrum
2|)(|)()}({)( XSRFS
Use from All-Pole (AR) Model
Itakura Measure (Cont’d)Itakura Measure (Cont’d)
p
i
jiea
H
1
1
1)(
This is based on the spectrum difference between main signal and assessment signal
ia
iRiK
Autoregressive Coefficients
Reflection Coefficients
Autocorrelation Coefficients
Itakura Measure (Cont’d)Itakura Measure (Cont’d)
M
lssss mlgmlg
Mmgmgd
1
2ˆˆ )],(),([
1))(),((
m :Index of frame
l : Index of coefficients
Itakura Measure (Cont’d)Itakura Measure (Cont’d)
1
1',,
1ˆ',,
ˆ
])]',(),([
[
))'(),((~
M
lmml
M
lssmml
sslp
W
mlmlW
mmd
),( mls Is the l’th parameter of the frame that conduces m’th sample
Weighted Spectral Slope MeasureWeighted Spectral Slope Measure(WSSM)(WSSM)
|),(||),1(||),(| mksmksmks |),(ˆ||),1(ˆ||),(ˆ| mksmksmks
236
1, ]|),(ˆ||),(|[
|)),(ˆ||,),((|
k
mk
WSSM
mksmksWK
msmsd
),( mks Is STFT of k’th band of the frame that conduces m’th sample
dB.in are|),(||),1(| mksandmks