30
2nd Workshop on Wideband Spee ch Quality - June 2005 1 Perceptual Wideband Audio Quality Assessments Using PEAQ Christian Schmidmer Opticom GmbH, Erlangen [email protected]

2nd Workshop on Wideband Speech Quality - June 2005 1 Perceptual Wideband Audio Quality Assessments Using PEAQ Christian Schmidmer Opticom GmbH, Erlangen

Embed Size (px)

Citation preview

2nd Workshop on Wideband Speech Quality - June 2005

1

Perceptual Wideband Audio Quality Assessments Using PEAQ

Christian SchmidmerOpticom GmbH, Erlangen

[email protected]

2nd Workshop on Wideband Speech Quality - June 2005

2

Contents

Quality, definitions User expectation Subjective tests Psychoacoustics PEAQ PESQ vs. PEAQ

2nd Workshop on Wideband Speech Quality - June 2005

4

What is “Quality”?

“Quality is the difference between what we perceive and what we expect.”

From habilitation thesis of Prof. Ute Jekosch

“…they are used to phones that sound like a phone.”Frank Meier, Infineon

Maybe more important: …is for free.

2nd Workshop on Wideband Speech Quality - June 2005

5

Differences in Perception ofVoice and Audio

Experience, a priori knowledge Expectation Cognitive effects “Error correction” Different subjective tests require different models

2nd Workshop on Wideband Speech Quality - June 2005

6

The Problem of Subjective Scales

Bitrate MOS

256kBit/s 5

128kBit/s 4

64kBit/s 1

Bitrate MOS

128Bit/s 5

64kBit/s 4

16kBit/s 1

MP3 @ High Quality: MP3 @ Intermediate Quality:

The range of qualities in the subjective test defines the subjective scale!

2nd Workshop on Wideband Speech Quality - June 2005

7

MOS acc. To P.800

Standardized Listening Test Procedure acc. to ITU-T P.800ff

Absolute Category Rating Test (ACR), no comparison to reference signal (original)

„How good does it sound?“

5-point grading scale ‚opinion scale‘

Averaging over test Subjects: MOS‚Mean Opinion Score‘

Language dependent!ExcellentGoodFairPoorBad

54321

Impairment Grade

2nd Workshop on Wideband Speech Quality - June 2005

8

• Standardised assessment procedure for 'small impairments' in audio systems (ITU-R 1994)

• Comparison between reference and test signal

• Very sensitive to subtle distortions• double-blind triple-stimulus with

hidden reference

Subjective Assessment in ITU-R BS.1116

Original

A B

original / coded

coded / original

2nd Workshop on Wideband Speech Quality - June 2005

9

• Continuous grading scale with “anchors”

• “Subjective Difference Grade“ (SDG)• Question: „How different do the files

sound“

Impairment GradeImperceptible 5.0Perceptible, but not annoying 4.0Slightly annoying 3.0Annoying 2.0Very annoying 1.0The ITU-R five-grade impairment scale

Subjective Assessment in ITU-R BS.1116

2nd Workshop on Wideband Speech Quality - June 2005

10

Subjective Testing of Intermediate Audio Quality (IAQ)

“MUSHRA” Multi Stimulus Test with Hidden Reference and Anchors

developed by EBU working group B/AIM targets at IAQ ITU-R BS.1534

2nd Workshop on Wideband Speech Quality - June 2005

11

MUSHRA Test

Training of Subjects

• subjects can randomly access all types of codecs at similar bitrate

• comparison with CD quality reference

• two low-pass 'anchors' (7kHz, 3.5kHz) incl.

2nd Workshop on Wideband Speech Quality - June 2005

12

MUSHRA Test

Scoring Phase

• comparison with CD reference, hidden reference inc..

• two low-pass 'anchors' (7kHz, 3.5kHz) inc..

• subjects can randomly assess all codecs under test of similar bitrate at the same time• subjects adjust slider, no score involved

• slider mapped to 0..100

2nd Workshop on Wideband Speech Quality - June 2005

13

Comparison of Subjective Test Methods

P.800 BS.1116 BS.1534

Reference Not included Hidden and known Hidden and known

Impairments Large..very large Small Large

Main Application Speech quality Audio quality Intermediate audio quality

Subjects Inexperienced Expert listeners Expert listeners

Reliability Good Excellent Good

Comment Not applicable to music, influenced by a priori knowledge and expectation

Prb. with low quality Selection of anchors very critical

2nd Workshop on Wideband Speech Quality - June 2005

14

2nd Workshop on Wideband Speech Quality - June 2005

15

Temporal Masking

0 050 100 150 50 100 200150-50

t [ms]

0

20

40

60

SL[dB]

Pre- Simultaneous- Postmasking

•Premasking: 2-5ms

•Postmasking: 120ms

•Depending on the signal characteristics of the masker

Masker

2nd Workshop on Wideband Speech Quality - June 2005

16

Pitch Scale / Critical Bands

Bark Scale

0

5

10

15

20

25

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

frequency / Hz

cri

tic

al

ba

nd

A sine tone and a noise of critical bandwidth with the same center frequency and energy density are perceived equally loud.

2nd Workshop on Wideband Speech Quality - June 2005

17

0,02 0,05 0,1 0,2 0,5 1 2 5 10 20

0

20

40

60

80

dB

kHz

fT

LT

Threshold in Quiet - Masked Threshold

Threshold in Quiet

2nd Workshop on Wideband Speech Quality - June 2005

18

PEAQ is based on:– PAQM KPN Research, Netherlands /

OPTICOM– NMR Fraunhofer, Germany /

OPTICOM– DIX TU Berlin / Deutsche Telekom

Berkom– POM CCETT, France

– PERCEVAL CRC, Canada

– "Tool box" IRT, Germany

ITU-R TG 10/4: Call for proposals (1995)

Jan. 1999 released as ITU-R Rec. BS.1387

PEAQ

2nd Workshop on Wideband Speech Quality - June 2005

19

Intrusive Testing

Network XA Network YB

Comparison with known stimulus:+ Very high accuracy+ Black box approach – no knowledge of DUT- Requires a reference signal- Generates traffic

Alternatively both signals may be captured by the test system!

2nd Workshop on Wideband Speech Quality - June 2005

20

Two Versions of PEAQ:

PEAQ „Basic“ computational efficiency realtime performance

PEAQ „Advanced“ highest possible accuracy

2nd Workshop on Wideband Speech Quality - June 2005

21

Structure of a perceptual measurement tool

Reference(=sent file)

Feature-Extractor

PerceptualModel

Test(=received file)

CognitiveModel

MOS(Quality

Measure)

PerceptualModel

a b

a b

2nd Workshop on Wideband Speech Quality - June 2005

22

Excitation

Listening Level(dB SPL)Input Signal

1

FFT & Scaling•2048 Punkte•42.6ms/23.4Hz

Outer andMiddle EarWeighting

Grouping intoCritical Bands

•¼ Bark

“Pitch”

Internal Noise

SpreadingTemporal Masking

•Forward masking

2

+

fs=48kHz(fs=44.1kHz)

a

b

Perceptual Model, PEAQ “Basic”

2nd Workshop on Wideband Speech Quality - June 2005

23

Model Output Variable (MOV) Interpretation

WinModDiff1B

AvgModDiff1B Changes in modulation (related to roughness)

AvgModDiff2B

RmsNoiseLoudB Loudness of the distortion

BandwidthRefB Linear distortions (frequency response etc.)

BandwidthTestB

RelDistFramesB Frequency of audible distortions

Total NMRB Noise-to-mask ratio

MFPDB Detection probability

ADBB

EHSB Harmonic structure of the error

Table 0.1: MOVs used by the PEAQ "Basic" version, and their interpretation

MOVs used in PEAQ “Basic” Version

2nd Workshop on Wideband Speech Quality - June 2005

24

Filterbank•40 auditory bands•Subsampling 1:32

1

ScalingOuter andMiddle EarFiltering

Spreading andBackward Masking

Excitation

Subsampling•1:6

Forward Masking

Temporal Resolution:

0.66ms 4ms

+

“Pitch”

Internal Noise

Listening Level(dB SPL)Input Signal

fs=48kHz(fs=44.1kHz)

Perceptual Model, PEAQ “Advanced”

2nd Workshop on Wideband Speech Quality - June 2005

25

2nd Workshop on Wideband Speech Quality - June 2005

26

PEAQ vs. MUSHRA

• Microsoft Windows Media 4

• MPEG-4 AAC (Fraunhofer)

• MP3 (Fraunhofer)

• Quicktime 4, Music-Codec 2 (Qdesign)

• Real Audio 5.0

• RealAudio G2

• MPEG-4 TwinVQ (Yahama)

• EBU Tests of Internet Audio Codecs

2nd Workshop on Wideband Speech Quality - June 2005

27

Constraints of MUSHRA Testing

• no absolute scores:-> scores depend on the test condition

• low-pass anchors are only one quality dimension-> disturbance of artefacts is another one

• spreading of the scale from best to worst-> what about adding new items to an existing test?

In order to verify PEAQ performance we must adjust the best and worst item (not the anchors!)

2nd Workshop on Wideband Speech Quality - June 2005

28

PEAQ vs. MUSHRA (EBU Test)

Subj.Bitrate # DI(BV) Codec #

48 1 -0,77 7kHz 148 2 -1,25 AAC 248 3 -1,28 MP3 348 4 -1,77 MS 448 5 -2,36 3.5kHz 5

64 1 -0,32 AAC 164 2 -0,59 MS 264 3 -0,77 7kHz 364 4 -1,28 MP3 464 5 -2,36 3.5kHz 5

Objective Ranking

2nd Workshop on Wideband Speech Quality - June 2005

32

Results

48 kbps Stereo - DR

0

20

40

60

80

100

1 2 3 4 5 6 7 8

Codec No.

Sco

re Subjective

Objective

2nd Workshop on Wideband Speech Quality - June 2005

35

Final Question:

Can I use PESQ instead of PEAQ?

Perception of voice differs from perception of music

PESQ time alignment fails on music

PEAQ and PESQ are modelling different subjective tests

No!

2nd Workshop on Wideband Speech Quality - June 2005

36

www.opticom.de

OPTICOM Germany

More Information:

[email protected]

Thank you!