55
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 1 State of the Art in Perceptual Coding: MPEG-2/4 Advanced Audio Coding (AAC)

State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

  • Upload
    others

  • View
    4

  • Download
    1

Embed Size (px)

Citation preview

Page 1: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 1

State of the Art in Perceptual Coding: MPEG-2/4 Advanced Audio Coding (AAC)

Page 2: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 2

History•

1994: Official start of AAC development

Goal: Development of a new powerful state-of-the-art multi-channel coder without compatibility constraints

1997: AAC International standard (IS)

1999: AAC part of the MPEG-4 standard

Today: favorite coder for many application areas like Internet audio, solid state players, ISDN music transmission, High definition TV (HDTV), satellite and terrestrial digital audio, broadcasting

Page 3: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 3

Overview (1)•

Next generation mono/stereo/multichannel coding

Same quality at half the bit-rate

International cooperation of the Fraunhofer Institute and companies like AT&T, Sony and Dolby

Most efficient MPEG method for audio data compression up until now

Driving force to develop AAC was the quest for an efficient coding method for surround signals, like 5-channel signals (cinemas)

Page 4: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 4

Overview (2)•

Makes use of the signal masking properties of the human ear in order to reduce the amount of data

Quantization noise is distributed to frequency bands in such a way that it is masked by the total signal

Iterative encoder structure using Huffman coding and non-uniform quantization

Features found in Layer 3 and PAC

Window type and block switching–

Features found in AC-3, Layer 3, PAC, + new

Temporal Noise Shaping (TNS)–

New technique

Page 5: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 5

Overview (3)•

Prediction

Bit reservoir

M/S stereo coding

Intensity stereo coding

Gain control

Page 6: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 6

AAC-

Encoder Overview

Page 7: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 7

MPEG-AAC: Basic Features•

High frequency resolution filter bank-based coder (1024 subband

MDCT with 50% overlap)

1:8 block switching (1024/128 subband

MDCT)

Non-uniform quantizer

Noise shaping in half critical bands (scalefactorbands)

Huffman coding of scalefactors

and spectral coefficients

Page 8: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 8

MPEG-2 AAC: Advanced Coding Tools

Window shape adaptation usually fixed (sine or KBD)

Temporal noise shaping (TNS) often used

Gain control (SRS/ Sample Rate Scalable profile, only), not often used

Backward adaptive prediction not often used

Page 9: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 9

Frequency response of Sine and Fielder windowFielder (KBD) window

Sine window

Page 10: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 10

MPEG-2 AAC: Joint Stereo Tools

Mid/Side stereo (MS) per scalefactor

band

Intensity stereo coding between channel pairs

Coupling channel(s)

Other Features:

Flexible bitstream

format for up to 48 channels

Low Frequency Enhancement (LFE) channel(s)

Page 11: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 11

MPEG-2 AAC PerformanceTest Results:•

Broadcast quality at 320 kbit/s for 5 channels (better than MPEG-2 Layer II at 640 kbit/s)

Broadcast quality at 128 kbit/s stereo•

Comparison to other codecs: AAC 96 kbit/s stereo comparable to

AC-3 at 160 kbit/s–

Layer II at 192 kbit/s–

Layer III at 128 kbit/s Hence: AAC is successful in providing higher

compression ratios)•

Very low bitrates (comparison within MPEG): AAC best audio coder at bitrates down to 16 kbit/s for mono and stereo

Page 12: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 12

Differences MPEG-2 AAC and MPEG Audio Layer-3

Filter bank

ISO/MPEG Audio Layer-3 uses hybrid filter bank chosen for reasons of compatibility

MPEG-2 AAC uses a plain Modified Discrete Cosine Transform (MDCT) to reduce aliasing

Together with the increased number of subbands

(1024 instead of 576 samples) the

MDCT outperforms the filter banks of previous coding methods

Page 13: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 13

Differences MPEG-2 AAC and MPEG Audio Layer-3 •

Temporal Noise Shaping TNS•

Shapes the distribution of quantization noise intime by prediction in the frequency domain

Voice signals in particular experience considerable improvement through TNS

Prediction

(in band in time domain)

A technique commonly established in the area of speech coding systems

It benefits from the fact that stationary audio signals are predictable to a certain extend

But requires higher computational complexity

Page 14: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 14

Differences MPEG-2 AAC and MPEG Audio Layer-3 •

Quantization•

By allowing finer control of quantization resolution, the given bit rate can be used more efficiently

Bit-stream format•

The information to be transmitted undergoes entropy coding in order to keep redundancy as low as possible

The optimization of these coding methods together with a flexible bit-stream structure has made further improvement of the coding efficiency possible

Page 15: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 15

Filter Bank Details•

MDCT

(Princen

/ Bradley)

TDAC, MLT, cosine modulated filter bank–

critical sampling

time domain aliasing cancellation•

Block switching

to adjust the impulse response

Window type switching–

sine window

Kaiser Bessel Derived (KBD) window

Page 16: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 16

MPEG-4 General Audio Coding

Page 17: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 17

MPEG-4 Audio•

Interactivity

High compression•

Universal accessibility

MPEG-4 addresses applications in the shaded area.

‘TV/film’

‘Telecom’‘Computer’

interactivityWireless

AV-data

wide range of bit rates, more channels possible

Page 18: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 18

A short view into MPEG-4 Audio (1)•

Very diverse requirements: no single algorithm:

Music synthesis (Structured Audio) = kind of an extension of midi

Very low rate parametric coding (HILN, HVXC)–

Speech coding (CELP)

Perceptual Coding ("General Audio") over a wide range of bitrates

High quality coding done via AAC with additional coding tools:

TvinVQ, scalability tools–

Perceptual Noise Substitution (PNS)

Backwards compatibility, no new coding paradigm for high quality audio

Page 19: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 19

A short view into MPEG-4 Audio (2)

MPEG-4 General Audio Coding: The “all-round coder”

in MPEG-4 audio

MPEG-4 Extensions:–

Perceptual Noise Substitution (PNS)

Long Term Prediction–

TwinVQ

Coding Core ( important for low

bitrates)

Page 20: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 20

MPEG-4 Audio Algorithms DSL

text to speech

Page 21: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 21

MPEG-4 Audio Profiles•

Speech Audio Profile

Parametric speech coder (HVXC)–

CELP speech coder

• Synthesis Audio Profile–

generate speech and sound

Scalable Audio Profile–

contains the speech audio profile

General audio coding (AAC)–

TwinVQ

tools

Scalable coding of speech and music•

Main Audio Profile

contains all other profiles

Page 22: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 22

Temporal Noise

Shaping: TNS

Page 23: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 23

Temporal Noise Shaping (1)

t

t

t

t

original

TNS

e.g. castanet

e.g. speech

quantization noise

1 frame 1 frame

Page 24: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 24

Why use TNS instead of block switching?

Low number of subbands

leads to higher bit rate. Ok if it only happens occasionally. Therefore

buffer can be used. Problem if there are many peaks, as in speech the glottal pulses (every few ms!). Bit rate would become too high, or the quantization noise too high.

Alternative approach is needed TNS

But: TNS is not really a replacement for block switching

Page 25: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 25

Temporal Noise Shaping (2)

Solution for avoiding quantization noise spread:–

Make smaller frames (works for attacks but notfor speech decrease of coding efficiency)

Higher time resolution to shape quantizationnoise

TNS

Limitation of TNS:–

Time domain aliasing

Page 26: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 26

Speech Coding as Model for TNS

In frequency domain the prediction error is flat

TNS predicts in frequency domain instead of time domain, shapes noise in time domain.

encoder

decoder

compute LPC coefficientsand transmit them to the decoder

should beoriginal spectrum

predictionerror

linear FIRfilter

predictor only knows past

samples

should be flat spectrumfrequency response of structure inverse of signal

structure should havefrequency response like signal

quantization noise is shaped accordingly in frequency domain

Page 27: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 27

TNS•

Switch roles of time and frequency domain:

predict not over time, but over frequencies, over the subbands

quantization error is shaped (after decoding)in the time domain (instead the frequency domain) like the signal

hopefully reduces pre-echo artifacts

But: aliasing in time domain limits effectiveness (peaks are mirrored over time)

Page 28: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 28

MDCT1024 bands

z-1 pred

Q

TNS

Audio

already in AAC subbands

sequence of subbands

coeffside-info

predicts from one subband

to the next, starting at the lowest subband, from subband

0 it predicts 1,…

Structure of TNS (encoder)

Page 29: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 29

TNS Decoder

pred z-1

SynthMDCT1024 bands

Audio

subbands

Page 30: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 30

Extension: Perceptual Noise Substitution (PNS)

Page 31: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 31

Perceptual Noise Substitution (1)Background: •

Parametric coding of signals gives a very compact signal representation

Parametric coding of noise-like signal components has been used widely e.g. in speech coding

Can similar techniques be used in perceptual audio coding ?

MPEG-4:•

Perceptual Noise Substitution (PNS) permits a frequency selective parametric coding of noise-like signal components

Page 32: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 32

Perceptual Noise Substitution (2)

Page 33: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 33

Perceptual Noise Substitution (3)Principle:•

Noise-like signal components are detected on a scalefactor

band basis

Corresponding groups of spectral coefficients are excluded from quantization/coding

Instead, only a "noise substitution flag" plus totalpower of the substituted band is transmitted in the bitstream

Decoder inserts pseudo random vectors with desired target power as spectral coefficientsHighly compact representation for noise-like spectral components

Page 34: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 34

Extension: Long Term Prediction

Page 35: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 35

Prediction (1)

Background:•

Tone-like signals require much higher coding precision than noise-like signals (e.g. 20 dB vs. 6 dB SNR)High bit rate necessary to code signals with many tonal components (e.g. Harpsichord, Pitch Pipe)

Stationary tonal signal components are predictable, even in the downsampled

subbands

Further quality enhancement by predictive coding

Page 36: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 36

Prediction (2)MPEG-2 AAC:•

Prediction of each spectral coefficient; backward adaptive 2nd order Lattice predictor

High complexity (ca. 50% of decoder computation & RAM)Prohibitive for cost sensitive applications, used in “MPEG-2 Main Profile” only

MPEG-4 AAC:•

Long Term Predictor (LTP) as known from speech coding, before MDCT on time domain

signal.

New: Integration into perceptual audio coder•

Lower complexity: Saving of approx. 50% in terms of computation and memory over MPEG-2 predictors

Page 37: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 37

Long Term Prediction

Page 38: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 38

Transform-Domain Weighted Interleave VQ (1)Background:•

Audio coding at extremely low bitrates (≥

6

kbit/s)MPEG-4:•

Transform-Domain Weighted Interleave Vector Quantization (TwinVQ) as additional coding kernel

Fully integrated into MPEG-4 AAC coding system:

Uses same spectral representation as AAC coder

Makes use of other MPEG-4 tools (e.g. LTP, TNS, joint stereo coding)

Possible core coder for MPEG-4 scalable coding

Page 39: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 39

Transform-Domain Weighted Interleave VQ (2)Structure:•

Normalization of spectral coefficients:

LPC envelope (overall spectral shape)–

Periodic component coding (harmonic components)

Bark-scale envelope coding (additional flattening)

Vector Quantization (VQ) process:–

Interleaving of spectral coefficients into new sub-vectors

Vector quantization (two sets of codebooks, weighted distortion measure)

allows distortion control by perceptual model

Page 40: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 40

Transform-Domain Weighted Interleave VQ (3)

Page 41: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 41

Conclusions•

The “all-round”

coding system among the MPEG-4

audio schemes, providing a set powerful tools•

Based on MPEG-2 Advanced Audio Coding kernel

Several enhancements for improved coding efficiency

Perceptual Noise Substitution (PNS): Exploiting noise-like components in the signal

Long Term Prediction (LTP): Taking advantage of verystationary / tonal signals

TwinVQ

coder kernel supports audio coding at extremely low data rates (MPEG-4 scalable coding)

Page 42: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 42

MPEG-4 Scalable Audio Coding

Page 43: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 43

Scalable Audio Coding

Embed lower quality (e.g. lower bandwidth) bitstream

in higher bandwidth bitstream

Key functionality for MPEG-4 audio

Main types of scalability:–

Small step scalability

Enhancement layers of ~ 1 kbit/s–

Large step scalability

Enhancement layers of 8 kbit/s and more

All natural audio coding in MPEG-4 supports scalability

Page 44: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 44

The Scalable Audio Profile•

Four levels defined according to

sampling frequency–

number of channels / objects

Objects in the scalable audio profile–

AAC LC

AAC LTP–

AAC Scalable

TwinVQ–

CELP

HVXC–

TTSI

Page 45: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 45

Speech Coding•

HVXC (Harmonic Vector EXcitation

Coding)

Bit-rates typically 2 kbps -

4 kbps–

Parametric speech coding: High quality for coding of clean speech

Speed change / pitch change capability

CELP (Code Excited Linear Prediction)–

Bit-rates typically 6 kbps -

24 kbps

Very flexible configuration possibilities–

Support for 8 kHz and 16 kHz sampling

Page 46: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 46

MPEG-4 Low Delay Audio Coding

Page 47: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 47

MPEG-4 Version 2 Low Delay Audio Coding

Target:•

High audio and speech quality and

Low bitrate

and Low algorithmic delay (20 ms)

Solution:•

MPEG-4 Version 2 Low Delay Audio Coder:

Derived from MPEG-2/4 "Advanced Audio Coding" (AAC)

Specific modifications for low-delay operation

Page 48: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 48

Delay Sources in Perceptual Audio Coding

Framing delay•

Filter bank delay

Look-ahead delay for block switching•

Use of bit reservoir

Overall delay:

Page 49: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 49

Example: Delay of AAC Codec (48 kHz / 64 kbps)

Framing delay : 1024 samples•

Filter bank delay : 1024 samples

Look-ahead delay for block switching : 576 samples

Use of bit reservoir : 74.7 ms

Overall delay:

Page 50: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 50

Low Delay AAC Codec (48 kHz, min. delay mode)

Reduced filter bank delay : 959 samples•

No block switching no look-ahead delay: 0 samples

Minimal bit reservoir : 0...32 bits

Overall delay:

Page 51: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 51

Delay vs. Bitrate

Page 52: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 52

Preecho

Behavior

Page 53: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 53

Preecho

Reduction by Window Shape Adaptation

Page 54: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 54

Test Results -

Comparison to "MP3"

Page 55: State of the Art in Perceptual Coding: MPEG-2/4 Advanced ... › fileadmin › media › mt › lehre › ... · Huffman coding of scalefactors and spectral coefficients. Prof. Dr

Prof. Dr. Karlheinz Brandenburg, [email protected] Page 55

SummaryThe MPEG-4 V2 low-delay coder provides

High audio quality for music and speech•

Algorithmic delay of 20 ms enables two-way communications

Audio quality scales with bitrate•

Stereo and multi-channel capabilities (inherited from MPEG-2/4 AAC)

Compares to well w.r.t. MP3•

Low computational & memory complexity

Error robustness