21
Norsk Regnesentral Audio Coding and MP3 Wolfgang Leister contributions by: Torbjørn Ekman 26-Feb-03 Norsk Regnesentral Wolfgang Leister What is Sound? n Sound waves: 20Hz - 20kHz n Speed: 331.3 m/s (air) n Wavelength: 165 cm - 1.65 cm

Audio Coding and MP3 - folk.uio.nofolk.uio.no/inf5080/mkt04a-audio.pdf · Audio Coding and MP3 Wolfgang Leister ... Ideal sub-band coder ... nMicrosoft Windows Media Audio (WMA) nAC-3

  • Upload
    vanlien

  • View
    240

  • Download
    5

Embed Size (px)

Citation preview

1

Norsk Regnesentral

Audio Coding and MP3

Wolfgang Leistercontributions by:

Torbjørn Ekman

26-Feb-03

Norsk RegnesentralWolfgang Leister

What is Sound?

n Sound waves: 20Hz - 20kHzn Speed: 331.3 m/s (air)n Wavelength: 165 cm - 1.65 cm

2

26-Feb-03

Norsk RegnesentralWolfgang Leister

Analogue audio

n frequencies: 20Hz - 20kHzn mono: x(t) scalarn stereo:

=

)(

)()(

tx

txtx

l

r

26-Feb-03

Norsk RegnesentralWolfgang Leister

Audio Compression

n small files, low data rate at transmissionn reconstruction must be (as much as

possible) equal to original signaln redundancy (lossless coding)n irrelevancy (do not code what you cannot hear)

3

26-Feb-03

Norsk RegnesentralWolfgang Leister

Data rates

Quality Sample Rate Bit/Sample Channels Data Rate kb/s FrequencyTelephone 8.000 8 Mono 64,00 200-3400MW 11.025 8 Mono 88,00UKW 22.050 16 Stereo 705,60CD 44.100 16 Stereo 1411,00 20-20000DAT 48.000 16 Stereo 1536,00 20-20000

26-Feb-03

Norsk RegnesentralWolfgang Leister

Dynamics compression

n A-Law

+⋅+

≤+⋅

⋅=

else ln1

))(ln(1)(

1for

ln1)(

)('

ASabsA

Ssign

Aabs(S)

ASabsA

SsignS

n µ-Law

255,)1ln(

))(1ln(1)(' =

+⋅++

⋅= µµ

µ SabsSsignS

4

26-Feb-03

Norsk RegnesentralWolfgang Leister

Masking

26-Feb-03

Norsk RegnesentralWolfgang Leister

Masking

n Threshold for human earn Threshold changes:

n neighbouring frequencies(Example 0.5, 1, 4, 8 kHz)

n in time

5

26-Feb-03

Norsk RegnesentralWolfgang Leister

Sampling

• When x(t) is bandwidth limited:

• then

• with

0)( =⇒> fxf ω

[ ]∑∞

−∞=

∆⋅−=n

tntgnxtx )()(

ω211

<=∆sf

t [ ] )( tnxnx ∆⋅= tttg

πωπω

2)2sin()( =

26-Feb-03

Norsk RegnesentralWolfgang Leister

Quantisation

n

n

n

n

)(xQx → tionsrepresenta2Lbits k=⇒k

iji yxQyxyx =⇒−≤− )(

{ }nyy ,,1 K

6

26-Feb-03

Norsk RegnesentralWolfgang Leister

PCM = Pulse Code Modulation

n Sampling:n Quantisation:n Coding:

n Play:

{ } [ ]{ }nxtx →)(

[ ]{ } [ ]( ){ }nxQnx →

[ ]{ }( ) { }innxQ →

( ) [ ]( ) ( )tntgnxQty ii ∆⋅−⋅= ∑

redundancy

irrelevancy

26-Feb-03

Norsk RegnesentralWolfgang Leister

Stereo CD Audio

n Data rate: 1-31044.1bit162 −⋅⋅⋅ s

sbit

102.1411 3⋅=

7

26-Feb-03

Norsk RegnesentralWolfgang Leister

MPEG compression factors

n MPEG 1 Audio: PCM 32, 44.1, 48 kHz, max 448 kBit/s

n MPEG 2 Audio: PCM 16, 22.05, 24, 32, 44.1, 48 kHz, max 384 KBit/s

26-Feb-03

Norsk RegnesentralWolfgang Leister

MPEG Audio Layer I,II,III

n Layer In Layer II ⇒ Digital TVn Layer III ⇒ MP3

8

26-Feb-03

Norsk RegnesentralWolfgang Leister

MP3 - MPEG 1 Audio Layer 3

n Sampling: 16 kHz - 48 kHzn Bit rate: 32 kb/s - 192 kb/s

(CD Audio: 44.1 kHz, 1411 kb/s)

n www.iis.fhg.de/amm/gallery/index.htmln Karlheinz Brandenburg: “MP3 and AAC

explained”http://www.exp-math.uni-essen.de/~dreibh/diplom/bra99.pdf

26-Feb-03

Norsk RegnesentralWolfgang Leister

perceptual encoding / decoding

9

26-Feb-03

Norsk RegnesentralWolfgang Leister

Filterbank

26-Feb-03

Norsk RegnesentralWolfgang Leister

Ideal sub-band coder

n impossible: ideal sub-band codern downsampling ⇒ aliasingn possible: “nearly perfect”

=∈

=else0

,,1,ffor 1)(

MmDf m

mHK

10

26-Feb-03

Norsk RegnesentralWolfgang Leister

Downsampling

n from back ton sub-bandwidth B, upper frequency is multiple of B

n can sample at (instead of )

Bfs 2=

sf

BMfs ⋅= 2

sfM ⋅

↓M

[ ] [ ]Mkxky mm ⋅=

[ ]nxm [ ]kym

26-Feb-03

Norsk RegnesentralWolfgang Leister

Filterbank in MPEG-1 audio layer 1-3

n Polyphase filterbankn 32 subbandsn 512 tap FIR-filtersn 80 + and * per output

n Equal widthn Not perfect reconstructionn Frequency overlap

11

26-Feb-03

Norsk RegnesentralWolfgang Leister

A closer look

n The subbands overlap at 3 dB to the adjacent bands.n The leakage to the other bands is small.n The total response almost adds up to one (0 dB).

26-Feb-03

Norsk RegnesentralWolfgang Leister

White noisen The white noise run

through the filterbank.n The samples from each

band are played in the order of the subbands.

n The subsampled filtered sequence.

n The samples from eachband are played in the order of the subbands.

n The reconstruction error is –84 dB.

12

26-Feb-03

Norsk RegnesentralWolfgang Leister

Nonideal filterbanks

n In a perfect filterbankthe first part is the only part.

n The second part consists of the aliasing terms.

n The filterbank is designed so that the aliasing is small.

+=

=∑

4444 34444 211

1

0

)()(1

)()( ωωωω jAk

jM

k

Rk

jj eHeHM

eXeY

44444 344444 210

21

0

1

1

2

)()(1

)(

−−

=

=

∑∑ Mn

jAk

jM

k

Rk

M

n

Mn

jeHeH

MeX

πω

ωπ

ω

26-Feb-03

Norsk RegnesentralWolfgang Leister

Tubthumper, a time domain view

The red line is the reconstruction error after splitting the signal in subbands, down sampling and applying the synthesisfilterbank. The reconstruction error is –84 dB and sounds like

13

26-Feb-03

Norsk RegnesentralWolfgang Leister

Tubthumper, frequency view

Subsampled 32 times

No subsampling

21.710.75.22.41.00.3Center frequency[kHz]

32168421Subband

26-Feb-03

Norsk RegnesentralWolfgang Leister

Filterbank MPEGpolyphase

filterbank

12 samples

band 1

band 2

band 31

...

12 samples 12 samples

Layer I frame

384 samples

Layer II/III frame

1152 samples

14

26-Feb-03

Norsk RegnesentralWolfgang Leister

Critical Bands

n Heinrich Barkhausen (1881-1956)n psycho-acousticn width measured in bark

⋅+

<=

elsef

fforfbark

)1000/log(49

500100/1

26-Feb-03

Norsk RegnesentralWolfgang Leister

MPEG - Sub bands

n Layer I: 32 bands, 625 Hz each, Fourier transform

n Layer II: 32 bands, three frames, time masking

n Layer III: Division according to critical bands

15

26-Feb-03

Norsk RegnesentralWolfgang Leister

MPEG masking

n Psycho-acoustic modeln masking of neighbouring bandsn signals are coded when above masking

thresholdn MUSICAM (Masking-pattern adapted

Universal Subband Integrated Coding and Multiplexing)

n Layer I: simplified, Layer II: entirely, Layer III: with other methods

26-Feb-03

Norsk RegnesentralWolfgang Leister

2035

Example: Masking MPEG Audio

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 8 12 10 6 2 10 60 15 2 3 5 3 1

bandlevel

?15? ? ? ? ? ? 12 x ? ? ? ? ? ?masking

?x? ? ? ? ? ? - x ? ? ? ? ? ?coding

16

26-Feb-03

Norsk RegnesentralWolfgang Leister

MPEG-1 Layer 3 encoder

26-Feb-03

Norsk RegnesentralWolfgang Leister

MP3

n Filter bank - sub bandsn Series MDCTn fine grain frequency resolutionn non-uniform quantisationn perception model n Huffman coding

17

26-Feb-03

Norsk RegnesentralWolfgang Leister

MP3 (vs. Layer I/II)

n modified DCT (Series MDCT vs. FFT)

n critical bandsn Huffman codingn entropy reductionn dynamics compressionn difference and sum of stereo signals

26-Feb-03

Norsk RegnesentralWolfgang Leister

MPEG Audio Layer I,II,III

n Layer I: 19 ms delay, FFT, 384 samples, frequency masking, equal bands

n Layer II: 35 ms delay, FFT, 1152 samples, frequency masking, time simulated, equal bands

n Layer III: 59 ms delay, DCT, 1152 samples, frequency and time masking, bands as in bark scale

18

26-Feb-03

Norsk RegnesentralWolfgang Leister

MPEG Layer I, II, III

subj. quality bandwidth compression 1 min audioAudio CD CD 1400 1:1 10.58 MBMPEG1 Layer I CD 384 3.6:1 2.88 MBMPEG1 Layer II CD 256 5.5:1 1.92 MBMPEG1 Layer III CD 128 11:1 962 kBMPEG2 Layer III Radio 64 22:1 481 kBMPEG2 Layer III Telephone 16 88:1 120 kBCS-ACELP Speech 5,30 264:1 40 kB

26-Feb-03

Norsk RegnesentralWolfgang Leister

MPEG-2 AAC

19

26-Feb-03

Norsk RegnesentralWolfgang Leister

Audio Formats

n PCM - Pulse Code ModulationITU G.711; speech data 4kHz bandwidth, 64 kb/s

data rate

n ADPCM (Adaptive Differential PCM)ITU G.726, G.727; 16, 24, 32, 40 kBit/s. Standard

for CCITT G.721

n SB-ADPCM (Sub-Band ADPCM)ISDN, G.722; 7 kHz bandwidth in 64 kBit/s streams

26-Feb-03

Norsk RegnesentralWolfgang Leister

Audio Formats

n AIFF - Audio Interchange File FormatApple (extension from IFF by Electronic Arts)

n Wave (by Microsoft and IBM)Part of RIFF (Resource Interchange File Format)

n NeXT/Sun Audio File Format! big endian

20

26-Feb-03

Norsk RegnesentralWolfgang Leister

Proprietary Audio Formats

n AT&T Proprietary Compression Algorithm

n EPAC (Bell Labs)n Microsoft Windows Media Audio (WMA)n AC-3 Audio Code No. 3 - Dolby Digital

Surround

26-Feb-03

Norsk RegnesentralWolfgang Leister

Speech compression formats

n GSM 06-10: 160 13-bit values in 260 Bit (33 Byte) are compressed; 8000 samples/s result in data rate of 1650 Byte/s

n CELP (Code Excited Linear Prediction): analytical model

n LD-CELP (Low Delay CELP): G.728n LPC-10E (Linear Prediction Coder

(Enhanced): military coder, analytical model, 2.4 kBit/s understandable, but low quality.

21

26-Feb-03

Norsk RegnesentralWolfgang Leister

End of Part

Thank you for your attention!