27
Week 7 Psychoacoustic Compression 1 ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

Embed Size (px)

Citation preview

Page 1: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

1ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

ESE250:Digital Audio Basics

Week 7 February 23, 2012

Psychoacoustic Compression

Page 2: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

2

Course Map

Numbers correspond to course weeks

2,5 6

11

13

12

Week 7 Psychoacoustic Compression

Today: audio signal processing – putting it all together

ESE 250 – S’12 Kod & DeHon

Page 3: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

3Week 7 Psychoacoustic CompressionESE 250 – S’12 Kod & DeHon

Today’s Agenda

?• How do we compress from

WAV Bit Rate (per channel):~ 700 kbps @ 44.1 kHz

• Down toMP3 Target (per channel) :

~ 60 kbps @ 44.1 kHz

?

Page 4: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

Where are we ?• Week 2 Received signal is sampled &

quantized q = PCM[ r ]

• Week 3 Quantized Signal is Coded c =code[ q ]

• Week 4 Sampled signal first

transformed into frequency domain

Q = DFT[ q ]• Week 5

signal oversampled & low pass filtered

Q = LPF[ DFT(q+n) ]• Week 6

Transformed signal analyzed Using human psychoaoustic

models• Week 7

Acoustically Interesting signal is “perceptually coded”

C = MP3[ Q]

OverSample

DFT LPF

DecodeProduce

r(t)

p(t)

q + n

CPerceptual

CodingStore /

Transmit

Q + N Q

Week 4

Week 6

Week 5 Week 3

[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Week 7 Psychoacoustic CompressionESE 250 – S’12 Kod & DeHon 4

Page 5: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

5ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

- T0/2 T0/2 T0-T0

TA

- T0/2 T0/2

TN

- T0/2 T0/2

Week 5 Review (Oversampling)audio-relevant signal

q(t) Time window, T0 sec per “block” Nyquist sample rate, TA ideally nA = T0 / TA samples per “block”:

q = (qnA … , q-2 , q-1 , q0 , q1 , q2, …, qnA,,

) = PCM[ q(t) ]

• ambient noise n(t) Nyquist sample rate, TN << TA

• receive signal in a block nS = T0 / TN >> nA = T0 / TA ultimately, record

r = (r-nN , … , r-2 , r-1 , r0 , r1 , r2 , … , rnN

) = PCM[ r(t) ] = PCM[ q(t) + n(t)] = q + n = (q-nN

+ n-nN, … , q-2 + n-2, q-1 + n-1 ,

q0 + n0, q1 + n1, q2 + n2, … , qnN + n-nN

)

q(t)

n(t)

r(t) = q(t) + n(t)

Page 6: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

6ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

- T0/2 T0/2

r(t) = q(t) + n(t)• Given r, compute frequency domain representation

R = DFT[ r ]

= (R-nN , … , R-2 , R-1 , R0 , R1 , R2 , … , RnN

)

= (Q-nN + N-nN

, … , Q-2 + N-2, Q-1 + N-1 , Q0 + N0, Q1 + N1, Q2 + N2, … , QnN + N-nN

)

• introduce assumptions about frequency content: k > nA = A / 0 ) Qk = 0

k < nA = A / 0 ) Nk = 0

• to realize

R = (0,…, 0, QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,

, 0,…, 0 )

+ (N-nM,… , N-nA

, 0 , … , 0 , 0 , 0 , 0 , 0 , … , 0 , NnA

,…, NnM )

• Low Pass Filter

Q = (QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,

)

• Bit count = nA 32

T0 ¼ 1/44 sec ) nA ¼ 1000

T0 ¼ 1/88 sec ) nA ¼ 500

Week 5 Review (Anti-Aliasing)

……nA nS- nS - nA

……nA nS- nS - nA

DFT

LPF

Page 7: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

7ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

Week 6 Review (Hearing Model)

• Power Spectrum Model of Hearing: Critical Bands: Auditory system contains finite array

of adaptively tunable, overlapping bandpass filters Frequency Bins: humans process a signal’s

component (against noisy background) in the one filter with closest center frequency

Masking: certain signal components in a given band are “favored” and others are filtered out

• Established through decades of psychoacoustic experiments

• Model underlying today’s algorithmic thinking

B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.

Page 8: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

8ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

• acquire & transform “frame”

• assign frequencies to bands use

psychoacoustic model lookup

to determine frequency bandwidth

of each critical band

Today: Critical Band Assignments

……nA nS- nS - nA

|Q|

LUT

Bands

QnA

… …

Q1 Q2 Q3 Q4 Q5 Qk-1 Qk+1Qk

……

1 2 k 22… …

Page 9: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

9ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

• use psychoacoustic model

• to minimize bits per critical band Cband k(j) = Round[Qband k(j) , Level]

• by appeal to “masking” models Qband k = Cband k + Dband k

where distortion (“perceptual noise”)

Dband k

between retained signal

Cband k

and actual signal

Qband k

should be “masked” by retained signal

Today: Code (Compress) Frequencies

Bands

Lossy Coding

Page 10: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

10ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

• E.g., Look at kth band Qband k = (Qband k(1) , … ,,Qband k(m) ) amplitudes represented as reals

• Determine masking paradigm tone-masking-noise noise-masking-tone noise-masking-noise

• E.g., for tone masker, Pick tone frequency, band k (j) at maximal amplitude in the band

• Choose quantization level and compute compressed signal Cband k(j) = Round[Qband k(j) , Level] Cband k = (0band k(1) , … , Cband k(j) , … , 0band k(m) )

• Assess noise magnitude, | Dband k |

Dband k =

(Qband k(1) , … , Qband k(j) , … ,,Qband k(m) )

- (0band k(1) , … , Cband k(j) , …, 0band k(m)

• Use psychoacoustic model to determine whether compressed signal will mask the distortion noise for that band

Single BandSPL

frequency

SPL

1 bit | Noise |

frequency

SPL

1 bit Signal

Real Input

frequency

frequency

SPL

2 bit Signal

frequency

SPL

2 bit | Noise |

SMR for

2 bits

SMR for1 bit

more bits yields larger signal-to-mask-ratio

Page 11: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

11ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

Overview of Perceptual Coding• Goals

digitally represent a signal minimum number of bits “transparent” reproduction

o most sensitive humano cannot distinguish between o original and generated signal

• Perceptual Entropy Using psychoacoustic model to estimate information content of audio

signals[J. D. Johnston. IEEE J. Sel. Ar. Comm., 6(2):314–323, 1988]

suggested transparency achievable at 2 bits per sample or 88 kbps @ 44.1 kHz

• Our present (WAV) bit count = 32 bits per sample T0 ¼ 1/44 sec ) nA ¼ 1000 ) bits per frame ¼ 32 ¢ 44 kbps ¼ 1.4 ¢ 103 kbps

T0 ¼ 1/88 sec ) nA ¼ 500 ) bits per frame ¼ 16 ¢ 88 kbps ¼ 1.4 ¢ 103 kbps• Our leverage?

[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Page 12: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

12ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

What are the “Knobs” ?• “Reservoir”

Are all frames equally full of audio information?

• Masking How should we exploit the perceptual model?

• Local decoupling Can we exploit (rough) independence of each

band?

• Global accounting How should we re-impose the average frame-rate?

Page 13: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

13ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

MP3 Encoder Design Strategy• Target: ~ 1032 bits per frame

~ 2 bits per sample ~ 512 samples per frame

• Use masker(signal)-masking-maskee(noise) paradigm assume 1 masker “costs” ~ 2 K bits ) retained signal, C should consist of ~ 210 – K maskers on average

o specified amplitudes o at specified frequencies

allocated to some subset of the ~ 32 = 25 critical bands• Rough algorithm for computing retained signal

(i) frame bit-reservoir: supplies bits per each critical band(ii) commit to masker(s) within each band (iii) attempt to mask within-band distortion with available bits(iv) each band: give bits back to or take more from reservoir(v) iterate

Page 14: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

14ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

Emerging Picture

Bits Retained

AudioQuality

Knob #1

Knob #2

Knobs #1 & #2

Page 15: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

16ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

The Ultimate Boss• Subjective Quality Scales• International Standards

(a) Absolute impairment

(b) Differential grades

[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Page 16: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

17ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

General Dimensions of Merit• Bit Rate:

bits per sample samples per second

• Complexity: computational effort required to encode and decode

• Delay: time required to encode and decode

Page 17: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

18ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

MP3 Perceptual Coding Algorithm• Commit to Observation Window

“Long” frame (complex sound; frequency resolution)

“Short” frame (transient sound; temporal resolution)

• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”

• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits

[ Raissi. Technical report, MP3’ Tech, December 2002]

Page 18: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

19ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

MP3 Perceptual Coding Algorithm

[ Raissi. Technical report, MP3’ Tech, December 2002]

• Commit to Observation Window “Long” frame (complex sound;

frequency resolution) “Short” frame (transient sound;

temporal resolution)• Estimate Perceptual Entropy

Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”

• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits

Page 19: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

20ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

Resolution: Time vs. Frequency• Example: two sample plots in time-

frequency plane Masking Thresholds for

(a) castanets (b) piccolo Recall Masking Threshold def’n:

o lower volume signals o in relation to specified critical band

(simultaneous; just before; soon after) o are inaudible

• Affects Choice of Observation Window (“frame”)

(a) Castanets: prefer ~ 10 ms time resolution Implies blurrier frequency resolution

(b) Piccolo: prefer ~ 2 Critical-band frequency resolution

Implies blurrier temporal resolution[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Page 20: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

21ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

MP3 Perceptual Coding Algorithm• Commit to Observation Window

“Long” frame (complex sound; frequency resolution)

“Short” frame (transient sound; temporal resolution)

• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”

• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits

[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Page 21: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

22ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

Use of Perceptual Entropy

• Transform Coding Trade vector quantization for scalar quantization By transforming to frequency domain (decoupled)

• Critical Band Analysis Compute spectral power in each band, Pf = | Sf | Determine Masker for each band via SFM Determine JND (“just noticeable distortion”)

threshold for each band• Bit Assignment

Page 22: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

23ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

MP3 Perceptual Coding Algorithm• Commit to Observation Window

“Long” frame (complex sound; frequency resolution)

“Short” frame (transient sound; temporal resolution)

• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”

• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits

[You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.]

Page 23: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

24ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

Spectral Quantization/Coding Loop• Inner loop

Global gain – overall bit rate control Shared across all spectral values Larger value

o increased quantizer step sizeo increased quantization noise (“distortion”)

• Outer Loop Adjusts scale factors – reallocating bits to bands Affects only the spectral values within a critical band Insures the masker for that band will mask the distortion

o Larger mask signal relative to threshold implies fewer bits neededo Smaller mask relative to threshold implies more bits needed

• Loop Termination when bit rate constraint is satisfied with no audible distortion or after a set number of iterations (with excessive bits spent)

Freq. (Hz)

SP

L (

dB)

Critical

.Ban

d k-1

Critical

.Ban

d k

Critical

.Ban

d k+1……

Larger mask-to-noise ratio

Smaller mask-to-noise ratio

Page 24: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

25ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

Typical Quantization Control

[You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.]

Quantized Output amplitude

Inner loop bit rate control

Outer loop band specific Distortion control

FrequencyInput masker amplitude

Critical Band Index

5 1 0 1 5 2 0 2 5

5

1 0

1 5

2 0

2 5

GlobalGain = 220

ScaleFactor = 2

5 1 0 1 5 2 0 2 5

5

5

1 0

1 5

2 0

2 5

GlobalGain = 230

ScaleFactor = 2OutputAmplitude[quantizedSPL]

Input Amplitude [real SPL]

OutputAmplitude[quantizedSPL]

Input Amplitude [real SPL]

Page 25: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

26ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

MP3 Perceptual Coding Algorithm• Commit to Observation Window

“Long” (complex sound; frequency resolution)

“Short” (transient sound; temporal resolution)

• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”

• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits

[ Raissi. Technical report, MP3’ Tech, December 2002]

Page 26: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

27ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

To Probe Further• Tutorials on Psychoacoustic Coding (in increasing order of

abstraction and generality) D. Pan, M. Inc, and I. L. Schaumburg. A tutorial on MPEG/audio

compression. IEEE multimedia, 2(2):60–74, 1995. Nikil Jayant, James Johnston, and Robert Safranek. Signal

compression based on models of human perception. Proceedings of the IEEE, 81(10):1385–1422, 1993.

V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001.

• Lightweight Overview of MP3 Rassol Raissi. The theory behind mp3. Technical report, MP3’ Tech,

December 2002.• Scientific Basis of MP3 Coding Standard

J. D. Johnston. Transform coding of audio signals using perceptual noise criteria. IEEE Journal on selected areas in communications, 6(2):314–323, 1988.

Page 27: Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

28ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

ESE250:Digital Audio Basics

Week 7 February 23, 2012

Psychoacoustic Compression