Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

1ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression

ESE250:Digital Audio Basics

Week 7 February 23, 2012

Psychoacoustic Compression

2

Course Map

Numbers correspond to course weeks

2,5 6

11

13

12

Week 7 Psychoacoustic Compression

Today: audio signal processing – putting it all together

ESE 250 – S’12 Kod & DeHon

3Week 7 Psychoacoustic CompressionESE 250 – S’12 Kod & DeHon

Today’s Agenda

?• How do we compress from

WAV Bit Rate (per channel):~ 700 kbps @ 44.1 kHz

• Down toMP3 Target (per channel) :

~ 60 kbps @ 44.1 kHz

?

Where are we ?• Week 2 Received signal is sampled &

quantized q = PCM[ r ]

• Week 3 Quantized Signal is Coded c =code[ q ]

• Week 4 Sampled signal first

transformed into frequency domain

Q = DFT[ q ]• Week 5

signal oversampled & low pass filtered

Q = LPF[ DFT(q+n) ]• Week 6

Transformed signal analyzed Using human psychoaoustic

models• Week 7

Acoustically Interesting signal is “perceptually coded”

C = MP3[ Q]

OverSample

DFT LPF

DecodeProduce

r(t)

p(t)

q + n

CPerceptual

CodingStore /

Transmit

Q + N Q

Week 4

Week 6

Week 5 Week 3

[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Week 7 Psychoacoustic CompressionESE 250 – S’12 Kod & DeHon 4


- T0/2 T0/2 T0-T0

TA

- T0/2 T0/2

TN

- T0/2 T0/2

Week 5 Review (Oversampling)audio-relevant signal

q(t) Time window, T0 sec per “block” Nyquist sample rate, TA ideally nA = T0 / TA samples per “block”:

q = (qnA … , q-2 , q-1 , q0 , q1 , q2, …, qnA,,

) = PCM[ q(t) ]

• ambient noise n(t) Nyquist sample rate, TN << TA

• receive signal in a block nS = T0 / TN >> nA = T0 / TA ultimately, record

r = (r-nN , … , r-2 , r-1 , r0 , r1 , r2 , … , rnN

) = PCM[ r(t) ] = PCM[ q(t) + n(t)] = q + n = (q-nN

+ n-nN, … , q-2 + n-2, q-1 + n-1 ,

q0 + n0, q1 + n1, q2 + n2, … , qnN + n-nN

)

q(t)

n(t)

r(t) = q(t) + n(t)


- T0/2 T0/2

r(t) = q(t) + n(t)• Given r, compute frequency domain representation

R = DFT[ r ]

= (R-nN , … , R-2 , R-1 , R0 , R1 , R2 , … , RnN

)

= (Q-nN + N-nN

, … , Q-2 + N-2, Q-1 + N-1 , Q0 + N0, Q1 + N1, Q2 + N2, … , QnN + N-nN

)

• introduce assumptions about frequency content: k > nA = A / 0 ) Qk = 0

k < nA = A / 0 ) Nk = 0

• to realize

R = (0,…, 0, QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,

, 0,…, 0 )

+ (N-nM,… , N-nA

, 0 , … , 0 , 0 , 0 , 0 , 0 , … , 0 , NnA

,…, NnM )

• Low Pass Filter

Q = (QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,

)

• Bit count = nA 32

T0 ¼ 1/44 sec ) nA ¼ 1000

T0 ¼ 1/88 sec ) nA ¼ 500

Week 5 Review (Anti-Aliasing)

……nA nS- nS - nA


DFT

LPF


Week 6 Review (Hearing Model)

• Power Spectrum Model of Hearing: Critical Bands: Auditory system contains finite array

of adaptively tunable, overlapping bandpass filters Frequency Bins: humans process a signal’s

component (against noisy background) in the one filter with closest center frequency

Masking: certain signal components in a given band are “favored” and others are filtered out

• Established through decades of psychoacoustic experiments

• Model underlying today’s algorithmic thinking

B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.


• acquire & transform “frame”

• assign frequencies to bands use

psychoacoustic model lookup

to determine frequency bandwidth

of each critical band

Today: Critical Band Assignments


|Q|

LUT

Bands

QnA

… …

Q1 Q2 Q3 Q4 Q5 Qk-1 Qk+1Qk

…

……

1 2 k 22… …


• use psychoacoustic model

• to minimize bits per critical band Cband k(j) = Round[Qband k(j) , Level]

• by appeal to “masking” models Qband k = Cband k + Dband k

where distortion (“perceptual noise”)

Dband k

between retained signal

Cband k

and actual signal

Qband k

should be “masked” by retained signal

Today: Code (Compress) Frequencies

Bands

Lossy Coding


• E.g., Look at kth band Qband k = (Qband k(1) , … ,,Qband k(m) ) amplitudes represented as reals

• Determine masking paradigm tone-masking-noise noise-masking-tone noise-masking-noise

• E.g., for tone masker, Pick tone frequency, band k (j) at maximal amplitude in the band

• Choose quantization level and compute compressed signal Cband k(j) = Round[Qband k(j) , Level] Cband k = (0band k(1) , … , Cband k(j) , … , 0band k(m) )

• Assess noise magnitude, | Dband k |

Dband k =

(Qband k(1) , … , Qband k(j) , … ,,Qband k(m) )

- (0band k(1) , … , Cband k(j) , …, 0band k(m)

• Use psychoacoustic model to determine whether compressed signal will mask the distortion noise for that band

Single BandSPL

frequency

SPL

1 bit | Noise |

frequency

SPL

1 bit Signal

Real Input

frequency

frequency

SPL

2 bit Signal

frequency

SPL

2 bit | Noise |

SMR for

2 bits

SMR for1 bit

more bits yields larger signal-to-mask-ratio


Overview of Perceptual Coding• Goals

digitally represent a signal minimum number of bits “transparent” reproduction

o most sensitive humano cannot distinguish between o original and generated signal

• Perceptual Entropy Using psychoacoustic model to estimate information content of audio

signals[J. D. Johnston. IEEE J. Sel. Ar. Comm., 6(2):314–323, 1988]

suggested transparency achievable at 2 bits per sample or 88 kbps @ 44.1 kHz

• Our present (WAV) bit count = 32 bits per sample T0 ¼ 1/44 sec ) nA ¼ 1000 ) bits per frame ¼ 32 ¢ 44 kbps ¼ 1.4 ¢ 103 kbps

T0 ¼ 1/88 sec ) nA ¼ 500 ) bits per frame ¼ 16 ¢ 88 kbps ¼ 1.4 ¢ 103 kbps• Our leverage?



What are the “Knobs” ?• “Reservoir”

Are all frames equally full of audio information?

• Masking How should we exploit the perceptual model?

• Local decoupling Can we exploit (rough) independence of each

band?

• Global accounting How should we re-impose the average frame-rate?


MP3 Encoder Design Strategy• Target: ~ 1032 bits per frame

~ 2 bits per sample ~ 512 samples per frame

• Use masker(signal)-masking-maskee(noise) paradigm assume 1 masker “costs” ~ 2 K bits ) retained signal, C should consist of ~ 210 – K maskers on average

o specified amplitudes o at specified frequencies

allocated to some subset of the ~ 32 = 25 critical bands• Rough algorithm for computing retained signal

(i) frame bit-reservoir: supplies bits per each critical band(ii) commit to masker(s) within each band (iii) attempt to mask within-band distortion with available bits(iv) each band: give bits back to or take more from reservoir(v) iterate


Emerging Picture

Bits Retained

AudioQuality

Knob #1

Knob #2

Knobs #1 & #2


The Ultimate Boss• Subjective Quality Scales• International Standards

(a) Absolute impairment

(b) Differential grades



General Dimensions of Merit• Bit Rate:

bits per sample samples per second

• Complexity: computational effort required to encode and decode

• Delay: time required to encode and decode


MP3 Perceptual Coding Algorithm• Commit to Observation Window

“Long” frame (complex sound; frequency resolution)

“Short” frame (transient sound; temporal resolution)

• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”

• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits

[ Raissi. Technical report, MP3’ Tech, December 2002]


MP3 Perceptual Coding Algorithm


• Commit to Observation Window “Long” frame (complex sound;

frequency resolution) “Short” frame (transient sound;

temporal resolution)• Estimate Perceptual Entropy

Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”



Resolution: Time vs. Frequency• Example: two sample plots in time-

frequency plane Masking Thresholds for

(a) castanets (b) piccolo Recall Masking Threshold def’n:

o lower volume signals o in relation to specified critical band

(simultaneous; just before; soon after) o are inaudible

• Affects Choice of Observation Window (“frame”)

(a) Castanets: prefer ~ 10 ms time resolution Implies blurrier frequency resolution

(b) Piccolo: prefer ~ 2 Critical-band frequency resolution

Implies blurrier temporal resolution[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]









Use of Perceptual Entropy

• Transform Coding Trade vector quantization for scalar quantization By transforming to frequency domain (decoupled)

• Critical Band Analysis Compute spectral power in each band, Pf = | Sf | Determine Masker for each band via SFM Determine JND (“just noticeable distortion”)

threshold for each band• Bit Assignment







[You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.]


Spectral Quantization/Coding Loop• Inner loop

Global gain – overall bit rate control Shared across all spectral values Larger value

o increased quantizer step sizeo increased quantization noise (“distortion”)

• Outer Loop Adjusts scale factors – reallocating bits to bands Affects only the spectral values within a critical band Insures the masker for that band will mask the distortion

o Larger mask signal relative to threshold implies fewer bits neededo Smaller mask relative to threshold implies more bits needed

• Loop Termination when bit rate constraint is satisfied with no audible distortion or after a set number of iterations (with excessive bits spent)

Freq. (Hz)

SP

L (

dB)

Critical

.Ban

d k-1

Critical

.Ban

d k

Critical

.Ban

d k+1……

Larger mask-to-noise ratio

Smaller mask-to-noise ratio


Typical Quantization Control

[You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.]

Quantized Output amplitude

Inner loop bit rate control

Outer loop band specific Distortion control

FrequencyInput masker amplitude

Critical Band Index

5 1 0 1 5 2 0 2 5

5

1 0

1 5

2 0

2 5

GlobalGain = 220

ScaleFactor = 2

5 1 0 1 5 2 0 2 5

5

5

1 0

1 5

2 0

2 5

GlobalGain = 230

ScaleFactor = 2OutputAmplitude[quantizedSPL]

Input Amplitude [real SPL]

OutputAmplitude[quantizedSPL]

Input Amplitude [real SPL]



“Long” (complex sound; frequency resolution)

“Short” (transient sound; temporal resolution)





To Probe Further• Tutorials on Psychoacoustic Coding (in increasing order of

abstraction and generality) D. Pan, M. Inc, and I. L. Schaumburg. A tutorial on MPEG/audio

compression. IEEE multimedia, 2(2):60–74, 1995. Nikil Jayant, James Johnston, and Robert Safranek. Signal

compression based on models of human perception. Proceedings of the IEEE, 81(10):1385–1422, 1993.

V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001.

• Lightweight Overview of MP3 Rassol Raissi. The theory behind mp3. Technical report, MP3’ Tech,

December 2002.• Scientific Basis of MP3 Coding Standard

J. D. Johnston. Transform coding of audio signals using perceptual noise criteria. IEEE Journal on selected areas in communications, 6(2):314–323, 1988.


ESE250:Digital Audio Basics

Week 7 February 23, 2012

Psychoacoustic Compression

Documents

Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression