Upload
diana-harrell
View
217
Download
0
Embed Size (px)
Citation preview
1ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
ESE250:Digital Audio Basics
Week 7 February 23, 2012
Psychoacoustic Compression
2
Course Map
Numbers correspond to course weeks
2,5 6
11
13
12
Week 7 Psychoacoustic Compression
Today: audio signal processing – putting it all together
ESE 250 – S’12 Kod & DeHon
3Week 7 Psychoacoustic CompressionESE 250 – S’12 Kod & DeHon
Today’s Agenda
?• How do we compress from
WAV Bit Rate (per channel):~ 700 kbps @ 44.1 kHz
• Down toMP3 Target (per channel) :
~ 60 kbps @ 44.1 kHz
?
Where are we ?• Week 2 Received signal is sampled &
quantized q = PCM[ r ]
• Week 3 Quantized Signal is Coded c =code[ q ]
• Week 4 Sampled signal first
transformed into frequency domain
Q = DFT[ q ]• Week 5
signal oversampled & low pass filtered
Q = LPF[ DFT(q+n) ]• Week 6
Transformed signal analyzed Using human psychoaoustic
models• Week 7
Acoustically Interesting signal is “perceptually coded”
C = MP3[ Q]
OverSample
DFT LPF
DecodeProduce
r(t)
p(t)
q + n
CPerceptual
CodingStore /
Transmit
Q + N Q
Week 4
Week 6
Week 5 Week 3
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
Week 7 Psychoacoustic CompressionESE 250 – S’12 Kod & DeHon 4
5ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
- T0/2 T0/2 T0-T0
TA
- T0/2 T0/2
TN
- T0/2 T0/2
Week 5 Review (Oversampling)audio-relevant signal
q(t) Time window, T0 sec per “block” Nyquist sample rate, TA ideally nA = T0 / TA samples per “block”:
q = (qnA … , q-2 , q-1 , q0 , q1 , q2, …, qnA,,
) = PCM[ q(t) ]
• ambient noise n(t) Nyquist sample rate, TN << TA
• receive signal in a block nS = T0 / TN >> nA = T0 / TA ultimately, record
r = (r-nN , … , r-2 , r-1 , r0 , r1 , r2 , … , rnN
) = PCM[ r(t) ] = PCM[ q(t) + n(t)] = q + n = (q-nN
+ n-nN, … , q-2 + n-2, q-1 + n-1 ,
q0 + n0, q1 + n1, q2 + n2, … , qnN + n-nN
)
q(t)
n(t)
r(t) = q(t) + n(t)
6ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
- T0/2 T0/2
r(t) = q(t) + n(t)• Given r, compute frequency domain representation
R = DFT[ r ]
= (R-nN , … , R-2 , R-1 , R0 , R1 , R2 , … , RnN
)
= (Q-nN + N-nN
, … , Q-2 + N-2, Q-1 + N-1 , Q0 + N0, Q1 + N1, Q2 + N2, … , QnN + N-nN
)
• introduce assumptions about frequency content: k > nA = A / 0 ) Qk = 0
k < nA = A / 0 ) Nk = 0
• to realize
R = (0,…, 0, QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,
, 0,…, 0 )
+ (N-nM,… , N-nA
, 0 , … , 0 , 0 , 0 , 0 , 0 , … , 0 , NnA
,…, NnM )
• Low Pass Filter
Q = (QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,
)
• Bit count = nA 32
T0 ¼ 1/44 sec ) nA ¼ 1000
T0 ¼ 1/88 sec ) nA ¼ 500
Week 5 Review (Anti-Aliasing)
……nA nS- nS - nA
……nA nS- nS - nA
DFT
LPF
7ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
Week 6 Review (Hearing Model)
• Power Spectrum Model of Hearing: Critical Bands: Auditory system contains finite array
of adaptively tunable, overlapping bandpass filters Frequency Bins: humans process a signal’s
component (against noisy background) in the one filter with closest center frequency
Masking: certain signal components in a given band are “favored” and others are filtered out
• Established through decades of psychoacoustic experiments
• Model underlying today’s algorithmic thinking
B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.
8ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
• acquire & transform “frame”
• assign frequencies to bands use
psychoacoustic model lookup
to determine frequency bandwidth
of each critical band
Today: Critical Band Assignments
……nA nS- nS - nA
|Q|
LUT
Bands
QnA
… …
Q1 Q2 Q3 Q4 Q5 Qk-1 Qk+1Qk
…
……
1 2 k 22… …
9ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
• use psychoacoustic model
• to minimize bits per critical band Cband k(j) = Round[Qband k(j) , Level]
• by appeal to “masking” models Qband k = Cband k + Dband k
where distortion (“perceptual noise”)
Dband k
between retained signal
Cband k
and actual signal
Qband k
should be “masked” by retained signal
Today: Code (Compress) Frequencies
Bands
Lossy Coding
10ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
• E.g., Look at kth band Qband k = (Qband k(1) , … ,,Qband k(m) ) amplitudes represented as reals
• Determine masking paradigm tone-masking-noise noise-masking-tone noise-masking-noise
• E.g., for tone masker, Pick tone frequency, band k (j) at maximal amplitude in the band
• Choose quantization level and compute compressed signal Cband k(j) = Round[Qband k(j) , Level] Cband k = (0band k(1) , … , Cband k(j) , … , 0band k(m) )
• Assess noise magnitude, | Dband k |
Dband k =
(Qband k(1) , … , Qband k(j) , … ,,Qband k(m) )
- (0band k(1) , … , Cband k(j) , …, 0band k(m)
• Use psychoacoustic model to determine whether compressed signal will mask the distortion noise for that band
Single BandSPL
frequency
SPL
1 bit | Noise |
frequency
SPL
1 bit Signal
Real Input
frequency
frequency
SPL
2 bit Signal
frequency
SPL
2 bit | Noise |
SMR for
2 bits
SMR for1 bit
more bits yields larger signal-to-mask-ratio
11ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
Overview of Perceptual Coding• Goals
digitally represent a signal minimum number of bits “transparent” reproduction
o most sensitive humano cannot distinguish between o original and generated signal
• Perceptual Entropy Using psychoacoustic model to estimate information content of audio
signals[J. D. Johnston. IEEE J. Sel. Ar. Comm., 6(2):314–323, 1988]
suggested transparency achievable at 2 bits per sample or 88 kbps @ 44.1 kHz
• Our present (WAV) bit count = 32 bits per sample T0 ¼ 1/44 sec ) nA ¼ 1000 ) bits per frame ¼ 32 ¢ 44 kbps ¼ 1.4 ¢ 103 kbps
T0 ¼ 1/88 sec ) nA ¼ 500 ) bits per frame ¼ 16 ¢ 88 kbps ¼ 1.4 ¢ 103 kbps• Our leverage?
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
12ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
What are the “Knobs” ?• “Reservoir”
Are all frames equally full of audio information?
• Masking How should we exploit the perceptual model?
• Local decoupling Can we exploit (rough) independence of each
band?
• Global accounting How should we re-impose the average frame-rate?
13ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
MP3 Encoder Design Strategy• Target: ~ 1032 bits per frame
~ 2 bits per sample ~ 512 samples per frame
• Use masker(signal)-masking-maskee(noise) paradigm assume 1 masker “costs” ~ 2 K bits ) retained signal, C should consist of ~ 210 – K maskers on average
o specified amplitudes o at specified frequencies
allocated to some subset of the ~ 32 = 25 critical bands• Rough algorithm for computing retained signal
(i) frame bit-reservoir: supplies bits per each critical band(ii) commit to masker(s) within each band (iii) attempt to mask within-band distortion with available bits(iv) each band: give bits back to or take more from reservoir(v) iterate
14ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
Emerging Picture
Bits Retained
AudioQuality
Knob #1
Knob #2
Knobs #1 & #2
16ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
The Ultimate Boss• Subjective Quality Scales• International Standards
(a) Absolute impairment
(b) Differential grades
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
17ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
General Dimensions of Merit• Bit Rate:
bits per sample samples per second
• Complexity: computational effort required to encode and decode
• Delay: time required to encode and decode
18ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
MP3 Perceptual Coding Algorithm• Commit to Observation Window
“Long” frame (complex sound; frequency resolution)
“Short” frame (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits
[ Raissi. Technical report, MP3’ Tech, December 2002]
19ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
MP3 Perceptual Coding Algorithm
[ Raissi. Technical report, MP3’ Tech, December 2002]
• Commit to Observation Window “Long” frame (complex sound;
frequency resolution) “Short” frame (transient sound;
temporal resolution)• Estimate Perceptual Entropy
Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits
20ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
Resolution: Time vs. Frequency• Example: two sample plots in time-
frequency plane Masking Thresholds for
(a) castanets (b) piccolo Recall Masking Threshold def’n:
o lower volume signals o in relation to specified critical band
(simultaneous; just before; soon after) o are inaudible
• Affects Choice of Observation Window (“frame”)
(a) Castanets: prefer ~ 10 ms time resolution Implies blurrier frequency resolution
(b) Piccolo: prefer ~ 2 Critical-band frequency resolution
Implies blurrier temporal resolution[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
21ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
MP3 Perceptual Coding Algorithm• Commit to Observation Window
“Long” frame (complex sound; frequency resolution)
“Short” frame (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
22ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
Use of Perceptual Entropy
• Transform Coding Trade vector quantization for scalar quantization By transforming to frequency domain (decoupled)
• Critical Band Analysis Compute spectral power in each band, Pf = | Sf | Determine Masker for each band via SFM Determine JND (“just noticeable distortion”)
threshold for each band• Bit Assignment
23ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
MP3 Perceptual Coding Algorithm• Commit to Observation Window
“Long” frame (complex sound; frequency resolution)
“Short” frame (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits
[You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.]
24ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
Spectral Quantization/Coding Loop• Inner loop
Global gain – overall bit rate control Shared across all spectral values Larger value
o increased quantizer step sizeo increased quantization noise (“distortion”)
• Outer Loop Adjusts scale factors – reallocating bits to bands Affects only the spectral values within a critical band Insures the masker for that band will mask the distortion
o Larger mask signal relative to threshold implies fewer bits neededo Smaller mask relative to threshold implies more bits needed
• Loop Termination when bit rate constraint is satisfied with no audible distortion or after a set number of iterations (with excessive bits spent)
Freq. (Hz)
SP
L (
dB)
Critical
.Ban
d k-1
Critical
.Ban
d k
Critical
.Ban
d k+1……
Larger mask-to-noise ratio
Smaller mask-to-noise ratio
25ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
Typical Quantization Control
[You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.]
Quantized Output amplitude
Inner loop bit rate control
Outer loop band specific Distortion control
FrequencyInput masker amplitude
Critical Band Index
5 1 0 1 5 2 0 2 5
5
1 0
1 5
2 0
2 5
GlobalGain = 220
ScaleFactor = 2
5 1 0 1 5 2 0 2 5
5
5
1 0
1 5
2 0
2 5
GlobalGain = 230
ScaleFactor = 2OutputAmplitude[quantizedSPL]
Input Amplitude [real SPL]
OutputAmplitude[quantizedSPL]
Input Amplitude [real SPL]
26ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
MP3 Perceptual Coding Algorithm• Commit to Observation Window
“Long” (complex sound; frequency resolution)
“Short” (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band Characterize Masker Estimate Mask-to-Noise Threshold Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band Quantize band to bits-allowed levels Run Huffmann & count actual bits
[ Raissi. Technical report, MP3’ Tech, December 2002]
27ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
To Probe Further• Tutorials on Psychoacoustic Coding (in increasing order of
abstraction and generality) D. Pan, M. Inc, and I. L. Schaumburg. A tutorial on MPEG/audio
compression. IEEE multimedia, 2(2):60–74, 1995. Nikil Jayant, James Johnston, and Robert Safranek. Signal
compression based on models of human perception. Proceedings of the IEEE, 81(10):1385–1422, 1993.
V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001.
• Lightweight Overview of MP3 Rassol Raissi. The theory behind mp3. Technical report, MP3’ Tech,
December 2002.• Scientific Basis of MP3 Coding Standard
J. D. Johnston. Transform coding of audio signals using perceptual noise criteria. IEEE Journal on selected areas in communications, 6(2):314–323, 1988.
28ESE 250 – S’12 Kod & DeHon Week 7 Psychoacoustic Compression
ESE250:Digital Audio Basics
Week 7 February 23, 2012
Psychoacoustic Compression