Upload
hoangquynh
View
240
Download
0
Embed Size (px)
Citation preview
The evolution of the FFmpeg AAC encoderand the advanced coding techniques available in AAC
Rostislav [email protected]
2015-09-19
Structure
• Windows (psy system)
Structure
• Windows (psy system)
Structure
• Windows (psy system)
Structure
• Windows (psy system)
• Scalefactor bands (coder)• Band type
Structure
• Windows (psy system)
• Scalefactor bands (coder)• Band type• Scalefactor index
Structure
• Windows (psy system)
• Scalefactor bands (coder)• Band type• Scalefactor index
• Spectral coefficients (MDCT)
Coding tools available in AAC
Not shown: PNS (MPEG4), AAC-LTP (MPEG4), AAC-HE,AAC-HE v2 (AAC-HE + parametric stereo).
Encoding order
PNS
• Perceptual Noise Substitution
• Operates on scalefactor bands
• Unzeroes zeroed bands
• Replaces non-zeroed bands
• Saves bits by not having to encode the spectral coefficients
Original Spectrum
No PNS
PNS spectrum
PNS decision making
• Only consider SFBs starting at over 4500 Hz
• Energy vs psychoacoustic threshold
• SFB Energy spread
• Energy quantization error
TNS (filter)
• Operates on a single window
• LPC + Quantization
• Multitap FIR filter
• Can be slid in any direction
• A single window can have up to 4 filters
• Attempts to maks quantization artifacts
TNS (LPC)
Get the envelope of the spectrum using LPC:
The resulting LPC coefficients are converted to reflectioncoefficients which are then quantized and written to the bitstream.
float Cp = ((1 << (coef_res-1)) - 0.5)/(/2.0);
float Cn = ((1 << (coef_res-1)) + 0.5)/(/2.0);
/* Reflection coefficient quantization */
for (int i = 0; i < order; i++)
idx[i] = roundf(asin(c[i])*((c[i] >= 0) ? Cp : Cn));
/* Inverse quantization */
for (int i = 0; i < order; i++)
c[i] = sin(idx[i]/((idx[i] >= 0) ? Cp : Cn));
/* Conversion to unsigned */
for (int i = 0; i < order; i++)
idx[i] = idx[i]&(~(~0<<(coef_size - compression));
static const INTFLOAT tns_tmp2_map_1_4[8] = {
Q31( 0.00000000f),Q31(-0.20791170f),Q31(-0.40673664f),
Q31(-0.58778524f),Q31( 0.67369562f),Q31( 0.52643216f),
Q31( 0.36124167f),Q31( 0.18374951f),
};
static const INTFLOAT tns_tmp2_map_0_4[16] = {
Q31( 0.00000000f),Q31(-0.20791170f),Q31(-0.40673664f),
Q31(-0.58778524f),Q31(-0.74314481f),Q31(-0.86602539f),
Q31(-0.95105654f),Q31(-0.99452192f),Q31( 0.99573416f),
Q31( 0.96182561f),Q31( 0.89516330f),Q31( 0.79801720f),
Q31( 0.67369562f),Q31( 0.52643216f),Q31( 0.36124167f),
Q31( 0.18374951f),
};
TNS (bitstream)
• Quantization
• Filter direction
• Coefficient bitsize
• Coefficient compression
• A single window can have up to 4 filters
• Attempts to maks quantization artifacts
Intensity Stereo
• Encodes silmilar scalefactor bands
• Mixes spectral coefficients of both channels
• Encodes scaling as a scalefactor and phase as band type
• Decoder reconstructs coefficients
NB: M/S coding can be indicated but phase needs to be flipped
Intensity stereo energy calculations
ener(sfb) =coef∑0
coef [i ]2 (1)
(This is actually the energy provided by the psychoacoustic model)
enersum(sfb) =coef∑0
(|coef0[i ]|+ |coef1[i ]|)2 (2)
(Specifications unclear)
Intensity Stereo phase determination
Each spectral coefficient of each channel can have a differentphase, but you can only indicate a single phase for all coefficientsof a single channel. What do you do?Measure phase majority?
• Fast
• Inaccurate
• Sounds horrible with M/S switched on
Intensity Stereo phase determination
Measure distortion for both phases?
• Slow, need to repeat most calculations twice
• As accurate as it can be
• Invariant with respect to M/S coding
Intensity Stereo distortion measurements
Using the precalculated energy values and a given phase,
• Create IS coefficients
• Compare the cost of left + right channels vs the IS
• Flag IS if the IS distortion is less
Main prediction
• Indicate profile is AAC-Main
• Operates on a single scalefactor band
• Predicts
NB: AAC-Main profile doesn’t require any prediction to be flagged,but it can be set to extend TNS filter order to 20.
Main prediction operation
• Create predicted coefficients from previous frame
• Measure distortion and bit cost for every SFB
• Feed all coefficients into the prediction formulae
• Replace any flagged coefficients with the prediction error
• Calculate coefficients for the next frame and store them
• Verify all predictor groups have had a reset in 240 frames
Mid-Side coding
• Operates on SFBs
• Tries to reduce coefficient difference during encoding
• Stores sum and difference of coefficients in both channels
• Flagging works the same way IS band flaggind does
Pulses
• Operates on exactly four coefficients within a single SFB
• Able to reduce their amplitude by an integer amount
Pulses
• Operates on exactly four coefficients within a single SFB
• Able to reduce their amplitude by an integer amount
• ...no one actually uses it
Long term prediction
• Operates on SFBs
• Requires the AAC-LTP profile but again isn’t required
Long term prediction
• Operates on SFBs
• Requires the AAC-LTP profile but again isn’t required
• Is really a quick and dirty hacked on extension
Long term prediction operation
• Get current raw time dependent samples
• Feed them into an autocorrelation algorithm
• Get a lag value and an amplitude value
• Predict the current coefficients
• MDCT the current coefficients
• Use the previous coefficients as an overlap
• Compare the coefficients with the current spectral coeffs
• Flag whichever SFBs take less to encode
AAC-HE
AAC-HE v2 (Parametric Stereo)
The End
Coffee break starts now