The evolution of the FFmpeg AAC encoder - and the … · The evolution of the FFmpeg AAC encoder and the advanced coding techniques available in AAC Rostislav Pehlivanov [email protected]

The evolution of the FFmpeg AAC encoderand the advanced coding techniques available in AAC

Rostislav [email protected]

2015-09-19

Structure

• Windows (psy system)

Structure


Structure


Structure


• Scalefactor bands (coder)• Band type

Structure


• Scalefactor bands (coder)• Band type• Scalefactor index

Structure


• Scalefactor bands (coder)• Band type• Scalefactor index

• Spectral coefficients (MDCT)

Coding tools available in AAC

Not shown: PNS (MPEG4), AAC-LTP (MPEG4), AAC-HE,AAC-HE v2 (AAC-HE + parametric stereo).

Encoding order

PNS

• Perceptual Noise Substitution

• Operates on scalefactor bands

• Unzeroes zeroed bands

• Replaces non-zeroed bands

• Saves bits by not having to encode the spectral coefficients

Original Spectrum

No PNS

PNS spectrum

PNS decision making

• Only consider SFBs starting at over 4500 Hz

• Energy vs psychoacoustic threshold

• SFB Energy spread

• Energy quantization error

TNS (filter)

• Operates on a single window

• LPC + Quantization

• Multitap FIR filter

• Can be slid in any direction

• A single window can have up to 4 filters

• Attempts to maks quantization artifacts

TNS (LPC)

Get the envelope of the spectrum using LPC:

The resulting LPC coefficients are converted to reflectioncoefficients which are then quantized and written to the bitstream.

float Cp = ((1 << (coef_res-1)) - 0.5)/(/2.0);

float Cn = ((1 << (coef_res-1)) + 0.5)/(/2.0);

/* Reflection coefficient quantization */

for (int i = 0; i < order; i++)

idx[i] = roundf(asin(c[i])*((c[i] >= 0) ? Cp : Cn));

/* Inverse quantization */


c[i] = sin(idx[i]/((idx[i] >= 0) ? Cp : Cn));

/* Conversion to unsigned */


idx[i] = idx[i]&(~(~0<<(coef_size - compression));

static const INTFLOAT tns_tmp2_map_1_4[8] = {

Q31( 0.00000000f),Q31(-0.20791170f),Q31(-0.40673664f),

Q31(-0.58778524f),Q31( 0.67369562f),Q31( 0.52643216f),

Q31( 0.36124167f),Q31( 0.18374951f),

};

static const INTFLOAT tns_tmp2_map_0_4[16] = {

Q31( 0.00000000f),Q31(-0.20791170f),Q31(-0.40673664f),

Q31(-0.58778524f),Q31(-0.74314481f),Q31(-0.86602539f),

Q31(-0.95105654f),Q31(-0.99452192f),Q31( 0.99573416f),

Q31( 0.96182561f),Q31( 0.89516330f),Q31( 0.79801720f),

Q31( 0.67369562f),Q31( 0.52643216f),Q31( 0.36124167f),

Q31( 0.18374951f),

};

TNS (bitstream)

• Quantization

• Filter direction

• Coefficient bitsize

• Coefficient compression

• A single window can have up to 4 filters

• Attempts to maks quantization artifacts

Intensity Stereo

• Encodes silmilar scalefactor bands

• Mixes spectral coefficients of both channels

• Encodes scaling as a scalefactor and phase as band type

• Decoder reconstructs coefficients

NB: M/S coding can be indicated but phase needs to be flipped

Intensity stereo energy calculations

ener(sfb) =coef∑0

coef [i ]2 (1)

(This is actually the energy provided by the psychoacoustic model)

enersum(sfb) =coef∑0

(|coef0[i ]|+ |coef1[i ]|)2 (2)

(Specifications unclear)

Intensity Stereo phase determination

Each spectral coefficient of each channel can have a differentphase, but you can only indicate a single phase for all coefficientsof a single channel. What do you do?Measure phase majority?

• Fast

• Inaccurate

• Sounds horrible with M/S switched on

Intensity Stereo phase determination

Measure distortion for both phases?

• Slow, need to repeat most calculations twice

• As accurate as it can be

• Invariant with respect to M/S coding

Intensity Stereo distortion measurements

Using the precalculated energy values and a given phase,

• Create IS coefficients

• Compare the cost of left + right channels vs the IS

• Flag IS if the IS distortion is less

Main prediction

• Indicate profile is AAC-Main

• Operates on a single scalefactor band

• Predicts

NB: AAC-Main profile doesn’t require any prediction to be flagged,but it can be set to extend TNS filter order to 20.

Main prediction operation

• Create predicted coefficients from previous frame

• Measure distortion and bit cost for every SFB

• Feed all coefficients into the prediction formulae

• Replace any flagged coefficients with the prediction error

• Calculate coefficients for the next frame and store them

• Verify all predictor groups have had a reset in 240 frames

Mid-Side coding

• Operates on SFBs

• Tries to reduce coefficient difference during encoding

• Stores sum and difference of coefficients in both channels

• Flagging works the same way IS band flaggind does

Pulses

• Operates on exactly four coefficients within a single SFB

• Able to reduce their amplitude by an integer amount

Pulses

• Operates on exactly four coefficients within a single SFB

• Able to reduce their amplitude by an integer amount

• ...no one actually uses it

Long term prediction


• Requires the AAC-LTP profile but again isn’t required

Long term prediction


• Requires the AAC-LTP profile but again isn’t required

• Is really a quick and dirty hacked on extension

Long term prediction operation

• Get current raw time dependent samples

• Feed them into an autocorrelation algorithm

• Get a lag value and an amplitude value

• Predict the current coefficients

• MDCT the current coefficients

• Use the previous coefficients as an overlap

• Compare the coefficients with the current spectral coeffs

• Flag whichever SFBs take less to encode

AAC-HE

AAC-HE v2 (Parametric Stereo)

The End

Coffee break starts now

Documents

The evolution of the FFmpeg AAC encoder - and the … · The evolution of the FFmpeg AAC encoder and the advanced coding techniques available in AAC Rostislav Pehlivanov [email protected]