56
Jun. 2, 2008 Student: Shang-Yu Yeh ( 葉葉葉 ) Advisor: Dr. Hsueh-Ming Hang ( 葉 葉葉 ) Coding Efficiency and Quality Improvement for MPEG Surround Encoding 1

Jun. 2, 2008 Student: Shang-Yu Yeh ( 葉尚諭 ) Advisor: Dr. Hsueh -Ming Hang ( 杭學鳴 )

  • Upload
    trina

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Coding Efficiency and Quality Improvement for MPEG Surround Encoding. Jun. 2, 2008 Student: Shang-Yu Yeh ( 葉尚諭 ) Advisor: Dr. Hsueh -Ming Hang ( 杭學鳴 ). My Work. Design MPEG Surround Encoding Algorithms Subset coding mode Parameter band stride Parameter sets Adaptive smoothing - PowerPoint PPT Presentation

Citation preview

1

Jun. 2, 2008Student: Shang-Yu Yeh ()Advisor: Dr. Hsueh-Ming Hang ()

Coding Efficiency and Quality Improvement for MPEG Surround Encoding11My WorkDesign MPEG Surround Encoding AlgorithmsSubset coding mode Parameter band strideParameter setsAdaptive smoothingImplementation in the Reference Software2work:spectoolsencodermodulecodingimplementref sw encoder

2OutlineMPEG Surround IntroductionProposed Procedures and Experimental ResultsConclusion and Future WorkDemo

3outlinempsdemo3OutlineMPEG Surround IntroductionSpatial HearingMPEG Surround EncoderMPEG Surround DecoderProposed Procedures and Experimental ResultsConclusion and Future WorkDemo

4mpsencoderdecoder4Spatial HearingDescribing how human locate sound source in the horizontal placeInteraural Level Difference (ILD)Interaural Time Difference (ITD)Interaural Coherence (IC)

5spatial imagespatial hearingimagesource(direct left)time delayintensitytimeleveldifferencenon-coherencenon-coherenceIC

5MPEG SurroundLow-bitrate parametric coding technology for multi-channel audio signalBackward compatibility to stereo equipmentStandardizationCfP on SAC in March 2004Finalize in July, 2006 (ISO/IEC 23003-1)

6MPEG Surroundmulti-channelparametric coding()waveform coding(stereo) MPEG Surroundstandardization:04propose for sac2005mps2006finalize

6MPEG Surround EncoderCapture the spatial image of multi-channel audioGenerate a mono/stereo downmix

7mps Encoder(N-channel)fbbandsubband domain71banddomaindownmixchdmxinfolosschdmxsynthesistime domaindownmixaudio encoder/decoder(ex: mp3, aac etc)qcodingdmxdecoder

7MPEG Surround DecoderSynthesis multi-channel output signalBackward compatibility

8decencdecoderdownmixfbbitstreamN-channelsignaldecoderMPEG Surround decoderdownmixpath

8Downmix and Parameter ExtractionTwo elementary blocks construct hierarchical structuresR-OTT box (Reverse One-To-Two box)R-TTT box (Reverse Two-To-Three box)

9dmxparamdomainchannel downmixchannelSpecbasic element: ottboxtttbox1upmix22upmix3decencoderR-ottboxR-TTTbox ;R-ottbox2-channel input1-channel outputdmxinfolossspatial parameterR-tttbox3-channel input2-channel output spatial parameterFor example: 5.1channelaudio downmix2-channel5253r-ottboxr-tttboxbox

9Parameter Sets and BandsParameter sets: grouping of time slotsParameter bands: grouping of subbands

10enc subband domain71bandfsize=2048band32samplechsamplegroupingsample groupgroupingparameter setframe8groupingparameter bandnon-uniform7codingframe2pspairingpscodingqpentropybit

10R-OTT BoxCreate a mono downmix from a stereo inputExtract relevant spatial parametersChannel Level Differences (CLD)

whereInter-Channel coherence (ICC)

11R-ottboxparametercld: channelICC: correlation

11R-TTT BoxCreate a stereo downmix from three input channels

Two way to reconstruction the 3rd signalPrediction mode: 2 CPCs and ICC

Energy mode: 2CLDs

12

R-TTTbox3downmix3input()3()23Spec223cpcpredictionresidualiccenergymode2 cld3ch(codersbr)12Quantization and Entropy Coding SchemesQuantization - fine and coarseEntropy coding - Differential coding + Huffman tables

13quantizationfinecoarsecoarseQentropy codingdifferential codingDF and DTDTpilot-based codingdiffdatacodinghufftabtab1222datacodewordFPTPPCMraw data

rf sw encoderimplement PCM DF DT+1D Huff13OutlineMPEG Surround IntroductionProposed Procedures and Experimental ResultsSubset coding mode Parameter band strideParameter setsAdaptive smoothingConclusion and Future WorkDemo

14mps14New Encoder Structure4 Additional modules:

15encoder4moduleredundancy15Subset coding mode 4 coding modes for each parameter subset:Default(0)Keep(1)Interpolation(2)Lossless(3)Ref S/W implements only the Lossless mode

16QcodingSpecsubset4modesubsetpsparameter subsetps4modedecodersubset:d,k,I,Lsyntax (4mode)ref swimplementlossless modedecoder3mode16Subset coding mode Flow chartSearch each mode for the least error

Compare with a threshold

Exploit correlation of time

17flow chartsubsetmode0 1 2errorerrorxximode iminthresholdthreshold3modeerrormodelossless modemoduleredundancy

(errorreconstruction reference datadecoder(defaultlossless))17Experimental ResultsOnly the Lossless mode costs bitsThe bitrate reduction can be estimated:

Testsequencesps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4159.0444.5751.8034.6435.2820.51275.7455.4677.6655.1674.5952.79366.8547.9858.8639.4140.0123.66459.5344.0650.9734.1133.4019.08563.3747.0459.2440.9642.0626.0518lossless modebitmodebitbitrate3psbs%codingmode%allsetlosslessdecisionmodeset

18Experimental ResultsComparisons:1950~60consistent19Experimental Results2 phenomena:Theoretical results larger than experimental resultsdifferential coding schemesNumber of parameter sets increases => theoretical & experimental results decreaseprobability distributions202:1)? moduleentropy codingdtt moduledtcodinggaintotal2)psentropy coding 20Experimental ResultsDistributions of DT data:

CLDICCpdfstandard deviationps=1ps=4ps=1ps=41.772.130.841.2121?dtstdevseqseqps1ps4 (why????????)information theorycorrelation221Parameter Band StrideParameter band cannot be adjustedThe frequency resolution is adjusted by parameter band stride4 strides for each parameter subset

Parameter bandsParameter groups using different stridesStride 1Stride 2Stride 5Stride 28442115531177421101052114147312020104128281461

22toolfgroupsubbandpbpbencodingMpsfreq resol: pbstridestridesubsetbandbandbandstridingpbpgmps4stridestridesebsetpb4stridepgPbpgceiling functionpb14stide53pg1pg4pb25

22Parameter Band StrideExploit correlation in frequencyCombined with the pairing decisionFlow chart:2 successive lossless subsets1 single subset

23stridefreqredundancybandcorrelationpairingsubsetcodingstridestridepairinputframecoding modedatalosslesssetlosslesslosslesssubset(3133mode)subsetcodingstride2setstridetotal4

23Parameter Band Stride4 possible results:2 successive subsets in a pair with the same stride (>1)2 successive subsets using different strides (>1)2 successive subsets in a pair with stride=11 subset coded individually244?4codingstride3pairing1)stride>12)stridepair>1stride3)strideerrorbandstridesubset pair

reconstructdecdecoder

24Experimental ResultsThe bitrate can be estimated by :

Test sequenceps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4145.2823.2936.1015.5827.649.95251.7821.8748.5720.9646.2819.54340.0417.1634.6113.9928.8710.71445.9923.5239.8918.6532.5113.18544.9323.0238.6518.2931.8713.3725stridebitrate3psbs%codingsubsetstride%subset_stride2stride2subsetR_stridexpbpgpb14stride53pg=14/3

25Experimental Results2 phenomena:Theoretical results larger than experimental resultsdifferential coding schemesNumber of parameter sets increases => theoretical & experimental results decreaseprobability distributions

2622coding mode:1)? moduleentropy codingdf moduledfcodinggaintotal2)ps

26Experimental ResultsDistributions of DF dataCLDICCpdfstandard deviationps=1ps=4ps=1ps=42.83.021.471.7527banddfstdevseqseqps1ps4information theorycorrelation2

27Comparisons of the 2 modulesUsing coding mode is more efficient than pbstrideCompare the DT and DF dataDTICCCLDDFICCCLDps=1ps=4ps=1ps=4ps=1ps=4ps=1ps=411.071.311.371.6611.852.142.953.2720.810.862.151.421.851.984.144.2430.791.171.941.7531.611.893.143.3940.841.211.772.1341.471.752.83.0250.921.141.461.6151.771.992.943.1928coding modepbstridecoding modestrideseqdt dfstdevcasedtdftcorrelationfmodestride(bitrateerrorerrorcmdpbsdappendix)

28Comparisons of the 2 modulesUsing pbstride are more overestimated than using coding mode modules Differential coding schemes

29stridebitratecoding modedifferential coding5seqcoding modepcmdtdfstridepcmpcm2strideentropy codinggainstride29Experimental Results-Combined with Coding ModeBitrate reduction percentage: 25~55%Complexity: 0.13%ps1ps2ps4154.1442.0827.06258.3657.3855.36350.3743.0729.04452.7542.2027.36554.5248.0633.40302modulebitrate25~55pscomplexitymodule0.13%filter bank30Time ResolutionDescribing the number of parameters for each parameter band2 kinds of framing:Fixed framing: divided into equal partsVariable framing: arbitrary divisions1~8 parameter setsRequiring dynamic decision 31time resolframeps??Spec21)decsetdec2)dec8quality31Time ResolutionA border existsLarge difference of parametersCalculate the differences of backward and forward extractions

Division at time slots with larger differences

32?time borderpspstime slot32Time Resolutionafd

33inputframeframeps1)tree structuretime slot2)2slot2sample peakthresholdpeak3)frametime slotpeakpeakborder ? peak peak countslotgroupgroupborder bordergroupslotcountcount countweighting

33Experimental Resultswaveforms

34

ainputbps=1c?decpsps

34Experimental ResultsAdditional bitrate:

Complexity:Test sequences12345Additional bitrate(%)4.094.836.3424.784.0035bitrate4seqseq19.5%25%iteration9(par)*32(slot)*71(hyband)*2L(window)35Parameter SmoothingCompensate for artifacts caused by coarse quantizationPerformed at the decoder side1st order IIR filter

36toolcoarseqpartifactstationarytooldecodertemporal smoothing1st iirwl-1wkonj2sdeltasdeltapsslotddeterminatetauencoderwkonjwltau464, 128, 256, 51236Parameter SmoothingFlow chartCompare smoothed coarse with fine quantized parametersChoose the configure with the least error

37taufine qerrorapplyps levelsubsetsubsetnormalize4smoothing constant(i=0~3)smoothcoarsefine qerrordecolddata37Experimental Resultswaveforms

38modulefine qcoare qsmoothinga bbqqpsmgfine q38Experimental ResultsBitrate variations:

Complexity:Test sequence12345Bitrate change %(cf. coarse quantized)0.510.550.690.640.53Bitrate change %(cf. fine quantized)-11.53-7.37-7.03-10.93-11.0039Bitratetoolcoarse qcoarse q1%toolsyntaxbit per framefine q10%

complexity0.4%fb39OutlineMPEG Surround IntroductionProposed Procedures and Experimental ResultsConclusion and Future WorkDemo

40future work40ConclusionImplementation of some encoding procedure in the reference softwareExploit correlation along time axis and frequency axisBitrate reduction: 25~55%Theoretical EstimationAdaptive time resolution and parameter smoothing41spectoolencdecisioncoding modepbstridefreqredundancyqualitybitrate25~50%bitratetime resolutionsmoothingtool41Future WorkModify error measures Different band weightingsDifferent parameter weightingsFind a more precise evaluations of quality to fine-tuneSome other toolsResidual coding, temporal shapingetc42error measurebandbanderror measure

thresholdqualitythreshold

mpstool42OutlineMPEG Surround IntroductionProposed Procedures and Experimental ResultsConclusion and Future WorkDemo

4343Appendix44Filter Banks2 stages

45encdecfbanalysis filter2stagestageuniform 64-bandQMF fbfbSBRlow frequencyresolution3QMF bankfilteringdelay0QMF band6sub-subband; 1,2QMF band2sub-subbnad71bands

45OTT BoxSynthesize by a mono downmix with parameters

46

mono downmix XsXsenergyX1 X2cldX1X2iccX1 X2decorrelatorXdcommon rotation angle beta? XdupmixXd0betaupmix2

46R-TTT Box(2/2)Prediction mode:2 CPCs and 1 ICC:

where

Energy Mode:2 CLDs:

47?pred mode2cpc(channel prediction coefficient)2 icciccpred errorenergy mode2cld3chenergy ratio47TTT BoxPrediction Mode:With residual signal-> 2 CPCsWithout residual signal-> use the ICC to compensate the energy loss

Energy Mode:Energy reconstruction

48decoderPred moderes sig2cpcxd33input res sigiccresEnergy mode2cld48Experimental Results49Pbstride60~70%consistent

49bitrate reduction % without any error50dm0_xxxDataModeps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4Input0111.45 3.58 11.15 2.49 11.14 2.34 Input0240.60 24.03 39.56 20.63 38.36 19.81 Input0314.23 5.49 13.28 3.76 13.12 3.50 Input0411.45 3.64 11.14 2.52 11.12 2.28 Input0512.03 4.10 11.46 2.71 11.39 2.54 dmx_000DataModeps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4Input0110.57 1.03 9.59 0.39 9.13 0.40 Input0233.67 10.54 32.68 10.76 32.11 10.58 Input0313.88 2.39 12.00 1.32 11.13 1.21 Input049.47 0.66 9.32 0.47 9.07 0.40 Input0510.10 1.06 9.41 0.46 8.92 0.43 50Reference Software EncoderParameter set=1Parameter band=20Tree structure: 5151, 5152, 525Time slots: 16, 32Fine quantizationDifferential in T/F, PCM + 1D Huffman

51CLDICC1235DT distributions52Prediction Mode of R-TTT Box2 ways to decoding:With residual signal:Without residual signal: use ICC to compensate energy loss How to decide appropriate CPCs and ICC?

53prediction modecpctttdec2residualresidualreconstruction erroriccresidlossclddeterminatecpcspecresidualcpcicc?53Prediction Mode of R-TTT Box

54Eq1residualinput sig1eq2residdecicceq3sig2eq42icc1checkiccicc=1residualenergy054Prediction Mode of R-TTT BoxChoose CPCs to make prediction more preciseResidual energy ->0 good predictionNot verified yet since the coder is not considered55estimation errorenergy0encodercpc? criterioncpc

moduleprediction energy modedepend oncoder

55coding efficiency and quality improvement for mpeg surorund encodingJun. 2, 2008Student: Shang-Yu Yeh ()Advisor: Dr. Hsueh-Ming Hang ()

5656T/F Transform

T/F Transform

T/F Transform

Downmix

SpatialParameterEstimation

AudioEncoder

CompressedAudioBitstream

Spatial Parameters

F/T Transform

F/T Transform

MPEG Surround Encoder

CompressedAudioBitstream

AudioDecoder

SurroundSynthesis

Spatial Parameters

Legacy Decoding

F/T Transform

F/T Transform

F/T Transform

T/F Transform

T/F Transform

MPEG Surround Decoder

O

A

B