Low complexity AVS-M by implementing machine learning algorithm C4.5

Study of AVS-M Video Standard

Low complexity AVS-M by implementing machine learning algorithm C4.5By:-Ramolia Pragnesh R.Guided by :-Dr. K.R.Rao. Term:-Spring 2011.11Motivation:Increase in demand of multimedia contents over internet and wireless networks. Bandwidth is the too expensive resource to increase it in proportion to the increase in demand of data.Video codec plays an important role here, compressing the data with high efficiency tools.Complexity comes along with high efficiency in codecs.Implementing hardware solutions for low end devices like mobile, is very expensive and also creates problem of over heating and power consumption.

2Brief overview of the thesis:3

Figure 1: Proposed encoder with C4.5Table of contents:Overview of AVS-M.Complexity calculation in AVS-M.Various approaches to reduce complexity.Introduction to machine learning and algorithm C4.5Proposed encoder.Results.Future work.References.

4Introduction to AVS-M[24]AVS-M is the seventh part of video coding standard developed by AVS working group of China targeting mobile applications.It has 9 different levels for different formats [16].It supports only progressive video coding hence codes frames only[22].It uses only 4:2:0 chroma sub-sampling format[22].It uses only I and P frames[22].

55Different parts of AVS [10]PartName 1System2Video3Audio4Conformance test5Reference software6Digital media rights management7Mobile video8Transmit AVS via IP network9AVS file format10Mobile speech and audio codingTable 1: Different parts of AVS 66Key tools of AVS-M[31]:Network abstraction layer (NAL).Supplemental enhancement information (SEI).Transform 4x4 integer transform.Adaptive quantization of step size varying from 0-63.Intra prediction 9 modes (Fig. 5 ), simple 4x4 intra prediction and direct intra prediction[25].Inter prediction 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4 block sizes for ME/MC(Fig. 7).Quarter pixel accuracy in motion estimation.Simplified in-loop de-blocking filter.Entropy coding.Error resilience.

7Layered Data StructureSequencePictureSliceMacro BlockBlock

SequenceG.O.P.PictureSliceMacro blockBlock8Figure 2: Layered data structure of AVS-M8AVS-M Codec[10]Each MB needs to be intra or inter predicted.Switch S0(Fig. 3 ) is used to decide between inter and intra based type of MB.Unit size for intra prediction is block size of 4x4, and predictions are derived based on left and upper blocks.Inter predictions are derived from blocks of varying sizes: 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4 from locally reconstructed frames .Transform coefficients are coded by VLC.Deblocking filter is applied on reconstructed image.99Encoder

Figure 3: Encoder of AVS-M [10] 1010Decoder

Figure 4: Decoder of AVS-M [10]1111Intra adaptive directional prediction [25]

Figure 5: Intra adaptive directional prediction1212Intra predictionIntra prediction scheme in AVS-M brings much simplicity as compared to H.264 baseline profile of H.264.It uses 4x4 block as the unit for intra-prediction.It uses 2 modes of prediction in intra prediction: intra_4x4 and direct intra prediction.Intra_4x4 uses content based most probable intra mode decision as shown in Table 2 to save bits, where U and L represents the upper ad left blocks as shown in Fig. 6.

Direct intra prediction brings much of the compression based on trade-off decision.

Upper block[U]Left block[L]Current blockFig. 6 : Current block and neighboring block representation [16]1313Intra prediction U L -1012345678-18888888888 08002000202 18212222222 28222222222 38212345272 48442444644 58552555655 68666666666 78772777677 88012345678Table 2: Content based most probable mode decision table [25]Mode -1 is assigned to L or U when the current block does not have Left or Upper block respectively. 1414Inter-frame predictionSize of the blocks in inter-frame prediction can be 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4 depending on the amount of information present within the macro-block[9].Motion is predicted up to pixel accuracy. If the half_pixel_mv_flag is 1 then it is up to pixel accuracy.Half pixel and quarter pixel accurate motion vectors are calculated by interpolating the reference frame, by applying filters. (Fig. 8)

1515Inter frame block sizes [9]:7 block sizes are present in AVS-M for inter frame prediction .

Figure 7: Inter frame prediction block sizes1616sub-pixel motion estimation by interpolation[15][16]:

Figure 8: interpolation of sub-pixels (hatched lines show half-pixels, empty circles are quarter-pixels, and capital letters represent full-pixels.)1717Complexity calculation for AVS-M Variable 7 block sizes in Inter Mode.It supports 9 intra_4*4 mode and 1 Direct_intra prediction mode.Full search for motion estimation gives the optimum result, but that comes along with implementation complexity.For example, assuming FS(full search) and M block types, N reference frames and a search range for each reference frame and block type equal to +/- W, check for N x M x (2W + 1)^2 positions, to find out inter prediction mode and its motion vector, that too inter pixel accurate.

1818Continued7 inter prediction modes because of 7 different block sizes, 9 intra_4*4 modes and 1 direct intra prediction mode. and pixel accuracy in motion vector estimation.

1919Various techniques to reduce complexityIntra mode selection algorithm[26].Only intra spatial-prediction scheme[27].Fast mode decision algorithm for intra prediction for H.264/AVC [28].Dynamic control of motion estimation search parameters for low complexity H.264[29].Adaptive algorithm for fast motion estimation [30].Adaptive algorithm for fast motion estimation in H.264/MPEG-4 AVC [4]2020Introduction to machine learning [32]:It is a branch of science which develops algorithms to allow computers to evolve or become smart.Machine learning algorithms are applied in large number fields: machine vision, medical diagnostics, fraud transaction detection, image processing, wireless communication and market analysis are just few among them.

21Machine learning algorithm C4.5[33]It was developed by J. R. Quinlan.It is descendant of ID3 and CLS [c4.5 doc].It uses divide and conquer approach to develop a tree.Uses two possible criteria to carry out a test at each node of the tree: information gain and gain ratio.Initial tree is pruned to avoid overfitting, which introduces errors in prediction.

2222Proposed encoder:23

Implementation steps [2]:-Select number of frames of a video sequence in QCIF as training sequences.Obtain the required attributes off lineEncode the training sequence using full complexity AVS-M encoderStore the attributes calculated off line and mode decision taken by encoder in ARFF fileFeed this ARFF file to weka tool, which will give decision tree similar to that of figure 10.

2424ContinuedMask the motion estimation part in the actual AVS-M encoderOverwrite that with if-else statements based on the decision treeCompare the performance of the simple codec with actual AVS-M

2525Decision tree used in encoder:26

Figure : C4.5 decision tree for mode decision.

Results:27Comparison of the encoding time: Sequence no:SequenceEncoding time of the sequence with original AVS-M(seconds)Encoding time of the sequence with machine learning algorithm implemented (seconds)% reduction in encoding time of the sequence **1.Akiyo_qcif1.4970.39073.942.Highway_qcif1.9340.39079.833.Coastguard_qcif2.6990.39085.564.Bridge-close2.0750.35982.705.News_qcif1.7320.37478.416.Miss-america_qcif1.7470.45274.137.Container_qcif1.9030.40578.728.Carphone_qcif1.8720.40578.369.Foreman_qcif1.9970.40579.7128Bar chart comparing encoding time (sec.):29Comparison of the PSNR (Y- dB)Sequence numberSequencePSNR of the video sequence encoded with AVS-M(Y-component in dB)PSNR of the video sequence encoded with AVS-M, with machine learning algorithm (Y-component in dB)% *** reduction in PSNR of Y-component while encoding with proposed encoder1.Akiyo_qcif37.31698737.1769200.3752.Highway_qcif37.62273037.1162831.343.Coastguard_qcif34.20601233.8345101.084.Bridge-close34.36193733.4160942.705.News_qcif35.96431135.5750411.086.Miss-america_qcif39.41709138.8950651.327.Container_qcif35.79837935.0763942.018.Carphone_qcif36.51752634.4862565.569.Foreman_qcif35.97724234.7499833.4130Bar chart comparing PSNR (Y in dB)31Comparison of the PSNR (U in dB)Sequence numberSequencePSNR of the video sequence encoded with AVS-M(U-component in dB)PSNR of the video sequence encoded with AVS-M, with machine learning algorithm (U-component in dB)%**** reduction in PSNR of U-component while encoding with proposed encoder1.Akiyo_qcif39.28579139.1803940.2682.Highway_qcif37.67819637.6560850.0593.Coastguard_qcif43.27073343.1692280.0234.Bridge-close36.92945136.7411300.0505.News_qcif38.55516538.3552570.0526.Miss-america_qcif38.94732338.8953720.0137.Container_qcif39.58635939.3253560.0658.Carphone_qcif39.95166439.7759750.0449.Foreman_qcif40.31818640.1149080.05032Bar chart comparing PSNR (U in dB)33Comparison of the PSNR (V in dB) Sequence numberSequencePSNR of the video sequence encoded with AVS-M(V-component in dB)PSNR of the video sequence encoded with AVS-M, with machine learning algorithm (V-component in dB)%***** reduction in PSNR of V-component while encoding with proposed encoder1.Akiyo_qcif40.50678340.4526000.0132.Highway_qcif38.49372638.502246-0.0023.Coastguard_qcif44.82014544.5777570.0544.Bridge-close37.37620437.2571090.0315.News_qcif39.61979939.5836420.0096.Miss-america_qcif38.92940438.6474520.0727.Container_qcif39.61716939.4883380.0338.Carphone_qcif40.27581440.1903050.0219.Foreman_qcif40.92583340.7866550.03434Bar chart comparing PSNR (V in dB)35Comparison of PSNR (YUV in dB)Sequence numberSequencePSNR of the video sequence encoded with AVS-M(YUV-component in dB)PSNR of the video sequence encoded with AVS-M, with machine learning algorithm (YUV-component in dB)% reduction in PSNR of YUV-component while encoding with proposed encoder1.Akiyo_qcif38.00608737.8789260.0332.Highway_qcif37.73292837.3900160.0903.Coastguard_qcif35.74146035.3807181.0094.Bridge-close35.07416134.3007802.2045.News_qcif36.76611436.4297440.0916.Miss-america_qcif39.24410438.8500751.0047.Container_qcif36.73266036.1006751.7208.Carphone_qcif37.41256335.6638274.7429.Foreman_qcif37.04478635.8560833.20836Bar chart comparing PSNR(YUV in dB)37Comparison of number of bits usedSequence numberSequenceNo. of bits used to encode the sequence encoding with AVS-MNo. of bits used to encode the sequence encoding with the proposed encoder.%****** increase in bits while encoding with proposed encoder1.Akiyo_qcif2728326867-1.5242.Highway_qcif2734627098-0.0903.Coastguard_qcif5579054702-1.9504.Bridge-close4942444405-10.1545.News_qcif4201341877-0.3246.Miss-america_qcif2068920385-1.4697.Container_qcif3772536061-4.4108.Carphone_qcif4710145925-2.4969.Foreman_qcif4702946557-1.00338Bar chart comparing # of bits:39Comparison of performances of AVS-M and proposed encoder SequenceAkiyo_qcifContainer_qcifBridge-close_qcifMiss-america_qcifForeman_qcifPSNR(Y) dB38.87683036.85261735.70563641.02810937.394567PSNR(Y) dB36.95841833.76336533.26982538.09175435.005621%decrease in PSNR4.927.956.827.156.38PSNR(U) dB40.38866440.58030537.14583739.63900841.00515186PSNR(U) dB40.16729140.12725136.93563939.50890240.930060%decrease in PSNR0.051.110.0560.0320.018PSNR(V) dB41.28011340.48494837.87792740.01777742.023883PSNRV) dB41.20755740.25592437.77221839.77916841.934222%decrease in PSNR0.020.0560.0270.0590.02140ContinuedSequenceAkiyo_qcifContainer_qcifBridge-close_qcifMiss-america_qcifForeman_qcifPSNR(YUV) dB39.43011937.76994736.22215840.58658538.380581PSNR(YUV) dB37.85410435.047224134.25149438.48154636.272297%decrease in PSNR3.997.25.445.185.49No. of bits used(B)204723252364549384230396766770No. of bits used(B)209955253908398959227076841282% saving in bits2.50.06-27.38-1.449.71Encoding time(t) (t)49.87366.14365.69254.61572.634Encoding time (t)10.18614.35211.73014.07113.915% decrease in encoding time79.5778.3082.1474.2380.8441Conclusions:Implementing C4.5 in AVS-M, reduces the encoding time of a sequence by 75%-80%.Except for Foreman_qcif and Carphone_qcif, there is no considerable loss in PSNR (Y) component.There is almost no loss in PSNR of U and V components.Surprisingly, there are considerable savings in number of bits used to encode the sequence, which is a bonus. The tree trained for 4 frames works also for 100 frames and for all sequences.42Future work:Intense study in pattern recognition and machine learning can be undertaken to develop a better tree.Deep study of video attributes and their final impact in mode decision can also help us in developing a good decision tree.Other machine learning algorithms can be implemented in AVS-M or other codecs to reduce the complexity.The bitstream obtained by this encoder can be multiplexed with some audio bitstream for streaming applications.43References: [1]http://ee.uta.edu/Dip/Courses/EE5359/Multimedia%20Processing%20project%20report%20final.pdf ; course website UTA[2]P. Carrillo, H. Kalva and T. Pin Low complexity H.264 video encoding", SPIE. vol.7443, Paper # 74430A, Aug. 2009[3]Kusrini1, S. Hartati, Implementation of C4.5 algorithm to evaluate the cancellation possibility of new student applicants at STMIK AMIKOM YOGYKARTA, Proceedings of the International Conference on Electrical Engineering and Informatics Institute Teknologi Bandung, Indonesia June 17-19, 2007[4]S. Saponara, et al Adaptive algorithm for fast motion estimation in H.264/MPEG-4 AVC, Proc. Eusipco2004, pp. 569 572, Wien, Sept. 2004[5]Decision tree basics: http://dms.irb.hr/tutorial/tut_dtrees.php [6]Weka Tools software:http://www.cs.waikato.ac.nz/ml/weka/

4444Continued[7]X. Jing and L. P. Chua, An efficient inter mode decision approach for H.264 video coding International Conference on Multimedia and Expo (ICME), pp. 1111-1114, July 2004.[8]Software download:ftp://159.226.42.57/public/avs_doc/avs_software[9]Power point slides by L.Yu, chair of AVS video : http://www-ee.uta.edu/dip/Courses/EE5351/ISPACSAVS.pdf[10]L. Fan, Mobile multimedia broadcasting standards, ISBN: 978-0-387-78263-8, Springer US, 2009 [11]AVS working group official website, http://www.avs.org.cn[12]Test sequences can be downloaded from the site http://trace.eas.asu.edu/yuv/index.html[13]Y. Xiang et al., Perceptual evaluation of AVS-M based on mobile platform, Congress on Image and Signal Processing, 2008, vol. 2, Issue, pp.76 79, 27-30 May 2008.

4545Continued[14]M.Liu and Z.Wei. A fast mode decision algorithm for intra prediction in AVS-M video coding, vol.1, ICWAPR apos; 07, Issue, 2-4, pp.326 331, Nov. 2007.[15]L.Yu et al, Overview of AVS-Video: Tools, performance and complexity, SPIE VCIP, vol. 5960, pp. 596021-1~ 596021-12, Beijing, China, July 2005.[16]L.Yu , S.Chen, J.Wang, Overview of AVS-video coding standards special issue on AVS, SP:IC, vol. 24, p. 247-262, April 2009.[17]Y. Shen, et al, A simplified intra prediction method, AVS Doc. AVS-M 1419, 2004.[18]F.Yi, et al, An improvement of intra prediction mode coding, AVS Doc. AVS-M 1456, 2004.[19]L. Xiong, Improvement of chroma intra prediction, AVS Doc. AVS-M1379, 2004

4646Continued[20]X.Mao, et al, Adaptive block size coding for AVS-X profile. AVS Doc. AVS-M2372, 2008.[21]R.Wang , et al, Sub-pixel motion compensation interpolation filter in AVS, 2004 IEEE International Conference on Multimedia and Expo, 1:93-96, 2004.[22]F.Yi et al, Low-complexity tools in AVS Part 7, J. Computer Science Technology, vol.21, pp. 345-353, May. 2006[23]W.Gao and T.Huang, AVS Standard -Status and Future Plan, Workshop on Multimedia New Technologies and Application, Shenzhen, China, Oct. 2007.[24]W.Gao et al, AVS the Chinese next-generation video coding standard, National Association of Broadcasters, Las Vegas, 2004.[25] Z.Ma, et al, Intra coding of AVS Part 7 video coding standard, J. Computer Science Technology, vol.21, Feb.2006.

4747Continued[26] J. Kim et al , H.264 Intra Mode Decision for Reducing Complexity Using Directional Masks and Neighboring Modes, PSIVT 2006, LNCS 4319, pp. 959 968, 2006.[27]Xin, Vetro, Fast Mode Decision for Intra-only H.264/AVC Coding, TR 2006-034 May 2006.[28]Pan et. al Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding, IEEE Transactions On Circuits And Systems For Video Technology. Vol 15, No. 7, July 2005[29]S. Saponara et. al Dynamic Control of Motion Estimation Search Parameters for Low Complex H.264 Video Coding, IEEE Transactions on Consumer Electronics, Vol. 52, No. 1, FEBRUARY 2006.

4848Continued[30]Cheng-Chang Lien, Chung-Ping Yu, A Fast Mode Decision Method for H.264/AVC Using the Spatial-Temporal Prediction Scheme, ICPR 2006[31] Information technology- advanced coding of audio and video part -7: mobile video[32] O. Maimon, L. Rokach, The data mining and knowledge discovery handbook, Springer publication[33] X. Wu et al, Top 10 algorithms in data mining survey paper, Springer-Verlag London Limited 2007

490

8 8

0

1

8 8

4

4

0 1

8 8

4

4

0 1

2 3

8 8

4

4

4

4

0

16

16

0

1

16

8

8

0 1

8

8

16

0 1

2 3

8

8

8

8

Documents

Low complexity AVS-M by implementing machine learning algorithm C4.5