[IEEE 2010 International Conference on Computer and Communication Engineering (ICCCE) - Kuala Lumpur, Malaysia (2010.05.11-2010.05.12)] International Conference on Computer and Communication

Fast Mode Decision for Scalable Video Coding over Wireless Network

Haris Al Qodri Maarif, Teddy Surya Gunawan, Akhmad Unggul Priantoro Department of Electrical and Computer Engineering

International Islamic University Malaysia Kuala Lumpur, Malaysia

[email protected], [email protected], [email protected]

Abstract—The scalable video coding H.264/SVC is the video coding standard which is an extension of H.264/AVC. Because of its scalability, H.264/SVC has gained a great interest in video transmission. Partial bit stream can be transmitted and decoded by the H.264/SVC in order to provide quality in lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. Hence, H.264/SVC provides functionalities such as graceful degradation in lossy transmission environments, i.e. wireless networks, as well as bit rate format, and power adaptation. This paper deals with the Medium-Grain SNR Scalability (MGS) scheme and fast coding mode decision. The MGS scheme with two enhancement layers is applied for enhancing the streaming video quality. To perform an optimum mode decision, motion estimation is performed for all Macro Block (MB) modes, and the Rate Distortion (RD) costs are compared to identify an MB mode with the smallest RD cost. This increases computational complexity of H.264/SVC encoding. Therefore, fast mode decision algorithm scheme was implemented to speed up the encoding time. From the experiment, the fast mode decision was able to decrease the encoding process up to 30.22%.

Keywords: Scalable video coding, wireless network, fast mode decision, and rate distortion

I. INTRODUCTION Recently, scalable video coding (SVC) has become very

popular because of its ability to adapt in various conditions of network. In the encoding process, SVC allows partial transmission and decoding of a bit stream [1]. The encoded data from the SVC encoder contains the base layer and the enhancement layers. The base layer contains the main information of the video and should be transmitted with very high reliability. On the other hand, the enhancement layers might be dropped or only transmitted partially according to the available network bitrate [2, 3]. This condition is allowing fast and accurate variable bit rate channels adaptation based on network condition .

Over the past few years, there has been active research on fast mode decision algorithm due to the need to speed up the encoding process [4, 5]. In some papers [6, 7], the fast mode

decision in scalable video coding is not only speeding up the encoding time, but also maintain the quality of the encoded video. In other words, the computing complexity and encoding time was more than two times faster with negligible reduced quality as mentioned in [4-7].

The objective of this paper is to implement fast mode decision in the scalable video coding standard and evaluate its performance in terms of video quality and encoding time. The comparison between full mode exhaustive search and fast mode decision algorithm were observed. The rest of the paper is organized as follows. Section 2 describes the scalable video coding while section 3 explains the wireless network. The implementation of the proposed fast mode decision algorithm is presented in section 4. Section 5 evaluates the performance while section 6 concludes this paper.

II. SCALABLE VIDEO CODING Scalable video coding (SVC) is the scalable extension of

the H.264/AVC standard [8] and classified as the layered video codec [9]. SVC has been found in many video applications, e.g. video streaming and video communication, because of its capability to adapt in various wireless network conditions by adapting to variable bit channel. SVC can produce multiple bit-stream data multiple bit-streams for one source, and can code the data in different image-size, frame rate, and bit-rate. SVC attempts to fulfill the current demand of digital video technology which needs to play digital video in many different transmission bandwidth and video devices such as video telephony, mobile phone, pocket-pc, etc. Hence, different video applications requires different video transmission and different video size.

The scalability of SVC can be achieved in terms of temporal scalability (frame rate), spatial scalability (video size) and quality scalability (SNR scalability). The temporal scalability is defined by the motion compensated temporal filtering (MCTF) and prediction structure with hierarchical bidirectional images (B images). The spatial scalability utilizes multi-layer prediction approach in different video size. It defines the base layer which contains the main information

International Conference on Computer and Communication Engineering (ICCCE 2010), 11-13 May 2010, Kuala Lumpur, Malaysia

978-1-4244-6235-3/10/$26.00 ©2010 IEEE

and enhancement layer which provides the enhancement of base layer video quality. Furthermore, it provides scalability on the different video size. On the other hand, the quality scalability (SNR) can be implemented by two methods, i.e. coarse grain scalability (CGS) and medium grain scalability (MGS).

In the spatial scalability, the enhancement layers can be coded both with the temporal motion prediction and the inter-layer prediction mechanisms. The inter-layer prediction improves the coding efficiency of enhancement layer by enabling the usage of base layer information as much as possible. Three methods which are included in the spatial scalable coding are the prediction of motion parameters, the prediction of the residual signal and the inter layer intra texture prediction [10].

The inter layer motion prediction is employed to improve the coding efficiency of the enhancement layers by using the base layer motion data. When the two layers have the same spatial resolution, the enhancement layer can directly use the base layer’s motion information, such as macroblock partitioning and motion vectors. While the spatial resolution is different, the base layer’s motion information is scaled by a factor of 2 for the enhancement layer.

If the spatial resolution is the same, the base layer residual is used as the prediction of the current residual and the difference between the current residual signal and the base layer’s is coded. If the spatial resolution is changed, the base layer residual is upsampled for the prediction of current residual. This prediction takes place when the enhancement layer macroblock is coded in “BlSkip mode” and the base layer macroblock is intra-coded. The base layer intra signal is used as the prediction of the enhancement layer macroblock.

III. WIRELESS LOCAL AREA NETWORK (IEEE 802.11) IEEE 802.11 standard is becoming the most popular

standard for WLANs because of the new standard of higher-rate transmission. Since wireless is not limited by any means of wire, wireless LAN standard is coming with flexibility and simplicity as well as effectiveness in terms of cost, mobility and usage. IEEE 802.11 standard and its extensions define the specification for the MAC and different physical layers. Some of the standards are 802.11a Physical Layer and 802.11 MAC PCF [11].

In Open System Interconnection (OSI), The IEEE 802.11 Physical Layer (PHY) provides the interface between the MAC and the wireless medium. The IEEE 802.11 MAC is the set of rules that determining the way in accessing the medium and sending data, and PHY works to determine the detail of transmission and reception of the data [12].

The IEEE 802.11 MAC uses Collision Sense Multiple Access (CSMA) for controlling medium of transmission and it also uses Collision Avoidance (CA) for efficient transmission n radio link. The IEEE MAC mechanism allows two different coordination functions: Distributed Coordination Function (DCF) and Point Coordination Function (PCF). DCF is working the condition of transmission line whether it is free

for transmission or not. PCF is working more suitable for video streaming applications [11].

The IEEE PHY has four standards in data exchange techniques, such as : Infra Red (IR), Frequency Hopping Spread Sequence (FHSS) – 802.11, Direct Sequence Spread Sequence (DSSS) – 802.11b/g, and Orthogonal Frequency Division Multiplexing (OFDM) – 802.11a/g/n.

In terms of transmission link, the IEEE 802.11 has some different bitrates transmission, which are standardized for IEEE802.11b, IEEE802.11g, and IEEE802.11a. The bitrate for IEEE 802.11b is 11 Mbps. The bitrate for IEEE802.11g and IEEE802a are 54 Mbps.

Figure 1. Block diagram of IEEE 802.11 standard

Figure 1 illustrates the IEEE 802.11 standard which is the link layer that use the 802.2/LLC encapsulation. The specifications include the 802.11 MAC and multiple physical layer (PHY) which are applying Frequency Hopping Spread Spectrum (FHSS/802.11), Direct Sequence Spread Spectrum (DSSS/802.11a and Orthogonal Frequency Division Multiple Access (OFDM/802.11a/g/n).

IV. IMPLEMENTATION OF FAST MODE DECISION The algorithm for fast mode decision algorithm is

implemented by downsampling video size in enhancement layer. This is conducted to decrease coding complexity in enhancement layer. Since the resolution for enhancement layer is twice of the base layer, the encoding process takes almost 80% of encoding bit streams if both layers used the same video size [4].

As shown in Figure 2, the algorithm in encoder side, input video was encoded into two spatial layers, i.e. one base layer (layer 0) and one enhancement layer (layer 1) in different spatial resolution, QCIF (176x144) and CIF (352x288). To decide the MB modes, the motion estimation is employed to identify Rate Distortion (RD) costs, the those value are compared to get an MB mode with the smallest RD Cost.

Figure 2. Encoding algorithm with downsampling

The different spatial layer between base layer and enhancement layer needs more computation time for encoding process. Downsampling process for enhancement layer in the encoder side is provided in order to reduce the encoding complexity and decrease the encoding time. Then, after the decoding process, the video is upsampled into the original size as shown in Figure 3.

Figure 3. Algorithm in the Decoder Side

V. SIMULATION RESULTS AND DISCUSSION JSVM Reference software 9.15 [11] is utilized to

implement the algorithm and to perform simulation. It is the standard scheme for the scalable video coding. The scheme is including all key components like motion compensation, intra prediction, transform and entropy coding, the deblocking filter, or Network Abstraction Layer (NAL) Unit packetization.

The software needs some configurations to perform the running encoding and decoding process. The configuration parameters are stored in the configuration files, which is defined as main configuration files and layer configuration files. The main configuration file contains the parameter settings for the whole encoder system, while the layer configuration file contains the parameter settings for each particular layer. The parameter in configuration files should be defined properly so that the encoding process meets the simulation objectives.

For simulation process, streaming video with YUV format, i.e. Foreman and News video sequences, were employed as the tested video sequence to observe the output from video encoder. The frame rate of video sequences was 39 frames per second. Two layers were used, i.e. base layer and enhancement layer in different spatial resolution. Video in QCIF resolution is used as base layer, and CIF resolution is used as enhancement layer. The GOP 16 was used which also automatically define the number of I and P frames. The quantization parameters (QP) used was 38. Some default parameters in JSVM Software version 9.15 which supports the scalability encoding were also used as described in JSVM software manual [13].

We carried out three experiments for evaluation purposes. Three encoding schemes, i.e. full block search, fast search and proposed fast mode algorithm, were implemented and evaluated. Performance evaluation of the encoded video is based on subjective survey and objective evaluation. Subjective survey is based on the personal opinion and objective evaluation is based on the calculation of BDBR, BDPSNR, and Time Saving. BDBR is value of different bitrate, BDPSNR is the different value of PSNR and the Time Saving shows the computation time between the JSVM scheme and proposed scheme [14].

Table I and table II shows the simulation results. Table I shows comparison between proposed algorithm and fast

search. It can be seen that the proposed scheme provides higher time saving for encoding time up to 30 % with the negligible different PSNR of 0.100 – 0.118 dB, and 4% – 5% bit rate decreases. Table II shows comparison between proposed algorithm and full search. It can be seen that the proposed scheme provides higher time saving for encoding time up to 83 % with the negligible different PSNR of 0.110 dB and 4% – 5% bit rate decreases.

TABLE I. SCALABLE EXTENSION OF H.264/AVC FOR JSVM BASIC AND PROPOSED ALGORITHM

Video BDBR BDPSNR Time Saving (%)

News -5.521 0.118 30.22

Foreman -4.238 0.1 29.65

TABLE II. SCALABLE EXTENSION H.264/AVC FOR JSVM FAST SEARCH AND JSVM FULL SEARCH

Video BDBR BDPSNR Time Saving (%)

News -5.277 0.11 83.40

Foreman -4.331 0.11 82.68

TABLE III. SCALABLE EXTENSION H.264/AVC TEST STREAM

Freq (Hz)

News Foreman

Bitrate (kbps)

Min Bitrate (kbps)

Y-PSNR (dB)

Bitrate (kbps)

Min Bitrate (kbps)

Y-PSNR (dB)

Fast

Sea

rch

1.88 36.15 36.15 34.54 29.55 29.55 34.27

3.75 41.28 41.28 34.19 37.33 37.33 33.57

7.5 45.71 45.71 33.91 44.86 44.86 33.08

15 50.43 50.43 33.70 53.16 53.16 32.80

30 54.35 54.35 33.58 59.51 59.51 32.59

Full

Sear

ch

1.88 36.15 36.15 34.54 29.55 29.55 34.27

3.75 41.28 41.28 34.20 37.32 37.32 33.57

7.5 45.65 45.65 33.92 44.78 44.78 33.09

15 50.36 50.36 33.72 53.07 53.07 32.81

30 54.32 54.32 33.60 59.39 59.39 32.61

Fast

Sea

rch

Prop

1.88 37.05 36.20 34.58 30.41 29.60 34.31

3.75 42.81 41.35 34.22 38.80 37.39 33.60

7.5 48.38 45.78 33.94 47.40 44.82 33.11

15 55.42 50.53 33.74 58.01 53.16 32.84

30 64.04 54.47 33.61 68.98 59.45 32.63

Table III shows the variety of frequencies which represents temporal scalability of scalable video coding, i.e. scalability of bitstream. If the network is in the best condition, high quality encoded video with higher bitrate can be transmitted over the network, vice versa.

For subjective evaluation, the proposed scheme gives faster encoding time and the acceptable quality for encoded video. There are some reduced qualities in the proposed scheme but it can be considered as to be neglected. As mentioned in [10] that the quality encoded video by SVC will become poorer when the bandwidth is low. Figure 4 and Figure 5 show the reconstructed image from the proposed algorithm. It can be see that the proposed scheme has acceptable quality with negligible reduced quality compared to the reconstructed image from the JSVM scheme.

Figure 4. Reconstructed Foreman Video Sequence

Figure 5. Reconstruced News Video Sequence

VI. CONCLUSIONS Fast mode decision algorithm for scalable video coding has

been presented. The scalability achieved can be utilized for various wireless network condition. Higher quality for the higher bandwidth and lower quality for the lower bandwidth. Based on simulation results, it can be concluded that the

output from the implemented scheme in scalable video encoder provided faster encoding time and scalability in terms of spatial, temporal, and quality. The proposed scheme saved time up to 30.22 % of encoding time with negligible different PSNR 0.100 – 0.118 dB and 4% – 5% bit rate reduction compared to original fast search in JSVM. It reduced encoding time up to 83 % with the negligible different PSNR of 0.110 dB and 4% – 5% bit rate reduction compared to original full search in JSVM. On the other hand, subjective and objective evaluation showed that reconstructed video quality was maintained and negligible reduced quality of reconstructed video was detected. Future work will include optimization parameter of JSVM software and algorithm, implementation in the real time hardware, and simulation with various network conditions.

ACKNOWLEDGMENT This research has been supported by International Islamic

University Malaysia Research Endowment Fund 2009.

REFERENCES [1] B.J. Kim, Z. Xiong, and W. A. Pearlman, “Low bit-rate scalable video

coding with 3D set partitioning in hierarchical trees (3D SPIHT),” in IEEE Transactions on Circuits and Systems for Video Technology, vol. l0, no. 8, pp. 1374-1387, December, 2000.

[2] S. McCanne, M. Vetterli, and V. Jacobson, “Low-complexity video coding for receiver-driven layered multicast,” in IEEE Journal on Selected Areas in Communication, vol. 15, pp. 983-1001, August, 1997.

[3] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1103–1120, September, 2007.

[4] G. Goh, J. Kang, M. Cho, and K. Chung, “Fast Mode Decision for Scalable Video Coding Based on Neighboring Macroblock Analysis”, in Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 1845-1846, 2009

[5] H. Li, Z. G. Li, and C. Wen, “Fast Mode Decision for Coarse Grain SNR Scalable Video Coding”, in IEEE International Conference on Acoustics, vol. 2, pp. 545-548, May, 2006.

[6] H. Li, Z. G. Li, C. Wen, and L. Chau, “Fast Mode Decision for Spatial Scalable Video Coding”, in IEEE International Symposium on Circuits and Systems, pp. 3005-3008, September, 2006

[7] H. Li, Z. G. Li, and C. Wen, “Fast Mode Decision Algorithm for Inter-Frame Coding in Fully Scalable Video Coding”, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 7, pp. 889-896, July, 2006.

[8] H. Schwarz, D. Marpe, T. Schierl, and T. Wiegand, “Combined scalability support for the scalable extensions of H.264/AVC”, in International Conference on Multimedia and Expo, pp. 1-4, 2005.

[9] G. Liebl, M. Wagner, J. Pandel, and W. Weng, “An RTP payload format for erasure-resilient transmission of progressive multimedia streams”, in IETF, 2004.

[10] J. Ye , and J. Liu, “An Improved Method for Scalable Video Coding at Low Bit Rates”, International Symposium on Intelligent Signal Processing and Communication Systems, 2007

[11] X. Xiaofeng, S. Mihaela, K. Santhana, C. Sunghyun, and W. Yao, “Adaptive error control for fine-granular-scalability video coding over IEEE 802.11 wireless LANs”, in Proceedings of International Conference on Multimedia and Expo, pp.669-672, July, 2003.

[12] M. S. Gast, 802.11 Networks : The definitive Guide: O`Reilly, 2002 [13] JSVM Software Manual, 2009. [14] Gisle Bjontegaard, Calculation of Average PSNR Differences between

RD-curves,.VCEG-M33, 13th meeting: Austin, Texas, USA, April 2-4, 2001.

Documents

[IEEE 2010 International Conference on Computer and Communication Engineering (ICCCE) - Kuala Lumpur, Malaysia (2010.05.11-2010.05.12)] International Conference on Computer and Communication