[IEEE 2012 International Conference on Signal Processing and Communications (SPCOM) - Bangalore, Karnataka, India (2012.07.22-2012.07.25)] 2012 International Conference on Signal Processing

A Color Video Compression Technique using Key Frames and a Low Complexity Color Transfer

Rakesh Agarwal

Dept. of Electrical Engg. Indian Institute of Technology,

Kanpur Email:[email protected]

Sumana Gupta Dept. of Electrical Engg

Indian Institute of Technology, Kanpur

Email:[email protected]

Varaprasad Gude Dept. of Electrical Engg Indian

Institute of Technology, Kanpur

Email: [email protected]

Abstract- In this work, a novel method for color video compression using key-frame based color transfer has been proposed. In this scheme, compression is achieved by discarding the color information of all but few selected frames. These selected frames are either the key frames (frame selected by a key frame selection algorithm) or the Intra coded (I) frames. The partially colored video is compressed using a standard encoder thereby achieving higher compression. In the proposed decoder, a standard decoder first generates the partially colored video sequence from the compressed input. A color transfer algorithm is then used for generating the fully colored video sequence. The complexity of the proposed decoder is close to a standard decoder, allowing its use in wide variety of applications like video broadcasting, video streaming, handheld devices etc.

Index Terms— color, compression, video, complexity, key frames.

1. INTRODUCTION With an exponential increase in the use of digital media, a strong need for finding out new methods for efficient storage/video compression is felt. A color video occupies much more space than a gray scale video. If a color video is converted into a partially colored video and encoded, a significant increase in the compression ratio is possible.

However, the partially colored video should be such that it could be easily colored using traditional image color transfer algorithms, without much degradation in quality. Thus the frames that are to be kept colored while encoding should neither be very less so as to degrade the quality of the decoded video beyond a limit, neither too much so as to reduce the advantage in compression to negligible limits. Any kind of manual intervention for this purpose is not justified. Also the complexity of the algorithm should be maintained within practical limits.

2. RELATED WORK A work by Kumar et al. [4] explores the possibilities in this area of color video compression. They assigned some frames at a fixed pre-defined interval as reference

frames. The color of these reference frames was retained while that of others was removed. The video was than encoded using a standard Codec thereby giving higher compression. The video was then decoded using a standard Codec and the resulting partially colored video was colorized using a color transfer algorithm and motion vectors. The work achieved higher compression ratio compared to a standard Codec.

However the most important drawback of this work is that it chooses reference frames at a pre-defined interval. Thus if the video being compressed has very little motion activity, the reference frames selected by uniform sampling (as in this work) may be much more than required for sufficient quality of color transfer at the decoder. Thus the compression achieved in such cases will be lower than what is possible. On the other hand, for videos with high motion activity, uniform sampling may choose lesser number of representative frames than required for maintaining sufficient quality of color transfer at the decoder. Even the position of representative frames becomes important depending on the content of the video. Another possible drawback of this work is that they don’t identify any shots in the sequence prior to the selection of representative frames. Thus the scheme may not work satisfactorily for sequences with more than one shot.

3. PROPOSED APPROACH In this work, we aim to make use of the existing Key Frame selection algorithms for finding reference frames (referred to as Key Frames). These algorithms select the Key Frames adaptively for video sequences, without any manual intervention. Thus a smaller number of Key Frames are selected for videos with lower motion activity and a higher number of Key frames are selected for videos with higher motion activity. We also try to use Shot detection algorithms for identifying shot boundaries prior to Key Frame selection. At the decoder end, we use a simple color transfer algorithm for transferring color to the uncolored frames using the colored frames as the reference. Instead of recalculating the motion vectors at the decoder end, we use the motion vectors that are present in the decoder itself, for this purpose of colorization.

3.1 PROPOSED ENCODER Figure 1(a) gives the block diagram for the proposed encoder. The video sequence is first segmented into

978-1-4673-2014-6/12/$31.00 ©2012 IEEE

Figure 1(a): Proposed Encoder Block Diagram; (b) Proposed Decoder Block Diagram.

Shots using a shot boundary detection algorithm. The algorithm used in this work is based on the work of Kim et al. [3] and uses weighted variance and an adaptive threshold for identifying the shot boundaries. The algorithm is computationally very simple and gives satisfactory results for all kinds of shot boundaries.

After segmentation into shots, Key frame extraction algorithm (based on the work of Sze et al. [5]) is used to identify Key Frames within each shot. This algorithm uses spatio-temporal distribution of the pixels throughout the shot for identifying key frames. It calculates a hypothetical reference frame (referred to as a Temporally Maximum Occurrence Frame or the TMOF) based on the above mentioned distribution and calculates the distance of each frame from this TMOF. The peaks of this distance curve are chosen as the Key Frames. The validity of these selected Key Frames was verified using mutual information in place of the distance metric (as explained in subsequent sections).

After selection of the Key Frames, the color of the frames, other than the Key Frames and the Intra coded (I) frames, is removed. Key Frames are kept colored because they are required for a sufficient quality of color transfer at the decoder, while I frames are kept colored, because there is no motion vector information corresponding to these frames at the decoder. The resulting partially colored video is then compressed using a standard encoder to achieve compression.

3.2 PROPOSED DECODER Figure 1 (b) shows the block diagram for the proposed decoder. First the compressed video is decoded using a standard decoder. This process generates the partially colored video that is compressed using the standard encoder. The partially colored video is colorized using a color transfer algorithm [2].This color transfer finds the ‘best match’ in the luminance components of the two frames (colored reference and the uncolored target frame), and transfers the chrominance components corresponding to this best match in the reference frame, to the target frame. This algorithm uses a simple error distance criterion for finding this best match. The motion vectors required for the color transfer are acquired from the decoder.

4. RESULTS Since the video is stripped of its color components for all but few frames, an advantage in compression is

expected. The lesser the number of colored frames, more the expected advantage. This section gives the experimental results for different components of the proposed scheme. A brief explanation of the results with its theoretical counterpart is also provided.

4.1 SHOT DETECTION The shot detection algorithm was tested for various video sequences. The shot boundaries identified by the algorithm were verified manually. The algorithm, besides detecting hard cuts, successfully detected transitions present in different sequences.

4.2 KEY FRAME EXTRACTION The Key Frame Extraction algorithm was verified for two video sequences. The first, a slow video sequence (‘Claire’ sequence), generated 6 Key Frames out of a total of 411 frames in the sequence, while the other, a faster sequence (‘Football’ sequence) generated 12 Key Frames out of a total of 104 frames in the sequence. The Key Frames correspond to the red asterisk in Figures 2 and 3 for the two sequences respectively. The number of selected Key Frames clearly shows the advantage of using a Key Frame extraction algorithm. For a video sequence with low motion activity very few (about 1.5 % of all frames) Key Frames were selected, thereby promising higher compression. While for the second sequence with a higher motion activity, more number of key frames (11.5 % of all frames) Key Frames were selected. This increased number ensured the color transfer quality at the decoder. The algorithm was also verified by using mutual information in place of distance for determining Key Frames (i.e. the inverse of mutual information between the TMOF and individual frames was plotted and major peaks were selected as the Key Frames).

Figure 2: Weighted Distribution curve for the ‘Claire’

video.

Figure 3: Weighted Distribution curve for the ‘football’

sequence.

In Figure 4, red asterisks show the Key Frames selected by the described algorithm while the magenta asterisks show the Key Frames selected by the mutual information metric, for the ‘Football’ sequence. It can be clearly seen that both types of Key Frames are quite close to each other. In addition to it, the number of Key Frames selected by both the algorithms was equal (or similar). For the case of ‘Football’ sequence, it was 12 for both the sequence.

Figure 4: Comparison Key frames selection using Weighted Distance Distribution curve and Mutual Information for the

‘Football’ sequence.

4.3 ENCODER A ‘Claire’ video sequence was used to study the effect of the Quality Scaling Factor (QSF) of a standard Codec on the compression ratio of the proposed encoder. QSF is the value of the quantizing parameter of a standard encoder (MPEG-2 in this work). The DCT coefficients of a block in a standard encoder are divided by this QSF for achieving higher compression. This higher the value of QSF, higher will be the compression achieved. The quality however goes down (because increased quantization error). Thus QSF provides a mean for trade off between quality and compression. It is defined separately for all types of frames (i.e. I, P & B).

A decrease in the advantage in compression ratio was observed with increasing QSF values. One reason for such observation is the increase in compression ratio of a standard encoder with increasing QSF. As the QSF is increased, more and more DCT coefficients become zero. Thus there is a corresponding increase in the compression ratio of a standard encoder. In the proposed encoder however, most of the chrominance components are already zero (de-colorized). Thus an increase in the QSF does not help in compression to that

extent. The only non-zero components correspond to that of the luminance component of all the frames and the chrominance components of the Key Frames and the I frames (colored frames). This fact can be easily observed with the help of Figures 5 and 6. In Figure 5, a definite advantage in terms of compression can be seen with increasing QSF for the colored frames (Key Frames and I frames, shown as K and I respectively). On the other hand, this effect on the uncolored frames is somewhat subdued. In Figure 6, the effect of QSF on the size (in MB) of chroma components of each frame is shown. For the uncolored frames, since these components are already zero, there is a negligible effect of QSF. While for the colored frames this effect is prominent. Another important observation that can be made from Figure 5 is that, besides the colored frames, the corresponding next frames also show a reduction in compression. This is because of the predictive coding of such frames. The colored frames have non zero chrominance components, while corresponding next frames have zero chrominance components. Thus the error between the two is significantly high and thus a corresponding reduction in compression.

Figure 5: Performance comparison of proposed Codec with

MPEG-2 across different frames in ‘Claire’ sequence.

Figure 6: Comparison of relative chroma component size

for different values of QSF in ‘Claire’ sequence. Table 1 summarizes the results for different sequences using MPEG-2 as the standard codec with a simple profile and I period of 15. QSF for I was 1 and for P, 8. It can be observed that the number of key frames selected effects the advantage in compression ratio to a great extent.

Table 2 gives a comparison of compression ratios obtained using different standard encoders. This comparison was made using the “Prism Video Converter” v 1.40 (© NCH software). The profile of the

Video Sequence

Frames

Shots Key Frames

Total Colored Frames

Increase in Compression

Ratio (%)Mobile

(352x288) 300 1 7 27 16.70

Ice Age (352x240) 536 7 26 62 9.65

Claire (176x140) 411 1 6 34 14.75

Flower (352x288) 250 1 8 25 12.89

Football (320x240) 104 1 12 19 10.95

QSF MPEG2 Proposed Scheme

PSNR WPSNR WMAE PSNR WPSNR WMAE

I=1,P=1 48.02 59.17 0.0011 44.39 52.40 0.0024

I=4, P=4 47.55 57.72 0.0013 43.00 50.75 0.0029

I=8,P=8 45.28 54.42 0.0019 43.45 51.37 0.0027

I=1,P=8 47.47 58.42 0.0012 44.23 52.40 0.0024

I=1,P=16 46.56 56.48 0.0015 43.92 52.04 0.0025

I=8,P=24 42.18 50.46 0.0030 39.89 47.13 0.0044

I=8,P=28 39.00 47.54 0.0067 38.10 45.03 0.0056

I=16,P=31 35.90 43.48 0.0067 35.50 42.38 0.0076Sequence\ Codec 3gp-

H263 Proposed Encoder H264 Proposed

Encoder Mpeg2 Proposed Encoder

Ice Age 300.96 747.83 327.73 872.03 79.61 333.49

Football 242.48 300.06 37.62 571.55 12.54 181.86

Flower 291.22 362.26 43.94 802.84 14.85 243.48

Claire 101.78 94.05 158.08 200.81 45.86 71.44

Mobile 280.24 282.90 37.65 430.51 14.93 121.74

Sequence Football AMD Flower Claire Mobile Foreman

Decode Time- MPEG-2 (in

seconds) 24 22 29 9

30 31

Decode Time- Proposed Decoder

(in seconds) 61 64 82 25

85 83

encoders, I period and the QSF value were not available.

Table 1: Comparison of Proposed Encoder with MPEG2, (QSF: I=1, P=8).

Table 3 summarizes the value of various quality metrics for different QSF values for the ‘Claire’ sequence. Degradation in quality can be observed with increasing QSF.

Table 3: Results of different Quality Metrics for various QSF values (for Claire sequence)

Table 2: Comparison of Compression Ratio for different Standard Codecs.

4.4 DECODER Various quality metrics were used for estimating the degradation in quality after color transfer. One was the standard Peak Signal to Noise Ratio (PSNR). The other two were perceptual measures defined by Ameer et al. [1]. They are known as the Weber based Mean Absolute Error (WMAE) and the Weber based PSNR (WPSNR). The value of WMAE comes in the range of 0 to 1, with 0 signifying a perfect reconstruction and 1 signifying a total loss of image.

Figure 7 shows the PSNR values for the first 100 frames of the ‘Claire’ sequence. It can be clearly observed that there is no degradation in quality w.r.t. the standard codec for the colored frames (Key frames and I frames). This is because these frames are processed in the same way as in a standard codec. While for the other uncolored frames there is a slight degradation in quality (in the range of 1-3 dB). Table 3 summarizes the value of various quality metrics for different QSF values for the ‘Claire’ sequence. Degradation in quality can be observed with increasing QSF.

This is agreement with the working of the QSF as discussed before. However the difference in the values of various metrics for a standard codec and the proposed codec reduces with increasing QSF.

Table 4 summarizes the results for various sequences used in this work. The scheme had to be such that the complexity of the proposed decoder is kept close to a standard decoder. Decoding time provided a good measure of the decoding complexity. The proposed scheme was implemented in MATLAB. A comparison of the decoding times thus obtained is presented in Table 5.

Table 4: Results of different Quality Metrics for various sequences (QSF: I=1, P=8)

Sequence MPEG2 Proposed Scheme

PSNR WPSNR WMAE PSNR WPSNR WMAEMobile

(352x288) 39.96 46.59 0.0047 34.14 33.90 0.0202

Claire (176x140) 47.46 58.42 0.0012 44.22 52.40 0.0024

Foreman (352x288) 46.40 48.75 0.0037 45.99 40.18 0.0098

Flower (352x288) 41.04 43.35 0.0068 36.58 33.62 0.0208

Football (320x240) 43.22 46.56 0.0047 38.87 37.27 0.0137

Table 5: Comparison of decoding time for 100 frames (in seconds).

Figure 7: PSNR values for the first 100 frames of the ‘Claire’ sequence.

Figure 8 on next page shows the original and the decoded frames from two of the video sequences used for testing (‘Football’ and ‘Flower’). The ‘Football’

Figure 8: (a) to (c) & (d) to (f) show the original frames and (g) to (i) & (j) to (l) show the corresponding decoded frames from the ‘Flower’ sequence and the ‘Football’ sequence respectively.

sequence is a relatively fast sequence (with higher motion activity) compared to the ‘Flower’ sequence. It can be observed that the decoded frames are perceptually similar to the original ones. The quality metrics mentioned above were also used for verifying the quality of decoded video sequences. It can be observed that the decoding time of the proposed decoder and the standard decoder are close. There is a slight increase in the decoding time. This is primarily due to two reasons. One is the use of MATLAB, which takes significantly more time than a dedicated software and the other being use of optimized codes for a standard codec while custom made, un-optimized codes for the proposed codec. However these time values give a relative idea of the closeness of the proposed decoder scheme to a standard decoder in terms of computational complexity.

Table 6 below gives a comparison between the PSNR values for the proposed codec and MPEG2 along with corresponding advantage seen in compression (for the proposed coded w.r.t. MPEG2). The result obtains are for the ‘Claire’ sequence.

Table 6 : Comparison of PSNR values and advantage in comparison for

different QSF values

QSF

compression increment

(%)

PSNR

(Proposed)

PSNR

(MPEG2)

I=1,P=1 25.78 44.21 46.03

I=4,P=4 20.55 43.52 45.79

I=8,P=8 16.50 43.74 44.66

I=1,P=8 14.74 44.13 45.75

I=1,P=16 8.24 43.97 45.30

I=8,P=24 0.09 41.97 43.11

I=8,P=28 -3.79 41.07 41.51

5. CONCLUSION In this work, a novel scheme for color video compression is proposed. The periodical reference frame based algorithm [4] was improved by incorporating a shot detection algorithm and a key frame extraction algorithm. The said algorithms were chosen so as to keep the overall complexity of the codec (specially the decoder) within practical limits and close to a standard codec. A marked improvement in compression ratio, over a standard codec, was observed. The degradation in quality because of the proposed codec over a standard codec was also found to be small (1-10 dB reduction in PSNR).

6. REFERENCES [1] Ameer, S., and Basir, O., “Objective image quality

measure based on weber-weighted mean absolute error”, In Proceedings of the IEEE Int. Conf. on Signal Processing ICOSP’08, pp 728-732, DOI = 10.1109/ICOSP.2008.4697233, 2008.

[2] Jacob, V. G., and Gupta, S., “Colorization of grayscale images and videos using a semi- automatic approach”. In Proceedings of the IEEE Int. Conf. on Image Processing ICIP’09, pp 1653- 1656, DOI = 10.1109/ICIP.2009.5413392.

[3] Kim, W. H., Moon, K. S., and Kim, J. N., “Adaptive shot change detection for hardware application”. ISICA’08, Springer-Verlag, Berling Heidelberg, LNCS 5370, 2008, pp. 778-784.

[4] Kumar, R. K., and Mitra, S. K., “Motion estimation based color transfer and its application to color video compression”. Pattern Analysis and Applications, Springer-Verlag London Limited, Short Paper,2007, DOI= 10.1007/s10044-007-0086-6.

[5] Sze, K. W., Lam, K. M., and Qiu, G., “A New Key Frame Representation for Video Segment Retrieval”, IEEE Trans. Circuits and Syst. Video Technology, Vol 15. No. 9, Sept 2005.

Documents

[IEEE 2012 International Conference on Signal Processing and Communications (SPCOM) - Bangalore, Karnataka, India (2012.07.22-2012.07.25)] 2012 International Conference on Signal Processing