11
Design and Implementation of Real-Time Software-Based H.261 Video Codec Wen-Shiung Chen, 1 Yuan-Yu Peng, 2 Yung-Tsang Chang 3 Jen-Tse Wang 3 1 VIP-CC Laboratory, Department of Electrical Engineering, National Chi Nan University, Pu-Li, Nan-Tou, Republic of China 2 Department of Electrical Engineering, Feng Chia University, Taichung, Republic of China 3 Department of Information Management, Hsiuping Institute of Technology, Taichung, Republic of China ABSTRACT: ITU-T H.261 is a video coding standard for videophone and video-conferencing applications on LAN and ISDN, which re- quires a great amount of computing power for DCT and motion estimation, traditionally provided by hardware. Since motion estima- tion is a major problem in developing real-time video codec, in this paper we propose a simple and fast motion estimation algorithm to reduce searching time. Mainly, a real-time software-based H.261 video codec is investigated and implemented, in which several fast methods such as programming technique and Intel MMX™ instruc- tion are used to improve computing speed. The experimental results have demonstrated that our H.261 codec can compress video in CIF format over 30 fps and in QCIF at 105 fps, and can achieve a very high decoding rate. © 2002 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 12, 73– 83, 2002; Published online in Wiley InterScience (www.interscience.wiley. com). DOI 10.1002/ima.10013 Key words: multimedia communications; videophone; video-confer- encing; video codec; H.261; motion estimation I. INTRODUCTION Owing to fast and mature development in communications and networking technology, multimedia communications that combines applications with text, sound, audio and video data therefore be- comes the major trend of communication in the 21st century. It provides people with a friendly interface to communicate with one another via sound and video. However, video coding becomes a key task in such applications because of bandwidth shortage (Rao, 1996). Most existing video coding techniques, such as ITU-T H.261 (Rao, 1996) (ITUT, 1993) and ISO MPEG (Rao, 1996), have been designed to compress data by both reducing the spatial correlation and the temporal correlation. Traditionally, ITU-T H.261 video compression standard is implemented in hardware to facilitate videophone and video-conferencing services over LAN or ISDN. With the advances of current technology, more and more higher speed processors are making software-based video codec implemen- tation possible in real-time applications. The H.261 standard has high video data compression ratio, so it can tally with high quality and low bandwidth solution. There are two major categories in H.261 video compression techniques. One is intraframe coding which includes DPCM, transform coding and adaptive variable-length coding (for example, Huffman/runlength coding), the other is interframe coding that includes picture’s motion estimation (ME) and motion compensation (MC). The interframe coding makes much contribution to the video data compression, even though it takes up most of the computation time in encoding. A number of researches have been conducted on the design and implementation of H.261 (Bellini, 1999; Huang, 1996). The most popular method in ME is block-based or block match- ing approach (BMA) which uses current block as a basis to search a resemble block from the previous frame and acquires their dis- placement, called motion vector (MV). During encoding process, the encoder transmits the MV in place of coding of DCT coefficients. MV coding requires much less bits than DCT coefficient does, thus making a much higher compression ratio. However, the full search method performs an exhaustive search pixel-by-pixel in each block to acquire an MV, thus it results in heavy computation in the encoder (Rao, 1996). Many fast search algorithms are proposed to resolve the problem of computation complexity of full search (Rao, 1996; Teklap, 1995). They are categorized into two classes by either (i) reducing the number of distortion measurement while searching, such as three- step search (TSS) (Koga, 1981), new three-step search (NTSS) (Li, 1994), simple efficient search (SES) (Lu, 1997), fast three-step search (FTSS) (Kim, 1998), and new fast three-step search (NFTSS) (Kim, 1999) or (ii) improving distortion computing speed by using the operations, such as middle stop and partial sampling (Teklap, 1995). In this paper, we combine NTSS into FTSS to design a new ME search algorithm. In addition, we also use MMX instruction set (Lempel, 1997; Peleg, 1996) of SIMD architecture to enhance the computation speed of ME in our implementation. Finally, the main topic of this paper is to design and implement a real-time software- based H.261 codec. Section II reviews the H.261 coding standard. The proposed system of H.261 codec is described and its real-time software-based system is implemented in Section III. Section IV shows the experimental results. Finally, a conclusion is presented in Section V. Correspondence to: Wen-Shiung Chen © 2002 Wiley Periodicals, Inc.

Design and implementation of real-time software-based H.261 video codec

Embed Size (px)

Citation preview

Page 1: Design and implementation of real-time software-based H.261 video codec

Design and Implementation of Real-Time Software-Based H.261Video Codec

Wen-Shiung Chen,1 Yuan-Yu Peng,2 Yung-Tsang Chang3 Jen-Tse Wang3

1 VIP-CC Laboratory, Department of Electrical Engineering, National Chi Nan University, Pu-Li, Nan-Tou,Republic of China

2 Department of Electrical Engineering, Feng Chia University, Taichung, Republic of China

3 Department of Information Management, Hsiuping Institute of Technology, Taichung, Republic of China

ABSTRACT: ITU-T H.261 is a video coding standard for videophoneand video-conferencing applications on LAN and ISDN, which re-quires a great amount of computing power for DCT and motionestimation, traditionally provided by hardware. Since motion estima-tion is a major problem in developing real-time video codec, in thispaper we propose a simple and fast motion estimation algorithm toreduce searching time. Mainly, a real-time software-based H.261video codec is investigated and implemented, in which several fastmethods such as programming technique and Intel MMX™ instruc-tion are used to improve computing speed. The experimental resultshave demonstrated that our H.261 codec can compress video in CIFformat over 30 fps and in QCIF at 105 fps, and can achieve a very highdecoding rate. © 2002 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 12,73–83, 2002; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.10013

Key words: multimedia communications; videophone; video-confer-encing; video codec; H.261; motion estimation

I. INTRODUCTIONOwing to fast and mature development in communications andnetworking technology, multimedia communications that combinesapplications with text, sound, audio and video data therefore be-comes the major trend of communication in the 21st century. Itprovides people with a friendly interface to communicate with oneanother via sound and video. However, video coding becomes a keytask in such applications because of bandwidth shortage (Rao,1996). Most existing video coding techniques, such as ITU-T H.261(Rao, 1996) (ITUT, 1993) and ISO MPEG (Rao, 1996), have beendesigned to compress data by both reducing the spatial correlationand the temporal correlation. Traditionally, ITU-T H.261 videocompression standard is implemented in hardware to facilitatevideophone and video-conferencing services over LAN or ISDN.With the advances of current technology, more and more higherspeed processors are making software-based video codec implemen-tation possible in real-time applications.

The H.261 standard has high video data compression ratio, so itcan tally with high quality and low bandwidth solution. There are

two major categories in H.261 video compression techniques. One isintraframe coding which includes DPCM, transform coding andadaptive variable-length coding (for example, Huffman/runlengthcoding), the other is interframe coding that includes picture’s motionestimation (ME) and motion compensation (MC). The interframecoding makes much contribution to the video data compression,even though it takes up most of the computation time in encoding.A number of researches have been conducted on the design andimplementation of H.261 (Bellini, 1999; Huang, 1996).

The most popular method in ME is block-based or block match-ing approach (BMA) which uses current block as a basis to searcha resemble block from the previous frame and acquires their dis-placement, called motion vector (MV). During encoding process, theencoder transmits the MV in place of coding of DCT coefficients.MV coding requires much less bits than DCT coefficient does, thusmaking a much higher compression ratio. However, the full searchmethod performs an exhaustive search pixel-by-pixel in each blockto acquire an MV, thus it results in heavy computation in the encoder(Rao, 1996).

Many fast search algorithms are proposed to resolve the problemof computation complexity of full search (Rao, 1996; Teklap, 1995).They are categorized into two classes by either (i) reducing thenumber of distortion measurement while searching, such as three-step search (TSS) (Koga, 1981), new three-step search (NTSS) (Li,1994), simple efficient search (SES) (Lu, 1997), fast three-stepsearch (FTSS) (Kim, 1998), and new fast three-step search (NFTSS)(Kim, 1999) or (ii) improving distortion computing speed by usingthe operations, such as middle stop and partial sampling (Teklap,1995).

In this paper, we combine NTSS into FTSS to design a new MEsearch algorithm. In addition, we also use MMX instruction set(Lempel, 1997; Peleg, 1996) of SIMD architecture to enhance thecomputation speed of ME in our implementation. Finally, the maintopic of this paper is to design and implement a real-time software-based H.261 codec. Section II reviews the H.261 coding standard.The proposed system of H.261 codec is described and its real-timesoftware-based system is implemented in Section III. Section IVshows the experimental results. Finally, a conclusion is presented inSection V.Correspondence to: Wen-Shiung Chen

© 2002 Wiley Periodicals, Inc.

Page 2: Design and implementation of real-time software-based H.261 video codec

II. REVIEW OF H.261 VIDEO CODECH.261 standard, proposed by ITU-T to operate at a low bit rate,performs well at p � 64 Kbps network bandwidth. The blockdiagram of an H.261 video encoder is shown in Fig. 1. The mainelements are 2-D DCT/IDCT transform (Chen, 1977), quantization,variable length coding (VLC) (Hashemian, 1995), and frame pre-diction (Rao, 1996). The H.261 standard supports two kinds of videoframe formats: CIF (352�288 pixels) and QCIF (176�144 pixels).Like MPEG-1, the compressed data stream is arranged hierarchi-cally into four layers: picture, group of blocks (GOB), 16�16macroblock (MB), and 8�8 block. An MB consists of four lumi-nance (Y) blocks and two chrominance (Cb and Cr) blocks.

The video compression scheme chosen for the H.261 standardhas two main operation modes: the intra and inter modes. The intramode is similar to JPEG still-image compression in which it is basedon block-by-block DCT coding. In the inter mode, first a temporalprediction is employed with or without MC. Then the interframeprediction error is DCT encoded. Each mode offers several options,such as changing the quantization scale parameter and using a filterwith MC. Frame prediction (for example, ME and MC) is performedin a manner similar to that in MPEG-1, with the exception that onlyI-frame and P-frames are used. The MB organization and classifi-cation of an MB as an intra-mode or inter-mode follow the approachin MPEG-1.

In the H.261 standard, three major possible coding modes are“intra,” “inter,” and “inter�MC,” respectively. To select the best

coding mode for each MB, a suitable criterion must be designed.In general, the variances of the original MB, the MB difference,and the displaced MB difference with the best MV estimation areused in the criterion. A typical coding scheme is described asfollows:

(1) For each MB, the ME is performed and an MV and adisplaced MB difference are generated. If the variance ofthe displaced MB difference is smaller than a threshold,then the “inter�MC” mode is selected, and the MV isencoded by using DPCM and transmitted. The displacedMB difference block is DCT encoded.

(2) Otherwise, an MV will not be transmitted, and a decisionbetween the “inter” and “intra” modes needs to be made. Ifthe variance of the original MB is smaller than a threshold,then the “intra” mode is selected, where each original blockis DCT encoded. First, the DCT of each 8�8 block isperformed. Then the transformed coefficients are quantizedand the quantized values are encoded using a variable-length code, such as DPCM and combination of Huffmancode and run-length code. Otherwise, the “inter” (with zeroMV) mode is selected. Similarly, the MB difference blockis DCT encoded.

Typically, the issues of the ME method, the criterion for select-ing a suitable coding mode and the criterion for selecting to either

Figure 1. Block diagram of the H.261 encoder.

74 Vol. 12, 73–83 (2002)

Page 3: Design and implementation of real-time software-based H.261 video codec

transmit or skip a MB are not subject to H.261 standard recommen-dation, and left to the particular implementation, adaptively depend-ing on the complexity of the input video and the output data rateconstraints.

In the H.261 decoder, the received data stream is parsed and thenprocessed by a VLC decoder to generate the MV or the quantizedDCT coefficients. The coefficients are de-quantized and followed byIDCT. Depending on the coding mode, macroblocks from a priorframe may also be added to the current data to form the recon-structed image sequence.

III. REAL-TIME SOFTWARE-BASED H.261 CODECWhile the existing video compression techniques, such as MPEGand H.261, can achieve higher compression ratio by utilizing inter-frame correlations, the fact that they exploit ME/MC to reducetemporal redundancies makes the real-time encoding complicated,and even infeasible. Since encoding speed have become a majorconsideration, especially in real-time applications, a simple andefficient ME search algorithm with low computation load is pro-posed in this section. Meanwhile, some techniques for reducing thecomputation requirements of encoding process are also employed inthe implementation.

A. Memory Mapping. The H.261 video codec designed in thispaper is implemented by using C-language under Microsoft Win-dows system. The input video data coded in the codec are allocatedphysically in 1-D memory while they are actually represented by2-D array. Moreover, in H.261 the data format is arranged hierar-chically. Accessing a specific MB in a specific GOB always requiresan address transformation to get its correct address. For instance, in

CIF format the initial address of 10th MB in the 3rd GOB is mappedinto the linear location by

��3-1�/2� � �3 � 16 � 352� � ��3-1�mod2� � 16 � 11

� �10-1�/11 � 352 � �10-1� � 16 � 17,040. (1)

The above transformation uses a number of multiplication, division,and addition operations. In fact, these operations may be removed toimprove coding efficiency. Thus, we adopt a fast table look-upprocess via a memory-mapping table instead of the address trans-formation process.

The encoder builds the memory-mapping table before encoding.The number of columns in the table is equal to the total number of MBsin a frame. For instance, there are 396 (33 MBs�12 GOBs) columns inCIF format while 99 columns in QCIF. The indexing order in the tableis the same as coding order in encoding process. Since the table is builtoff-line before encoding, it only takes one table look-up operationwithout memory address computation for each MB data access so thatmemory access time can be greatly reduced.

B. Fast Quantization Table. With an encoder, quantizationprocess frequently requires a lot of comparison instructions anddivision computations, and thus making encoding efficiencyworse. Empirically, it may take 10 –15% of total encoding time.In the implementation we also use a fast table look-up process viaa preset quantization table instead of time-consuming quantiza-tion computation. Since the DCT transform coefficients in H.261range from �2,048 to 2,047, the quantized values range from�127 to 127 as the quality factor Q is specific (e.g., Q � 16).Accordingly, for each value of Q we allocate an array with 4,096elements in which each location stores the corresponding quan-tized value. If the implementation allows Qmax different qualityfactors, totally Qmax 1-D arrays, each with 4,096 elements, areneeded. In the quantization process, the quantized coefficientmay be retrieved via one table look-up operation without com-putation so that the processing time in quantizer can be greatlyreduced. Experimental result shows that the operation can speedup the quantization process by 4 – 6 times.

C. Fast VLC Decoding. The H.261 standard suggests fiveclasses of VLC coding tables, including MBA, MTYPE, MVD, CBPand TCOEFF. In the encoder, table look-up process is performed inthe tables to quickly retrieve the corresponding bit stream code. Inthe decoder, we use a binary tree as the decoding tree. For thepurpose of fast decoding, the binary decoding tree is realized by a2�N 2-D array rather than a linked list structure. Each column ofthe array is treated as left sub-tree or right sub-tree, respectively. Anexample of the 2-D array realization for the binary decoding tree is

Table I. Comparison of number of search points using the differentalgorithms for four image sequences.

Football Tennis Foreman Ms. Am. Average

FS 961 961 961 961 961TSS 33 33 33 33 33NTSS 29.68 28.49 25.78 25.22 27.29FTSS 16.00 16.00 16.00 16.00 16.00MFTS

S 15.92 16.03 15.60 15.89 15.86

Figure 2. 2-D array realization of the decoding tree: example.

Vol. 12, 73–83 (2002) 75

Page 4: Design and implementation of real-time software-based H.261 video codec

shown in Fig. 2. Given the array in the decoder, starting from theindex 0, we may efficiently decode the bit streams, for example,“0101” and “10,” to be the values of X and Y, respectively.

D. Coding Mode Decision Criterion. When a frame is beingencoded, the MTYPEs of every MBs are determined by a codingmode decision criterion and are stored in an MTYPE array beforeencoding. The coding mode decision criterion designed in thissystem is described as follows.

(1) Use SAD measurement to get the MB block difference (BD)between the current MB and the referenced MB at the samereferenced position on the previous frame.

(2) If BD � thfast out, that is, the referenced MB should havegood image quality, then perform INTER coding mode andgo to Step (7). This is called the “fast jump” process.

(3) Perform the MFTSS search algorithm to get a motion vector(MV) and compute the displaced block difference (DBD)between the current MB and the referenced MB.

Figure 3. Reconstructed images with cumulative error of “Ms. America.” (a) FTSS and (b) Our MFTSS. From left to right: frame 5, frame 10,frame 20, and frame 30.

Figure 4. Reconstructed images with cumulative error of “Table Tennis.” (a) FTSS and (b) Our MFTSS. From left to right: frame 5, frame 10,frame 20, and frame 30.

76 Vol. 12, 73–83 (2002)

Page 5: Design and implementation of real-time software-based H.261 video codec

(4) If DBD � thintra, that is, the image quality of the referencedMB is unfavorable, then discard the referenced MB andreturn INTRA mode and go to Step (7).

(5) If BD � DBD, that is, the referenced MB at the same positionfound in Step (1) is better than the referenced one found in Step(3), then return INTER mode and go to Step (7).

(6) If DBD � thfilter, that is, the quality of the referenced MB isnot good enough, then return INTER�MC�FIL mode; Oth-erwise, return INTER�MC mode.

(7) Go to Step (1) to process the next MB.

It is noted that the “fast jump” process used in (1) and (2) is ableto greatly reduce extra ME search and shows about 30% improve-ments on motion estimation performance.

E. A New ME Search Algorithm. The FTSS algorithm (Kim,1998) is based on TSS (Koga, 1981) and the SES (Lu, 1997) algo-rithms. Basically, it is a fast search model made to avoid unnecessarysearching steps. By using the FTSS to search for an MV, the number ofsearch points takes 16 and is only about half of that by using the originalTSS algorithm (see Table I). However, the reconstructed images withthe cumulative error shown in Figure 3(a) and Figure 4(a) reveals thatthe FTSS seriously sacrifices the image quality.

In this section, a new ME method, called “modified fast three-step search” (MFTSS), is proposed to improve the image qualitycoded by the FTSS algorithm without sacrificing its searching speed.In general, the images with objects, e.g., a speaker, in video-conferencing and videophone applications usually have a staticbackground and the speaker only makes few head movements andhand gestures. Hence, the almost zero MVs are usually producedfrom the image frame’s ME processing. The idea of the MFTSSalgorithm is shown in Figure 5 and described as follows.

By using the idea of NTSS, the MFTSS algorithm searches eightadditional search points nearby the central point except those at thefirst step searching in the FTSS. If the best-matched point found islocated at one of the eight search points, then the new three searchpoints along the direction of MV, as shown in Figure 5, are furtherchecked to obtain the better one. Otherwise, the algorithm followsthe FTSS procedure.

Table I shows the comparison of the number of search points (orSAD computations) using the FS (ITUT, 1993), TSS (Li, 1994), NTSS

(Lu, 1997), FTSS (Kim, 1999), and our MFTSS algorithms for fourimage sequences. The results show that our new algorithm reduces thenumber of search points in the NTSS by 42%, and even operates lessthan the FTSS. Figure 6 demonstrates the comparison of PSNR be-tween the FTSS, NTSS, and MFTSS for “Ms. America,” “Table Ten-nis,” and “Foreman,” respectively. The improvements in PSNR to theFTSS are 0.72 dB, 1.4 dB, and 1.15 dB, respectively. The improve-ments in PSNR to the NTSS are 0.24 dB, 0.05 dB, and 0.4 dB,respectively.

From the reconstructed images with the cumulative error shown inFigure 3 and Figure 4, our MFTSS algorithm obtains a much better

Figure 5. The proposed MFTSS algorithm.

Figure 6. Comparison of PSNR between FTSS, NTSS, and MFTSS:(a) “Ms. America,” (b) “Table Tennis,” and (c) “Foreman.”

Vol. 12, 73–83 (2002) 77

Page 6: Design and implementation of real-time software-based H.261 video codec

image quality than the FTSS. Figure 7 illustrates that the executionspeed of ME process is increased 17 times over the PVRG codec byusing the MFTSS algorithm, MMX instruction, and fast jump.

F. Distortion Measurement. Distortion measurement is essen-tial for determining the matching result in ME process. Mean-squared-error (MSE) and sum of absolute difference (SAD) arecommonly used. The SAD is defined as

SAD � �i

�ai � bi�, (2)

where ai and bi are the gray level of pixels. Since ME is most time-consuming in encoding process, to reduce the computation complexitywe choose SAD as the distortion measurement in the implementationbecause it only involves addition and subtraction unlike MSE.

G. Motion Estimation Speedup Using MMX Instructions.The MMX register is 64 bits long so it may access eight image pixelsof one byte simultaneously. From the viewpoint of a processor, the data

Table II. Frame rate of our H.261 codec system.

Ms.America Foreman Author

Encoding rate(fps) 33.61 21.31 105.41

Decoding rate(fps) 168.35 73.42 249.58

Figure 8. Fast SAD computation using MMX instruction.

Figure 7. Encoding time distributions: (a) “Ms. America,” (b) “Fore-man,” and (c) “Author.”

Figure 9. Coding performance in bit rates for (a) “Ms. America,” (b)“Foreman,” and (c) “Author.”

78 Vol. 12, 73–83 (2002)

Page 7: Design and implementation of real-time software-based H.261 video codec

transfer speed between a processor and a memory is twice that betweena processor and a register. Therefore, one data access with 64 bits longis more efficient than to access the same data by eight access operationsin each with 8 bits long (Lempel, 1997; Barad, 1996).

Moreover, Intel’s MMX™ instruction set offers unsigned satu-ration instruction that can be used to compute absolute value morequickly than operands comparison does. A computation of absolute

value of subtraction (i.e., �a-b�) may be replaced by two psubusband a por operations, as shown in Figure 8. Since the size of an MBis 16�16, we only need to perform 32 operations of absolute valueof subtraction rather than 256 high-level absolute value computa-tions to obtain the SAD of an MB so that the searching speed of ourMFTSS algorithm can be greatly enhanced. The pseudocode is listedas follows.

load[Source_Data1],FrameData/*loaddatafromthecurrentframe*/load[Source_Data2],FrameData/*loaddatafromthecurrentframe*/load[Estimation_Data1],PrevFrameData/*loaddatafromthepreviousframe*/load[Estimation_Data2],PrevFrameData/*loaddatafromthepreviousframe*/psubusb[Source_Data1],[Estimation_Data1]/*unsignedsaturationsubtraction*/psubusb[Estimation_Data2],[Source_Data1]/*unsignedsaturationsubtraction*/por[Source_Data1],[Estimation_Data2]/*computingresult�a-b�*/

IV. RESULTS AND DISCUSSION

A. Testing Environment. In the experiment, three CIF se-quence images, “Ms. America,” “Table Tennis” (150 frames),and “Foreman” (250 frames), and a QCIF sequence image “Au-thor” (150 frames) captured by the author are tested for our H.261video codec. The program runs on a Pentium II 400 MHZ desktopPC. The resultsare compared with those of the portable videoresearch group (PVRG) H.261 codec designed by Stanford Uni-versity.

B. Coding Efficiency. The test results shown in Figure 7 re-veal that two of the most time-consuming processes in encodingare ME and quantization. By applying the MFTSS search algo-rithm and programming techniques, such as MMX instruction setand fast jump, to the implementation of software-based H.261coder, a great encoding speed improvement in ME and quanti-zation is achieved. The encoding time distributions by Stanford’sPVRG H.261 codec and our H.261 codec are shown in Fig. 8. Inour implementation, the workload of ME is reduced to below10% of the whole system efficiency. By using the quantizationtable, it can also greatly improve the performance of quantiza-tion.

The frame rates of the PVRG H.261 encoder are 7.92 fps, 4.32fps, and 32.63 fps for encoding “Ms. America,” “Foreman,” and“Author,” respectively. Table II shows that the frame rates of ourH.261 encoder are 33.61 fps, 21.31 fps, and 105.41 fps, respectively.Our H.261 can achieve very high decoding rates that are 168.35 fps,73.42 fps, and 249.58 fps, respectively. As a result, the implemen-tation of our H.261 codec performs better in encoding efficiency ascompared with the PVRG H.261 codec.

C. Coding Performance. Figure 9 demonstrates the test resultsin bit rate achieved by our codec. The comparison of coding per-formance in PSNR between our codec and the PVRG codec isillustrated in Figure 10. Our codec is not as good as the PVRGH.261 codec because the “fast jump” process and fast search MEalgorithm are used in our implementation while full search ME isused in the PVRG codec. However, the coding efficiency is moreimportant than the decoded image quality in videophone application.The reconstructed image sequence of “Ms. America,” “Foreman,”and “Author” are shown in Figure 11, Figure 12, and Figure 13,respectively.

V. CONCLUSIONIn this paper, a new ME search algorithm is proposed. Mainly, theencoder and decoder of H.261 system are implemented on PC. Our

Figure 10. Coding performance in PSNR for (a) “Ms. America,” (b)“Foreman,” and (c) “Author.”

Vol. 12, 73–83 (2002) 79

Page 8: Design and implementation of real-time software-based H.261 video codec

Figure 11. Reconstructed image sequence of “Ms. America.”

80 Vol. 12, 73–83 (2002)

Page 9: Design and implementation of real-time software-based H.261 video codec

Figure 12. Reconstructed image sequence of “Foreman.”

Vol. 12, 73–83 (2002) 81

Page 10: Design and implementation of real-time software-based H.261 video codec

Figure 13. Reconstructed image sequence of “Author.”

82 Vol. 12, 73–83 (2002)

Page 11: Design and implementation of real-time software-based H.261 video codec

codec achieves encoding rate over 30 fps and 105 fps for CIF andQCIF formats, respectively, on Pentium II 400MHZ PC. The de-coding rate is very high. Overall, our H.261 codec can meet theneeds in real-time applications. In our implementation, DCT be-comes the bottleneck to coding efficiency. Hence an improvement tothe DCT computation efficiency, design of a more efficient algo-rithm for further improving the performance and efficiency of ME,and an intelligent criterion for deciding coding mode will be con-ducted in our future research.

REFERENCES

K.R. Rao and J.J. Hwang, Techniques and Standards for Image Video andAudio Coding, Prentice-Hall PTR, 1996.

ITU-T Recommendation H.261, Video codec for audiovisual services atpx64 Kbits/s, Mar. 1993.

A. Bellini, F. Del Lungo, F. Gori, R. Grossi and M. Guarducci, A fast H.261software codec for high quality videoconferencing on PCs, IEEE Int ConfMultimedia Comp Syst, vol. 2, pp. 1007–1008, 1999.

D.-Y. Hsiau and J.-L. Wu, Real-time PC-based software implementation ofH.261 video codec, IEEE Trans Consumer Elec, vol. 43, No. 4, Nov. 1997.

T. Moriyoshi, H. Shinohara, T. Miyazaki and I. Kuroda, Real-time softwarevideo codec with a fast adaptive motion vector search, IEEE Workshop SigProc Sys (SiPS), pp. 44–53, 1999.

W. Tan, E. Chan and A. Zalchor, Real-time software implementation ofscalable video codec, IEEE Proc Inter Conf Imag Proc, vol. 1, pp. 17–20,1996.

K. Wang, J. Normile, H.-J. Wu, D. Ponceleon, K. Chu and K.-K. Sung, Areal-time software-only H.261 codec, ICASSP, vol. 4, pp. 2719–2722, 1995.

A. Young, Software CODEC algorithms for desktop videoconferencing,IEEE Proc of the 37th Midwest Symp on Circ and Sys, vol. 2, pp. 896–899,1994.

H.-C. Huang and J.-L. Wu, Novel real-time software-based video codingalgorithms, IEEE Trans Consumer Elec, vol. 39, No. 3, Aug. 1993.

H.-C. Huang and J.-L. Wu, New generation of real-time software-basedvideo codec: popular video coder II (PVC-II), IEEE Trans on ConsumerElec, vol. 42, No. 4, Nov. 1996.

O. Lempel, A. Peleg and U. Weiser, Intel’s MMX™ technology—a newinstruction set extension, IEEE Proc Compcon ’97, pp. 255–259, 1997.

H. Barad, B. Eitan, K. Gottlieb, M. Gutman, N. Hoffman, O. Lempel, A.Peleg and U. Weiser, Intel’s multimedia architecture extension, NineteenthConvention of Elec and Elec Eng in Israel, pp. 148–151, 1996.

A. Peleg and U. Weiser, MMX technology extension to the Intel architecture,IEEE Micro, vol. 164, pp. 42–50, Aug. 1996.

R. Hashemian, Memory efficient and high-speed search Huffman Coding,IEEE Trans Comm, vol. 34, No. 10, pp. 2576–2581, Oct. 1995.

W.H. Chen, C.H. Smith, and S.C. Fralick, A fast computation algorithm forthe discrete cosine transform, IEEE Trans Comm, vol. COM-25, pp. 1004–1009, 1977.

J.R. Jain and A.K. Jain, Displacement measurement and its application ininterframe image coding, IEEE Trans Comm, vol. COM-29, No. 12, pp.1799–1808, Dec. 1981.

T. Koga, K. Iinuma, A. Iijima, and T. Ishiguro, Motion-compensated inter-frame coding for video-conferencing, Proc NTC81, pp. C9.6.1–9.6.5, NewOrleans, LA, 1981.

R. Li, B. Aeng and M.L. Liou, A new three-step search algorithm for blockmotion estimation, IEEE Trans Circuits Sys for Video Tech, vol. 4, pp.438–442, Aug. 1994.

J. Lu and M.L. Liou, A simple and efficient search algorithm for block-matching motion estimation, IEEE Trans on Circuits and Sys for Video Tech,vol. 7, pp. 429–433, Apr. 1997.

J.N. Kim and T.S. Choi, A fast three-step search algorithm with minimumchecking points using unimodal error surface assumption, IEEE Trans onConsumer Elect, vol. 44, No. 3, pp. 638–648, Aug. 1998.

J.N. Kim and T.S. Choi, A fast motion estimation for software basedreal-time video coding, IEEE Trans Consumer Elec, vol. 45, No. 2, May1999.

A.M. Teklap, Digital Video Proc, Prentice-Hall, 1995.

Vol. 12, 73–83 (2002) 83