04042353

8/2/2019 04042353

1/4

Temporal Motion Prediction for Fast Motion Estimation in Multiple Reference

Frames

G Nageswara Rao, PSSBK Gupta

Emuzed A Flextronics Company, Bangalore, India

Email: {nageswararao.gunupudi, shyam.pallapothu}@flextronicssoftware.com

Abstract--- In the new video coding standard,H.264/MPEG-4 Part 10, motion compensation is allowed

to use multiple reference frames that improves the rate

distortion performance but at the cost of drastic increase

in complexity. The increased computation is in

proportion to the number of searched reference frames.

However, the reduction of prediction residues is highly

dependent on the nature of sequences. In this paper, we

present a fast technique to predict the motion vector in

reference frames to speed up the matching process for

multiple reference frames. The proposed technique isbased on choosing the motion centre and carrying the

search around the centre with a radius of 1 or 2 pixels in

all reference frames which exception of the one which

immediately precedes the current frame. For the

reference frame that immediately precedes the current

frame any motion estimation technique can be used. The

results show that the proposed technique reduces the

computational requirements down to that required for

single reference frame motion estimation with only a

negligible loss of objective quality.

Index terms--- H.264/AVC, Motion Estimation,

Multi Frame Motion Compensation (MFMC)

I. INTRODUCTION

H.264/AVC is the newest video coding standard [1] of the

ITU-T Video Coding Experts Group and the ISO/IEC

Moving Picture Experts Group. H.264/AVC introduced lot

of new coding and error resilient tools and as a result it has

achieved a significant improvement in rate-distortion

efficiency relative to existing standards, up to 50% of bit-

rate reduction is achieved over MPEG-4 advanced simple

profile. Motion compensation process defined in H.264 at

quarter-pixel accuracy with variable block sizes and multiple

reference frames greatly reduces prediction errors. Multi

frame motion compensation (MFMC) is one such tool that

provides significant coding gain and better error robustness

as well. The multi-frame buffer stores frames at encoder and

decoder those are efficient for motion-compensated

prediction. Multi-frame motion estimation was proposed as

a technique to improve the error-resilience of compressed

video by Budagavi and Gibson [2]. The multi-frame motion

compensation (MFMC) coder makes use of the redundancy

that exists across multiple frames in typical video-

conferencing sequences to achieve additional compression

over that obtained by using the single frame motion

compensation (SFMC) approach [3]. The advantage of the

MFMC approach is that it is more robust to error

propagation when compared to the traditional SFMC, as it

makes use of information from multiple frames.

As a consequence of many prediction tools in motion

compensation the motion estimation process in H.264

becomes further more complex. The Multi-frame motion

compensation improves the rate distortion performance

substantially by introducing much higher loading to the

system. Without considering temporal correlations between

multiple reference frames, conventional single-frame searchalgorithms can still be applied to multi-frame motion

estimation, but using a rather inefficient frame-by-frame

approach. But it increases the complexity of Motion

Estimation (ME) by number of reference frames times to

that of single frame motion estimation, as ME needs to be

carried for all reference frames. This is not feasible for most

cases in real time implementation. Nevertheless, the

decrease of prediction residues depends on the nature of

sequences. Sometimes the prediction gain is very significant,

but sometimes a lot of computation is wasted without any

considerable bit rate reduction. As a consequence several

fast techniques are developed recently for motion estimation

in multiple reference frames. We present an effectivealgorithm to accelerate the multiple reference frames ME

without significant loss of video quality.

In this paper we proposed an algorithm, which makes use

of the already computed motion vectors with respect to the

first reference frame i.e. the reference frame closer to the

current reference frame for prediction of best motion vectors

for second and subsequent reference frames. The proposed

scheme is to minimizing the computational overheads

resulting from motion estimation of reference frames other

than first reference frame. However the proposed technique

is independent of ME algorithm used for first reference

frame. The main goal of the algorithm is to perform a fast

search in all reference frames other than first referenceframe while maintaining a PSNR quality similar to that

would be obtained when full search block-matching

algorithm (FSBMA) is used in each reference frame. The

rest of the paper presents various statistics of multi-frame

motion estimation (Section II), proposed motion vector

prediction system and algorithm for motion estimation

(Section III & IV), simulation results (Section V) and

conclusions (Section VI).

2006 IEEE International

Symposium on Signal Processingand Information Technology

0-7803-9754-1/06/$20.002006 IEEE 817

8/2/2019 04042353

2/4

II. MULTI FRAME MOTION ESTIMATION

Multi-frame motion estimation extends the temporal

displacement vectors utilized in the block-matching video

coding by permitting the use of more frames than the one

that previously decoded for the motion-compensated

prediction. The use of multiple frames for the motion

estimation in many cases provides significant improvement

in coding gain [5] and also provides better error robustness.

It is well agreed that motion estimation is the complex

module of standard based video encoders. And the

complexity increases N times with motion vector search in N

reference frames. Nevertheless, the decrease of prediction

residues depends on the nature of video content. The newly

proposed standard H.264 supports the hybrid block motion

compensation with multiple reference frames, which

significantly increases the complexity of video encoders for

real time implementations. Given an inter-mode, the

reference software JM9.5 adopts a full search and carries out

the matching process in all reference frames one by one. The

best mode is chosen by minimizing a Lagrangian cost

function, which considers both 2-D 4x4 Hadamard

transformed SAD (SATD) and number of bits required to

code the side information. Table.1 gives the PSNR results

and complexity of ME in seconds for different sequences

with one, two and three reference frames at bit rate of 512

Kbps. Table.2 shows the percentage of references from first

and other reference frames. It is observed that the number of

macro blocks that are referenced in farthest reference frames

decreases. It also can be observed that the number of

references from all other reference frames is less than that of

the first reference frame.

Table.2 shows that maximal probability for references for

motion compensation are from the first reference frame

except for Salesman sequence, but the references from the

other reference frames is also not insignificant, which shows

the scope for designing fast and accurate ME techniques in

secondary reference frames. A PSNR loss is observed in

salesman sequence (Table.1) from two reference frames to

three reference frames, which is justified by its referenceframe statistics shown in Table.2 where increase in overhead

bits for reference frame index signaling in three reference

frames case is not compensated by its residual error gains.

Frames and Motion Vectors classification in MFMC:

The statistics and observations presented in the previous

section clearly gives a specific importance for the reference

frame that temporally precedes the current frame and motion

vectors corresponding to that reference frame. Hence here a

classification for the reference frames and motion vector

used in MFMC is presented. We termed the reference frame

that immediately precedes the current frame as primary

reference frame (PRF) and rest all as secondary reference

frames (SRF). Similarly the motion vectors obtained withrespect primary reference frames are termed as primary

motion vectors (PMV) and rest all secondary motion vectors

(SMV). This classification of primary and secondary

reference frames and motion vectors adds clarity for motion

estimation process in terms of complexity-accuracy trade-

offs. The proposed technique considers the motion

estimation in secondary reference frames And the motion

estimation algorithm in primary reference frames is allowed

to choose any fast technique.

III. TEMPORAL PREDICTION OF MOTION VECTOR

The temporal prediction algorithm proposed here is based

on the fact that the motion of macro blocks can be trackedtemporally over the frames through the primary motion

vectors in the frames in between current frame and reference

frame. The best match for a block in current frame with a

block in a reference frame can be tracked temporally by

adding the motion vectors in the every two neighboring

frames that are between the current frame and reference

frame i.e. primary motion vectors of frames between the

current frame and reference frame.Table.2. References frames statistics with 10

reference frames

Table.1. Quality (PSNR) and Complexity for Multiple reference frame Motion Estimation

818

8/2/2019 04042353

3/4

It is empirically observed that the motion across the frames

is linear and smooth. Hence tracking the motion between

every two frames using primary motion vectors and

summing up the motion vectors of every adjacent frame

between the current frame and the secondary reference can

identify best match of a block in current frame

Lets say the current block is at (x, y) in current frame

and found to be best matched at (x

1

, y

1

) location in primaryreference frame, i.e (x x1, y y

1) is the primary motion

vector (PMV) of the block. Then it can be strongly justified

that the block (x, y) can be closely matched with the block

that is best matched for (x1, y

1). But the block (x

1, y

1) may

not be aligned with a block boundary of the reference frame

at T-1, (x1, y

1) is approximated to nearest block boundary

(xb1, yb

1) and motion vector of block (xb

1, yb

1), say (mvx

2,

mvy2), is used as the predicted motion vector for the

reference frame at T-2. If (xb1, yb

1) falls out of the frame

boundary, it is approximated to nearest block position. Then

it is observed that the best match for the current block is

more likely to around the (x2, y

2) in first secondary reference

frame. Hence the motion search for the current block in itsfirst secondary reference frame can be carried out around

(x2, y

2). Thus search center of motion estimation in first

secondary reference frame can be formulated as follows.

Fig.2 explains the temporal motion tracking system for

secondary reference frames.

(mvx2, mvy

2) = PMV of (xb

1, yb

1)

(x2, y

2) = (xb

1+ mvx

2, yb

1+ mvy

2), and

xb1

= { (x1

+ BlockSize/2) >> B }* BlockSize

yb1

= { (y1

+ BlockSize/2) >> B } * BlockSize

where B = log2 (BlockSize) (1)

The statistics to show deviation of predicted motion vector

from the finally best motion vector in secondary reference

frame are given for foreman and mobile sequences in Fig.1.

It is showing that about 80% to 90% of predicted motion

vectors are falling with in 2 pixels displacement from the

best motion vector and 60% of predicted motion vectors are

falling with in 1 pixels displacement from the best motion

vector. Thus motion search around the predicted motion

vector calculated as described in Eq.1with search range of 2

pixels is sufficient for finding the best match of the given

macro block in the given reference frame. The PSNR results

are presented for fast motion estimation with search range of

2 and 1 (referred as 1x1 and 2x2 searches respectively)

are given Table 3. To further speed up the computation of

motion vector of secondary reference frames, we havechosen a iterative pattern with 1 search area, where the

search is carried first in 1 search area and complete the

search if the best motion vector is found at center of search

area, other continue search in 1 range for best motion

vector around the best motion vector position of the current

search area. Table 3 also shows the results for the iterative

search strategy with search window 1 pixels with two

iterations. The early exit of iterative technique enables much

faster convergence of motion estimation in secondary

reference frames. Fig.3 gives various alternative search

patterns that can be applied for search around search center.

Fig.3. Search Pattern for Secondary reference

frames Motion estimation

Fig.2. Motion vector tracking over the frames to predict

in Motion vector in secondary referenceframes

T

T 1

(PRF)

T 2

SRF

(x2

,y

2)

(x1

, y1)

(x, y)

Fig.1

819

8/2/2019 04042353

4/4

IV. PROPOSED ALGORITHM

The observation made in previous sections lead to a fast

motion estimation algorithm for secondary reference frames.

This section presents the fast motion estimation algorithm

for secondary reference frames. The seed point for the

motion search is obtained from the primary motion vectors

(PMVs) of the frames that are between the current frame and

reference frame under ME search. However to includedifferent motions and make algorithm more robust, we

include zero motion vectors (ZMV) and predicted motion

vector defined in H.264 standard (SPMV) in seed point

selection. Among the ZMV, temporally predicted motion

vector, and standard predicted motion vector (SPMV) the

motion vector which result in least rate-distortion cost is

selected as seed point for motion search and iterative search

is with in 1 search area is carried about the seed point. The

algorithm for motion estimation with multiple reference

frames is summarized bellow.

Step1: Perform the motion estimation in primary reference

frame (Any fast technique would be applied). And store the

primary motion vectors.Step2: Compute temporal motion vector predictor using the

primary motion vectors of the frames between the current

frame and reference frame under search.

Find the seed for the ME in first secondary reference frame

as the primary motion vector of the block in the primary

reference frame which is maximally covering the best match

of the current block of the current frame.

Step3: Choose the best seed among zero motion vector,

standard predicted motion vector and temporally predicted

motion vector derived in step 2 by minimizing the R-D cost.

Step4: Search the best match for the current block in

secondary reference frame around the best seed computed in

step3 with radius of 1.Step5: If the best MV resulted in step 4 is the search center

i.e seed point, then decide that as the best MV for the

current block and go to step 6. Otherwise the best MV in the

1 search is then chosen as search center and continue best

MV search about new search center with 1 search range.

Step6: Repeat step 2 to step 5 for all the secondary

reference frames.

V. SIMULATION RESULTS

JM9.5 reference software is used for generation of

simulation results for the proposed algorithm. All the results

are generated in Baseline profile for CIF (352x288)

resolution at 512 kbps and at frame rate of 15fps. The same

techniques are verified with different bit rates. Foreman,

Hall, football, Claire, Flower, and Mother and Daughter

sequences are used. Only I and P frames are used and GOVlength of 60 is used. Search range of 32, Inter prediction

modes up to 8x8 blocks are used. Table.3 give the

comparison of PSNR quality and Motion estimation time in

seconds with full search algorithm and proposed techniques

with two reference frames being used for motion

compensation. Full search block matching algorithm is used

for motion estimation in primary reference frame.

VI. CONCLUSIONS

A fast technique for predicting motion in multiple reference

frames is presented. The proposed algorithm scales down

the complexity of motion estimation in multiple reference

frames scenario close to that required for single referenceframe motion estimation with negligible loss of PSNR. The

proposed scheme minimizes the memory traffic for hardware

implementations as the search area is minimized in motion

estimation.

REFERENCES

[1] Joint Video Team of ITU-T and ISO/IEC JTC 1, ITU-T

Rec. H.264 ISO/IEC 14496-10 AVC, March 2003.

[2] M. Budagavi and J. Gibson, Multi-frame Block Motion

Compensated Video Coding for Wireless Channels, in

Thirtieth Asilomar Conf. on Signals, Systems, and

Computers, vol. 2, pp.953-957, Nov. 1996.

[3] T. Wiegand, G. J. Sullivan, G. Bjontegaard and A. Luthra,

Overview of the H.264 / AVC Video Coding Standard,IEEE Transactions on Circuit and Systems for Video

Technology,VOL. 13, NO. 7, July 2003

[4] Yi-Hon Hsiao, Tien-Hsu Lee, Pao-Chi Chang:

Short/long-term motion vector prediction in multi-frame

video coding system. ICIP 2004: 1449-1452.[5] T.Wiegand, X.Zhang, and B.Girod Long-term memory

motion compensated prediction, IEEE Trans. Circuits Syst.

Video Technol.., vol 9, no.1, pp.7084, Feb. 1999.

Table.3. Quality (PSNR) and Complexity of different fast MFMC schemes proposed with two reference frames

820

Documents

04042353