12
ELSEVIER Signal Processing: Image Communication 6 (1994) 241-252 SIGNAL PROCESSING: IMAGE COMMUNICATION Motion estimation using combined shape and edge matching Tero Koivunen*, Jouni Salonen Nokia Research Center, hnage Processing Group, Kanslerinkatu 8, 33720 Tampere. Finland Received 31 July 1993;revised 10 December 1993 ~bstract ! This paper describes a new motion estimation algorithm that can be used in motion compensated standards conversions. As an example, we propose a scanning rate conversion method for motion compensated field rate upconversion (FRU) from an interlaced 50 Hz to progressive 100 Hz format. Motion estimation is done on the block matching basis using a bit level classification approach (BLC). Motion vector selection is controlled by both the cluantized rough shape and the edge information of the image. Non-linear vector postprocessing is used to obtain a consistent and uniform vector field. Block segmentation is used to improve the accuracy of the vector field. Key words: Motion estimation; Block matching; Edge detection; Segmentation; Interpolation 1. Introduction The advances in the VLSI technology and the breakthrough of digital techniques in consumer electronics have made it possible to integrate and further improve television picture quality by means of digital signal processing. The first pioneering television receiver concepts appeared in the early 1980s. Since that time, scan rate conversions and other DSP techniques have been developed, the most sophisticated of which are operated in tem- poral domain, taking into account the effect of motion. Scan rate conversions are required in standards conversions, where both the field rate and the num- ber of lines are different. In compatible HDTV and computer applications such as multimedia, further *Corresponding author. 0923-5965/94/$7.00 1994 ElsevierScience B.V.All rights reserved SSDI 0923-5965(94)00010-6 conversions such as deinterlacing, pixel ratio con- version, and picture aspect ratio conversion are needed. To overcome the undesired effects of the low refresh rate, field rate upconversion (FRU) from 50 Hz to I00 Hz can be used. The upconversion methods currently employed in advanced state- of-the-art television sets are simple field or frame repetition methods, often referred to as AABB and ABAB methods, respectively. The main disadvan- tage of these methods is poor motion portrayal I-8,11]. Non-linear filtering techniques, such as temporal median based algorithms, have a prop- erty of inherently adapting to motion I-2,3, 6, 9]. In the so called motion adaptive systems, interpola- tion is controlled by a motion decision signal. implying separate processing for stationary and mov- ing regions. By switching the signal processing dur- ing occurrences of image motion, motion portrayal can be improved [15]. With motion estimation, the

Motion estimation using combined shape and edge matching

Embed Size (px)

Citation preview

ELSEVI E R Signal Processing: Image Communication 6 (1994) 241-252

SIGNAL PROCESSING:

IMAGE COMMUNICATION

Motion estimation using combined shape and edge matching

T e r o K o i v u n e n * , J o u n i S a l o n e n

Nokia Research Center, hnage Processing Group, Kanslerinkatu 8, 33720 Tampere. Finland

Received 31 July 1993; revised 10 December 1993

~bstract

! This paper describes a new motion estimation algorithm that can be used in motion compensated standards conversions. As an example, we propose a scanning rate conversion method for motion compensated field rate upconversion (FRU) from an interlaced 50 Hz to progressive 100 Hz format. Motion estimation is done on the block matching basis using a bit level classification approach (BLC). Motion vector selection is controlled by both the cluantized rough shape and the edge information of the image. Non-linear vector postprocessing is used to obtain a consistent and uniform vector field. Block segmentation is used to improve the accuracy of the vector field.

Key words: Motion estimation; Block matching; Edge detection; Segmentation; Interpolation

1. Introduction

The advances in the VLSI technology and the breakthrough of digital techniques in consumer electronics have made it possible to integrate and further improve television picture quality by means of digital signal processing. The first pioneering television receiver concepts appeared in the early 1980s. Since that time, scan rate conversions and other DSP techniques have been developed, the most sophisticated of which are operated in tem- poral domain, taking into account the effect of motion.

Scan rate conversions are required in standards conversions, where both the field rate and the num- ber of lines are different. In compatible HDTV and computer applications such as multimedia, further

* Corresponding author.

0923-5965/94/$7.00 �9 1994 Elsevier Science B.V. All rights reserved SSDI 0923-5965(94)00010-6

conversions such as deinterlacing, pixel ratio con- version, and picture aspect ratio conversion are needed.

To overcome the undesired effects of the low refresh rate, field rate upconversion (FRU) from 50 Hz to I00 Hz can be used. The upconversion methods currently employed in advanced state- of-the-art television sets are simple field or frame repetition methods, often referred to as AABB and ABAB methods, respectively. The main disadvan- tage of these methods is poor motion portrayal I-8,11]. Non-linear filtering techniques, such as temporal median based algorithms, have a prop- erty of inherently adapting to motion I-2, 3, 6, 9]. In the so called motion adaptive systems, interpola- tion is controlled by a motion decision signal. implying separate processing for stationary and mov- ing regions. By switching the signal processing dur- ing occurrences of image motion, motion portrayal can be improved [15]. With motion estimation, the

242 T. Koirunen, J. Salonen ] Signal Processhtg: Image Communication 6 (1994) 241-252

static and moving branches are more integrated, as the range and direction of motion is computed. Ideally, there are no transition or switching arte- factsfromstationary to moving mode.

The effect of motion compensation in an FRU application is shown in Fig. 1, with a simple rotat- ing wheel example. In a standard AABB field rep- etition method, the motion phase of the previous field issimply repeated. With true motion compen- sation, a new motion pliase is createdl facilitaiing better motion rendition than with field repetition.

Motion is not directly observable from the inten- sity or luminance of the image produced by the camera, but must be estimated from the data se- quence. Motion is estimated between two or more original frames, the result of which is often de- scribed as a motion vector, having a length and direction corresponding to the estimated displace- ment. Motion estimation methods can be divided into two main classes: motion parameter extraction methods and pixel velocity measuring methods.

-in motion parameter extraction, movement is described in terms of parameters like translation, rotation and zooming, which are extracted from a sequence of moving two-dimensional objects. Due to their imaging limitations and complexity, motion parameter extracting techniques are not particularly well suited for motion estimation in television.

Pixel velocity measuring methods can be further divided into methods based on Fourier techniques, spatio-temporal differentials, and block matching 1-10,181.,

Fourier techniques involve correlation of two images: by. first performing a two-dimensional

field l field 2 " field 3 field 4

/ f - - - ~ 100ttz field _ ( ( " , ~

A A B B

((3 �9 �9 A mc(AB) B mc(BC)

Fig. I. The effect of motion compensation.

Fourier transform on each image, multiplying the corresponding frequency components, and per- forming a reverse transform on the result. The output is a correlation surface with a peak at the coordinates, corresponding to the shift between two images. If the amplitude of all the frequency components is normalized prior to the reverse transform, only the phase information is used. This te.chnique is often referred to as phase correlation. It is not affected by brightness changes, and has a good noise immunity. The method is well suited for measuring global motion, and with some addi- tional processing, accuracy can be improved [13].

Techniques based on spatio-temporal differen- tials rely on purely translational models. It is as- sumed that the intensity variation across a televi- sion field is a linear function of displacement. More complex movement can be estimated by restricting the estimation region. It is also necessary to assume that the brightness of a moving object does not change. Several differentials are calculated in the temporal, horizontal and vertical directions. The spatial displacement is given by these differential ratios, in units of pixels per frame. The implementa- tion is relatively simple. Such techniques work well for subpixel shifts, but usually fail for large move- ment [14, 19]. They can be improved by introduc- ing recursion, but convergence can be slow or does not occur at all.

A commonly used variant of the latter is block matching, where region-by-region estimation is used instead of pixel-by-pixel estimation. There are a number of match criteria to be used in block matching. Best known of them are mean absolute difference (MAD) and mean squared difference (MSD) criteria. In these approaches, average error values are calculated within a block, and the block location having the smallest error value within the search area is used to determine the motion vector. The MAD criterion is mostly used since its imple- mentation is feasible, even though its performance is not especially good. On the contrary, the MSD error criterion performs well but is rarely used due to its hardware complexity.

A matching criterion whose performance is close to that of the MSD and which is relatively easy to implement is pixel difference classification criterion (PDC) [5]. In PDC, the behaviour of individual

T. Koivunen. J. Salonen ] Signal Processing: hnage Communication 6 (1994) 241-252 243

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76271.50./i:.1 . . . . . . .

/ / t ,Ms.I I 625/50/2:! 1_ IEdgeDetect ioC_l Edge / [ Sha~ I I - - Z-land Coding ~ Matching N N Matching

. . . . . . . . . . . . . . . . . . . . . I. l i v,c,o, i. q W'~ I Selection Median Selection

I 10xMV

I Vector l_ 2MSB Segmentation [ - VECTOR MEMORY'

. . . . . - . . .__.. . .~_________._._._.__._._._._.__.__.___,..I~fI.MA..TO.R.

: ~ MOTION VECTOR I 625110011:1 16251100/2:1 1 ~ 625/5011:1 "l Interpolation "I Down- - sampling

Fig. 2. Block diagram of the motion estimator.

It--

samples is taken into account in the search: pixel is classified to be either a matching or a mismatching one. That is to say, a match count of a block corresponds to MSD or MAE of a block.

In the following, a new block matching algo- rithm is described. Section 2explains the basic block matching algorithm, Section 3 deals with postprocessing of the estimated vectors, and in Sec- tion 4, motion compensated interpolation from 50 Hz to 100 Hz field rate is taken as an example of a video display standards conversion.

Block diagram of the motion estimator in a motion compensated field rate upconversion application is shown in Fig. 2.

2. The block matching algorithm

The block matching algorithm consists of the following main functions: edge analysis, interlaced- to-progressive conversion, and block matching.

2.1. Edge detection and analysis

The edge analysis is carried out by measuring correlation of samples with ten differently oriented

operators, which altogether form a 6 x 3 detection window. Correlation values are considered as a main direction and three subdirections which together provide a hierarchical and robust way to select the most probable edge direction in a neigh- bourhood. The algorithm used here is based on the one described in [16], extended with horizontal edge detection.

Fig. 3 illustrates possible edge/filtering direc- tions; angles given are based on square pixels. The edge analysis produces edge information which is

135"

1460

154"

180"

116" 90 ~ 64 ~ 45*

L ~ 34~

26*

0 o

Fig. 3. Detected edge orientations.

244 T. Koirunen, J. Salonen / Signal Processhlg: hnage Communication 6 (1994) 241-252

coded with 4 bits and stored in a local memory to be used in interpolation and motion estimation. An example of edge detection and coding can be seen in Fig. 9, taken from the KEW GARDENS se- quence.

2.2. Edge adaptive h~terlace-to-progressive con~'egsion

The vertical resolution of the estimator is im- proved when progressive pictures are used for estimation. Additionally, the progressive picture format provides easier access to other scanning formats. Therefore, the first stage in the field rate upconversion procedure is interlace-to-progressive conversion.

Deinterlacing is based on edge analysis and the use of edge controlled FIR-median-hybrid (FMH) filter bank. The FMH-filters used are the most simple ones, consisting of a three-point median filter with I- or 2-tap FIR-filters as substructures. There is a dedicated FMH-filter for each direction. An introduction to FMH-filters can be found in [7].-The performance of the edge adaptive IPC algorithm is compared with four other methods in Section 5.

2.3. Block matchhzg

The block matching algorithm is affected, in ad- dition to the matching criterion, by block size and search area. An upper limit for the block size is imposed by the simple translational model, since rotation and zooming cannot be easily modelled.

A large matching block is needed to obtain a good estimate for global motion, but is not accu- rate enough to estimate the motion of a small object, or in case of covering and uncovering motion. A small matching block, on the other hand, may give multiple matches.

A large search area tends to increase the number of ambiguous vector candidates, especially if a small matching block is used. In particular, a region of repetitive structure may cause a non- uniform vector field. To alleviate this problem, the

16x16 Matching Block 50x50 Search Area in P(t) in P(t+T)

Fig. 4. Block matching.

search area can be divided into multiple overlap- ping annular regions, each of which results in a vec- tor candidate 117]. The choice of the search area could also be made based on a priori knowledge on the range of the motion to be estimated, but would also call for special hardware, e.g. a random access memory for the controlled block search.

A compromise between the resolution and the reliability of the vector field is made by fixing the matching block size to 16x16 pixels, and the search area to 50 x 50 pixels. In addition, the matching block is subsampled by a factor of 2 in a quincunx pattern, depicted in Fig. 4.

2.3.1. Matchhlg criterion The pixel difference classification (PDC) match-

ing criterion is based on the counting of matched samples by the absolute difference. Each sample in a reference block is classified to be a matching sample if and only if it differs only slightly from the corresponding sample in a target block. The greater the number of matching samples in a block the better the match for the block.

In order to avoid unnecessary vector search op- eration in stationary regions and thereby improve vector field quality, a simple motion detector is included in the frame memory block. Motion is detected between two progressive frames on a block basis. The pixel difference count is thre- sholded for each block to obtain a motion decision.

T. Koirunen, J. Salonen / Signal Processing: Image Communication 6 (1994) 241-252 245

The motion detection function could be replaced by introducing a zero motion vector in the MSE vector selection.

2.3.2. Quantization Prior to the matching operations, the incoming

luminance signal is quantized from 8 bits to 4 bits, to obtain rough shape information. It has been found that the discarded four LSBs do not have a significant effect on the reliability of the estima- tion. An example of a quantized image used in shape matching is shown in Fig. 10.

The signal is further quantized from 4 bits to 2 bits, to be used in vector segmentation. As a re- suit, the hardware complexity and the memory requirements of the system are reduced.

i In this motion estimation algorithm a bit level classification (BLC) is employed in block matching. The BLC is similar to the PDC, except for the matching criterion: an absolute difference and thre- stiold is replaced by quantization (4 bits) and equal- ity comparison. According to simulations this sim- plification has just a minor effect on the matching quality, but however, it decreases computation complexity essentially.

2.3.3. Shape matchhlg A sample match M at location x,y,t is defined

between two corresponding samples of different time instants, and is used to give the shape match (see Fig. 5).

M(x, y, t) =

10 if l [ x - ~Dx, y - CtDy, t -- aT] = l[x + (1 -- ct)Dx,y + (1 -- ~)Dy, t + (1 - cOT],

otherwise, O)

where x is the horizontal sample location, y is the vertical sample location, t is the temporal sample location, l is the intensity (luminance), D is the estimated spatial displacement, T is the temporal displacement between the fields and ct is the scaling factor according to the time displacement.

Simplifying this for pixel p(x,y) in thecase of field rate doubling (ct = 0.5), we get

M(p,t) =

10 if l [ p - D / 2 , t - T/2q = I [ p + D/a,t + T/2], otherwise.

(a) Further, a block match for n x m block B(i,j) is obtained by summing the sample matches:

B(i,j)= ~ ~. M(Xk,y,,t), (3a) k=l /=1

where B is subsampled with a quincunx pattern, depicted in Fig. 4. Thus, for Eq. (3a),

mod2(k + l) = 0. (3b)

2.3.4. Edge matchhlg In addition to the shape information, edge ori-

entation information is used in matching corres- pondingly, and is obtained from the edge adaptive deinterlacing module. A pixelwise edge information is coded with 4 bits and the BLC is employed exactly in the same way as in the shape matching. This means that there are two parallel and indepen- dent matching operations: one for 4-bit shape and another for 4-bit edge information, both giving the best match in the highest match count sense. Thus, two vector candidates are obtained for each block, and when the 3 x 3 neighbourhood is included for MSE selection, there are 18 possible vectors for each 16 x 16 pixel block. MSE is computed using the full 8-bit resolution.

Table 1 lists the average share of the two match- ing criteria for three test sequences. Both implies that both the shape match and the edge match result in the same vector. It can be seen that the

Table 1 Origin of the MSE selected vector

Sequence Shape match Edge match Both

BBC DISC 26% 14'/, 60% TRAIN 27% 10% 63% KEW GARDENS 35% 9% 56%

246 T. Koirunen, J. Salonen ] Signal Processhlg: bnage Communication 6 (1994) 241-252

percentage of edge match only is relatively small, about 10%, but it proved to be subjectively impor- tant. By introducing the edge matching criterion, the interpolated picture quality improved signifi- cantly.

An example of the coded edge orientation signal is shown in Fig. 9, where samples labelled to an edge are shown in different shades of gray, e.g. 0 ~ = 10%, 154 ~ = 20%, and so on to 26 ~ = 100%, which corresponds to a gray level of 255 for an 8-bit signal. Areas with no edge detected are shown in black. Edge orientations are shown in Fig. 3.

3. Motion vector processing

The motion vector field obtained from the MSE selection can be improved by smoothing and seg- menting in order to reduce block artefacts in the interpolated picture.

3.1. Vector field smoothing

The consistency of the final vector field is im- proved by vector median filtering. The vector me- dian operation has been found well suited for motion vector processing [12].

The vector median of the set vl, v2, v3 . . . . . vN is defined as Vvr, t, such that

Vv~l ~ {viii = 1,2,3 . . . . . N}, (4a)

and for all j = 1, 2, 3 . . . . . N,

N N

Z II vVM -- v, II ~< Z II v~ -- v~ II. (4b) i = 1 i = 1

With Hamming and Euclidean distances in Eq. (4b), both VM1 and VM 2 can be defined, using the L~ and L2 norms, respectively. The former was selected for its better performance and feasible im- plementation. A spatial 3 x 3 (N = 9) square win- dow is used for the vector median filtering window.

3.2. Refined vector allocation by pixel segnlentatio~l

The vector field smoothing operation is not suffi- cient by itself, as it introduces occasionally new

artefacts to the interpolated picture. As a counter- measure, we have used block segmentation. In areas of connecting motion, the vector field is fur- ther spatially segmented to obtain better resolu- tion. Without any a priori knowledge of the picture content, a fairly conservative decision threshold is adopted. Segmentation is only performed when either vector component of the estimated vector deviates somewhat from the components of the neighbouring vectors. A spatial 3 x 3 square win- dow is used:

true if I D x - Dxkl > tsegm

Segm(k,/) = or IDy - Dr, I >/segm, (5)

[false otherwise,

where k and I are the block indices. (Dxk, Dr, ) is the displacement vector of the center block in the pro- cessing window, and (Dx, D r) a r e the displacement vectors of the surrounding neighbouring blocks. We have used a threshold value tsegrn = 3.

Segmentation is performed separately for each quarter of the block withsingle pixel accuracy by simple 2-bit MSB quantization. Thus, four levels can be found in each block at most. The dominant level is then interpreted as one segment, and the remaining three as another. The resulting two seg- ments are tested with candidate vectors using the MSE criterion. Three candidate motion vectors are used: the estimated vector, a zero (0, 0) vector, and the vector from the 3 x3 block neighbourhood, which has the largest deviation from the estimated vector.

4. Motion compensated interpolation

There are two basic princip!es for motion com- pensated interpolation: vector based pixel insertion and temporal averaging.

In vector based pixel insertion, pixels in the pre- vious or the next frame are shifted in the direction of the estimated motion vector. Insertion is typi- cally used with bidirectional estimation.

In temporal averaging,'pixels of the two tem- porally adjacent fields are averaged to obtain a new motion phase. The location of the averaged pixels is determined by the estimated displacement

T. Koivunen. J. Salonen / Signal Processhlg: hllage Communication 6 (1994) 241-252 247

P(t-T/'2) E(t) P(t+T/2)

~ p+D/2

p-D/2

) * ( )1

T/2 T/2

Fig. 5. Temporal interpolation.

D = (Dx, D,.). In Fig. 5, the position of the sample being processed in the frame E( t ) to be estimated is denoted by p. The location of the corresponding displaced sample in the existing progressive frame P(t - T]2) is given by p - aD, and the intensity by I(• -- aDx, y -- aDr, t -- aT) , where T denotes the time interval between the existing frames P, and a ~a weighting factor depending on the temporal position of the estimated frame E( t ) with respect to frfimes P(t - T/2) and P(t + T/2). Given the inten- sity of the sample in frame P(t + T/2) in a similar way, the intensity of the estimated sample can be interpolated:

l ( x , y , t ) = (1 -- a ) I ( x -- aD~, y -- aDr, t -- a T )

+ a l ( x + (1 - a)D:,, y + (I - a)D,., t + (1 -- a) T).

(6)

In the special case of field rate upconversion from 50 Hz to 1130 Hz, a = 0.5, as shown in Fig. 5.

The problem of interpolating occluding objects in areas of connecting motion in the estimated frame is solved with three simple interpolation rules, which take into account the motion of the surrounding segments. Eq. (7a) gives the condition for inserting the estimated pixel E( t ) from the next frame P(t + T/2), Eq. (7b) gives the condition for inserting the estimated pixel E( t ) from the previous frame P(t - T/2), and Eq. (7c) is for temporal aver- aging between the previous and the next frame.

E(i , j , t) = P(i + Ox /2 , j + 0,./2, t + T/2),

if D~(i,j) > Dx(i -- 1,j) or if D,.(i,j) > D,.(i,j - 1),

(7a) E(i , j , t) = P ( i - o J 2 , j - o , . /2 , t - T/2) ,

if D~(i,j) > 0~(i + 1,j) or if D,.(i,j) > D,.(i,j + 1),

(7b)

E(i,j , t) = (P( i + D x / 2 , j + D,./2, t + T/2)

+ P ( i - D J 2 , j - Dr~2, t -- T /2 ) ) /2 ,

otherwise. (7c)

For computational simplicity, we have applied the interpolation rules to cases which involve a zero motion vector, i.e. D(i , j ) = (0,0). The processing is applied to samples within a range of D/2 from the border of the moving segment.

As a result, motion compensated interpolation i

process is adaptive to motion both covering and uncovering a still picture segment.

5. Results

We have used mainly critical natural sequences for computer simulations, taken from the Eureka 95 HDTV project library. Those chosen here are BBC DISC, TRAIN, and KEW GARDENS, sam- ples of which are shown in Figs. 12-14. - BBC DISC is a rotating cardboard, which has

numerous detailed objects attached to it, and a rotating spoked wheel at the front.

- TRAIN has a miniature train movingpast a sta- tion, with a camera zoom to the detailed houses in the background. In the above two sequences both the beginning and end is static.

- KEW GARDENS is an outdoor panning shot of the location with a moving camera. The superim- posed text at the bottom is static.

The original sequences are in 625/5011:1 format. The interlaced signal is obtained by discarding every other line from each original progressive frame. In order to obtain more critical test material for motion estimation simulations, the deinterlaced sequences have been accelerated by a factor of 2 or 4, resulting in a higher motion speed, up to 25 pixels/20 ms.

Performance measures are given in terms of Peak Signal to Noise Ratio (PSNR). PSNR is defined as

PSNR = - 101og(MSE/Im~), (8a)

where

MSE = (I]NM) 2 ~ ~ ( I ( i , j ) - I ' ( i , j ) ) 2, (8b)

248 T. Koivunen. J. Salonen ] Signal Processing: Image Communication 6 (1994) 241-252

NM being the image or block size. lmax is the maximum intensity, which for 8-bit signal is 255. I is the intensity of the original progressive image, and I ' is the intensity of the processed progressive image.

Five different IPC algorithms have been com- pared in terms of PSNR:

- A vertical 3-point median (VMED) [4]. - A vertical 2-point average (VAVE). - A 5 x 3 vertical-temporal FIR (VT-FIR) [1]. - Two edge adaptive algorithms with an inter-field

FIR Median Hybrid filter bank (EA-FMH), and an intra-field FIR filter bank (EA-FIR) 1-16]. The results are given in Figs. 6-8. The signifi-

cantly greater PSNR values at the beginning and end of both the BBC DISC and TRAIN sequences are due to the fact that they both start and end with still images. Table 2 summarizes the average IPC errors.

The performance of the VMED is two-fold: in static areas of the BBC DISC and TRAIN se- quences it gives clearly the best results, but per- forms worst for detailed moving pictures. The other

non-linear algorithm, the EA-FMH, has a similar performance, but is more robust. VT-FIR performs best among the linear filters, but is the most com- plex with two field memories and several line mem- ories. By comparing VAVE and EA-FIR, it can be seen that the edge orientation control gives about 0.5dB improvement for material containing diag- onal frequencies. Subjectively, the improvement is much greater. The EA-FMH filter was chosen due

- " VMED V A V E . . . . V I ' - H R - EA- - - EA-FIR

35~ t ~

z30 r ~

25 0 i t i i

5O

frame

Fig. 7. IPC PSNR of the TRAIN sequence.

- ' " ~ V A V E """ VT-PIR-- EA- - - F_A-FIR FIVtq

VAVE . . . . V T - F I R - EA- - - EA-FIR

,,,35

25

] !

i i i i i

5O

frame

Fig. 6. IPC PSNR of the BBC DISC sequence.

35 r r ~

z

i i t i t

5O

frm~

Fig. 8. IPC PSNR of the KEW GARDENS sequence.

Table 2 Average IPC PSNR (dB)

Sequence VMED VAVE VT-FI R EA-FMH EA-FI R

BBC DISC 28.46 29.53 30.00 30.12 30.14 TRAIN 30.86 30.85 31.53 30.70 30.55 KEW GARDENS 33.51 30.87 31.89 33.19 31.19

T. Koirunen. J. Salonen / Signal Processhlg: hnage Communication 6 (1994) 241-252 249

Fig. 9. An example of edge detection and coding.

Fig. 10. An example of 4-bit quantization for shape matching.

to its robust performance and simple implementa- tion.

The estimation results are illustrated in Fig. 11. As an example, a detail of the KEW GARDENS sequence has been extracted and magnified for closer inspection. In reference to Table 3, Fig. 1 l(a) shows the interpolated picture before segmenta- tion, and Fig. 1 l(b) shows the interpolated picture after segmentation. The quantized picture used in

segmentation is shown in Fig. l l(c). The block artefacts are clearly visible in Fig. 1 l(a), especially in the engraved text in the wall, and in the superim- posed text. Both of these have disappeared in Fig. l l(b). The differences to the results obtained using the standard MAE criterion in PSNR values of Table 3 are marginal, but the visual subjective improvement of the picture quality is evident. Additionally, remaining errors of the type as seen at

250 T. Koivunen, J. Salonen / Signal Processhzg: Image Communication 6 (1994) 241-252

Fig. 11. (a) Interpolation result before segmentation. (b) Interpolation result after segmentation. (c) 2-bit quantization used in segmentation.

7". Koirunen, J. Salonen / Signal Processing: hnage Communication 6 (1994) 241-252

Table 3

Average estimation P S N R (dB)

251

Sequence Before After

segmenta t ion segmen ta t ion M A E Ref.

BBC D I S C 28.16 28.04 27.39 T R A I N 28.94 28.96 28.13

K E W G A R D E N S 29.01 29.24 29.19

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

! . ; ..=r . . . . . . . ~ . . . . , . . . . . . . . . . . . . . . . .

[ ' : : : , ~ : . . - - : : - - : . : . . ' ~ : : : Z" : : : ' : . . . . . . . . , ~ : : : : : : : : : : : . , ' ~ k . ~ ."-, = ~ : ~ 7 ~ . . . ' z 7 " = . ~ o - . : . . . . . . . . . . .

t ' ' ~ ' ~ . . . . , ~ , , , , " " " , ' ' ~ " ' E ~ ' . ' 3 " : ~ F . ~ . . . . . . . . . . . . .

�9 - I . . . . . . . . . . . . . . . ' , .

. . . . . . . - , =

�9 . . . : ,~ ~ ' , ~ - , ~ ~, . . . . . . . . . . . .

::t.~'~ : : : : , : : ~ : Z / l t ~ X \ \ " ' ~ , ; z , : : : . . . . .

. . . ~ . . . . . . . . . , , . . . . . . . . ~:~. } ~\:~/i~i,.:,', . . . . .

. o / . I t d . 7 . . ~ . . . ; �9 ~ . . �9 ~ . . . . r

. ~ . . ~ ~ ~ . . . . . . ~ ~ ~'~ . % - . . . . : - . v . ~ j ; . . : - - . ' . ' - ; . . . . o , �9 . ' . . . . . .

\ % \ \ ~ " - , ~ ' / o - " " ; " ~ " , " ~ 1 7 6 1 7 6 1 7 6

�9 " * . - . ' ~ ' x : : , : 5 / . - : : : - : - " " . " " " " " ~ i - " ," -" . . . . . . . . . . . . . �9 " . . . . . . - oo �9 �9 - - i , ~ . . . . . . . . . o . . . . . . . . .

: : : .-.-: : : . ' 2_ - -_ ; ' . ' . ' " . ~ " . . . . . . . . . . . . . . . . . . . . . . . . . .

: : : : . ' l : .n: l :C[ ly{:O: f : l? : . : .~ n?=/.-c,:,-:,~ r.,-.,:rc,-:,:: :(~.1). :~tz: , : t : : : l :Y. �9 "

Fig. 12. A sample frame of es t imated motion vectors for the BBC D I S C sequence.

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

I I I @ ~ ' , I @ I o I I I l l I I I @ @ @ I I I ~ 1 I 0 0 I ~ I , i r I / ~ 1 / / ~ /

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : . . . . . . . .

. . . . . . . . . . . . . . . . . . . . ~ , : " ~ : : : 2:..:2:,: :: : -"-"-':':':

: : : : : : : : : : : : : : : : : : : : : : : : : : : : . . . . . . . . . . . . . . . . . . . . . ,,:: :~ G ::~ ~ti 5 : - i : i - :2:. : .-- i . . . . . . . . . . . ~ 1 7 6 . . . . . . . . . . . . . . . . . . ~ _ ~ 1 7 6

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : .'s : rCDT~. I I " f ' i ; : h . n - : , Z ~:,z,$.r,."Z,."e: " i ~ . - t " I { : ' , ' I ' : ' I ' Y �9 "

, o . . . . . . o o o . . . . . ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 . . . . . . . . . . . o o o ~ 1 7 6 1 7 6 . . . . . ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 . . . . . . ~ 1 7 6 1 7 6 . . . . . . . . . . . . ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 . . . . . . . . . . ~

, . . . . . . . . . . . . . o ~ . | ~ 1 7 6 . . . . . . ~ 1 7 6 . . . . ~ . . . . .

. . . . . ,. . . . . . . . . ~ . . ~ . . . . . . . . . : : : : : : . : : : : : : . ; . . - . . ; . - . . . : : . . . ~ . , . : : - . " " " " :'" " , - . a , - : . " . . . . . . . . . . . .

t / ; . . . . . ~ ; . . ' . - : ~ ,~ ' - ~ " , F " . ~ . . . . . . . . "

�9 " ' : : " " " i ' " �9 . . . . : ~ - t ' ~ ' - ' : 3 ; r ~ ' , f - - . �9 " t . ' ' , r . . . . . . " " . . . . . ~: ' i : : . . . . ~ ' t , " " J I ' i . . . . . . . . g . : ' : : ' ' : ~ , - ~ , : -

, . . . . . ~ 1 7 6 ~ . . . t . . . . . , . . . . . . ~ - -

, : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,

. ~ . . . . . . . . . . . . . . . . . . . . . . . . .

i i -" -"_K E W i GNtq D.EN S;- L O N B . O N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . �9 . . . . . . . . . . ,

. . . . . . . . . . . I . . . . . . . . . . . . . I I I I . . . . i I I I I

, ! I I i i I I I I I I I I I I I I . . . . . . I I I i I I I

Fig. 14. A sample f rame of es t imated motion vectors for the K E W G A R D E N S sequence.

the right of the symbols 'S', ',' and 'L' can be corrected by means of the adaptive interpolation scheme described in Section 4.

The final vector fields superimposed on the aver- aged original two temporally adjacent frames are shown in Figs. 12-14.

Fig. 13. A sample f rame of es t imated motion vectors for the T R A I N sequence.

6. Conclusion

A new motion estimation algorithm, used for motion compensated field rate upconversion, is proposed in this paper�9 The algorithm utilizes both spatial and temporal information, and particularly, individual pixels play an important role in block

252 T. Koivunen, J. Salonen / Signal Processing: hnage Communication 6 (1994) 241-252

matching. Several new features, characteristic to the algorithm, can be identified: - The edge adaptive interlace-to-progressive

conversion (IPC) algorithm; - Combined edge and shape information

matching: - Modified pixel difference classification criterion,

suitable for matching coded images; - Image segmentation for adaptive block size; - Temporal direction adaptive motion interpola-

tion. The overall hardware complexity of the es-

timator is close to that of the MAE based system. The matching operation in the proposed algorithm

.is relatively simple because adders are replaced by counters and comparators, but the additional vec- tor postprocessing compensates for the advantage.

One of the future tasks is to extend the adaptive interpolation scheme to a general case.

References

[1] D.M. Ackroyd and M. Weston, "Interpolating interlaced television pictures", hnage Technol., November 1990, pp. 430-433.

[2] B. Alp, P. Haavisto, T. Jarske~ K. Oist~im6 and Y. Neuvo, "Median-based algorithms for image sequence process- ing", Proc. Visual Communications and hnage Processing, Lausanne, Switzerland, 1990, pp. 122-134.

[3] H.-J. Dreier, "Line flicker reduction by adaptive signal processing", 3rd h~ternat. Workshop on HDTV, Torino, Italy, August 30-September 1, 1989.

[4] T. Doyle and P. Frencken, "Median filtering of television images", ICCE Digest of Technical Papers, 1986, pp. 186-187.

[5] H. Gharavi and M. Mills, "Blockmatching motion estima- tion - New results", IEEE Trans. Circuits and Systems, Vol. 37, No. 5, May 1990, pp. 649-651.

[6] P. Haavisto, J. Juhola and Y. Neuvo, "Fractional frame rate up-conversion using weighted median filters", IEEE

Trans. Consumer Electron., Vol. 35, No. 3, August 1989, pp. 272-278.

[7] P. Heinonen and Y. Neuvo, "FIR-median hybrid filters", IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-35, No. 6, June 1987, pp. 832-838.

[8] C. Hentschel, "Linear and nonlinear procedures for flicker reduction", IEEE Trans. Consumer Electron., Vol. CE-33, No. 3, August 1987, pp. 192-198.

[91 C. Hentschel, "Comparison between median filtering and vertical edge controlled interpolation for flicker reduc- tion", IEEE Trans. Consumer Electron., Vol. 35, No. 3, August 1989, pp. 279-289.

[10"1 T.S. Huang, ed., Image Sequence Analysis, Springer, Berlin, 1981.

[11] R.N. Jackson and M.J.J.C. Annegarn, "Compatible sys- tems for high quality television", SMPTE J., Vol. 92, No. 7, 1983, pp. 719-723.

[12] T. Koivunen and A. Nieminen, "Motion field restoration using vector median filtering on HDTV sequences", Proc. Visual Communications and hnage Processing, Lausanne, Switzerland, 1990.

[13] J.J. Pearson et al., "Video-rate image correlation proces- sor", Appl. Digital Image Process. SPIE, Vol. 119, IOCC 1977, pp. 197-205.

[14] D. Pele, P. Siohan and B. Choquet, "Field rate conversion by motion estimation/compensation", 2ndhlternat. Work- shop on Signal Processhlg of HDTV, L'Aquila, Italy, 1988.

[15] A. Roberts, The improved display of 625-1ine television �9 pictures: Adaptive interpolation, BBC Research Dept. Re-

port, 1985]5, May 1985. [16] J. Salonen and S. Kalli, "Edge adaptive interpolation for

scanning rate conversions", in: E. Dubois and L. Chiarig- lione, eds., Signal Processing of ttDTV, IV, Elsevier, Am- sterdam, 1993, pp. 757-764.

[17] J. Salonen and T. Koivunen, "A new motion compensated field rate upconversion algorithm", Proc. IEEE Winter Workshop on Nonlinear Signal Processing, Tampere, Fin- land, 1993.

[18] G.A. Thomas, Television motion measurement for DATV and other applications, BBC Research Dept. Report, 1987/11, September 1987.

[19] S.C. Wells, "Motion estimation algorithms and their ap- plication to hybrid coding", IEEE Colloquium on Motion Adaptive Video Processing, May 1984, lEE Digest No. 1984]59.