Deteksi Mata rabun

8/16/2019 Deteksi Mata rabun

1/4

FRAME DIFFERENCE NORMALIZATION: AN APPROACH TO REDUCE ERROR

RATES OF CUT DETECTION ALGORITHMS FOR MPEG VIDEOS

Ralph Ewerth1 and Bernd Freisleben1,2

1SFB/FK 615, University of Siegen, D-57068 Siegen, Germany2Dept. of Math. and Computer Science, University of Marburg, D-35032 Marburg, Germany

{ewerth, freisleb}@informatik.uni-marburg.de

ABSTRACT

The segmentation of video sequences into shots is the first

step towards video content analysis. Two kinds of shot

boundaries can be distinguished: abrupt scene changes(“cuts”) and gradual transitions. In this paper, we present

a technique to reduce the error rates of cut detection

algorithms based on pixel-wise or histogram-based frame

difference metrics when operating directly on compressed

MPEG video data. The proposed approach, called “Frame

Difference Normalization” (FDN), intends to eliminate

the effects of a specific frame pattern in MPEG streams

responsible for causing such errors. Experimental results

will be presented to demonstrate the benefits of our

proposal and its superiority over a more general noise

filter. Furthermore, the proposed method is not limited to

a particular algorithm but it is applicable to an entire class

of cut detection algorithms.

1. INTRODUCTION

Several research efforts have been made in recent years to

address the problem of detecting shot boundaries in digital

videos. There are two kinds of shot boundaries: (a) abrupt

scene changes (called “cuts”), and (b) gradual transitions

between two different shots. This paper focuses on the

problem of detecting cuts. Lienhart [6] states that a cut is

defined as the direct concatenation of two shots with notransitional frames involved; cuts lead to a perceptible

temporal visual discontinuity.

One could argue that the problem of detecting cuts has

been solved satisfactorily since many researchers reported

very good recall and precision rates ranging up to 100%,

where recall is the number of correctly detected cuts

divided by the number of really existing cuts, and

precision is the number of correctly detected cuts divided

by the total number of detected cuts (including “false

alarms”). However, in the test set used in our experiments

there were many MPEG videos for which a high-quality

cut detection algorithm produced a systematic detection

error. It can be shown that there often is a bias in frame

differences depending on MPEG specific frame types. In

this paper, a method to handle such errors is proposed.

The basic idea of our approach is to adequately normalize

frame differences to increase the detection quality. The performance of our proposal will be demonstrated by

presenting experimental results for examples taken from

the MPEG-7 test content set that has been suggested as

the standard test set for video segmentation research in

[2]. Furthermore, it is worth mentioning that the approach

is general in the sense that it can be used in conjunction

with an entire class of cut detection algorithms, namely

those based on pixel-wise or histogram-based frame

difference metrics and operating on MPEG videos.

This paper is organized as follows. In section 2, some

basic principles of cut detection algorithms are explained

and some recent developments are mentioned. In section

3, the reason for a specific kind of errors will be discussed

and our solution to this problem is presented: “Frame

Difference Normalization” (FDN). Section 4 presents

experimental results obtained with an implementation of a

particular cut detection algorithm in different test

conditions. Section 5 concludes the paper and outlines

areas for future research.

2. RELATED WORK

2.1 Frame to frame differences and thresholds

Considering Lienhart’s definition mentioned above it isreasonable to look at the differences between two

consecutive frames to detect a cut. A large number of

different metrics has been defined to estimate frame

differences (e.g. [5], [6], [9]). A straightforward approach

is to measure the differences between the particular pixels

of consecutive frames, but this approach is very sensitive

to object motion, camera motion, brightness changes and

noise. Hanjalic [4] includes motion estimation and

compensation for small sub-images to remove artifacts

based on motion. Many approaches (e.g. [7] and [9])

propose the usage of histograms since they are less

sensitive to motion and the other events mentioned above.


2/4

The estimated difference between consecutive frames

is commonly used to decide whether there is cut at frame

k . Therefore, a threshold is used in the following way: If

the frame difference from frame k-1 to frame k exceeds a

given value t , then there is cut at this position. However,

applying a global threshold value t to an entire sequenceresults in many false alarms and missed cuts. Furthermore,

the problem of determining an appropriate value t must be

solved. To address the first issue, Yeo and Liu [9] suggest

a sliding window technique and use a local threshold

within such a window. This window consists of 2*m+1

frame differences, for a small m > 0. To decide whether

there is a cut at position k, the frame differences between

the neighboring frames are taken into account. A peak at

frame position k is considered as a cut only if it is the

maximum value and n times larger than the second largest

peak in the window. This principle has been used in many

variations ([4], [7], [8], [9]) and typically results in a

robust detection performance.

2.2 MPEG compressed domain and DC frames

The overwhelming majority of digital video sequences is

available in a compressed format, mainly MPEG. MPEG

distinguishes between I, P, and B frames. An I-frame is a

frame encoded independently of other frames. The

encoding of a P-frame is based on either a previous I- or

P-frame (called reference frame), while the encoding of a

B-frame can be based on two reference frames, a previous

as well as a subsequent I- or P-frame. For small pixel

blocks in these P- and B-frames, motion vectors can beused that point to similar blocks in a reference frame.

For cut detection it is reasonable to use the MPEG bit

stream information with the lowest possible decoding

cost. For example, the advantage of DC images, i.e. sub-

images that consist of the dequantized DC coefficients of

the DCT (discrete cosine transform) blocks, is that they

still contain sufficient information for content analysis

(because a DC coefficient is equivalent to the average

value of a single DCT block consisting of 8*8 pixels)

while the effort to extract them from the bit stream is

much lower than for a complete frame. While the

extraction of DC coefficients is trivial for I-frames, for P-

and B-frames motion compensation has to beaccomplished. An approximation with a tolerable error is

presented by Yeo and Liu [9] and Shen and Delp [7].

Their approaches avoid decoding the blocks referenced by

a motion vector.

2.3 High-quality cut detection algorithms

In recent years, several studies addressed the problem of

cut detection and reported very good detection results, e.g.

the algorithm of Yeo and Liu [9]. Chen et al. determine

the probability for a cut by applying a binary regression

tree to a multi-dimensional feature vector for each frame

[1], Hanjalic proposed a statistical approach [4]. The

authors report very high detection rates, starting from

92% recall and 94% precision in [1] up to 100% for both

in [4]. Truong et al. [8] have proposed a local mean ratio filter (recall: 97.9%; precision: 97.5%) as an enhancement

to histogram-based cut detection algorithms in order to

reduce noise in frame difference sequences.

3. PROBLEM AND PROPOSED SOLUTION

3.1 Problem: Consequences of MPEG frame encoding

for cut detection algorithms

MPEG encoders try to achieve a certain bit rate so there is

not an arbitrary number of bits an encoder could use to

encode a frame. Since an I-frame is encoded without any

reference frame, the number of required bits is

proportionally high compared to P- or B-frames (inter

frames). Otherwise, the degree of accuracy can vary

depending on the frame type due to effects of motion

compensation, e.g. inaccuracy or quantization of

macroblock differences. Thus, often a bias can be found

in the estimated frame differences in MPEG videos. This

bias depends on the properties of the frames that were

involved in the calculation. The difference values can be

estimated e.g. either with histogram or pixel based

metrics. For a commonly used IPB-pattern like

“IBBPBBPBBPBBIB...” there are at least five different

frame type transitions: I to B, B to B, B to P, P to B and Bto I. For example, for the video sequence “riscos-sl.mpg”

from the mentioned MPEG-7 test set the average

difference values for specific frame transitions were

calculated. These average values varied from 139.5 (B to

B) up to 367.5 (B to I).

Now, the problem can be pointed out more clearly. Let

us assume using the approach from [9] with parameter

m=5 and the threshold value n=2 that in general gives

very good detection results. The frame difference position

with the maximum value within a sliding window is

accepted as a cut if it is n times larger than the second

largest value inside this window. In figure 1, the

histogram differences are shown for a nearly completelyuneventful scene without any cut. Since the maximum

difference between frame 2881 and 2882 (B->I transition)

is more than n times larger than the second largest peak,

our detector concludes that there is a cut and thus

produces a false alarm. Many false alarms of this kind

have been found in different videos. Instead of just

increasing the parameter n which would result in a lower

recall rate, a method is proposed now to handle such false

alarms without reducing the recall rate.


3/4

We assume that the performance of the cut detection

algorithms mentioned in the previous section would

decrease if they were applied to the kind of MPEG videos

exhibiting the noise pattern as described above. They are

potentially affected because these algorithms use either

histogram or pixel based metrics for frame differences.

3.2 Solution: Frame Difference Normalization (FDN)

Our approach to eliminate the kind of errors described

above is called “Frame Difference Normalization“. First, a

set S of tuples for possible frame transition types is

defined where I, P and B match the MPEG frame types:

S={(I,I),(I,P),(I,B),(P,I),(P,P),(P,B),(B,I),(B,P),(B,B)} (1)

Let s be in S , and d i,s the frame difference between

frame i and i+1 with the frame type transition s at this

position. Now, the average difference value for eachspecific frame type transition s is calculated. Let f be the

number of frames in a given video, and let f s be the

number of frame transitions of type s, then the average

value for a frame transition type s is estimated as shown in

formula (2), where p is a number between 0 and 1.

=


4/4

Table 1: Results for the Yeo/Liu implementation and its FDN

variant, both tested with different threshold values.

The local mean ratio filter is supposed to reduce noise and

accentuate peaks representing a cut. Both implementations

were tested with different threshold values and the FDN

technique as described in section 3 has been added to the

Y/L implementation.

A subset of the MPEG-7 test content has been used as

video test material consisting of 10 MPEG-1 videos from

different genres with a total length of over 400.000 frames

containing 2807 hard cuts. This subset has been randomly

chosen in order to get videos that were created withdifferent encoders and, in 5 out of 10 videos the noise

pattern caused by IPB frame types was present. The

results for the different tests are displayed in Table 1 and

2, where the number of really existing cuts, the number of

detected cuts, the number of false alarms and the total

number of errors (missed cuts plus false alarms) are listed.

FDN leads to a significant error reduction. Table 1

demonstrates that there is a large reduction of false alarms

in case of n=2 while keeping a high number of correctly

detected cuts. This is an important aspect for those cases

where a high rate of correct detections is desired. In the

cases n>2 the total number of errors is reduced

noticeably, too. The FDN method leads to lower error

rates than our implementation of the more general local

mean ratio filter (see Table 2). We conclude that for

MPEG videos the systematic reduction of noise with FDN

is more effective than with a local mean ratio filter.

5. CONCLUSIONS

A method to reduce noise caused by the IPB-frame pattern

in MPEG streams has been presented in this paper.

Considering this noise pattern and normalizing the

difference values with the FDN method led to a noticeable

error reduction. Overall, the systematic noise reduction inMPEG videos with FDN was superior to a more general

noise filter, a local mean ratio filter. The advantage of

FDN for MPEG videos is that it reduces a specific noise

pattern while a general noise filter smoothes all difference

values in the same way. Furthermore, it can be stated that

FDN could be added to a number of different algorithms,

like [1], [4], [7], [9]. In future work, the reasons for the

described IPB pattern should be analyzed in detail, e.g. by

investigating factors as the impact of bit rate, the chosen

encoder and so on. Considering such encoding parameters

could enhance a lot of video analysis algorithms. Also, the

Implement.

Threshold

LMI

α=4

LMI

α=4.5

LMI

α=5

LMI

α=6

No. of cuts 2807 2807 2807 2807

Detected cuts 2680 2633 2591 2434

False alarms 677 403 265 161

Errors 804 577 481 534

Table 2: Results for the Yeo/Liu implementation extended with

a local mean ration filter tested with different threshold values.

effects of adding the FDN technique to other cut detection

algorithms should be analyzed. Finally, dissolve detection

algorithms for MPEG videos could benefit from FDN.

6. ACKNOWLEDGEMENT

This work is financially supported by the Deutsche

Forschungsgemeinschaft (SFB/FK 615, Project MT). The

authors would like to thank M. Grube and J. Waldhans for

their implementation work and J. Gllavata, M. Gollnick,M. Grauer, F. Mansouri, E. Papalilo, R. Sennert and J.

Wagner for their valuable support.

7. REFERENCES

[1] J.-Y. Chen, C. Taskiran, A. Albiol, E. J. Delp and C. A.

Bouman, “ViBE: A Compressed Video Database Structured for

Active Browsing and Search”, to appear in IEEE Transactions

on Multimedia, 2003.

[2] T. Chua, M. Kankanhalli and Y. Lin, “A General Framework

for Video Segmentation Based on Temporal Multi-Resolution

Analysis” in Proc. of Int’l Workshop on Advanced Image

Technology, pp. 119-124, Fujisawa, Japan 2000.

[3] U. Gargi, R. Kasturi and S. H. Strayer, “Performance

Characterization of Video-Shot-Change Detection Methods” in

IEEE Transaction on Circuits and Systems for Video

Technology, Vol. 10, No. 1, pp. 1-13, 2000.

[4] A. Hanjalic, “Shot Boundary Detection: Unraveled and

Resolved?,” IEEE Transactions on Circuits and Systems for

Video Technology, Vol. 12, pp. 533-544, 2002.

[5] I. Koprinska and S. Carrato, “Temporal Video Segmentation:

A Survey” in Signal Processing: Image Communication 16

(2001), pp. 477-500, 2001.

[6] R. Lienhart, “Reliable Transition Detection in Videos: A

Survey and Practitioner’s Guide.” to appear in International

Journal of Image and Graphics, 2003

[7] K. Shen and E. J. Delp, „A Fast Algorithm for Video Parsing

Using MPEG Compressed Sequences“, in Proc. of IEEE ICIP

1995, Washington, DC., pp. 252-255, 1995.

[8] B. T. Truong, C. Dorai and S. Venkatesh, “New

Enhancements to Cut, Fade, and Dissolve Detection Processes in

Video Segmentation” in Proc. ACM Multimedia 2000, pp. 219-

227, 2000.

[9] B. Yeo and B. Liu, “Rapid Scene Analysis on Compressed

Video” IEEE Transactions on Circuits and Systems for Video

Technology, Vol. 5, No. 6, pp. 533-544, 1995.

Implement.

Threshold

Y/L

n=2

FDN

n=2

Y/L

n=2.5

FDN

n=2.5

Y/L

n=3

FDN

n=3

No. of cuts 2807 2807 2807 2807 2807 2807

Detected cuts 2654 2652 2560 2576 2439 2475

False alarms 963 387 227 202 131 144

Errors 1116 542 474 433 499 476

Documents

Deteksi Mata rabun