Upload
waliulu-raditya
View
212
Download
0
Embed Size (px)
Citation preview
8/16/2019 Deteksi Mata rabun
1/4
FRAME DIFFERENCE NORMALIZATION: AN APPROACH TO REDUCE ERROR
RATES OF CUT DETECTION ALGORITHMS FOR MPEG VIDEOS
Ralph Ewerth1 and Bernd Freisleben1,2
1SFB/FK 615, University of Siegen, D-57068 Siegen, Germany2Dept. of Math. and Computer Science, University of Marburg, D-35032 Marburg, Germany
{ewerth, freisleb}@informatik.uni-marburg.de
ABSTRACT
The segmentation of video sequences into shots is the first
step towards video content analysis. Two kinds of shot
boundaries can be distinguished: abrupt scene changes(“cuts”) and gradual transitions. In this paper, we present
a technique to reduce the error rates of cut detection
algorithms based on pixel-wise or histogram-based frame
difference metrics when operating directly on compressed
MPEG video data. The proposed approach, called “Frame
Difference Normalization” (FDN), intends to eliminate
the effects of a specific frame pattern in MPEG streams
responsible for causing such errors. Experimental results
will be presented to demonstrate the benefits of our
proposal and its superiority over a more general noise
filter. Furthermore, the proposed method is not limited to
a particular algorithm but it is applicable to an entire class
of cut detection algorithms.
1. INTRODUCTION
Several research efforts have been made in recent years to
address the problem of detecting shot boundaries in digital
videos. There are two kinds of shot boundaries: (a) abrupt
scene changes (called “cuts”), and (b) gradual transitions
between two different shots. This paper focuses on the
problem of detecting cuts. Lienhart [6] states that a cut is
defined as the direct concatenation of two shots with notransitional frames involved; cuts lead to a perceptible
temporal visual discontinuity.
One could argue that the problem of detecting cuts has
been solved satisfactorily since many researchers reported
very good recall and precision rates ranging up to 100%,
where recall is the number of correctly detected cuts
divided by the number of really existing cuts, and
precision is the number of correctly detected cuts divided
by the total number of detected cuts (including “false
alarms”). However, in the test set used in our experiments
there were many MPEG videos for which a high-quality
cut detection algorithm produced a systematic detection
error. It can be shown that there often is a bias in frame
differences depending on MPEG specific frame types. In
this paper, a method to handle such errors is proposed.
The basic idea of our approach is to adequately normalize
frame differences to increase the detection quality. The performance of our proposal will be demonstrated by
presenting experimental results for examples taken from
the MPEG-7 test content set that has been suggested as
the standard test set for video segmentation research in
[2]. Furthermore, it is worth mentioning that the approach
is general in the sense that it can be used in conjunction
with an entire class of cut detection algorithms, namely
those based on pixel-wise or histogram-based frame
difference metrics and operating on MPEG videos.
This paper is organized as follows. In section 2, some
basic principles of cut detection algorithms are explained
and some recent developments are mentioned. In section
3, the reason for a specific kind of errors will be discussed
and our solution to this problem is presented: “Frame
Difference Normalization” (FDN). Section 4 presents
experimental results obtained with an implementation of a
particular cut detection algorithm in different test
conditions. Section 5 concludes the paper and outlines
areas for future research.
2. RELATED WORK
2.1 Frame to frame differences and thresholds
Considering Lienhart’s definition mentioned above it isreasonable to look at the differences between two
consecutive frames to detect a cut. A large number of
different metrics has been defined to estimate frame
differences (e.g. [5], [6], [9]). A straightforward approach
is to measure the differences between the particular pixels
of consecutive frames, but this approach is very sensitive
to object motion, camera motion, brightness changes and
noise. Hanjalic [4] includes motion estimation and
compensation for small sub-images to remove artifacts
based on motion. Many approaches (e.g. [7] and [9])
propose the usage of histograms since they are less
sensitive to motion and the other events mentioned above.
8/16/2019 Deteksi Mata rabun
2/4
The estimated difference between consecutive frames
is commonly used to decide whether there is cut at frame
k . Therefore, a threshold is used in the following way: If
the frame difference from frame k-1 to frame k exceeds a
given value t , then there is cut at this position. However,
applying a global threshold value t to an entire sequenceresults in many false alarms and missed cuts. Furthermore,
the problem of determining an appropriate value t must be
solved. To address the first issue, Yeo and Liu [9] suggest
a sliding window technique and use a local threshold
within such a window. This window consists of 2*m+1
frame differences, for a small m > 0. To decide whether
there is a cut at position k, the frame differences between
the neighboring frames are taken into account. A peak at
frame position k is considered as a cut only if it is the
maximum value and n times larger than the second largest
peak in the window. This principle has been used in many
variations ([4], [7], [8], [9]) and typically results in a
robust detection performance.
2.2 MPEG compressed domain and DC frames
The overwhelming majority of digital video sequences is
available in a compressed format, mainly MPEG. MPEG
distinguishes between I, P, and B frames. An I-frame is a
frame encoded independently of other frames. The
encoding of a P-frame is based on either a previous I- or
P-frame (called reference frame), while the encoding of a
B-frame can be based on two reference frames, a previous
as well as a subsequent I- or P-frame. For small pixel
blocks in these P- and B-frames, motion vectors can beused that point to similar blocks in a reference frame.
For cut detection it is reasonable to use the MPEG bit
stream information with the lowest possible decoding
cost. For example, the advantage of DC images, i.e. sub-
images that consist of the dequantized DC coefficients of
the DCT (discrete cosine transform) blocks, is that they
still contain sufficient information for content analysis
(because a DC coefficient is equivalent to the average
value of a single DCT block consisting of 8*8 pixels)
while the effort to extract them from the bit stream is
much lower than for a complete frame. While the
extraction of DC coefficients is trivial for I-frames, for P-
and B-frames motion compensation has to beaccomplished. An approximation with a tolerable error is
presented by Yeo and Liu [9] and Shen and Delp [7].
Their approaches avoid decoding the blocks referenced by
a motion vector.
2.3 High-quality cut detection algorithms
In recent years, several studies addressed the problem of
cut detection and reported very good detection results, e.g.
the algorithm of Yeo and Liu [9]. Chen et al. determine
the probability for a cut by applying a binary regression
tree to a multi-dimensional feature vector for each frame
[1], Hanjalic proposed a statistical approach [4]. The
authors report very high detection rates, starting from
92% recall and 94% precision in [1] up to 100% for both
in [4]. Truong et al. [8] have proposed a local mean ratio filter (recall: 97.9%; precision: 97.5%) as an enhancement
to histogram-based cut detection algorithms in order to
reduce noise in frame difference sequences.
3. PROBLEM AND PROPOSED SOLUTION
3.1 Problem: Consequences of MPEG frame encoding
for cut detection algorithms
MPEG encoders try to achieve a certain bit rate so there is
not an arbitrary number of bits an encoder could use to
encode a frame. Since an I-frame is encoded without any
reference frame, the number of required bits is
proportionally high compared to P- or B-frames (inter
frames). Otherwise, the degree of accuracy can vary
depending on the frame type due to effects of motion
compensation, e.g. inaccuracy or quantization of
macroblock differences. Thus, often a bias can be found
in the estimated frame differences in MPEG videos. This
bias depends on the properties of the frames that were
involved in the calculation. The difference values can be
estimated e.g. either with histogram or pixel based
metrics. For a commonly used IPB-pattern like
“IBBPBBPBBPBBIB...” there are at least five different
frame type transitions: I to B, B to B, B to P, P to B and Bto I. For example, for the video sequence “riscos-sl.mpg”
from the mentioned MPEG-7 test set the average
difference values for specific frame transitions were
calculated. These average values varied from 139.5 (B to
B) up to 367.5 (B to I).
Now, the problem can be pointed out more clearly. Let
us assume using the approach from [9] with parameter
m=5 and the threshold value n=2 that in general gives
very good detection results. The frame difference position
with the maximum value within a sliding window is
accepted as a cut if it is n times larger than the second
largest value inside this window. In figure 1, the
histogram differences are shown for a nearly completelyuneventful scene without any cut. Since the maximum
difference between frame 2881 and 2882 (B->I transition)
is more than n times larger than the second largest peak,
our detector concludes that there is a cut and thus
produces a false alarm. Many false alarms of this kind
have been found in different videos. Instead of just
increasing the parameter n which would result in a lower
recall rate, a method is proposed now to handle such false
alarms without reducing the recall rate.
8/16/2019 Deteksi Mata rabun
3/4
We assume that the performance of the cut detection
algorithms mentioned in the previous section would
decrease if they were applied to the kind of MPEG videos
exhibiting the noise pattern as described above. They are
potentially affected because these algorithms use either
histogram or pixel based metrics for frame differences.
3.2 Solution: Frame Difference Normalization (FDN)
Our approach to eliminate the kind of errors described
above is called “Frame Difference Normalization“. First, a
set S of tuples for possible frame transition types is
defined where I, P and B match the MPEG frame types:
S={(I,I),(I,P),(I,B),(P,I),(P,P),(P,B),(B,I),(B,P),(B,B)} (1)
Let s be in S , and d i,s the frame difference between
frame i and i+1 with the frame type transition s at this
position. Now, the average difference value for eachspecific frame type transition s is calculated. Let f be the
number of frames in a given video, and let f s be the
number of frame transitions of type s, then the average
value for a frame transition type s is estimated as shown in
formula (2), where p is a number between 0 and 1.
=
8/16/2019 Deteksi Mata rabun
4/4
Table 1: Results for the Yeo/Liu implementation and its FDN
variant, both tested with different threshold values.
The local mean ratio filter is supposed to reduce noise and
accentuate peaks representing a cut. Both implementations
were tested with different threshold values and the FDN
technique as described in section 3 has been added to the
Y/L implementation.
A subset of the MPEG-7 test content has been used as
video test material consisting of 10 MPEG-1 videos from
different genres with a total length of over 400.000 frames
containing 2807 hard cuts. This subset has been randomly
chosen in order to get videos that were created withdifferent encoders and, in 5 out of 10 videos the noise
pattern caused by IPB frame types was present. The
results for the different tests are displayed in Table 1 and
2, where the number of really existing cuts, the number of
detected cuts, the number of false alarms and the total
number of errors (missed cuts plus false alarms) are listed.
FDN leads to a significant error reduction. Table 1
demonstrates that there is a large reduction of false alarms
in case of n=2 while keeping a high number of correctly
detected cuts. This is an important aspect for those cases
where a high rate of correct detections is desired. In the
cases n>2 the total number of errors is reduced
noticeably, too. The FDN method leads to lower error
rates than our implementation of the more general local
mean ratio filter (see Table 2). We conclude that for
MPEG videos the systematic reduction of noise with FDN
is more effective than with a local mean ratio filter.
5. CONCLUSIONS
A method to reduce noise caused by the IPB-frame pattern
in MPEG streams has been presented in this paper.
Considering this noise pattern and normalizing the
difference values with the FDN method led to a noticeable
error reduction. Overall, the systematic noise reduction inMPEG videos with FDN was superior to a more general
noise filter, a local mean ratio filter. The advantage of
FDN for MPEG videos is that it reduces a specific noise
pattern while a general noise filter smoothes all difference
values in the same way. Furthermore, it can be stated that
FDN could be added to a number of different algorithms,
like [1], [4], [7], [9]. In future work, the reasons for the
described IPB pattern should be analyzed in detail, e.g. by
investigating factors as the impact of bit rate, the chosen
encoder and so on. Considering such encoding parameters
could enhance a lot of video analysis algorithms. Also, the
Implement.
Threshold
LMI
α=4
LMI
α=4.5
LMI
α=5
LMI
α=6
No. of cuts 2807 2807 2807 2807
Detected cuts 2680 2633 2591 2434
False alarms 677 403 265 161
Errors 804 577 481 534
Table 2: Results for the Yeo/Liu implementation extended with
a local mean ration filter tested with different threshold values.
effects of adding the FDN technique to other cut detection
algorithms should be analyzed. Finally, dissolve detection
algorithms for MPEG videos could benefit from FDN.
6. ACKNOWLEDGEMENT
This work is financially supported by the Deutsche
Forschungsgemeinschaft (SFB/FK 615, Project MT). The
authors would like to thank M. Grube and J. Waldhans for
their implementation work and J. Gllavata, M. Gollnick,M. Grauer, F. Mansouri, E. Papalilo, R. Sennert and J.
Wagner for their valuable support.
7. REFERENCES
[1] J.-Y. Chen, C. Taskiran, A. Albiol, E. J. Delp and C. A.
Bouman, “ViBE: A Compressed Video Database Structured for
Active Browsing and Search”, to appear in IEEE Transactions
on Multimedia, 2003.
[2] T. Chua, M. Kankanhalli and Y. Lin, “A General Framework
for Video Segmentation Based on Temporal Multi-Resolution
Analysis” in Proc. of Int’l Workshop on Advanced Image
Technology, pp. 119-124, Fujisawa, Japan 2000.
[3] U. Gargi, R. Kasturi and S. H. Strayer, “Performance
Characterization of Video-Shot-Change Detection Methods” in
IEEE Transaction on Circuits and Systems for Video
Technology, Vol. 10, No. 1, pp. 1-13, 2000.
[4] A. Hanjalic, “Shot Boundary Detection: Unraveled and
Resolved?,” IEEE Transactions on Circuits and Systems for
Video Technology, Vol. 12, pp. 533-544, 2002.
[5] I. Koprinska and S. Carrato, “Temporal Video Segmentation:
A Survey” in Signal Processing: Image Communication 16
(2001), pp. 477-500, 2001.
[6] R. Lienhart, “Reliable Transition Detection in Videos: A
Survey and Practitioner’s Guide.” to appear in International
Journal of Image and Graphics, 2003
[7] K. Shen and E. J. Delp, „A Fast Algorithm for Video Parsing
Using MPEG Compressed Sequences“, in Proc. of IEEE ICIP
1995, Washington, DC., pp. 252-255, 1995.
[8] B. T. Truong, C. Dorai and S. Venkatesh, “New
Enhancements to Cut, Fade, and Dissolve Detection Processes in
Video Segmentation” in Proc. ACM Multimedia 2000, pp. 219-
227, 2000.
[9] B. Yeo and B. Liu, “Rapid Scene Analysis on Compressed
Video” IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 5, No. 6, pp. 533-544, 1995.
Implement.
Threshold
Y/L
n=2
FDN
n=2
Y/L
n=2.5
FDN
n=2.5
Y/L
n=3
FDN
n=3
No. of cuts 2807 2807 2807 2807 2807 2807
Detected cuts 2654 2652 2560 2576 2439 2475
False alarms 963 387 227 202 131 144
Errors 1116 542 474 433 499 476