Deteksi Mata rabun

Embed Size (px)

Citation preview

  • 8/16/2019 Deteksi Mata rabun

    1/4

     

    FRAME DIFFERENCE NORMALIZATION: AN APPROACH TO REDUCE ERROR

    RATES OF CUT DETECTION ALGORITHMS FOR MPEG VIDEOS

     Ralph Ewerth1 and Bernd Freisleben1,2 

    1SFB/FK 615, University of Siegen, D-57068 Siegen, Germany2Dept. of Math. and Computer Science, University of Marburg, D-35032 Marburg, Germany

    {ewerth, freisleb}@informatik.uni-marburg.de

    ABSTRACT

    The segmentation of video sequences into shots is the first

    step towards video content analysis. Two kinds of shot

     boundaries can be distinguished: abrupt scene changes(“cuts”) and gradual transitions. In this paper, we present

    a technique to reduce the error rates of cut detection

    algorithms based on pixel-wise or histogram-based frame

    difference metrics when operating directly on compressed

    MPEG video data. The proposed approach, called “Frame

    Difference Normalization” (FDN), intends to eliminate

    the effects of a specific frame pattern in MPEG streams

    responsible for causing such errors. Experimental results

    will be presented to demonstrate the benefits of our

     proposal and its superiority over a more general noise

    filter. Furthermore, the proposed method is not limited to

    a particular algorithm but it is applicable to an entire class

    of cut detection algorithms.

    1. INTRODUCTION

    Several research efforts have been made in recent years to

    address the problem of detecting shot boundaries in digital

    videos. There are two kinds of shot boundaries: (a) abrupt

    scene changes (called “cuts”), and (b) gradual transitions

     between two different shots. This paper focuses on the

     problem of detecting cuts. Lienhart [6] states that a cut is

    defined as the direct concatenation of two shots with notransitional frames involved; cuts lead to a perceptible

    temporal visual discontinuity.

    One could argue that the problem of detecting cuts has

     been solved satisfactorily since many researchers reported

    very good recall and precision rates ranging up to 100%,

    where recall is the number of correctly detected cuts

    divided by the number of really existing cuts, and

     precision is the number of correctly detected cuts divided

     by the total number of detected cuts (including “false

    alarms”). However, in the test set used in our experiments

    there were many MPEG videos for which a high-quality

    cut detection algorithm produced a systematic detection

    error. It can be shown that there often is a bias in frame

    differences depending on MPEG specific frame types. In

    this paper, a method to handle such errors is proposed.

    The basic idea of our approach is to adequately normalize

    frame differences to increase the detection quality. The performance of our proposal will be demonstrated by

     presenting experimental results for examples taken from

    the MPEG-7 test content set that has been suggested as

    the standard test set for video segmentation research in

    [2]. Furthermore, it is worth mentioning that the approach

    is general in the sense that it can be used in conjunction

    with an entire class of cut detection algorithms, namely

    those based on pixel-wise or histogram-based frame

    difference metrics and operating on MPEG videos.

    This paper is organized as follows. In section 2, some

     basic principles of cut detection algorithms are explained

    and some recent developments are mentioned. In section

    3, the reason for a specific kind of errors will be discussed

    and our solution to this problem is presented: “Frame

    Difference Normalization” (FDN). Section 4 presents

    experimental results obtained with an implementation of a

     particular cut detection algorithm in different test

    conditions. Section 5 concludes the paper and outlines

    areas for future research.

    2. RELATED WORK

    2.1 Frame to frame differences and thresholds

    Considering Lienhart’s definition mentioned above it isreasonable to look at the differences between two

    consecutive frames to detect a cut. A large number of

    different metrics has been defined to estimate frame

    differences (e.g. [5], [6], [9]). A straightforward approach

    is to measure the differences between the particular pixels

    of consecutive frames, but this approach is very sensitive

    to object motion, camera motion, brightness changes and

    noise. Hanjalic [4] includes motion estimation and

    compensation for small sub-images to remove artifacts

     based on motion. Many approaches (e.g. [7] and [9])

     propose the usage of histograms since they are less

    sensitive to motion and the other events mentioned above.

  • 8/16/2019 Deteksi Mata rabun

    2/4

    The estimated difference between consecutive frames

    is commonly used to decide whether there is cut at frame

    k . Therefore, a threshold is used in the following way: If

    the frame difference from frame k-1 to frame k  exceeds a

    given value t , then there is cut at this position. However,

    applying a global threshold value t   to an entire sequenceresults in many false alarms and missed cuts. Furthermore,

    the problem of determining an appropriate value t must be

    solved. To address the first issue, Yeo and Liu [9] suggest

    a sliding window technique and use a local threshold

    within such a window. This window consists of 2*m+1 

    frame differences, for a small m > 0. To decide whether

    there is a cut at position k, the frame differences between

    the neighboring frames are taken into account. A peak at

    frame position k   is considered as a cut only if it is the

    maximum value and n times larger than the second largest

     peak in the window. This principle has been used in many

    variations ([4], [7], [8], [9]) and typically results in a

    robust detection performance.

    2.2 MPEG compressed domain and DC frames

    The overwhelming majority of digital video sequences is

    available in a compressed format, mainly MPEG. MPEG

    distinguishes between I, P, and B frames. An I-frame is a

    frame encoded independently of other frames. The

    encoding of a P-frame is based on either a previous I- or

    P-frame (called reference frame), while the encoding of a

    B-frame can be based on two reference frames, a previous

    as well as a subsequent I- or P-frame. For small pixel

     blocks in these P- and B-frames, motion vectors can beused that point to similar blocks in a reference frame.

    For cut detection it is reasonable to use the MPEG bit

    stream information with the lowest possible decoding

    cost. For example, the advantage of DC images, i.e. sub-

    images that consist of the dequantized DC coefficients of

    the DCT (discrete cosine transform) blocks, is that they

    still contain sufficient information for content analysis

    (because a DC coefficient is equivalent to the average

    value of a single DCT block consisting of 8*8 pixels)

    while the effort to extract them from the bit stream is

    much lower than for a complete frame. While the

    extraction of DC coefficients is trivial for I-frames, for P-

    and B-frames motion compensation has to beaccomplished. An approximation with a tolerable error is

     presented by Yeo and Liu [9] and Shen and Delp [7].

    Their approaches avoid decoding the blocks referenced by

    a motion vector.

    2.3 High-quality cut detection algorithms 

    In recent years, several studies addressed the problem of

    cut detection and reported very good detection results, e.g.

    the algorithm of Yeo and Liu [9]. Chen et al. determine

    the probability for a cut by applying a binary regression

    tree to a multi-dimensional feature vector for each frame

    [1], Hanjalic proposed a statistical approach [4]. The

    authors report very high detection rates, starting from

    92% recall and 94% precision in [1] up to 100% for both

    in [4]. Truong et al. [8] have proposed a local mean ratio filter  (recall: 97.9%; precision: 97.5%) as an enhancement

    to histogram-based cut detection algorithms in order to

    reduce noise in frame difference sequences.

    3. PROBLEM AND PROPOSED SOLUTION

    3.1 Problem: Consequences of MPEG frame encoding

    for cut detection algorithms

    MPEG encoders try to achieve a certain bit rate so there is

    not an arbitrary number of bits an encoder could use to

    encode a frame. Since an I-frame is encoded without any

    reference frame, the number of required bits is

     proportionally high compared to P- or B-frames (inter

    frames). Otherwise, the degree of accuracy can vary

    depending on the frame type due to effects of motion

    compensation, e.g. inaccuracy or quantization of

    macroblock differences. Thus, often a bias can be found

    in the estimated frame differences in MPEG videos. This

     bias depends on the properties of the frames that were

    involved in the calculation. The difference values can be

    estimated e.g. either with histogram or pixel based

    metrics. For a commonly used IPB-pattern like

    “IBBPBBPBBPBBIB...” there are at least five different

    frame type transitions: I to B, B to B, B to P, P to B and Bto I. For example, for the video sequence “riscos-sl.mpg”

    from the mentioned MPEG-7 test set the average

    difference values for specific frame transitions were

    calculated. These average values varied from 139.5 (B to

    B) up to 367.5 (B to I).

     Now, the problem can be pointed out more clearly. Let

    us assume using the approach from [9] with parameter

    m=5 and the threshold value n=2 that in general gives

    very good detection results. The frame difference position

    with the maximum value within a sliding window is

    accepted as a cut if it is n  times larger than the second

    largest value inside this window. In figure 1, the

    histogram differences are shown for a nearly completelyuneventful scene without any cut. Since the maximum

    difference between frame 2881 and 2882 (B->I transition)

    is more than n times larger than the second largest peak,

    our detector concludes that there is a cut and thus

     produces a false alarm. Many false alarms of this kind

    have been found in different videos. Instead of just

    increasing the parameter n which would result in a lower

    recall rate, a method is proposed now to handle such false

    alarms without reducing the recall rate.

  • 8/16/2019 Deteksi Mata rabun

    3/4

    We assume that the performance of the cut detection

    algorithms mentioned in the previous section would

    decrease if they were applied to the kind of MPEG videos

    exhibiting the noise pattern as described above. They are

     potentially affected because these algorithms use either

    histogram or pixel based metrics for frame differences.

    3.2 Solution: Frame Difference Normalization (FDN)

    Our approach to eliminate the kind of errors described

    above is called “Frame Difference Normalization“. First, a

    set S   of tuples for possible frame transition types is

    defined where I, P and B match the MPEG frame types:

    S={(I,I),(I,P),(I,B),(P,I),(P,P),(P,B),(B,I),(B,P),(B,B)} (1)

    Let  s  be in  S , and d i,s  the frame difference between

    frame i  and i+1  with the frame type transition  s  at this

     position. Now, the average difference value for eachspecific frame type transition  s is calculated. Let  f  be the

    number of frames in a given video, and let  f  s  be the

    number of frame transitions of type  s, then the average

    value for a frame transition type s is estimated as shown in

    formula (2), where p is a number between 0 and 1.

    =

  • 8/16/2019 Deteksi Mata rabun

    4/4

    Table 1: Results for the Yeo/Liu implementation and its FDN

    variant, both tested with different threshold values.

    The local mean ratio filter is supposed to reduce noise and

    accentuate peaks representing a cut. Both implementations

    were tested with different threshold values and the FDN

    technique as described in section 3 has been added to the

    Y/L implementation.

    A subset of the MPEG-7 test content has been used as

    video test material consisting of 10 MPEG-1 videos from

    different genres with a total length of over 400.000 frames

    containing 2807 hard cuts. This subset has been randomly

    chosen in order to get videos that were created withdifferent encoders and, in 5 out of 10 videos the noise

     pattern caused by IPB frame types was present. The

    results for the different tests are displayed in Table 1 and

    2, where the number of really existing cuts, the number of

    detected cuts, the number of false alarms and the total

    number of errors (missed cuts plus false alarms) are listed.

    FDN leads to a significant error reduction. Table 1

    demonstrates that there is a large reduction of false alarms

    in case of n=2 while keeping a high number of correctly

    detected cuts. This is an important aspect for those cases

    where a high rate of correct detections is desired. In the

    cases n>2 the total number of errors is reduced

    noticeably, too. The FDN method leads to lower error

    rates than our implementation of the more general local

    mean ratio filter (see Table 2). We conclude that for

    MPEG videos the systematic reduction of noise with FDN

    is more effective than with a local mean ratio filter.

    5. CONCLUSIONS

    A method to reduce noise caused by the IPB-frame pattern

    in MPEG streams has been presented in this paper.

    Considering this noise pattern and normalizing the

    difference values with the FDN method led to a noticeable

    error reduction. Overall, the systematic noise reduction inMPEG videos with FDN was superior to a more general

    noise filter, a local mean ratio filter. The advantage of

    FDN for MPEG videos is that it reduces a specific noise

     pattern while a general noise filter smoothes all difference

    values in the same way. Furthermore, it can be stated that

    FDN could be added to a number of different algorithms,

    like [1], [4], [7], [9]. In future work, the reasons for   the

    described IPB pattern should be analyzed in detail, e.g. by

    investigating factors as the impact of bit rate, the chosen

    encoder and so on. Considering such encoding parameters

    could enhance a lot of video analysis algorithms. Also, the

    Implement.

    Threshold

    LMI

    α=4

    LMI

    α=4.5

    LMI

    α=5

    LMI

    α=6

     No. of cuts 2807 2807 2807 2807

    Detected cuts 2680 2633 2591 2434

    False alarms 677 403 265 161

     Errors 804 577 481 534

    Table 2: Results for the Yeo/Liu implementation extended with

    a local mean ration filter tested with different threshold values.

    effects of adding the FDN technique to other cut detection

    algorithms should be analyzed. Finally, dissolve detection

    algorithms for MPEG videos could benefit from FDN. 

    6. ACKNOWLEDGEMENT

    This work is financially supported by the Deutsche

    Forschungsgemeinschaft (SFB/FK 615, Project MT). The

    authors would like to thank M. Grube and J. Waldhans for

    their implementation work and J. Gllavata, M. Gollnick,M. Grauer, F. Mansouri, E. Papalilo, R. Sennert and J.

    Wagner for their valuable support.

    7. REFERENCES

    [1] J.-Y. Chen, C. Taskiran, A. Albiol, E. J. Delp and C. A.

    Bouman, “ViBE: A Compressed Video Database Structured for

    Active Browsing and Search”, to appear in  IEEE Transactions

    on Multimedia, 2003.

    [2] T. Chua, M. Kankanhalli and Y. Lin, “A General Framework

    for Video Segmentation Based on Temporal Multi-Resolution

    Analysis” in  Proc. of Int’l Workshop on Advanced Image

    Technology, pp. 119-124, Fujisawa, Japan 2000.

    [3] U. Gargi, R. Kasturi and S. H. Strayer, “Performance

    Characterization of Video-Shot-Change Detection Methods” in

     IEEE Transaction on Circuits and Systems for Video

    Technology, Vol. 10, No. 1, pp. 1-13, 2000.

    [4] A. Hanjalic, “Shot Boundary Detection: Unraveled and

    Resolved?,”  IEEE Transactions on Circuits and Systems for

    Video Technology, Vol. 12, pp. 533-544, 2002.

    [5] I. Koprinska and S. Carrato, “Temporal Video Segmentation:

    A Survey” in Signal Processing: Image Communication 16

    (2001), pp. 477-500, 2001.

    [6] R. Lienhart, “Reliable Transition Detection in Videos: A

    Survey and Practitioner’s Guide.” to appear in  International

     Journal of Image and Graphics, 2003

    [7] K. Shen and E. J. Delp, „A Fast Algorithm for Video Parsing

    Using MPEG Compressed Sequences“, in  Proc. of IEEE ICIP

    1995, Washington, DC., pp. 252-255, 1995.

    [8] B. T. Truong, C. Dorai and S. Venkatesh, “New

    Enhancements to Cut, Fade, and Dissolve Detection Processes in

    Video Segmentation” in  Proc. ACM Multimedia 2000, pp. 219-

    227, 2000.

    [9] B. Yeo and B. Liu, “Rapid Scene Analysis on Compressed

    Video”  IEEE Transactions on Circuits and Systems for Video

    Technology, Vol. 5, No. 6, pp. 533-544, 1995. 

    Implement.

    Threshold

    Y/L

    n=2

    FDN

    n=2

    Y/L

    n=2.5 

    FDN

    n=2.5

    Y/L

    n=3

    FDN

    n=3

     No. of cuts 2807 2807 2807 2807 2807 2807

    Detected cuts 2654 2652 2560 2576 2439 2475

    False alarms 963 387 227 202 131 144

     Errors 1116 542 474 433 499 476