Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University

Video Shot Boundary Detectionat RMIT University

Timo Volkmer, Saied Tahaghoghi, and Hugh E. WilliamsSchool of Computer Science & IT, RMIT University

{tvolkmer, saied, hugh}@cs.rmit.edu.au

Overview

Our general approachThe moving query window

Details of the approachHow we measure frame similarityImprovements for 2004 cut detectionDetection of gradual transitions

EvaluationExperimental results

Conclusions

The Moving Query Window

A moving query window consists of two equal-sized half windows, surrounding a current frame

The moving query window is advanced through the video frame-by-frame

Current Frame Post FramesPre Frames

Cut detection and gradual transition detection is performed with separate decision stages during a single pass

Frame feature representation

We use one-dimensional, localised histograms with 4x4 regions in the HSV colour space (16 bins per colour component)

A colour histogram represents each frame region. Corresponding regions are compared

Different weights can be applied to each region during comparison

Cut detection

We disregard the four central regions of each frame to avoid the effect of rapid activity (that is, their weight = 0)

Using the remaining regions, each frame in the moving window is ranked by decreasing similarity to the current frame

Frame similarity is the sum of the inter-region similarities

The number of pre-frames that are ranked in the top half of the rankings is monitored

When a cut is passed, the number of top ranked pre-frames (usually) rises to a maximum and falls to a minimum within a few frames

We have determined an optimum window size and optimum thresholds that are effective for all our training sets

Our cut detection is (now) parameter free

Gradual transition detection

Pre-frames and post-frames are combined into two distinct sets of frames. The average distance of each set to the current frame is computed

We use all frame regions (with identical weights)

The ratio between the pre-frame set distance and the post-frame set distance, the PrePostRatio, is monitored

The end of most gradual transitions is indicated by a peak in the PrePostRatio curve

We maintain a moving average PrePostRatio for calculating a dynamic threshold to detect transitions

As a final decision step, we require a minimum difference between the last frame of the previous shot and the first frame of the new shot

PrePostRatio in detail

A schematised dissolve between a shot A and a shot B:

Pre-frames Current frame Post-frames

AAAAAA

AAA

A A A A A A

BBBBBB

A A A A A A A

BBB

BBB

A A A A A A A BBBBBB

A A A BBBBBB

A BBBBBB

A

B

PrePostRatio

minimal

slowly rising

steeply rising

maximum

falling

The PrePostRatio is usually minimal at the beginning of a gradual transition and rises up to a maximum at the end of the transition

PrePostRatio curve example

The curve shows two short gradual transitions and two cuts within a range of 1000 frames

0 100 200 300 400 500 600 700 800 900 1000

Frames

PrePostRatio

0

Training and Evaluation

We have trained on the TRECVID 2003 shot boundary test set

Main parameters for gradual transition detection are

The query window size

The size of the history buffer for dynamic thresholding

A threshold level factor

Results are discussed on the next slides. (We achieve similar and better results on the 2002 and 2001 test sets in blind runs.)

Results at TRECVID 2004

All CutsGradual

Transitions

SysID RecallPrecisi

onRecall

Precision

RecallPrecisio

n

rmit1 0.915 0.829 0.944 0.922 0.852 0.671

rmit2 0.901 0.850 0.944 0.921 0.810 0.714

rmit3 0.907 0.859 0.944 0.921 0.828 0.738

rmit4 0.893 0.870 0.944 0.921 0.783 0.762

rmit5 0.897 0.877 0.944 0.921 0.798 0.782

rmit6 0.883 0.885 0.944 0.921 0.753 0.802

rmit7 0.889 0.890 0.944 0.921 0.772 0.819

rmit8 0.871 0.893 0.944 0.921 0.715 0.824

rmit9 0.881 0.899 0.944 0.921 0.746 0.844

rmit10 0.860 0.900 0.944 0.921 0.681 0.844

Overall results

0.7 0.8 0.9 1.0Average Recall

0.7

0.8

0.9

1.0

Average Precision

1

23

45

678

910

11

RMIT, all transitions

Others, all transitions

Frame recall and precision for gradual transitions

0.6 0.7 0.8 0.9 1.0

Average Frame Recall

0.6

0.7

0.8

0.9

1.0

Average Frame Precision

1

2

3

4

5

6

7

8

9

10

11

RMITOthers

Discussion

Cut detection is highly effectiveThis year, recall is 94% and precision is 92%. Improvements from 2003 due to ignoring centre region

Gradual detection has improved significantly since 2003:Recall now between 68%--85%, precision 67%--84%High detection threshold favours precision, low favours recallShort detection threshold history length was found to be preferableFinal decision step reduces false positivesFor television news, we are able to use a fixed moving query window size of 24 framesExperimented with a simple ASR technique in 10 additional runs, which removed detected transitions that coincided with spoken words. Ad hoc, very unsuccessful…

Conclusions

Disregarding the focus area of frames for cut detection has improved our results by 3% in recall and 9% in precision

Our parameter-free ranking scheme is highly effective in cut detection on a wide variety of footage

Our gradual transition detection method is relatively simple and needs only few parameters

The additional, final preprocessing step reduces false positives and improved results significantly

The use of localised histograms and more dynamic thresholding also improved results in gradual transition detection

Our approach is computationally inexpensive, simple to implement, and effective

15,500 seconds to process the video (around 4 hours, 18 minutes)

Questions?

Documents

Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University