Upload
aleesha-flowers
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Video Shot Boundary Detectionat RMIT University
Timo Volkmer, Saied Tahaghoghi, and Hugh E. WilliamsSchool of Computer Science & IT, RMIT University
{tvolkmer, saied, hugh}@cs.rmit.edu.au
Overview
Our general approachThe moving query window
Details of the approachHow we measure frame similarityImprovements for 2004 cut detectionDetection of gradual transitions
EvaluationExperimental results
Conclusions
The Moving Query Window
A moving query window consists of two equal-sized half windows, surrounding a current frame
The moving query window is advanced through the video frame-by-frame
Current Frame Post FramesPre Frames
Cut detection and gradual transition detection is performed with separate decision stages during a single pass
Frame feature representation
We use one-dimensional, localised histograms with 4x4 regions in the HSV colour space (16 bins per colour component)
A colour histogram represents each frame region. Corresponding regions are compared
Different weights can be applied to each region during comparison
Cut detection
We disregard the four central regions of each frame to avoid the effect of rapid activity (that is, their weight = 0)
Using the remaining regions, each frame in the moving window is ranked by decreasing similarity to the current frame
Frame similarity is the sum of the inter-region similarities
The number of pre-frames that are ranked in the top half of the rankings is monitored
When a cut is passed, the number of top ranked pre-frames (usually) rises to a maximum and falls to a minimum within a few frames
We have determined an optimum window size and optimum thresholds that are effective for all our training sets
Our cut detection is (now) parameter free
Gradual transition detection
Pre-frames and post-frames are combined into two distinct sets of frames. The average distance of each set to the current frame is computed
We use all frame regions (with identical weights)
The ratio between the pre-frame set distance and the post-frame set distance, the PrePostRatio, is monitored
The end of most gradual transitions is indicated by a peak in the PrePostRatio curve
We maintain a moving average PrePostRatio for calculating a dynamic threshold to detect transitions
As a final decision step, we require a minimum difference between the last frame of the previous shot and the first frame of the new shot
PrePostRatio in detail
A schematised dissolve between a shot A and a shot B:
Pre-frames Current frame Post-frames
AAAAAA
AAA
A A A A A A
BBBBBB
A A A A A A A
BBB
BBB
A A A A A A A BBBBBB
A A A BBBBBB
A BBBBBB
A
B
PrePostRatio
minimal
slowly rising
steeply rising
maximum
falling
The PrePostRatio is usually minimal at the beginning of a gradual transition and rises up to a maximum at the end of the transition
PrePostRatio curve example
The curve shows two short gradual transitions and two cuts within a range of 1000 frames
0 100 200 300 400 500 600 700 800 900 1000
Frames
PrePostRatio
0
Training and Evaluation
We have trained on the TRECVID 2003 shot boundary test set
Main parameters for gradual transition detection are
The query window size
The size of the history buffer for dynamic thresholding
A threshold level factor
Results are discussed on the next slides. (We achieve similar and better results on the 2002 and 2001 test sets in blind runs.)
Results at TRECVID 2004
All CutsGradual
Transitions
SysID RecallPrecisi
onRecall
Precision
RecallPrecisio
n
rmit1 0.915 0.829 0.944 0.922 0.852 0.671
rmit2 0.901 0.850 0.944 0.921 0.810 0.714
rmit3 0.907 0.859 0.944 0.921 0.828 0.738
rmit4 0.893 0.870 0.944 0.921 0.783 0.762
rmit5 0.897 0.877 0.944 0.921 0.798 0.782
rmit6 0.883 0.885 0.944 0.921 0.753 0.802
rmit7 0.889 0.890 0.944 0.921 0.772 0.819
rmit8 0.871 0.893 0.944 0.921 0.715 0.824
rmit9 0.881 0.899 0.944 0.921 0.746 0.844
rmit10 0.860 0.900 0.944 0.921 0.681 0.844
Overall results
0.7 0.8 0.9 1.0Average Recall
0.7
0.8
0.9
1.0
Average Precision
1
23
45
678
910
11
RMIT, all transitions
Others, all transitions
Frame recall and precision for gradual transitions
0.6 0.7 0.8 0.9 1.0
Average Frame Recall
0.6
0.7
0.8
0.9
1.0
Average Frame Precision
1
2
3
4
5
6
7
8
9
10
11
RMITOthers
Discussion
Cut detection is highly effectiveThis year, recall is 94% and precision is 92%. Improvements from 2003 due to ignoring centre region
Gradual detection has improved significantly since 2003:Recall now between 68%--85%, precision 67%--84%High detection threshold favours precision, low favours recallShort detection threshold history length was found to be preferableFinal decision step reduces false positivesFor television news, we are able to use a fixed moving query window size of 24 framesExperimented with a simple ASR technique in 10 additional runs, which removed detected transitions that coincided with spoken words. Ad hoc, very unsuccessful…
Conclusions
Disregarding the focus area of frames for cut detection has improved our results by 3% in recall and 9% in precision
Our parameter-free ranking scheme is highly effective in cut detection on a wide variety of footage
Our gradual transition detection method is relatively simple and needs only few parameters
The additional, final preprocessing step reduces false positives and improved results significantly
The use of localised histograms and more dynamic thresholding also improved results in gradual transition detection
Our approach is computationally inexpensive, simple to implement, and effective
15,500 seconds to process the video (around 4 hours, 18 minutes)