Upload
adin
View
44
Download
0
Embed Size (px)
DESCRIPTION
Video Summarization Using Mutual Reinforcement Principle and Shot Arrangement Patterns. Shi Lu, Michael R. Lyu and Irwin King {slu, lyu, King}@cse.cuhk.edu.hk Department of computer science and Engineering The Chinese University of Hong Kong Shatin N.T. Hong Kong Jan. 12, 2005. Outline. - PowerPoint PPT Presentation
Citation preview
MMM2005MMM2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
1
Video Summarization Using Mutual Reinforcement Principle and Shot
Arrangement Patterns Shi Lu, Michael R. Lyu and Irwin King
{slu, lyu, King}@cse.cuhk.edu.hkDepartment of computer science and Engineering
The Chinese University of Hong KongShatin N.T. Hong Kong
Jan. 12, 2005
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
2OutlineIntroduction
Background and motivation Goals The Proposed Method Video structure analysis Video shot arrangement patterns Mutual reinforcement principle Video skim selectionExperiment results
Conclusion
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
3Background and MotivationHuge volume of video data are distributed over the WebBrowsing and managing the huge video database are time consumingVideo summarization helps the user to quickly grasp the content of a videoTwo kinds of applications:
Dynamic video skimming Static video summary
We mainly focus on generating dynamic video skimming for movies
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
4Goals
Goals for video summarization Conciseness
Given the target length of the video skim Content coverage
Visual diversity and temporal coverage Balanced structural coverage
Visual coherence
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
5Workflow...
Raw video
......
......
...
Video shots
Video scene boundaries
Videosegmentation
Structureanalysis
...... ...
...
...
...
Find the shotarrangement patterns
Select importantpatterns
Sub skims
Final skim
...
...
Concatenate
Importance score by Mutualreinforcement
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
6Video StructureVideo narrates a story just like an article does
Video (story) Video scenes (paragraph) Video shot groups Video shots (sentence) Video frames
Can be built from bottom to up
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
7Video Scene FormationLoop scenes and progressive scenes
Group the visually similar video shots into groups ToC method by Y. Rui, et al Spectral graph partitioning by J. B. Shi, et al
Intersected groups forms loop scenes
Loop scenes depict an event happened at a place Progressive scenes: “transition” between events or dynamic eventsSummarize each video scene respectively
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
8Video Scene AnalysisScene importance: length and complexityContent entropy for loop scenesMeasure the complexity for a loop scene
For progressive scenes, we only consider its length
)log()(i
j
i
j
Sc
Sg
j Sc
Sgi l
l
l
lScEntropy
Length of a member video shot group
Total length of the video scene
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
9Skim Length DistributionDetermine each video scene’s target skim length, given
Determine each progressive scenes’ skim length If , discard it, else
Determine each loop scenes’ skim length If ,discard it
Redistribute to remaining scenes
1tLLlv
vsSci
v
vsScvs
i
LLlL
i
2)()(
' tScEntropylScEntropyl
LL
jjSc
iScvsvs
i
j
i
vsL'
vsL
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
10Shot Arrangement PatternsThe way the director arrange the video shots conveys his intention For each scene, video shot group labels form a string (e.g 1232432452……)K-Non-Repetitive String (k-nrs)Minimal content redundancy and visually coherent—good video skim candidatesString coverage {3124} covers {312,124,31,12,24,3,1,2,4}
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
11Shot Arrangement PatternsSeveral detected nrs strings
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
12Video SemanticsLow level features and high level concepts: semantic gapSummary based on low level features is not able to ensure the perceived qualitySolution: obtain video semantic information by manual/semi-automatic annotationUsage: Retrieval Summary
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
13Video SemanticsConcept representation for a video shot
The most popular question: who has done what?
The two major contexts: who, what action
Concept term and video shot description (user editable and reusable)
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
14Video SemanticsConcept term and video shot description
Term (key word): denote an entity, e.g. “Joe”, “talking”, “in the bank”
Context: “who”, “what action”… Shot description: the set comprising all the
concept terms that is related to the shot Obtained by semi-automatic or video annotation
}....{ 1 ntt
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
15Mutual Reinforcement How to measure the priority for a set of concept terms and a set of descriptions? A more important description should contain
more important terms; A more important term should be contained
by more important descriptionsMutual reinforcement principle
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
16Mutual ReinforcementLet W be the weight matrix describes the relationship between the term set and shot description set (elements in W can have various definitions, e.g. the number of occurrence of a term in a description)Let U,V be the vector of the importance value of the concept term set and video shot description set
We have
Where and are constants.U and V can be calculated by SVD of W
,1
1
WVk
U UWk
V T
2
1
}{ id }{ it
1k 2k
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
17Mutual ReinforcementFor each semantic context:We choose the singular vectors correspond to W ’s largest singular value as the importance vector for concept terms and sentencesSince W is non-negative , the first singular vector V will be non-negative
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
18Mutual ReinforcementImportance calculation on 76 video shotsBased on context “who”
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
19Video SummarizationBased on the result of mutual reinforcement, we can determine the relational priority between video shots
The generated skim can ensure the semantic contents coverage
VVV whowhat
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
20Video Skim SelectionInput: the decomposed nrs string set from a scene and the importance scoresdo Select the most important k-nrs string into the skim shot set Remove those nrs strings from the original set covered by the selected stringUntil the target skim length is reached
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
21Video Skim Selection
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
22EvaluationSubjective experiment:15 people were invited to watch video skims generated from 4 videos with skim rate 0.15 and 0.30Questions about main actors and key events: Who has done What? (Meaningfulness score) Which skim looks better? (favorite score)Compared with our previous graph based algorithmAchieve better coherency
MMM 2005MMM 2005 The Chinese University of Hong KongThe Chinese University of Hong Kong
23SummaryA novel dynamic video summarization method is proposed Video structure analysis
Determine video scene boundaries Analyze the shot arrangement patterns Scene complexity and target skim length
Mutual reinforcement Utilizing the semantic information An importance measure for video shot patterns
Video skim selection