16
Lecture 12: Video Representation, Summarisation, and Query Dr Jing Chen NICTA & CSE UNSW CS9519 Multimedia Systems S2 2006 [email protected] COMP9519 Multimedia Systems – Lecture 12 – Slide 2 – J Chen Last week … Structure of video Frame Shot Scene Story Why video structure analysis Transition effect between shots Shot segmentation Scene segmentation COMP9519 Multimedia Systems – Lecture 12 – Slide 3 – J Chen Last Week-- A diagram of video structure video data ... ... shot #1 shot #2 shot #3 shot #4 shot #19 shot #20 shot #21 shots Scenes (stories) ... scene #1 scene #2 scene #8 keyframe keyframe keyframe keyframe keyframe keyframe keyframe * H.B.Kang, Video Abstraction Techniques For A Digital Library COMP9519 Multimedia Systems – Lecture 12 – Slide 4 – J Chen Last Week -- Structure of Video Sequence story scene1 scene2 shot1 shot2 shot3 shot4 shot5 frames1,2,………………………………………………………………N

Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

Lecture 12: Video Representation,

Summarisation, and Query

Dr Jing ChenNICTA & CSE UNSW

CS9519 Multimedia SystemsS2 2006

[email protected]

COMP9519 Multimedia Systems – Lecture 12 – Slide 2 – J Chen

Last week …Structure of video

FrameShotSceneStory

Why video structure analysisTransition effect between shotsShot segmentationScene segmentation

COMP9519 Multimedia Systems – Lecture 12 – Slide 3 – J Chen

Last Week-- A diagram of video structure

video data

...

...

shot #1 shot #2 shot #3 shot #4 shot #19 shot #20 shot #21

shots

Scenes (stories)

...

scene #1 scene #2 scene #8

keyframe keyframe keyframe keyframe keyframe keyframe keyframe

* H.B.Kang, Video Abstraction Techniques For A Digital Library

COMP9519 Multimedia Systems – Lecture 12 – Slide 4 – J Chen

Last Week -- Structure of Video Sequence

story

scene1 scene2

shot1 shot2 shot3 shot4 shot5

frames1,2,………………………………………………………………N

Page 2: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 5 – J Chen

Last Week- Why video structure analysis?Typical video retrieval system diagram

Some applications:Indexing and browsingNon-linear editingEvent detection

Video data

Segmentation

Key frame computation

Feature extraction

Color Motion Shape …

Video query, retrieval and production

Video browsing

* Yan Liu & Fei Li

COMP9519 Multimedia Systems – Lecture 12 – Slide 6 – J Chen

Last Week -- Types of Shot Transition

CutFadeDissolveWipe

COMP9519 Multimedia Systems – Lecture 12 – Slide 7 – J Chen

Last Week -- video shot segmentation methods

Spatial domain approachesPixel domain -- Frame differencingMotion compensated frame differencingHistograms (global, joint and local)Model driven

Compressed domain approachesDCT coefficientsDC termsMB types of B frame

COMP9519 Multimedia Systems – Lecture 12 – Slide 8 – J Chen

Last Week -- Thresholding vs Clustering based shot segmentation

ThresholdingLocal decision (based on the info of very few frames)Thresholds are typically highly sensitive to the type of input video

ClusteringView shot segmentation as a k-class unsupervised clustering problemAssign frames to one of the k classes via k-meansGlobal decisionNot only eliminates the need for threshold setting but also allows multiple features to be used simultaneously to improve the performance

Page 3: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 9 – J Chen

Last Week – Scene SegmentationReference frame (R-Frame)

Distance between two shots

Clustering of Shots – Visual Dissimilarity

R-frames for shot i RAi RBi RCi . . .

R-frames for shot j RAj RBj RCj RCj. . .

COMP9519 Multimedia Systems – Lecture 12 – Slide 10 – J Chen

Last Week – A diagram of scene segmentation with scene transition graph

COMP9519 Multimedia Systems – Lecture 12 – Slide 11 – J Chen

OutlineWe have covered

Feature extractionImage/video retrieval systemVideo structure analysis

Q: how do we visualise the resultsHow to browse large video files?How to present the retrieval results?

RequirementsUnder the constraint of the available screen spaceEfficiently and user friendlyBrowse a large number of videos (1000s)

COMP9519 Multimedia Systems – Lecture 12 – Slide 12 – J Chen

Outline

StoryboardsHierarchical browserVideo directory browserVideo summaryVideo skimmingThumbnail imagesBrowsing many clips

Concept of video query

Page 4: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 13 – J Chen

Video browser/retrieval interface

Traditional interfaces:storyboards

How to show the temporal variation in the video?child storyboardanimated gifsskimshierarchical browser

* H. Sundaram

COMP9519 Multimedia Systems – Lecture 12 – Slide 14 – J Chen

Visual Browsing Example 1Lateral browser surrounding temporal browser, courtesy of Imperial College London

Carnegie Mellon

COMP9519 Multimedia Systems – Lecture 12 – Slide 15 – J Chen

Visual Browsing Example 2Best “people” shots, Carnegie Mellon Informedia system:

COMP9519 Multimedia Systems – Lecture 12 – Slide 16 – J Chen

StoryboardsSay this is the result of a search for video data..

What are the problems here?

* H. Sundaram

Page 5: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 17 – J Chen

Child storyboardsOnce we click on each thumbnail, this action pops up a child storyboard.

the storyboard shows the temporal behavior of the video

15 min of video.still no audio!clicking plays the shothow to select key-frames?

clustering

* H. Sundaram

COMP9519 Multimedia Systems – Lecture 12 – Slide 18 – J Chen

Key-frame SelectionConsiderations

Flexibility (number and level)Fidelity (content comprehension)

ApproachesFixed number, fixed spacingFirst/last frame, clean frameDifference, motionClustering

Cluster all frames using complete link algorithm (FXPAL)Use the maximum of the pair-wise distances between frames to determine the inter-cluster similarity, and produces small, tightly bound clusters

* H. Sundaram

COMP9519 Multimedia Systems – Lecture 12 – Slide 19 – J Chen

More Advance Browsing and Summarization Schemes

Hierarchical browser

Video directory browser

Video summary

Video skimming

Thumbnail images

Browsing many clips

COMP9519 Multimedia Systems – Lecture 12 – Slide 20 – J Chen

Key-frame Based Hierarchical Video Browser

* H.J.Zhang et al, Video Parsing, Retrieval and Browsing: An Integrated and Content-Based Solution, ACM MM 2005

Page 6: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 21 – J Chen

Visual Summaries for Video News Cluster at the Scene Level

* ClassView paper

COMP9519 Multimedia Systems – Lecture 12 – Slide 22 – J Chen

Data Structure and Browser Layout for Key-Frame Based Hierarchical Browser

COMP9519 Multimedia Systems – Lecture 12 – Slide 23 – J Chen

Scene Transition Graph

COMP9519 Multimedia Systems – Lecture 12 – Slide 24 – J Chen

Web-based Video Directory Browser (FX-PAL)

Page 7: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 25 – J Chen

Key Frames Attached to Time ScaleThe positions of the key-frames are marked by blue triangles along a mouse-sensitive time scale adjacent to the key-frame

As the mouse moves over the time scale, the key-frame for the corresponding time is shown and the triangle for that key-frame turns red.

COMP9519 Multimedia Systems – Lecture 12 – Slide 26 – J Chen

Mapping Confidence Scores to Gray LevelsMetadata: annotation of audio/videoTranslate metadata values into “confidence score”Present the confidence score by levels of gray

High confidence areas are marked in blackAreas of lower confidence fade progressively to white

COMP9519 Multimedia Systems – Lecture 12 – Slide 27 – J Chen

Confidence Score DisplayFeatures can be selected from a pull-down menu

COMP9519 Multimedia Systems – Lecture 12 – Slide 28 – J Chen

Metadata media player

Page 8: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 29 – J Chen

Demo of video directory browserhttp://www.fxpal.com/?p=mbase

COMP9519 Multimedia Systems – Lecture 12 – Slide 30 – J Chen

Summary of videoPresent key-frames (images) in a compact, visually pleasing display

Given :2D space constraints,Key-frame setimportance measures

What is the best display layout?Comic book concept

Issues:Time Order vs. Layout OrderPreserve high-level structuresImportance Measures

?

Video Manga (FXPAL)

COMP9519 Multimedia Systems – Lecture 12 – Slide 31 – J Chen

Discard Key-frames (in Video Manga)Key frame extraction

Starting pointmentioned earlier

Too many key-frames!Calculate an importance score for each segment based on its rarity and duration.Longer shots are preferred because they are likely to be important in the video.Repeated shots receive lower scores

They do not add much to the summary even if they are long. Segments with an importance score higher than a threshold are selected to generate a pictorial summary. For each segment chosen, the frame nearest the center of the segment is extracted as a representative key-frame.Frames are sized according to the importance measure of their originating segments

Higher importance segments are represented with larger key-frames.

COMP9519 Multimedia Systems – Lecture 12 – Slide 32 – J Chen

Key-frame PackingArrange the key frames in a logical order

Fit the available space efficiently

Page 9: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 33 – J Chen

Video Manga (FX-PAL)

COMP9519 Multimedia Systems – Lecture 12 – Slide 34 – J Chen

Web-based Interactive Video SummaryBrowse the video based on either key-frames or the timelinePop up captions as the mouse moves over an image

Space saving

COMP9519 Multimedia Systems – Lecture 12 – Slide 35 – J Chen

Playing videoClicking on a key-frame starts video playback from the beginning of that segment

COMP9519 Multimedia Systems – Lecture 12 – Slide 36 – J Chen

Demo of Video Mangahttp://www.fxpal.com/?p=mbase

Page 10: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 37 – J Chen

Video skimsA temporal, multimedia abstraction that incorporates both video and audio information from a longer source.Goal:

to communicate the essential content of a video in an order of magnitude less time.

COMP9519 Multimedia Systems – Lecture 12 – Slide 38 – J Chen

Generalized Video Skim Creation Process (CMU-Informedia)

COMP9519 Multimedia Systems – Lecture 12 – Slide 39 – J Chen

Audio/video alignment in skimsDefault skim (DEF)

Dropping video at regular intervalsImage centric skim (IMG)

Emphasizes visual content, decomposing the source into component shots, detecting “important” objects, such as faces and text, and identifying structural motion within a shot

Audio-centric skim (AUD) Derives solely from audio information. Automatic speech recognition and alignment techniques register the audio track to the video’s text transcript.

“integrated best” skim (BOTH) merges the image centric and audio-centric approaches while maintaining moderate audio/video synchrony. Top-rated audio regions are selected as in the AUD skimThe audio is then augmented with imagery selected using IMG heuristics from a temporal window extending five seconds before and after the audio region.

COMP9519 Multimedia Systems – Lecture 12 – Slide 40 – J Chen

“Skim Video”: Extracting Significant Content

Skim Video (78 frames)

Original Video (1100 frames)

Page 11: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 41 – J Chen

The informedia skimsTemporal abstraction

motivates viewersTime compression

preserve essential dataSegments with matching words are combinedEach segment is extended based on the “goodness scores” of the ending point, until the time budget is reachedIssues:

Choppy presentationTemporal syntax (e.g., dialog)Early cutout of sentence, scene, audio

*SundaramCOMP9519 Multimedia Systems – Lecture 12 – Slide 42 – J Chen

Thumbnail images

COMP9519 Multimedia Systems – Lecture 12 – Slide 43 – J Chen

Empirical Study Into Thumbnail Images

COMP9519 Multimedia Systems – Lecture 12 – Slide 44 – J Chen

Text-based Result List

Page 12: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 45 – J Chen

“Naïve” Thumbnail List (Uses First Shot Image)

COMP9519 Multimedia Systems – Lecture 12 – Slide 46 – J Chen

Query-based Thumbnail Result List

COMP9519 Multimedia Systems – Lecture 12 – Slide 47 – J Chen

Query-based Thumbnail Selection Process

1. Decompose video segment into shots.2. Compute representative frame for each shot.

3. Locate query scoring words (shown by arrows).4. Use frame from highest scoring shot.

COMP9519 Multimedia Systems – Lecture 12 – Slide 48 – J Chen

Thumbnail Study Results

0

500

1000

Text First Query

Time (secs.)

0

100

200

300

400

Text First Query

Score (max =400)

0

25

50

75

Text First Query

Titles Browsed

1

3

5

7

9

Text First Query

1(terrible)-9(wonderful)

© Copyright 2003 Michael G. Christel 48 CarnegieMellon

Page 13: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 49 – J Chen

Empirical Study Summary*

Significant performance improvements for query-based thumbnail treatment over other two treatmentsSubjective satisfaction significantly greater for query-based thumbnail treatmentSubjects could not identify differences between thumbnail treatments, but their performance definitely showed differences!

_____*Christel, M., Winkler, D., and Taylor, C.R. Improving

Access to a Digital Video Library. In Human-Computer Interaction: INTERACT97, Chapman & Hall, London, 1997, 524-531

COMP9519 Multimedia Systems – Lecture 12 – Slide 50 – J Chen

Thumbnail View with Query Relevance Bar

© Copyright 2003 Michael G. Christel 50 CarnegieMellon

COMP9519 Multimedia Systems – Lecture 12 – Slide 51 – J Chen

Close-up of Thumbnail with Relevance Bar

Relevance score of [0, 100]This document has score of 30

Color-coded scoring words:“Asylum” contributes some,

“rights” a bit,“refugee” contributes 50%

Query-based thumbnail

Shortcut to storyboard* Christel

COMP9519 Multimedia Systems – Lecture 12 – Slide 52 – J Chen

Match bars

Page 14: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 53 – J Chen

Using Match Info to Reduce Storyboard Size

COMP9519 Multimedia Systems – Lecture 12 – Slide 54 – J Chen

Browsing many files (FXPAL)

COMP9519 Multimedia Systems – Lecture 12 – Slide 55 – J Chen

Video editing user interface (FXPAL)The top display lets users select clips from the raw video.The bottom display lets the users organize the clips along the timeline and change the lengths of the clips.

COMP9519 Multimedia Systems – Lecture 12 – Slide 56 – J Chen

Flipping through images in a pileCluster all clips by the similarity of their color histograms

Place similar clips into the same pile

Each clip is represented by one key-frame in a pile

The clips are stacked in temporal order.

Page 15: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 57 – J Chen

Expanding a pile of video clipsTo see the additional images in a pile, the user can expand the pile by clicking on it. The current display is faded out and the images of the pile are shown in an area in the middle of the faded out display. The timeline displays the coverage of the expanded view in lightgray and the coverage of the pile in a darker color as before.

COMP9519 Multimedia Systems – Lecture 12 – Slide 58 – J Chen

Outline

StoryboardsHierarchical browserVideo directory browserVideo summaryVideo skimmingThumbnail imagesBrowsing many clips

Concept of video query

COMP9519 Multimedia Systems – Lecture 12 – Slide 59 – J Chen

Feature-based Similarity Search

Video Query

COMP9519 Multimedia Systems – Lecture 12 – Slide 60 – J Chen

Query typesPoint query

specifies a point in the data space and retrieves all point objects in the database with identical coordinates:

Range queryGiven a query point Q, a distance r, and a distance function M, retrieve all points P from the database, which have a distance smaller or equal to r from Q according to M:

Nearest neighbor queryGiven a query point Q, retrieve the nearest neighbor point P from the database, ie, find object

K-nearest neighbor queryGiven a query point, return the k nearest neighbor points

Page 16: Last week … Lecture 12: Video Representation, zFrame …cs9519/lecture_notes_06/L12_COMP... · 2006. 10. 19. · Lecture 12: Video Representation, Summarisation, and Query Dr Jing

COMP9519 Multimedia Systems – Lecture 12 – Slide 61 – J Chen

Distance functionsEuclidean (L2) Manhattan (L1)Maximum (L∞)Weighted Euclidean Weighted maximum Ellipsoid where W is a positive definite similarity matrix

COMP9519 Multimedia Systems – Lecture 12 – Slide 62 – J Chen

Query without index

Sequential scanSequentially scan through all records in the database

Size of databaseStorage cost is O(dn), where d is the dimensionality of a record, n is the size of the DB, assuming floating point data

The time to process a query is O(dn)Infeasible for a large database with millions of records!

Q: a better solution to search? Index

COMP9519 Multimedia Systems – Lecture 12 – Slide 63 – J Chen

ConclusionStoryboardsHierarchical browserVideo directory browserVideo summaryVideo skimmingThumbnail imagesBrowsing many clips

Concept of video query

COMP9519 Multimedia Systems – Lecture 12 – Slide 64 – J Chen

Some referencesChapter 4 of Book Multimedia Information Retrieval and Management An Interactive Comic Book Presentation for Exploring Video. John Boreczky, Andreas Girgensohn, Gene Golovchinsky, and Shingo Uchihashi in CHI 2000 Conference Proceedings, ACM Press, pp. 185-192, 2000., April 1, 2000 Christel, M., Smith, M., Taylor, C.R., and Winkler, D. Evolving Video Skims into Useful Multimedia Abstractions. In Proc. ACM CHI ’98 (Los Angeles, CA, April 1998), ACM Press, 171-178Christel, M., Winkler, D., and Taylor, C.R. Improving Access to a Digital Video Library. In Human-Computer Interaction: INTERACT97, Chapman & Hall, London, 1997, 524-531