22
Multimedia Database Management System - Chapter 7 Video Indexing and Retrieval Rachmat Wahid Saleh Insani, S.Kom

Video Indexing and Retrieval

Embed Size (px)

Citation preview

Page 1: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

VideoIndexing and Retrieval

Rachmat Wahid Saleh Insani, S.Kom

Page 2: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Video

A complete video consist of: • Subtitle. • Sound track. • Images recorded or played out continuously at a

fixed rate.

Page 3: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Video IR Method• Metadata-based method

Structured metadata using traditional DBMS.

• Text-based method Associated subtitles using IR technique.

• Audio-based method Associated soundtracks using Audio IR.

• Content-based method Video as collection of independent images. Video sequences divides into groups of similar frames.

• Integrated approach Combination of two or more above methods.

Page 4: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Shot-based Video IR

Video is made of video shot. A shot is a sequence of contiguous frames have one or more following features:

• The frames depict the same scene. • The frames signify a single camera operation. • The frames contain a distinct event. • The frames are chosen as a single indexable entity

by the user.

Page 5: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Shot-based Video IRShot-based video IR consist of few steps:

• Segment the video into shots This is called video temporal segmentation

• Index each shot The steps: identify keyframes for each shot, and use image IR

• Apply a similarity measurement between queries and video shots, then retrieve shots with high similarities Use image IR based on feature vectors in 2nd step

Page 6: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Video Shot Detection or Segmentation

Segmentation is a process for dividing a video sequence into shots. Require suitable quantitative measure that captures the differences between a pair of frames. If the differences exceeds a threshold, it may be interpreted as indicating a segment boundary. Establishing suitable differences metrics and technique for applying them are the key issues in automatic partitioning.

Page 7: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Basic Video Segment Techniques

The key issue is how to measure the frame-to-frame differences. There are two basic techniques: • Sum pixel-to-pixel differences between neighbouring

frames If the sum is larger than a press threshold, a shot boundary exist between these two frames

• Measures colour histogram distance between neighbouring frames Object motion causes histogram differences. If large differences is found, a camera break occurred

Page 8: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Video Indexing and Retrieval

Shots are needs to be represented and indexed, in order to locate and retrieved the shots quickly. We represents each shot by one or more keyframes or r frames (representative frames). Video retrieval is based on similarity between the query and r frames.

Page 9: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

IR Based on r Frames of Video Shots

How to represent and index each shot in IR based on r frames? 1. Use r frames (representative frames) to represent each

shot. 2. During retrieval, features on the r frames are

compared with queries. 3. If the frames is similar/relevant, it is presented to the

user. How many representative frame(s) should be used in a shot, and how we select the representative frames?

Page 10: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

How Many R Frames Should be Used in a Shot?

There are few methods: • Uses one r frames per shot. • Assigns the number of r frames to shots according to

their length. • Divides a shot into subshots, and assign one r frame

into each subshot. Subshot are detected based on changes on the content.

Page 11: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

How to Select the R Frames?

There are few methods: • Use first frame for each segment/shot as the r frame. • Use a frame which is similar with average frame.

Each pixel in this frame is the average of pixel values at the same grid point in all frames of the segment.

• Use a frame whose its histogram is closest to the average histogram. The histograms of all the frames in the segment are averaged.

Page 12: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

IR Based on Motion Information

• Motion information is derived from optical flow or motion vectors. Parameters used are: • Motion content

Total amount of motion within a given video

• Motion uniformity Smoothness of the motion within a video as a function of time

• Motion panning Left-to-right or Right-to-left motion of the camera

• Motion tilting Vertical motion component of the motion within a video sequence

Page 13: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

IR Based on ObjectsObject is a group of pixels that move together. To index the video, use the segmented objects. Object motion can help to construct a description of that motion for use in subsequent retrieval of the video shot. Object-based video IR is easy when the video is compressed using the MPEG-4 object-based coding standard.

Page 14: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

MPEG-4 Object based Coding Standard

An MPEG-4 video session (VS) is a collection of one or more VOs. A VO consists of one or more video object layers (VOLs). Each VOL consists of an ordered sequence of snapshots in time called the video object planes (VOP) A VOP is a semantic object in the scene containing shape and motion information. Accompanying each VOP is the composition that indicates where and when each VOP is to be displayed

Page 15: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

MPEG-4 Object based Coding Standard

To index MPEG-4 compressed video, we use the following parameters: • The birth and death frames of individual objects. • Global motion characteristics/camera operations

observed in the scene. • Representative key frames that capture the major

transformations each object undergoes. • The dominant motion characteristics of each object

throughout its lifetime.

Page 16: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

IR Based on Metadata

Video IR based on metadata is using conventional DBMS. Metadata such as: title, video type, directors, genre, etc.

Page 17: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

IR Based on Annotation

Video IR based on annotation is using IR technique. Annotation is obtained in ways: • Video is manually interpreted and annotated.

A time consuming task, automatic high-level video content understanding is currently not possible for general video.

• Videos have associated transcripts and subtitles. • Speech recognition within video.

Extract spoken words for indexing and retrieval.

Page 18: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Integrated Video Indexing and Retrieval

Combining all techniques discussed before.

Page 19: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Effective Video Representation and Abstraction

Video representation and abstraction tools have three main applications:

• First, the application is video browsing • Second, the application is for presentation of video

retrieval results. • Third, the application is to reduce network

bandwidth requirements and delay. How do we organize and compactly represents the video?

Page 20: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Topical or Subject Classification

Organize based on subject classification.

Page 21: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

StoryboardA storyboard is a collection of representative frames that faithfully represent the main events and action in a video. The main difference is that a clipmap shows three-dimensional micons but a storyboard shows only representative frames. A storyboard is 2 orders of magnitude smaller (in storage requirements) than the corresponding compressed entire video. This greatly reduces the bandwidth and delay required to deliver the information over a network for quick preview or browsing.

Page 22: Video Indexing and Retrieval

Multimedia Database Management System - Chapter 7

Scene Transition GraphThe scene transition graph (STG) is a directed graph structure that compactly captures both the content and temporal flow of video. An STG consists of a number of nodes connected by directed edges. Each node is represented by a typical image and represents one or more video shots. Directed edges indicate the content and temporal flow of the video.