View
225
Download
5
Category
Tags:
Preview:
Citation preview
Multimedia Information Retrieval
Modern Information Retrieval Course
Computer Engineering Department
Sharif University of TechnologySpring 2006
Sharif University, Modern Information Retrieval Course, Spring 2006
2
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
3
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
4
Support variety of data
Different kinds of media Image
Graph,… Audio
Music, speech,… Video
Sharif University, Modern Information Retrieval Course, Spring 2006
5
MMIR Motivations
Content, content, and more content …How to get what is needed ?
Increasing availability of multimedia information
Difficult to find, select, filter, manage AV content
More and more situations where it is necessary to have ‘information about the content’
Sharif University, Modern Information Retrieval Course, Spring 2006
6
Key Issues in MMIR
Sharif University, Modern Information Retrieval Course, Spring 2006
7
Goals
Want to make multimedia content searchable like text information, Because the value of content depends on how easy it is to find, filter, manage, and use it.
Need content description method beyond simple text annotation
Sharif University, Modern Information Retrieval Course, Spring 2006
8
MMIR Approaches
Text Based MMIR Content Based MMIR
Sharif University, Modern Information Retrieval Course, Spring 2006
9
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
10
Text-Based Retrieval
based on text associated with the file
URL: http://www.host.com/animals/dogs/poodle.gif
Alt text: <img src=URL alt="picture of poodle">
Hyperlink text: <a href=URL>Sally the poodle</a>
Sharif University, Modern Information Retrieval Course, Spring 2006
11
Text-based Search Engines
Indexing based on text in the container webpage Http://www.google.com Http://www.ditto.com …
Sharif University, Modern Information Retrieval Course, Spring 2006
12
Keyword-based System
Video Database
User
Information Need
Automatic Annotation
Keyword
Including filename, video title, caption,
related web page
Sharif University, Modern Information Retrieval Course, Spring 2006
13
Why this happens?
Most of these search engines are keyword based Have to represent your idea in keywords These keywords are expected to appear
in the filename, or corresponding webpage
Sharif University, Modern Information Retrieval Course, Spring 2006
14
Image: The Google Approach
How does image search work? Google analyzes the text on the page adjacent to
the image, the image caption and dozens of other factors to determine the image content. Google also uses sophisticated algorithms to remove duplicates and ensure that the highest quality images are presented first in your results.
Examples Campanile tcd Cliffs of Moher
Recall may not be great…
Sharif University, Modern Information Retrieval Course, Spring 2006
15
Google image search
Sharif University, Modern Information Retrieval Course, Spring 2006
16
Google Image Search
Sharif University, Modern Information Retrieval Course, Spring 2006
17
Problems with Text-Based
The text in the ALT tag has to be done manually Expensive Time consuming
It is incomplete and subjective Some features are difficult to define in
text such as texture or object shape
Sharif University, Modern Information Retrieval Course, Spring 2006
18
Therefore……
Unable to handle semantic meaning of images
Unable to handle visual position Unable to handle time information Unable to use images as query ……….
Sharif University, Modern Information Retrieval Course, Spring 2006
19
So …
Better for simple concepts e.g. A picture of a giraffe
Don’t work for complex queries e.g. A picture of a brick home with black
shutters and white pillars, with a pickup truck in front of it (image)
Sharif University, Modern Information Retrieval Course, Spring 2006
20
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
21
Architecture for Multimedia Retrieval
StorageStorage
BrowseBrowse
AV DescriptionAV DescriptionFeature extractionFeature
extraction
Manual / automatic
TransmissionTransmissionEncoding(for transmission)
Decoding(for transmission)
FilterFilterPush
Search / querySearch / query
PullConf.points
Human or machine
Sharif University, Modern Information Retrieval Course, Spring 2006
22
Query-retrieval matrix
text
videoimagesspeech
musicsketchesmultimedia
text
still
ssk
etc
hsp
eech
soun
dh
um
min
g exam
ple
s
query doc
conventional text retrieval
hum a tune and get a music piece
you roar and get a wildlife documentary type “floods” and get BBC radio news
Example
Sharif University, Modern Information Retrieval Course, Spring 2006
23
Main Components
Feature Extraction & Analysis Description Schemes Searching & Filtering Examples:
IBM’s Query By Image Content (QBIC) Virages’s VIR Image Engine Online http://collage.nhil.com/
Sharif University, Modern Information Retrieval Course, Spring 2006
24
Internal representation
Using attributes is not sufficient Feature
Information extracted from objects Multimedia object is represented as a
set of features Features can be assigned manually,
automatically, or using a hybrid approach
Sharif University, Modern Information Retrieval Course, Spring 2006
25
Features for MMIR
high-level features words and phrases from text, speech recognition
medium-level features face detector, regions classifiers, outdoor etc
low-level features Fourier transforms, wavelet decomposition,
texture histograms, colour histograms, shape primitives, filter primitives
Sharif University, Modern Information Retrieval Course, Spring 2006
26
Internal representation
Values of some specific features are assigned to a object by comparing the object with some previously classified objects
Feature extraction cannot be precise A weight is usually assigned to each feature
value representing the uncertainty of assigning such a value to that feature
80% sure that a shape is a square
Sharif University, Modern Information Retrieval Course, Spring 2006
27
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
28
MMIR Model’s Main Components
Query Language
Indexing and Searching
Sharif University, Modern Information Retrieval Course, Spring 2006
29
Query languages
In designing a multimedia query language, two main aspects require attention How the user enters his/her request to the
system Which conditions on multimedia objects
can be specified in the user request
Sharif University, Modern Information Retrieval Course, Spring 2006
30
Request specification
Interfaces Browsing and navigation Specifying the conditions the objects of
interest must satisfy, by means of queries Queries can be specified in two
different ways Using a specific query language Query by example
Using actual data (object example)
Sharif University, Modern Information Retrieval Course, Spring 2006
31
Conditions on multimedia data Query predicates
Attribute predicates Concern the attributes for which an exact value is
supplied for each object Exact-match retrieval
Structural predicates Concern the structure of multimedia objects Can be answered by metadata and information
about the database schema “Find all multimedia objects containing at least one
image and a video clip”
Sharif University, Modern Information Retrieval Course, Spring 2006
32
Conditions on multimedia data
Semantic predicates Concern the semantic content of the
required data, depending on the features that have been extracted and stored for each multimedia object
“Find all the red houses” Exact match cannot be applied
Sharif University, Modern Information Retrieval Course, Spring 2006
33
Indexing and searching Searching similar patterns Distance function
Given two objects, O1 and O2, the distance (=dissimilarity) of the two objects is denoted by D(O1,O2)
Similarity queries Whole match Sub-pattern match Nearest neighbors All pairs
Sharif University, Modern Information Retrieval Course, Spring 2006
34
Spatial access methods
Map objects into points in f-D space, and to use multiattribute access methods (also referred to as spatial access methods or SAMs) to cluster them and to search for them
Methods R*-trees and the rest of the R-tree family Linear quadtrees Grid-files Linear quadtrees and grid files explode exponentially with
the dimensionality
Sharif University, Modern Information Retrieval Course, Spring 2006
35
R-tree
R-tree Represent a spatial object by its minimum
bounding rectangle (MBR) Data rectangles are grouped to form parent
nodes (recursively grouped) The MBR of a parent node completely contains
the MBRs of its children MBRs are allowed to overlap Nodes of the tree correspond to disk pages
Sharif University, Modern Information Retrieval Course, Spring 2006
36
Sharif University, Modern Information Retrieval Course, Spring 2006
37
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
38
Visual Features ...
ColourColour
ShapeShape
TextureTexture
Sharif University, Modern Information Retrieval Course, Spring 2006
39
HistogramsGreyscale histogram of image AAssuming 256 intensity levelshA(l) (l=1 256)hA(l) =#{(i,j)|A(i,j)=l, i = 1 m, for j = 1 n}
i.e. a count of the number of pixels at each level
Sharif University, Modern Information Retrieval Course, Spring 2006
40
Colour Histogram
Describe the colors and its percentages in an image.
Nj
jjjjjc NjandPPColorValueIP(If1
1 ,1,10,),
Sharif University, Modern Information Retrieval Course, Spring 2006
41
Texture Matching
Texture characterizes small-scale regularity Color describes pixels, texture describes
regions Described by several types of features
e.g., smoothness, periodicity, directionality Perform weighted vector space matching Usually in combination with a color
histogram
Sharif University, Modern Information Retrieval Course, Spring 2006
42
Texture Test Patterns
Sharif University, Modern Information Retrieval Course, Spring 2006
43
Image Retrieval using low level features
See IBM demos at: http://wwwqbic.almaden.ibm.com/ http://mp7.watson.ibm.com/ (video)
Hermitage Museum www.hermitagemuseum.org
Sharif University, Modern Information Retrieval Course, Spring 2006
44
Berkeley Blobworld
Sharif University, Modern Information Retrieval Course, Spring 2006
45
Berkeley Blobworld
Sharif University, Modern Information Retrieval Course, Spring 2006
46
But…..• Low-level feature doesn’t work in all the cases
Sharif University, Modern Information Retrieval Course, Spring 2006
47
Solution: Regional Low-level Image Feature
Segmentation into objects Extract low-level features from each regions
Sharif University, Modern Information Retrieval Course, Spring 2006
48
Solution: High-level Image Feature
Objects: Persons, Roads, Cars, Skies…
Scenes: Indoors, Outdoors, Cityscape, Landscape, Water, Office, Factory…
Event: Parade, Explosion, Picnic, Playing Soccer…
Generated from low-level features
Sharif University, Modern Information Retrieval Course, Spring 2006
49
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
50
Audio Genres
Important types of audio data Speech-centered
Radio programs Telephone conversations Recorded meetings
Music-centered Instrumental, vocal
Other sources Alarms, instrumentation, surveillance, …
Sharif University, Modern Information Retrieval Course, Spring 2006
51
Speech-based Documents
Radio/TV news retrieval. Search archival radio/news broadcasts. Video and audio email. Knowledge management : transfert of
tacit knowledge to others. Search audio archives of meetings,
lectures, etc…
Sharif University, Modern Information Retrieval Course, Spring 2006
52
Preamble
Two utterances of the same words by the same person under the same conditions generate very different waveforms.
Variations due to loudness, pitch, brightness, bandwidth, harmonisity, and others are all continuous variables and are equivalent to color and texture in images.
Sharif University, Modern Information Retrieval Course, Spring 2006
53
Detectable Speech Features
Content Phonemes, one-best word recognition, n-best
Identity Speaker identification, speaker segmentation
Language Language, dialect, accent
Other measurable parameters Time, duration, channel, environment
Sharif University, Modern Information Retrieval Course, Spring 2006
54
How Speech Recognition Works
Three stages What sounds were made?
Convert from waveform to subword units (phonemes)
How could the sounds be grouped into words? Identify the most probable word segmentation points
Which of the possible words were spoken? Based on likelihood of possible multiword sequences
All three stages are learned from training data Using hill climbing (a “Hidden Markov Model”)
Sharif University, Modern Information Retrieval Course, Spring 2006
55
Speech Recognition
PhonemeDetection
WordConstruction
WordSelection
Phonemen-grams
Phonemelattices
Words
Phonemetranscription
dictionary
Word n-gramlanguage
model
One-best phoneme transcription
N-best phoneme sequences
One-bestword transcript
Sharif University, Modern Information Retrieval Course, Spring 2006
56
Music and audio analysis
Music is a large and extremely variable audio class.
The range of sounds is large, from music genres to animal cries to synthesizer samples.
Any of the above can and will occur in combination.
Sharif University, Modern Information Retrieval Course, Spring 2006
57
Audio retrieval-by-content Require some measure of audio similarity. Most approaches to general audio retrieval
take a perceptual approach, using measures such as loudness.
Neural net to map a sound clip to a text description : An obvious drawback is the subjective nature of audio description.
Sharif University, Modern Information Retrieval Course, Spring 2006
58
Sample system: Muscle fish
To analyze sound files for a specific set of psychoacoustic features.
This results in a vector of attributes that include loudness, pitch, bandwidth and harmonicity.
Given enough training samples, a Gaussian classifier can be constructed, or for retrieval.
Sharif University, Modern Information Retrieval Course, Spring 2006
59
An Euclidean distance is used as a measure of similarity.
For retrieval, the distance is computed between a given sound example and all other sound examples (about 400 in the demonstration).
Sounds are ranked by distance, with the closer ones being more similar.
Sharif University, Modern Information Retrieval Course, Spring 2006
60
Music and MIDI retrieval
Using archives of MIDI files, which are score-like representations of music intended for musical synthesizers or sequencers.
Given a melodic query, the MIDI files can be searched for similar melodies.
Sharif University, Modern Information Retrieval Course, Spring 2006
61
Polyphonic Music Indexing Technique
n-grams encode music as text strings using pitch and
onsets index text words with text search engine process query in the same way application: eg, Query by Humming
Sharif University, Modern Information Retrieval Course, Spring 2006
62
Monophonic pitch n-gramming
0 +7 0 +2 0 -2 0 -2 0 Interval:
Example: musical strings with interval-only representation
[0 +7 0 +2]
ZGZB
[+7 0 +2 0]
GZBZ
[0 +2 0 -2]
ZBZb
Sharif University, Modern Information Retrieval Course, Spring 2006
63
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
64
Application
Increasing demand for visual information retrieval Retrieve useful information from databases Sharing and distributing video data through computer
networks
Example: BBC BBC archive has +500k queries plus 1M new items … per
year; From the BBC …
Police car with blue light flashing Government plan to improve reading standards Two shot of Kenneth Clarke and William Hague
Sharif University, Modern Information Retrieval Course, Spring 2006
65
Video SearchActive Research Area
Sharif University, Modern Information Retrieval Course, Spring 2006
66
Video Search: Features
Color Robust to background Independent of size, orientation Color Histogram [Swain &
Ballard] “Sensitive to noise and
sparse”- Cumulative Histograms [Stricker & Orgengo]
Color Moments Color Sets: Map RGB Color
space to Hue Saturation Value, & quantize [Smith, Chang]
Color layout- local color features by dividing image into regions
Color Autocorrelograms
Texture One of the earliest Image features
[Harlick et al 70s] Co-occurrence matrix Orientation and distance on gray-
scale pixels Contrast, inverse deference
moment, and entropy [Gotlieb & Kreyszig]
Human visual texture properties: coarseness, contrast, directionality, likeliness, regularity and roughness [Tamura et al]
Wavelet Transforms [90s] [Smith & Chang] extracted mean
and variance from wavelet subbands
Gabor Filters And so on
Region Segmentation Partition image into regions Strong Segmentation: Object
segmentation is difficult. Weak segmentation: Region
segmentation based on some homegenity criteria
Scene Segmentation Shot detection, scene detection Look for changes in color,
texture, brightness Context based scene
segmentation applied to certain categories such as broadcast news
Sharif University, Modern Information Retrieval Course, Spring 2006
67
Video Search: Features
Face Face detection is highly reliable
- Neural Networks [Rwoley]- Wavelet based histograms of facial features [Schneiderman]
Face recognition for video is still a challenging problem.- EigenFaces: Extract eigenvectors and use as feature space
OCR OCR is fairly successful
technology. Accurate, especially with good
matching vocabularies. Script recognition still an open
problem.
ASR Automatic speech recognition
fairly accurate for medium to large vocabulary broadcast type data
Large number of available speech vendors.
Still open for free conversational speech in noisy conditions.
Shape
Outer Boundary based vs. region based
Fourier descriptors Moment invariants Finite Element Method
(Stiffness matrix- how each point is connected to others; Eigen vectors of matrix)
Turing function based (similar to Fourier descriptor) convex/concave polygons[Arkin et al]
Wavelet transforms leverages multiresolution [Chuang & Kao]
Chamfer matching for comparing 2 shapes (linear dimension rather than area)
3-D object representations using similar invariant features
Well-known edge detection algorithms.
Sharif University, Modern Information Retrieval Course, Spring 2006
68
Video Structures
Image structure Absolute positioning, relative positioning
Object motion Translation, rotation
Camera motion Pan, zoom, perspective change
Shot transitions Cut, fade, dissolve, …
Sharif University, Modern Information Retrieval Course, Spring 2006
69
Typical Retrieval Framework
User : provide query information that represents his information needs
Database: store a large collection of video data
Goal: Find the most relevant shots from the database Shots: “paragraph” in video, typically
20 – 40 seconds, which is the basic unit of video retrieval
Sharif University, Modern Information Retrieval Course, Spring 2006
70
Bridging the Gap Video Database
User
Result
Sharif University, Modern Information Retrieval Course, Spring 2006
71
Automatically Structure Video Data
The first step for video retrieval: Video “programmes” are structured into logical scenes, and physical shots
If dealing with text, then the structure is obvious: paragraph, section, topic, page, etc.
All text-based indexing, retrieval, linking, etc. builds upon this structure;
Automatic shot boundary detection and selection of representative keyframes is usually the first step;
Sharif University, Modern Information Retrieval Course, Spring 2006
72
Typical automatic structuring of video
A set of shots
a video document
Keyframe browser combined with transcript or object-based search
Sharif University, Modern Information Retrieval Course, Spring 2006
73
Ideal solution Video Database
User
Video Structure Information NeedUnderstanding the semantic meaning and retrieve
Result
Sharif University, Modern Information Retrieval Course, Spring 2006
74
Ideal solution Video Database
User
Video Structure Information NeedUnderstanding the semantic meaning and retrieve
Result
However, 1. Hard to represent query in
natural language and for computer to understand
2. Computers have no experience3. Other representation
restriction like position, time
Sharif University, Modern Information Retrieval Course, Spring 2006
75
Alternative SolutionVideo Database
User
Video Structure Information Need
Match and combine
Result
Provide evidence of relevant information ( text, image, audio)
Sharif University, Modern Information Retrieval Course, Spring 2006
76
Evidence-based Retrieval System
General framework for current video retrieval system Video retrieval based on the evidence from both
users and database, including Text information Image information Motion information Audio information
Return a relevant score for each evidence Combination of the scores
Sharif University, Modern Information Retrieval Course, Spring 2006
77
Keyword-based SystemVideo Database
User
Video Structure Information Need
Automatic Annotation
Keyword
Including filename, video title, caption,
related web page
Sharif University, Modern Information Retrieval Course, Spring 2006
78
Keyword-based SystemVideo Database
User
Video Structure Information Need
Automatic Annotation
Keyword
Manual Annotation
Sharif University, Modern Information Retrieval Course, Spring 2006
79
Manual Annotation
Manually creating annotation/keywords for image / video data
Examples: Gettyimage.com (image retrieval) Pros:
Represent the semantic meaning of video
Cons Time-consuming, labor-intensive Keyword is not enough to represent information need
Sharif University, Modern Information Retrieval Course, Spring 2006
80
Speech and OCR transcriptionVideo Database
User
Video Structure Information Need
Annotation
Keyword
Speech Transcription
OCR Transcription
Sharif University, Modern Information Retrieval Course, Spring 2006
81
Query using speech/OCR information
Query: Find pictures of Harry
Hertz, Director of the National Quality Program, NIST
Speech: We’re looking for people that have a broad range of expertise that have business knowledge that have knowledge on quality management on quality improvement and in particular …
OCR:H,arry Hertz a Director aro 7 wa-,i,,ty Program,Harry Hertz a Director
Sharif University, Modern Information Retrieval Course, Spring 2006
82
What we lack?Video Database
User
Video Structure Information Need
Annotation
Keyword
Speech Transcription
OCR Transcription
Image Information
Sharif University, Modern Information Retrieval Course, Spring 2006
83
Image-based RetrievalVideo Database
User
Video Structure Information Need
Text Information
Keyword
Image Feature
Query Images
Sharif University, Modern Information Retrieval Course, Spring 2006
84
Image Feature
Image-based RetrievalVideo Database
User
Video Structure Information Need
Text Information
Keyword
Query Images
Low-level Feature
High-level Feature
Sharif University, Modern Information Retrieval Course, Spring 2006
85
More Evidence in Video Retrieval
Video DatabaseUser
Video Structure Information Need
Text Information
Keyword
Image Information
Query Images
MotionInformation
Audio Information
Motion
Audio
Sharif University, Modern Information Retrieval Course, Spring 2006
86
MPEG-7: The ObjectiveMPEG-7: The Objective
Standardize object-based description tools for various types of audiovisual information, allowing fast and efficient content searching, filtering and identification, and addressing a large range of applications.
New objective for MPEG:
MPEG-1, -2 and -4 represent the content itself (‘the bits’)
MPEG-7 should represent information about the content (‘the bits about the bits’)
Sharif University, Modern Information Retrieval Course, Spring 2006
87
Scope of MPEG-7
Description creation
description Description consumption
Not the description creation
Not the description consumption
Just the description !This is the scope This is the scope
of MPEG-7of MPEG-7
The goal is to define the minimum that enables interoperability.
Sharif University, Modern Information Retrieval Course, Spring 2006
88
Descriptor (D) : A Descriptor is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation.
Examples: Feature Descriptor
Color Histogram of Y,U,V components
Shape ART moments
Motion Motion field, coefficients of a model
Audio frequencyAverage frequency components
Title Text
Annotation Text
Genre Text, index in as thesaurus
MPEG-7 Terminology: DescriptorMPEG-7 Terminology: Descriptor
Sharif University, Modern Information Retrieval Course, Spring 2006
89
Outline
Introduction Text-Based MMIR Content-Based Retrieval
Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval
Conclusions
Sharif University, Modern Information Retrieval Course, Spring 2006
90
Conclusions
Simple image retrieval is commercially available Color histograms, texture, limited shape information
Segmentation-based retrieval is still in the lab Keep an eye on the Berkeley group
Limited audio indexing is practical now Audio feature matching, answering machine detection
Sharif University, Modern Information Retrieval Course, Spring 2006
91
Conclusions
Multimedia IR Text: good solutions exist Video, Image, Sound – a lot of work to do.
Sharif University, Modern Information Retrieval Course, Spring 2006
92
Conclusions
The goal of content-based video retrieval is to build more intelligent video retrieval engine via semantic meaning
Many applications in daily life Combine evidence from different aspects Hot research topic, few business system State-of-the-art performance is still
unacceptable for normal users, space to improve
Sharif University, Modern Information Retrieval Course, Spring 2006
93
Conclusions
Problems with Content-Based MMIR Must have an example image Example image is 2-D
Hence only that view of the object will be returned
Large amount of image data Similar colour histogram does not equal similar
image Usually the best results come from a
combination of both text and content searching
Sharif University, Modern Information Retrieval Course, Spring 2006
94
Conclusions
Combination of multi-modal results Difference characteristics between multi-
modal information Text-based Information: better for middle and
high level queries Image-based Information: better for low and
middle level queries Combination of multi-modal information
Sharif University, Modern Information Retrieval Course, Spring 2006
95
Conclusions
Challenging research questions Draws on
computer vision, audio processing, natural language analysis, unstructured document analysis, information retrieval, information visualisation, computer human interaction, artificial intelligence
Recommended