12
Multimedia Information Management Final Project Multimodal Searching Sara Egidi Fabio Greco Alessio Villardita

Multimedia information management

Embed Size (px)

Citation preview

Page 1: Multimedia information management

Multimedia Information

ManagementFinal Project

Multimodal Searching

Sara EgidiFabio Greco

Alessio Villardita

Page 2: Multimedia information management

Roadmap●System Architecture

●Dataset

●Feature extraction

●Features quantisation

● Indexing

●Searching

● Interface Implementation

●Results

Page 3: Multimedia information management

Project objectivesDevelopment of a search engine that allows textual, visual and multimodal searches. Implementation steps:

Extraction of global deep featuresIndexing of visual featuresIndexing of image metadata (tags)Combination of text and visual features at search timeExtensionsMultiple layers’ visual featuresShow classification results

Page 4: Multimedia information management

Dataset 25000 images from Flickr

Raw - Exif Includes information on camera, settings, date, time and perhaps location.

Annotations Very few in number (29), not sufficiently large to be representative and useful for indexing purposes.

TagsPreprocessed by flickr. Tags written by users yield some meaningless and thus useless entries.

Page 5: Multimedia information management

Deep feature extraction

AlexNet

Global features:

FC6-Layer 6

FC7-Layer 7

FC8-Class labels

Page 6: Multimedia information management

Overall System Architecture

Page 7: Multimedia information management

Features quantizationFrom deep features to alphanumeric strings:

each component of the feature vector is associated with a unique alphanumeric keyword

to keep the feature weight into account, the float value of each component is represented as integer using Math.round and using a quantization factor Q

Example Q = 30[ 0.20 0.005 0.12 0.29 ] → [ 6 0.15 3.6 8.7 ] → [ 6 0 4 9 ] → [A1 A1 A1 A1 A1 A1 A3 A3 A3 A3 A4 A4 A4 A4 A4 A4 A4 A4 A4 ]

Page 8: Multimedia information management

Indexing

Id

Tags

Deep feature 6

Deep feature 7

Class label (lvl 8)

Text Query

Visual Query

Text+Visual Query

Page 9: Multimedia information management

Searching5 different combinations

Text

Text + uploaded image

Text + indexed image

Uploaded image

Indexed image

text

Page 10: Multimedia information management

Interface Implementation

Page 11: Multimedia information management

ResultsWithout text

(Visual Query)

With text (dog)

(Multimodal Query)

Different layers

Page 12: Multimedia information management

ReferencesM. J. Huiskes, M. S. Lew (2008). The MIR Flickr Retrieval Evaluation.

ACM International Conference on Multimedia Information Retrieval (MIR'08), Vancouver, Canada (bib)

Large Scale Deep Convolutional Neural Network Features Search with Lucene, Claudio Gennaro

The MIR Flickr Retrieval Evaluation, Mark J. Huiskes and Michael S. Lew

Source code: http://www.github.com/egidisa/MultiModalSearch