Multimedia information management

Multimedia Information

ManagementFinal Project

Multimodal Searching

Sara EgidiFabio Greco

Alessio Villardita

Roadmap●System Architecture

●Dataset

●Feature extraction

●Features quantisation

● Indexing

●Searching

● Interface Implementation

●Results

Project objectivesDevelopment of a search engine that allows textual, visual and multimodal searches. Implementation steps:

Extraction of global deep featuresIndexing of visual featuresIndexing of image metadata (tags)Combination of text and visual features at search timeExtensionsMultiple layers’ visual featuresShow classification results

Dataset 25000 images from Flickr

Raw - Exif Includes information on camera, settings, date, time and perhaps location.

Annotations Very few in number (29), not sufficiently large to be representative and useful for indexing purposes.

TagsPreprocessed by flickr. Tags written by users yield some meaningless and thus useless entries.

Deep feature extraction

AlexNet

Global features:

FC6-Layer 6

FC7-Layer 7

FC8-Class labels

Overall System Architecture

Features quantizationFrom deep features to alphanumeric strings:

each component of the feature vector is associated with a unique alphanumeric keyword

to keep the feature weight into account, the float value of each component is represented as integer using Math.round and using a quantization factor Q

Example Q = 30[ 0.20 0.005 0.12 0.29 ] → [ 6 0.15 3.6 8.7 ] → [ 6 0 4 9 ] → [A1 A1 A1 A1 A1 A1 A3 A3 A3 A3 A4 A4 A4 A4 A4 A4 A4 A4 A4 ]

Indexing

Id

Tags

Deep feature 6

Deep feature 7

Class label (lvl 8)

Text Query

Visual Query

Text+Visual Query

Searching5 different combinations

Text

Text + uploaded image

Text + indexed image

Uploaded image

Indexed image

text

Interface Implementation

ResultsWithout text

(Visual Query)

With text (dog)

(Multimodal Query)

Different layers

ReferencesM. J. Huiskes, M. S. Lew (2008). The MIR Flickr Retrieval Evaluation.

ACM International Conference on Multimedia Information Retrieval (MIR'08), Vancouver, Canada (bib)

Large Scale Deep Convolutional Neural Network Features Search with Lucene, Claudio Gennaro

The MIR Flickr Retrieval Evaluation, Mark J. Huiskes and Michael S. Lew

Source code: http://www.github.com/egidisa/MultiModalSearch

http://press.liacs.nl/mirflickr/mirflickr.pdf


http://press.liacs.nl/mirflickr/mirflickr.bib

http://arxiv.org/find/cs/1/au:+Gennaro_C/0/1/0/all/0/1


http://www.github.com/egidisa/MultiModalSearch

Engineering

Multimedia information management