55
A Sketch-based Approach for Multimedia Retrieval MS (by research) Thesis, August, 2012 - November, 2016 Koustav Ghosal Supervisor : Dr. Anoop Namboodiri Centre for Visual Information Technology IIIT Hyderabad November 29, 2016 Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere) A Sketch-based Approach for Multimedia Retrieval November 29, 2016 1 / 55

A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

A Sketch-based Approach for Multimedia RetrievalMS (by research) Thesis,

August, 2012 - November, 2016

Koustav GhosalSupervisor : Dr. Anoop Namboodiri

Centre for Visual Information TechnologyIIIT Hyderabad

November 29, 2016

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 1 / 55

Page 2: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Outline

1 Motivation

2 Video Retrieval

3 Image Retrieval

4 Zero-Shot Learning

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 2 / 55

Page 3: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Outline

1 Motivation

2 Video Retrieval

3 Image Retrieval

4 Zero-Shot Learning

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 3 / 55

Page 4: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Content Based Multimedia RetrievalTextual and Example Based Queries

(a) (b)

Figure: (a) Query by Text (b) Query by Example

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 4 / 55

Page 5: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Content Based Multimedia RetrievalLimitations of Text based Query

”Car”

Figure: Image Search

Metadata may not represent the original content.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 5 / 55

Page 6: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Content Based Multimedia RetrievalLimitations of Text based Query

”All those red coloured vehicles which came south and turned left”

Figure: Tracking

A more complicated event means a more complicated query.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 6 / 55

Page 7: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Content Based Multimedia RetrievalLimitations of Example based Query

Examples are not always available.In fact, their absence being the reason for the search.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 7 / 55

Page 8: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Why Sketch-based queries ?Advantages

Efficiently encodes information like shape, pose, colour, size etc. , allat once.

A free-hand sketch is more convenient to draw than typing lengthyqueries.

A sketch is closer to the content of a video as compared to meta-data(tags, comments, captions).

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 8 / 55

Page 9: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Challenges in Sketch-based systemsPerceptual Variability

(a) (b) (c) (d)

Figure: Different interpretations of the same trajectory. (a) Original (b),(c),(d)User Inputs

Cognitive variation in motion perception in human beings .

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 9 / 55

Page 10: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Challenges in Sketch-based systemsMultimodality

An image is a multi-channel dense representation of an object/scene.

(a) (b) (c)

Figure: (a) Query (b) Results (c) Desired Output

”A simple sketch is a high level sparse representation of the object/scenebeing searched for.” [Li et al., 2015]

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 10 / 55

Page 11: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Challenges in Sketch-based systemsScarcity of Data

(a) ImageNet (b) Caltech

(c)TU-Berlin (d) Sketchy

Figure: Number of samples (a) 14,197,122 (b) 30,607 (c) 20,000 (d) 75,471

The system should be able to generalize to unknown classes.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 11 / 55

Page 12: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Challenges in Sketch-based systemsSummary

Perceptual Variability.

Multimodality.

Scarcity of Data.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 12 / 55

Page 13: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Challenges in Sketch-based systemsSummary

Perceptual Variability : Video Retrieval

Multimodality : Image Retrieval

Scarcity of Data : Zero Shot Learning

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 13 / 55

Page 14: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Outline

1 Motivation

2 Video Retrieval

3 Image Retrieval

4 Zero-Shot Learning

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 14 / 55

Page 15: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Motion

Videos characterized by motion.

(a) (b)

Figure: (a) Human Tracking (b) Sports Analysis

0Image Courtsey : www.google.comKoustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 15 / 55

Page 16: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

FeaturesQualitative

Objective : Features minimize perceptual variability.

Qualitative Spatio Temporal Features

Features based on motion properties which tell us “how” rather than “howmuch”.

Shape

Direction

Scale

Combines 3 different aspects of motion.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 16 / 55

Page 17: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Aspects of MotionShape

(a) (b)circles

(c) (d)

Figure: A sample motion with the corresponding m-segments: (a) Original (b)Smooth and Normalized (c) m-segments (d) Circle-Based Representation

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 17 / 55

Page 18: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Aspects of MotionShape

J = minx0,y0,r

n∑i

x2i + y2i − 2x0xi − 2y0yi + x20 + y20 + r2

S = ( xµ, yµ, r , w , s )(xµ, yµ), r = center, radius of the circle.

w = Slope of the segment.s = Normalized length of arc.

K-Means, Bag of Motion

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 18 / 55

Page 19: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Aspects of MotionScale, Direction

0 20 40 60 80 100 120 1401.5

1.0

0.5

0.0

0.5

1.0

1.5

0 20 40 60 80 100 120 1400

20

40

60

80

100

(a) (b) (c) (d)

Figure: A Spiral Motion from our synthetic dataset (a) Points sampledequidistantly in each segment (b) Directions tracked for each equipoint segment(c) Temporal Change of Direction (d) Temporal Change of scale

Trajectory Direction = (α1, α2, . . . , αn)αk = sin θk

Trajectory Scale = (d1, d2, . . . , dn)dk = distance of the current segment from the mean.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 19 / 55

Page 20: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Summary of Features4 types

Bag of Motion : Trajectory = Histogram.

Ordered Bag of Motion : Trajectory = (s1, s2, . . . , sm), wheresk = ( xµ, yµ, r , w , s )k

Direction : = (α1, α2, . . . , αn)

Scale : = (d1, d2, . . . , dn)

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 20 / 55

Page 21: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Pipeline

ResultsFilters

1. Bag Of Motions

2. Ordered BoM

3. Direction

4. Scale

1 2 3 44 3 2 1

Query Database

1

2

3

4

Update Score

Figure: Cascaded Retrieval

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 21 / 55

Page 22: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Datasets

(a) Pool Dataset (b) Synthetic Dataset.

Figure: (a) Five classes each containing 20 videos each. (b) Five classescontaining 20 videos each. Thus the dataset had 200 videos

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 22 / 55

Page 23: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsA Sample Retrieval

Figure: Qualitative Results

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 23 / 55

Page 24: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsQuantitative Results : Accuracy

0 2 4 6 8 10 12 14 16Top K retrievals

0

20

40

60

80

100

Acc

ura

cy

33

5056 57 60 62

66 67 69 69 71 74 74 76 78

0 2 4 6 8 10 12 14 16Top K retrievals

0

20

40

60

80

100

Acc

ura

cy

26

3843

5056 58

6267 69 71 72 75 78 78 79

(a) Pool Videos dataset (b) Synthetic Motion dataset

Figure: Accuracy at different top K retrievals.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 24 / 55

Page 25: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsQuantitative Results : Mean Reciprocal Rank

0.0 0.2 0.4 0.6 0.8 1.0Reciprocal Ranks

0

10

20

30

40

50

Num

ber

of

Queri

es

0.0 0.2 0.4 0.6 0.8 1.0Reciprocal Ranks

0

10

20

30

40

50

Num

ber

of

Queri

es

(a)Pool Videos dataset (b) Synthetic Motion dataset

Figure: Mean Reciprocal Ranks

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 25 / 55

Page 26: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Outline

1 Motivation

2 Video Retrieval

3 Image Retrieval

4 Zero-Shot Learning

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 26 / 55

Page 27: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Sparsity of Sketches

Two different modalities

(b)

(a) (c)

Figure: (a) Query (b) Results (c) Desired Output

Should not be compared directly.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 27 / 55

Page 28: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Our ModelTwo Modalities

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 28 / 55

Page 29: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Our ModelCorrespondence

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 29 / 55

Page 30: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Our Model

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 30 / 55

Page 31: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Our Model

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 31 / 55

Page 32: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Our ModelTwo Modalities

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 32 / 55

Page 33: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Standard CCA

Given two sets I and S ,

I = (I1, I2, . . . , In)

S = (S1,S2, . . . ,Sn)

we try two find two subspaces,

P I =<WI , I >,PS =<WS , S >

such that that their correlation is maximized.

ρ = maxWI ,WS

corr (P I ,PS)

= maxWI ,WS

〈PS ,P I 〉‖PS‖‖P I‖

(1)

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 33 / 55

Page 34: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Standard CCAcontinued...

As derived in [Hardoon et al., 2004], Equation 1 reduces to,

ρ = maxWI ,WS

W ′I CovISWS√

W ′I CovIIWI

√W ′

SCovSSWS

(2)

and the covariance matrix of (AI ,AS) given by:

Cov = E

[ (AI

AS

) (AI

AS

)′]

=

[CovII CovISCovSI CovSS

](3)

Equation 2 can be solved as an Eigen value problem for WI and WS .

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 34 / 55

Page 35: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Cluster CCA

As suggested in [Rasiwasia et al., 2014], we compute the covariancematrices as follows,

CovIS =1

M

C∑c=1

|I c |∑j=1

|Sc |∑k=1

I cj Sc ′k (4)

CovII =1

M

C∑c=1

|I c |∑j=1

|Sc |I cj I c′

j (5)

CovSS =1

M

C∑c=1

|Sc |∑k=1

|I c |SckS

c ′k (6)

where M =∑C

c=1 |I c ||Sc |, is the total number of pairwisecorrespondences across C classes.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 35 / 55

Page 36: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

PipelineTraining and Testing

(a)

(b)

Figure: Proposed pipeline (a) Training (b) Retrieval

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 36 / 55

Page 37: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Datasets

TU Berlin Dataset, 250 categories, 80 sample each category

Caltech 256, Pascal VOC 2007

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 37 / 55

Page 38: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Features

Table 1 : Summary of Features

Feature Dimension SourceCALTECH - SIFT 1000 Vl-Feat. [Vedaldi and Fulkerson, 2008]CALTECH - HOG 20000 Vl-Feat [Vedaldi and Fulkerson, 2008]CALTECH - CNN 4096 Krizhevsky et al. [Krizhevsky et al., 2012]PASCAL - SIFT 1000 Guillaumin et al.in [Guillaumin et al., 2010]PASCAL - HOG 20000 Vl-Feat [Vedaldi and Fulkerson, 2008]PASCAL - CNN 4096 Krizhevsky et al. [Krizhevsky et al., 2012]

TU-BERLIN - SIFT-Like 501 Eitz et al.in [Eitz et al., 2012]TU-BERLIN - HOG 20000 Vl-Feat [Vedaldi and Fulkerson, 2008]TU-BERLIN - Fisher 250000 Rosalia et al. [Schneider and Tuytelaars, 2014]TU-BERLIN - CNN 4096 Yang et al. [Yang and Hospedales, 2015]

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 38 / 55

Page 39: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsQualitative

Figure: Example Retrievals From Our System

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 39 / 55

Page 40: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsQuantitative

Table : Mean Average Precision (MAP) for Image-Sketch feature combinations

Dataset SIFT-SIFT SIFT-HOG SIFT-Fisher HOG-SIFT HOG-HOG HOG-Fisher CNN-CNN

Caltech 0.06 0.03 0.20 0.14 0.02 0.01 0.20Pascal 0.13 0.12 0.05 0.18 0.09 0.06 0.06

Table : Performance improvement in mAP values

Dataset Features Before CCA After CCA

Caltech SIFT-Fisher 0.01 0.20Caltech CNN-CNN 0.01 0.20Pascal HOG-SIFT 0.01 0.18Pascal SIFT-SIFT 0.06 0.13

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 40 / 55

Page 41: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Outline

1 Motivation

2 Video Retrieval

3 Image Retrieval

4 Zero-Shot Learning

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 41 / 55

Page 42: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Zero-Shot Learning

(a) (b)

Figure: (a) Standard Classifier (b) Zero-Shot Classifier

Knowledge Database acts as an Oracle, who knows everything

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 42 / 55

Page 43: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Word 2 Vec

Word 2 Vec by Mikolov et al. [Mikolov et al., 2013] is a vector space,where semantically similar words are mapped together.Apples and Oranges are closer than Apples and Mumbai.The distance between two classes in Word 2 Vec space representstheir semantic similarity.

Figure: An artistic expression of Word2Vec vector space.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 43 / 55

Page 44: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

TrainingObjective Function

We train a CNN as in [Yu et al., 2015] for TU Berlin and asin [Chatfield et al., 2014] for Caltech256 dataset and replace the soft-max

layer as follows.

J(θ) =∑y∈Y

∑x i∈XY

||wy − g(x i )||2

A close miss is penalized less than a distant miss.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 44 / 55

Page 45: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Datasets

TU Berlin Dataset, 250 categories, 80 sample each category

Caltech 256, Pascal VOC 2007

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 45 / 55

Page 46: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsQualitative

Figure: Example Queries

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 46 / 55

Page 47: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Experiments

Sketch Classification.

Uni-Modal Sketch Retrieval.

Cross-Modal Retrieval.

Zero Shot Retrieval.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 47 / 55

Page 48: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsClassification

Algorithm Features Classifier Accuracy

TU-Berlin DatasetYang et al. [Yu et al., 2015] CNN-Ensemble Soft-Max 74.9%Yang et al. [Yu et al., 2015] CNN-Single Soft-Max 72.6%Schneider et al. [Schneider and Tuytelaars, 2014] Fisher SVM 63.1%Eitz et al. [Eitz et al., 2012] BOW SVM 56%Proposed DFSR Random Forest 70.22%

Table: Classification Results show that our features perform reasonably wellalmost at par with the state of the art methods.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 48 / 55

Page 49: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsUni-Modal Retrieval

(a) (b)

Figure: Uni-modal retrieval : (a) Sketch Modality (b) Image Modality

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 49 / 55

Page 50: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsCross-Modal Retrieval

Figure: Cross Modal Retrieval

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 50 / 55

Page 51: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

ResultsZero Shot Retrieval

(a) (b)

Figure: Zero-Shot Cross-Modal Retrieval : (a) PR curve for the worst performingpartition. It can be observed that DFSR outperform the features proposed byYang et al. [Yang and Hospedales, 2015]. (b) PR curve for the best performingpartition.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 51 / 55

Page 52: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Summary

Figure: Related Domains

Qualitative features for video retrieval.

Cluster CCA for Multi Modal Image Retrieval.

Deep Features for Semantic Rerieval.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 52 / 55

Page 53: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

References I

Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Return of the devilin the details: Delving deep into convolutional nets. In British Machine VisionConference.

Eitz, M., Hays, J., and Alexa, M. (2012). How do humans sketch objects? ACM Trans.Graph.

Guillaumin, M., Verbeek, J., and Schmid, C. (2010). Multimodal semi-supervisedlearning for image classification. In CVPR.

Hardoon, D., Szedmak, S., and Shawe-Taylor, J. (2004). Canonical correlation analysis:An overview with application to learning methods. Neural computation.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification withdeep convolutional neural networks. In NIPS.

Li, Y., Hospedales, T. M., Song, Y.-Z., and Gong, S. (2015). Free-hand sketchrecognition by multi-kernel feature learning. CVIU.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 53 / 55

Page 54: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

References II

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributedrepresentations of words and phrases and their compositionality. In Advances in neuralinformation processing systems, pages 3111–3119.

Rasiwasia, N., Mahajan, D., Mahadevan, V., and Aggarwal, G. (2014). Clustercanonical correlation analysis. In AI Statistics.

Schneider, R. G. and Tuytelaars, T. (2014). Sketch classification andclassification-driven analysis using fisher vectors. TOG.

Vedaldi, A. and Fulkerson, B. (2008). VLFeat: An open and portable library ofcomputer vision algorithms. http://www.vlfeat.org/.

Yang, Y. and Hospedales, T. M. (2015). Deep neural networks for sketch recognition.arXiv preprint arXiv:1501.07873.

Yu, Q., Yang, Y., Song, Y.-Z., Xiang, T., and Hospedales, T. M. (2015). Sketch-a-netthat beats humans. In BMVC.

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 54 / 55

Page 55: A Sketch-based Approach for Multimedia Retrieval...Outline 1 Motivation 2 Video Retrieval 3 Image Retrieval 4 Zero-Shot Learning Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities

Thank [email protected]

Koustav Ghosal Supervisor : Dr. Anoop Namboodiri (Universities of Somewhere and Elsewhere)A Sketch-based Approach for Multimedia Retrieval November 29, 2016 55 / 55