54
Time-Sensitive Egocentric Image Retrieval for Finding Objects in Lifelogs Xavier Giró Cristian Reyes Author: Advisors: 14th July 2016 Eva Mohedano Kevin McGuinness

Time sensitive egocentric image retrieval for finding objects in lifelogs

Embed Size (px)

Citation preview

Outline

1. Motivation2. Methodology3. Experiments4. Results & Conclusions5. Conclusions

2

Lifelogging

Extract value from this new data

3

4

Can’t find my phone

Motivation

5

Hundreds of images!

Review

Egocentric cameras may help

Last time seen: At the CAFE6

Goal: Retrieve a useful image to find the object.Task

7

How: Exploiting visual and temporal information.

Outline

1. Motivation

2. Methodology3. Experiments4. Results & Conclusions5. Conclusions

8

System Overview

9

Visual Ranking - DescriptorsConvolutional Neural Networks

CNN design (vgg16) used for feature extraction.

Conv-5_1

10Eva Mohedano, Amaia Salvador, Kevin McGuinness, Ferran Marques, Noel E. O’Connor, and Xavier Giro-i Nieto. Bags of local convolutional features for scalable instance search. In Proceedings of the ACM International Conference on Multimedia. ACM, 2016.

Visual Ranking - Descriptors

Bag of Words

11Eva Mohedano, Amaia Salvador, Kevin McGuinness, Ferran Marques, Noel E. O’Connor, and Xavier Giro-i Nieto. Bags of local convolutional features for scalable instance search. In Proceedings of the ACM International Conference on Multimedia. ACM, 2016.

Visual Ranking - Queries

5 visual examples

12

Visual Ranking - Queries

3 masking strategies

Full Image (FI)

Hard Bounding Box (HBB)

Soft Bounding Box (SBB)

13

Visual Ranking - Target

3 masking strategies

Full Image (FI)

Center Bias (CB)

Saliency Mask(SM)

Saliency Map

14Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O’Connor, and Xavier Giro-i Nieto. Shallow and deep convolutional networks for saliency prediction. CVPR 2016.

System Overview

15

Candidate Selection

Absolute

Threshold on Visual Similarity Scores (TVSS)

Adaptive

Nearest Neighbor Distance Ratio (NNDR)

2 thresholding strategies

Parameters

16

LEARNT

D. G. Loewe. Distinctive image features from scale-invariant keypoints, 2004.

System Overview

17

Temporal aware reranking

Time-stamp sorting Time-stamp sorting

18

Redundancy

19

Temporal Diversity

20

Effect of Diversity

New locations introduced

21

Outline

1. Motivation2. Methodology

3. Experiments4. Results & Conclusions5. Conclusions

22

Compare different configurations

1. Dataset of images

2. Queries

3. Annotations

4. Metric

Evaluate quantitatively

23

Dataset

EDUB1

● 4912 images● 4 users● 2 days/user● Narrative Clip 1

NTCIR-Lifelog2

● 88185 images● 3 users● 30 days/user● Autographer

General BehaviorWide Angle Lens

24

1. Marc Bolaños and Petia Radeva. Ego-object discovery.

2. C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, and R. Albatal. NTCIR Lifelog: The first test collection for lifelog research.

25

EDUB NTCIR-Lifelog

Dataset - Query Definition

26

Annotation Strategy

2. Annotate only the last occurrence

3. Annotate all the scene of the last occurrence

→ Neighbor images may also be helpful

→ The system is expected to find all 31. Annotate the 3 last occurrences

27

Metric

Mean Average Precision Mean Reciprocal Rank

28ALL RELEVANT IMAGES THE BEST RANKED RELEVANT IMAGE

9 Days 15 Days

Training

TRAINING TEST

29

Training - Codebook

~ 13,500 images

~18,000,000feature vectors

25,000clusters

30

Training - Thresholds

31

Training - Thresholds

32

TIME-STAMP SORTINGINTERLEAVING

FULL IMAGECENTER BIAS

SALIENCY MASK

SAME OPTIMAL THRESHOLDS

Parameters summary

33

Outline

1. Motivation2. Methodology3. Experiments

4. Results & Conclusions5. Conclusions

34

Discussion & Conclusions

35

FI

SBB

HBB

QUERY APPROACH

Discussion & Conclusions

36

Full Image

Saliency Mask

Center Bias

TARGET APPROACH

Discussion & Conclusions

37

Adaptive: NNDR

Absolute: TVSS

Time-stamp reordering

+

Discussion & Conclusions

38

Adaptive: NNDR

Absolute: TVSS

Interleaving

+

Discussion & Conclusions

39

Discussion & Conclusions

The system helps the user

40

Diversity helps

Discussion & Conclusions

41

Discussion & Conclusions

Objects are not

always in the center

42

Discussion & Conclusions

Saliency Maps do not help here

But they do here

43

Discussion & Conclusions

Parameters are not independent.

44

45

46

47

Lifelogging Tools and Applications

AcknowledgementsFinancial Support

Noel E. O’Connor Cathal Gurrin

Albert Gil

48

49

50

f(Q) NNDR TVSS NNDR + I TVSS + I

Saliency Mask 0,201 0,217 0,210 0,226

Using Saliency Mask for target

f(Q) NNDR TVSS NNDR + I TVSS + I

Saliency Mask 0,176 0,271 0,190 0,279

Using Full Image for target

f(Q) NNDR TVSS NNDR + I TVSS + I

Saliency Mask 0,162 0,198 0,173 0,213

Using Center Bias for target

Visual Ranking - Descriptors

Bag of Words

A) There is a red ball in a white box.

B) The red box contains a white ball.

Sentence A

Sentence B

there 1 0is 1 0a 2 1

red 1 1ball 1 1in 1 0

white 1 1box 1 1the 0 1

contains 0 1

Cosine Similarity Score = 0.68

An example using text

51

52

+

Outline

1. Motivation2. Methodology3. Experiments4. Results

5. Conclusions

53

Conclusions● The system accomplishes its task.● Thresholding and temporal reranking have improved performance● Center Bias does not necessary improve performance.● Saliency Maps have improved performance.● Parameters are not independent when measuring with A-MRR.● Good baseline for further research.

54