Time sensitive egocentric image retrieval for finding objects in lifelogs

Time-Sensitive Egocentric Image Retrieval for Finding Objects in Lifelogs

Xavier Giró

Cristian Reyes

Author:

Advisors:

14th July 2016

Eva Mohedano Kevin McGuinness

https://imatge.upc.edu/web/publications/time-sensitive-egocentric-image-retrieval-fidings-objects-lifelogs




Outline

1. Motivation2. Methodology3. Experiments4. Results & Conclusions5. Conclusions

2

Lifelogging

Extract value from this new data

3

4

Can’t find my phone

Motivation

5

Hundreds of images!

Review

Egocentric cameras may help

Last time seen: At the CAFE6

Goal: Retrieve a useful image to find the object.Task

7

How: Exploiting visual and temporal information.

Outline

1. Motivation

2. Methodology3. Experiments4. Results & Conclusions5. Conclusions

8

System Overview

9

Visual Ranking - DescriptorsConvolutional Neural Networks

CNN design (vgg16) used for feature extraction.

Conv-5_1

10Eva Mohedano, Amaia Salvador, Kevin McGuinness, Ferran Marques, Noel E. O’Connor, and Xavier Giro-i Nieto. Bags of local convolutional features for scalable instance search. In Proceedings of the ACM International Conference on Multimedia. ACM, 2016.

Visual Ranking - Descriptors

Bag of Words

11Eva Mohedano, Amaia Salvador, Kevin McGuinness, Ferran Marques, Noel E. O’Connor, and Xavier Giro-i Nieto. Bags of local convolutional features for scalable instance search. In Proceedings of the ACM International Conference on Multimedia. ACM, 2016.

Visual Ranking - Queries

5 visual examples

12

Visual Ranking - Queries

3 masking strategies

Full Image (FI)

Hard Bounding Box (HBB)

Soft Bounding Box (SBB)

13

Visual Ranking - Target

3 masking strategies

Full Image (FI)

Center Bias (CB)

Saliency Mask(SM)

Saliency Map

14Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O’Connor, and Xavier Giro-i Nieto. Shallow and deep convolutional networks for saliency prediction. CVPR 2016.

System Overview

15

Candidate Selection

Absolute

Threshold on Visual Similarity Scores (TVSS)

Adaptive

Nearest Neighbor Distance Ratio (NNDR)

2 thresholding strategies

Parameters

16

LEARNT

D. G. Loewe. Distinctive image features from scale-invariant keypoints, 2004.

System Overview

17

Temporal aware reranking

Time-stamp sorting Time-stamp sorting

18

Redundancy

19

Temporal Diversity

20

Effect of Diversity

New locations introduced

21

Outline

1. Motivation2. Methodology

3. Experiments4. Results & Conclusions5. Conclusions

22

Compare different configurations

1. Dataset of images

2. Queries

3. Annotations

4. Metric

Evaluate quantitatively

23

Dataset

EDUB1

● 4912 images● 4 users● 2 days/user● Narrative Clip 1

NTCIR-Lifelog2

● 88185 images● 3 users● 30 days/user● Autographer

General BehaviorWide Angle Lens

24

1. Marc Bolaños and Petia Radeva. Ego-object discovery.

2. C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, and R. Albatal. NTCIR Lifelog: The first test collection for lifelog research.

25

EDUB NTCIR-Lifelog

Dataset - Query Definition

26

Annotation Strategy

2. Annotate only the last occurrence

3. Annotate all the scene of the last occurrence

→ Neighbor images may also be helpful

→ The system is expected to find all 31. Annotate the 3 last occurrences

27

Metric

Mean Average Precision Mean Reciprocal Rank

28ALL RELEVANT IMAGES THE BEST RANKED RELEVANT IMAGE

9 Days 15 Days

Training

TRAINING TEST

29

Training - Codebook

~ 13,500 images

~18,000,000feature vectors

25,000clusters

30

Training - Thresholds

31

Training - Thresholds

32

TIME-STAMP SORTINGINTERLEAVING

FULL IMAGECENTER BIAS

SALIENCY MASK

SAME OPTIMAL THRESHOLDS

Parameters summary

33

Outline

1. Motivation2. Methodology3. Experiments

4. Results & Conclusions5. Conclusions

34

Discussion & Conclusions

35

FI

SBB

HBB

QUERY APPROACH


36

Full Image

Saliency Mask

Center Bias

TARGET APPROACH


37

Adaptive: NNDR

Absolute: TVSS

Time-stamp reordering

+


38

Adaptive: NNDR

Absolute: TVSS

Interleaving

+


39


The system helps the user

40

Diversity helps


41


Objects are not

always in the center

42


Saliency Maps do not help here

But they do here

43


Parameters are not independent.

44

45

46

47

Lifelogging Tools and Applications

AcknowledgementsFinancial Support

Noel E. O’Connor Cathal Gurrin

Albert Gil

48

49

50

f(Q) NNDR TVSS NNDR + I TVSS + I

Saliency Mask 0,201 0,217 0,210 0,226

Using Saliency Mask for target


Saliency Mask 0,176 0,271 0,190 0,279

Using Full Image for target


Saliency Mask 0,162 0,198 0,173 0,213

Using Center Bias for target

Visual Ranking - Descriptors

Bag of Words

A) There is a red ball in a white box.

B) The red box contains a white ball.

Sentence A

Sentence B

there 1 0is 1 0a 2 1

red 1 1ball 1 1in 1 0

white 1 1box 1 1the 0 1

contains 0 1

Cosine Similarity Score = 0.68

An example using text

51

52

+

Outline

1. Motivation2. Methodology3. Experiments4. Results

5. Conclusions

53

Conclusions● The system accomplishes its task.● Thresholding and temporal reranking have improved performance● Center Bias does not necessary improve performance.● Saliency Maps have improved performance.● Parameters are not independent when measuring with A-MRR.● Good baseline for further research.

54

Technology

Time sensitive egocentric image retrieval for finding objects in lifelogs