75
1 Vincent Lepetit University of Bordeaux, France & TU Graz, Austria Deep Learning for 3D Localization

Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

1!

!

Vincent Lepetit !

University of Bordeaux, France & TU Graz, Austria!

Deep Learning for 3D Localization !

Page 2: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

2!

•  3D object detection from color images; !!

•  Accurate geolocalization without registered images. !

Page 3: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

3!

BB8: 3D Pose Without Using Depth !

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. Mahdi Rad, Vincent Lepetit, ICCV 2017.!

Page 4: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

4!

Predicting the 3D Pose given a 2D Location of the object !

(Exponential

Map / quaternion /

rotation matrix,

Translation)

CNN

Solution #1: Directly predicting the pose!

Page 5: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

5!

Predicting 2D locations from an image is an easier regression task; !

We can compute the 3D pose from these 2D locations. !

We Can Do Better !

(the 2D

projections of

the 8 corners of

the 3D bounding

box)

CNN

Page 6: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

6!

à we can compute the 3D pose using a PnP algorithm. !

Getting the 3D Pose !

Page 7: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

7!

Training Set!Split the LINEMOD data: 15% of images for training, 85% for testing. !

!

Augmentation (200,000 images in total):!

extraction from real image !

random scaling ! random background !

random translation !

Other examples:!

Page 8: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

8!

CNN !

handle only 1 object, !OR !

VGG + retraining of the fully connected layers + fine-tuning of the last convolutional layers, can handle all 15 objects of the LINEMOD dataset. !

Page 9: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

9!

Direct Pose Estimation VS BB8 !

Direct Pose BB8

Average 64.9 85.4

Ape* 91.2 96.2

Bench Vise 61.3 80.2

Camera 43.1 82.8

Can 62.5 85.8

Cat* 93.1 97.2

Driller* 46.5 77.6

Duck 67.9 84.6

Egg Box 68.2 90.1

Glue 69.3 93.5

Hole Puncher 78.2 91.7

Iron 64.5 79.0

Lamp 50.4 79.9

Phone 46.9 80.0

2D projection metric !!

Page 10: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

10!

Refining the Pose !

input image!

3D model rendered from the current pose estimate !

CNN2 (2D displacements

improving the

current estimates

of the corners'

projections)

Page 11: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

11!

Refining the Pose !BB8 BB8 +

Refinement

Average 85.4 91.7

Ape 96.2 97.6

Bench Vise 80.2 92.0

Camera 82.8 88.3

Can 85.8 93.7

Cat 97.2 98.7

Driller 77.6 83.4

Duck 84.6 94.1

Egg Box 90.1 93.4

Glue 93.5 96.0

Hole Puncher 91.7 97.4

Iron 79.0 85.2

Lamp 79.9 83.8

Phone 80.0 88.8

Page 12: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

12!

Others [Brachmann16] VS Ours!

Metric 2D Projection 6D Pose 5cm and 5°

Sequence others ours others ours others ours

Average 73.7 89.4 50.2 62.8 40.6 69.0

Ape 85.2 96.5 33.2 40.2 34.4 80.0

Bench Vise 67.9 91.0 64.8 92.0 40.6 81.8

Camera 58.7 86.2 38.4 56.2 30.5 60.3

Can 70.8 92.1 62.9 64.6 48.4 77.1

Cat 84.2 98.7 42.7 62.3 34.6 79.6

Driller 73.9 80.7 61.9 74.1 54.5 69.3

Duck 73.1 92.4 30.2 44.8 22.0 53.6

Egg Box 83.1 91.1 49.9 58.1 57.1 81.3

Glue 74.2 92.5 31.2 41.6 23.6 54.2

Hole

Puncher

78.9 95.1 52.8 67.1 47.3 73.1

Iron 83.6 85.0 80.0 84.9 58.7 61.3

Lamp 64.0 75.5 67.0 76.3 49.3 67.4

Phone 60.6 85.1 38.1 53.9 26.8 58.4

Page 13: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

13!

Robustness to Partial Occlusion !

When generating training images, we randomly superimpose

objects from other sequences to the target object to be robust to

occlusion: !

Page 14: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

14!

Robustness to Partial Occlusion !

Page 15: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

15!

(Almost) Symmetric Objects !

Page 16: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

16!

T-Less Dataset [Hodan et al]: !(Almost) Symmetric Objects !

Page 17: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

17!

What Is the Problem?!

The network is trained to predict very different poses for very similar images, or exactly the same images if the

object is perfectly symmetrical. !

image space !pose space !

Page 18: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

18!

Solution (1)!Let β be the angle of symmetry of the object: !

1.  We train a regressor (Convolutional Neural Network in practice) to predict the projection of the 3D bounding box

as in our previous method BUT ONLY on a restricted

range: [0°, β/2].!

!

β = 180° in this object !

0° …! β/2 !β/4 … !

à no more ambiguities!

Page 19: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

19!

Solution (2)!

2. To handle larger ranges, we train a classifier (also a Convolutional Neural Network) to tell if the pose is between [0°,

β/2] or between [β/2, β]: !

β/4! 3β/4!

Page 20: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

20!

Solution (3)!At run-time, if the pose is between [β/2, β] (as given by the classifier), we flip the image before applying the regressor: !

β/4! 3β/4!

Page 21: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

21!

Solution (4)!(still at run-time), if we flipped the image before applying the regressor, we flip the corners of the bounding box predicted

by the regressor:!

!

!

!

!

!

!

!!

!

Page 22: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

22!

Page 23: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

23!

Robustness to Light Changes: Adaptive Local Contrast Normalization (ALCN)!

27

ALCN: Adaptive Local Contrast Normalization for Robust Object Detection and 3D Pose Estimation. Mahdi Rad, Peter Roth, Vincent Lepetit, BMVC 2017. !

Training Sequence! Test Sequence!

Page 24: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

24!

Existing Illumination Normalization Methods !

!

Difference-of-Gaussians!

Page 25: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

25!

Difference-of-Gaussians !

!!"#

= !!!!! ∗ ! − !!!!!! ∗ !!

Observation: The parameters should be carefully tuned for optimal performance !

Page 26: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

26!

Idea: The Parameters Should Adapt to the Input Image !

!!"#$

= ! !! ! .!!!!"#$

!

!!!

∗ !!

BUT we cannot train this CNN in a standard supervised manner.!

* I Normalizer

f =

f .

.

.

N

f 2

f 1

+ ALCN

I

Page 27: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

27!

Training !

Solution: train jointly in a supervised way together with another CNN !

* Image

window

Normalizer

f

Detector

g

+ + + = BG

Obj. #1

Obj. #N

f . . . N f 2 f 1

Page 28: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

28!

Training !

We augment the Phos dataset:!

with synthetic images: !

Page 29: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

29!

Comparing with Existing Methods !

!

Difference-of-Gaussians!

ALCN [our method] using 1 real image

of a target object!

Page 30: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

30!

Predicted Filters for Different Input !

(a)

35

Page 31: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

31!

Predicted Filters for different input !

(b)

35

Page 32: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

32!

Predicted Filters for different input !

(c)

35

Page 33: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

33!

Predicted Filters for different input !

(d)

35

Page 34: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

34!

Color Image Normalization!

abcolor

RGBimage

lightness Normalizedlightness

Normalizedcolor image

Page 35: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

35!

Explicit Normalization vs. !Illumination Robustness with Deep Learning !

!

AUC

Shallow 0.401

ALCN + Shallow 0.787

VGG 0.606

ResNet (20 Layers) 0.456

ResNet (32 Layers) 0.498

ResNet (44 Layers) 0.518

ResNet (56 Layers) 0.589

ResNet (110 Layers) 0.565

Page 36: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

36!

Evaluation !

w/o with illumination changes

sequence #1 #2 #3 #4 #5 #6 #7 #8

BB8 100 47.3 18.3 32.7 0.00 0.00 0.00 0.00

BB8 + ALCN 100 77.8 60.7 70.7 64.1 51.4 76.2 50.1

Green: Ground Truth Red: BB8

Blue: BB8 + ALCN

Page 37: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

37!

Accurate Localization from Images !Accurate Camera Registration in Urban Environments Using High-Level Feature Matching, Anil Armagan, Martin Hirzer, Peter Roth, Vincent Lepetit, BMVC 2017. !

Page 38: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

38!

reprojection of the 2D map using the pose from

the sensors !

Page 39: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

39!

Use registered images? Very cumbersome !!

Idea: Use 2.5D maps from OpenStreetMap!

Page 40: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

40!

In 2.5D maps, buildings are modeled as 2D polygons + height !

We Want to Use 2.5D Maps !

Page 41: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

41!

- sensors give accurate angles wrt gravity: !à we can work on rectified images.!

!

!

!

!

!

!

!

!

!

- we assume we know the altitude of the camera; !

!

Remain 3 degrees-of-freedom: !

2D translation + rotation along the ground plane !

Original image ! Rectified image!

Page 42: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

42!

?!2.5D map around the GPS location!

Page 43: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

43!

Matching High-Level Features !

Page 44: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

44!

Matching High-Level Features !

3 correspondences between building corners in the image and in the

2D map à pose (2d translation + angle) !

+ RANSAC!

Page 45: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

45!

Matching High-Level Features !

Page 46: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

46!

Matching High-Level Features !

1 correspondence between a façade in the image and a façade in the

2D map à pose (2d translation + angle) !

+ RANSAC!

Page 47: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

47!

Semantic Segmentation and !Façades' Normals Estimation !

Semantic Segmentation! Façades' Normal Estimation!

CNN1 CNN2

Page 48: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

48!

Creating Training Data !

3D Tracker!

with human supervision !

Image Sequences!

3D Model from 2.5D Map!

Façade, Corners, and Normals Labels !

Page 49: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

49!

The segmentation is robust to (limited) occlusions. !

Segmentation Results !

Page 50: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

50!

Locating the Buildings' Corners and Façades !

x-coordinates for the building corners !min and max x-coordinates for the façades !

orientation (1 angle) for the façades!

Page 51: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

51!

Selecting the Best Hypothesis: Maximum Likelihood!

maxpose

X

x

logPclass(Render(pose),x)(x)

Page 52: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

52!

Some Results !

Page 53: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

53!

Some More Results !

Page 54: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

54!

Efficient 3D Tracking in Urban Environments with Semantic

Segmentation, Martin Hirzer, Clemens Arth, Peter Roth, Vincent Lepetit,

BMVC 2017. !

Page 55: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

56!

Thanks for listening! !!

Questions? !!

Page 56: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

57!

Thanks for listening! !!

Questions? !!

Page 57: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

58!

Training Set!Split the LINEMOD data: 15% of images for training, 85% for testing. !

!

Augment the training set: !

1.  Extract the object from a training image; !

2.  Scale the segmented object; !

3.  Change the background with a random image from ImageNet;!

4.  Shift the object by some pixels.!

!

We generate 200,000 training images !

Page 58: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

59!

Results !

!

•  Rectangular cuboid: !

!

Previous method Ours

Green: Ground truth - Blue: Estimated Pose

Page 59: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

60!

Implementation !

Train two CNNs:!1.  Train a regressor using training images

between range of [0° - α, β/2 + α] !–  β is the symmetric angle (e.g. β = 180° for

rectangular cuboid.!

–  α << β!

–  α helps to have more accurate results for 0° and β. !

2.  Train a classifier to predict if the angle is between 0° and β/2 (class 1), or between β/2 and β (class 2). !

–  If it is detected as class 2, we rotate the image by β degree, then apply the regressor to predict BB. !

α

α

ω

1

1

2

2

β/2

Page 60: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

61!

Implementation !

Examples: !

!

!

β

Rectangular cuboid: β = 180°

Cylindrical: β = 0°, α = 0°

Page 61: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

62!

(the 2D projections of the 8

corners of the 3D bounding box)

kfΘ(

prediction for the i-th corner (2 values) !

network parameters !

Θ̂ = argminX

(W,p)∈TrainingSet

8X

i=1

kfΘ(W )[i]� pik2

CNN !

kfΘ(W )[

projection of the i-th corner !

Page 62: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

63!

But Before That, We Need to Find the Object in 2D !

W

Detector !

Page 63: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

64!

1. We split the input image into regions of size 128x128:!

128

128

384

512

Page 64: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

65!

2. We segment each region as an 8x8 !binary mask.!Each block of the masks corresponds to a 16x16 image window!

8

8

Page 65: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

66!

3. We only keep the largest component. !

Page 66: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

67!

4. We segment each active block again. !Decreases the uncertainty from 16px to 4px !

67

Page 67: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

68!

5. We use the centroid of the segmentation as the center of window W

68

Page 68: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

69!

Robustness to Light Changes: !Adaptive Local Contrast Normalization !

Normalization to illumination model: !

Page 69: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

70!

Comparing with Existing Methods !

!

37

Page 70: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

71!

From Image Window Normalization !to Whole Image Normalization !

!

!

!

!! ! !" = !!!(!!")!

Image I

!!"#$

= ! !!!!"#$ ∗ ! ! !∘ !

!

!!!

!!

!!"!"#$

= ! !! !!" .!!!!"#$

!

!!!

∗ !!" !Iij

Fk !! ! !" !

F(I) Normalizer

f

Page 71: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

72!

Image

I

Normalizer

f

F N

*

F 2

F 1

I

I

I

*

*

+

+

+

I

∘!

∘!

∘!

N ALCN

2 ALCN

1 ALCN

ALCN

From Image Window Normalization !to Whole Image Normalization !

40

Page 72: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

73!

Robustness to Light Changes: !Adaptive Local Contrast Normalization !

How we can detect the target object:!

•  Under challenging illumination conditions !

•  From very few training samples !

!

Example: !

28

Page 73: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

74!

Semantic Segmentation !

Page 74: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

76!

SegNet Video !

Page 75: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1

85!

Images with Various Illuminations After Filtering !