Deep Learning for 3D Localization - University of...

Vincent Lepetit !

University of Bordeaux, France & TU Graz, Austria!

Deep Learning for 3D Localization !

•  3D object detection from color images; !!

•  Accurate geolocalization without registered images. !

BB8: 3D Pose Without Using Depth !

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. Mahdi Rad, Vincent Lepetit, ICCV 2017.!

Predicting the 3D Pose given a 2D Location of the object !

(Exponential

Map / quaternion /

rotation matrix,

Translation)

Solution #1: Directly predicting the pose!

Predicting 2D locations from an image is an easier regression task; !

We can compute the 3D pose from these 2D locations. !

We Can Do Better !

(the 2D

projections of

the 8 corners of

the 3D bounding

à we can compute the 3D pose using a PnP algorithm. !

Getting the 3D Pose !

Training Set!Split the LINEMOD data: 15% of images for training, 85% for testing. !

Augmentation (200,000 images in total):!

extraction from real image !

random scaling ! random background !

random translation !

Other examples:!

handle only 1 object, !OR !

VGG + retraining of the fully connected layers + fine-tuning of the last convolutional layers, can handle all 15 objects of the LINEMOD dataset. !

Direct Pose Estimation VS BB8 !

Direct Pose BB8

Average 64.9 85.4

Ape* 91.2 96.2

Bench Vise 61.3 80.2

Camera 43.1 82.8

Can 62.5 85.8

Cat* 93.1 97.2

Driller* 46.5 77.6

Duck 67.9 84.6

Egg Box 68.2 90.1

Glue 69.3 93.5

Hole Puncher 78.2 91.7

Iron 64.5 79.0

Lamp 50.4 79.9

Phone 46.9 80.0

2D projection metric !!

Refining the Pose !

input image!

3D model rendered from the current pose estimate !

CNN2 (2D displacements

improving the

current estimates

of the corners'

projections)

Refining the Pose !BB8 BB8 +

Refinement

Average 85.4 91.7

Ape 96.2 97.6

Bench Vise 80.2 92.0

Camera 82.8 88.3

Can 85.8 93.7

Cat 97.2 98.7

Driller 77.6 83.4

Duck 84.6 94.1

Egg Box 90.1 93.4

Glue 93.5 96.0

Hole Puncher 91.7 97.4

Iron 79.0 85.2

Lamp 79.9 83.8

Phone 80.0 88.8

Others [Brachmann16] VS Ours!

Metric 2D Projection 6D Pose 5cm and 5°

Sequence others ours others ours others ours

Average 73.7 89.4 50.2 62.8 40.6 69.0

Ape 85.2 96.5 33.2 40.2 34.4 80.0

Bench Vise 67.9 91.0 64.8 92.0 40.6 81.8

Camera 58.7 86.2 38.4 56.2 30.5 60.3

Can 70.8 92.1 62.9 64.6 48.4 77.1

Cat 84.2 98.7 42.7 62.3 34.6 79.6

Driller 73.9 80.7 61.9 74.1 54.5 69.3

Duck 73.1 92.4 30.2 44.8 22.0 53.6

Egg Box 83.1 91.1 49.9 58.1 57.1 81.3

Glue 74.2 92.5 31.2 41.6 23.6 54.2

Puncher

78.9 95.1 52.8 67.1 47.3 73.1

Iron 83.6 85.0 80.0 84.9 58.7 61.3

Lamp 64.0 75.5 67.0 76.3 49.3 67.4

Phone 60.6 85.1 38.1 53.9 26.8 58.4

Robustness to Partial Occlusion !

When generating training images, we randomly superimpose

objects from other sequences to the target object to be robust to

occlusion: !

Robustness to Partial Occlusion !

(Almost) Symmetric Objects !

T-Less Dataset [Hodan et al]: !(Almost) Symmetric Objects !

What Is the Problem?!

The network is trained to predict very different poses for very similar images, or exactly the same images if the

object is perfectly symmetrical. !

image space !pose space !

Solution (1)!Let β be the angle of symmetry of the object: !

1.  We train a regressor (Convolutional Neural Network in practice) to predict the projection of the 3D bounding box

as in our previous method BUT ONLY on a restricted

range: [0°, β/2].!

β = 180° in this object !

0° …! β/2 !β/4 … !

à no more ambiguities!

Solution (2)!

2. To handle larger ranges, we train a classifier (also a Convolutional Neural Network) to tell if the pose is between [0°,

β/2] or between [β/2, β]: !

β/4! 3β/4!

Solution (3)!At run-time, if the pose is between [β/2, β] (as given by the classifier), we flip the image before applying the regressor: !

β/4! 3β/4!

Solution (4)!(still at run-time), if we flipped the image before applying the regressor, we flip the corners of the bounding box predicted

by the regressor:!

Robustness to Light Changes: Adaptive Local Contrast Normalization (ALCN)!

ALCN: Adaptive Local Contrast Normalization for Robust Object Detection and 3D Pose Estimation. Mahdi Rad, Peter Roth, Vincent Lepetit, BMVC 2017. !

Training Sequence! Test Sequence!

Existing Illumination Normalization Methods !

Difference-of-Gaussians!

Difference-of-Gaussians !

= !!!!! ∗ ! − !!!!!! ∗ !!

Observation: The parameters should be carefully tuned for optimal performance !

Idea: The Parameters Should Adapt to the Input Image !

= ! !! ! .!!!!"#$

∗ !!

BUT we cannot train this CNN in a standard supervised manner.!

* I Normalizer

+ ALCN

Training !

Solution: train jointly in a supervised way together with another CNN !

* Image

window

Normalizer

Detector

+ + + = BG

Obj. #1

Obj. #N

f . . . N f 2 f 1

Training !

We augment the Phos dataset:!

with synthetic images: !

Comparing with Existing Methods !

Difference-of-Gaussians!

ALCN [our method] using 1 real image

of a target object!

Predicted Filters for Different Input !

Predicted Filters for different input !

Color Image Normalization!

abcolor

RGBimage

lightness Normalizedlightness

Normalizedcolor image

Explicit Normalization vs. !Illumination Robustness with Deep Learning !

Shallow 0.401

ALCN + Shallow 0.787

VGG 0.606

ResNet (20 Layers) 0.456

Evaluation !

w/o with illumination changes

sequence #1 #2 #3 #4 #5 #6 #7 #8

BB8 100 47.3 18.3 32.7 0.00 0.00 0.00 0.00

BB8 + ALCN 100 77.8 60.7 70.7 64.1 51.4 76.2 50.1

Green: Ground Truth Red: BB8

Blue: BB8 + ALCN

Accurate Localization from Images !Accurate Camera Registration in Urban Environments Using High-Level Feature Matching, Anil Armagan, Martin Hirzer, Peter Roth, Vincent Lepetit, BMVC 2017. !

reprojection of the 2D map using the pose from

the sensors !

Use registered images? Very cumbersome !!

Idea: Use 2.5D maps from OpenStreetMap!

In 2.5D maps, buildings are modeled as 2D polygons + height !

We Want to Use 2.5D Maps !

- sensors give accurate angles wrt gravity: !à we can work on rectified images.!

- we assume we know the altitude of the camera; !

Remain 3 degrees-of-freedom: !

2D translation + rotation along the ground plane !

Original image ! Rectified image!

?!2.5D map around the GPS location!

Matching High-Level Features !

3 correspondences between building corners in the image and in the

2D map à pose (2d translation + angle) !

+ RANSAC!

Matching High-Level Features !

1 correspondence between a façade in the image and a façade in the

2D map à pose (2d translation + angle) !

+ RANSAC!

Semantic Segmentation and !Façades' Normals Estimation !

Semantic Segmentation! Façades' Normal Estimation!

CNN1 CNN2

Creating Training Data !

3D Tracker!

with human supervision !

Image Sequences!

3D Model from 2.5D Map!

Façade, Corners, and Normals Labels !

The segmentation is robust to (limited) occlusions. !

Segmentation Results !

Locating the Buildings' Corners and Façades !

x-coordinates for the building corners !min and max x-coordinates for the façades !

orientation (1 angle) for the façades!

Selecting the Best Hypothesis: Maximum Likelihood!

maxpose

logPclass(Render(pose),x)(x)

Some Results !

Some More Results !

Efficient 3D Tracking in Urban Environments with Semantic

Segmentation, Martin Hirzer, Clemens Arth, Peter Roth, Vincent Lepetit,

BMVC 2017. !

Thanks for listening! !!

Questions? !!

Thanks for listening! !!

Questions? !!

Training Set!Split the LINEMOD data: 15% of images for training, 85% for testing. !

Augment the training set: !

1.  Extract the object from a training image; !

2.  Scale the segmented object; !

3.  Change the background with a random image from ImageNet;!

4.  Shift the object by some pixels.!

We generate 200,000 training images !

Results !

•  Rectangular cuboid: !

Previous method Ours

Green: Ground truth - Blue: Estimated Pose

Implementation !

Train two CNNs:!1.  Train a regressor using training images

between range of [0° - α, β/2 + α] !–  β is the symmetric angle (e.g. β = 180° for

rectangular cuboid.!

–  α << β!

–  α helps to have more accurate results for 0° and β. !

2.  Train a classifier to predict if the angle is between 0° and β/2 (class 1), or between β/2 and β (class 2). !

–  If it is detected as class 2, we rotate the image by β degree, then apply the regressor to predict BB. !

Implementation !

Examples: !

Rectangular cuboid: β = 180°

Cylindrical: β = 0°, α = 0°

(the 2D projections of the 8

corners of the 3D bounding box)

prediction for the i-th corner (2 values) !

network parameters !

Θ̂ = argminX

(W,p)∈TrainingSet

kfΘ(W )[i]� pik2

kfΘ(W )[

projection of the i-th corner !

But Before That, We Need to Find the Object in 2D !

Detector !

1. We split the input image into regions of size 128x128:!

2. We segment each region as an 8x8 !binary mask.!Each block of the masks corresponds to a 16x16 image window!

3. We only keep the largest component. !

4. We segment each active block again. !Decreases the uncertainty from 16px to 4px !

5. We use the centroid of the segmentation as the center of window W

Robustness to Light Changes: !Adaptive Local Contrast Normalization !

Normalization to illumination model: !

Comparing with Existing Methods !

From Image Window Normalization !to Whole Image Normalization !

!! ! !" = !!!(!!")!

Image I

= ! !!!!"#$ ∗ ! ! !∘ !

!!"!"#$

= ! !! !!" .!!!!"#$

∗ !!" !Iij

Fk !! ! !" !

F(I) Normalizer

Normalizer

N ALCN

2 ALCN

1 ALCN

From Image Window Normalization !to Whole Image Normalization !

Robustness to Light Changes: !Adaptive Local Contrast Normalization !

How we can detect the target object:!

•  Under challenging illumination conditions !

•  From very few training samples !

Example: !

Semantic Segmentation !

SegNet Video !

Images with Various Illuminations After Filtering !

Deep Learning for 3D Localization - University of...

Documents

Aetiological Studies in Oesophageal Atresia / Tracheo-Oesophageal Fistula … · 2016-03-10 · The most frequent type is proximal atresia with a distal fistula (85.8%), followed

[XLS]upmsp.edu.in · Web view92.8 91.8 91.8 90 89.6 89.2 88.4 88 87.8 87.4 87.2 86.8 86.8 86.8 86.8 86.8 86.8 86.6 86.6 86.4 86.2 86.2 85.8 85.8 85.4 85.4 85.4 85.4 85.2 85.2 85 85

2019 NWSA Budget Adoption - Microsoft · Depreciation 2.2 7.4 6.8 13.5 Total Operating Expenses 85.9 101.7 98.8 112.7 Operating Income $ 109.1 $ 85.8 $ 88.9 $ 86.7 Maintenance –

93.1 Popis věci a osoby, pracovní postup

FOURTH QUARTER 2018hugin.info/136555/R/2234438/879292.pdf · › Polysilicon Sales Volumes 1,270MT (93.1% Increase vs. Q3’18)-(18.2%) Average Price Decrease vs. Q3’18-(16.6%)

別紙4 常陸河川国道事務所（一般土木C等級工事）における ...91.2% 92.6% 90.9% 92.1% 91.2% 96.7% 90.7% 99.2% 92.8% 89.8% 93.1% 94.1% 92.7% 92.0% 91.4% 96.4% 93.1%

Patientensicherheit - Stand und Herausforderungen · Sectio 93.1 79.9 Ja Appendektomie 65.0 57.3 Nein Cholezystektomie 69.6 58.0 Ja Colonchirurgie 66.7 71.2 Nein Inguinalhernienoperation

CAUH 70 1 100 97.93 KIDNEY 97 · casf 360 18 84.83 95.31 kidney 95.31 12 kidpanc 84.83 6 82.8 panc 82.8 casg 18 2 60 89.18 kidney 89.18 1 kidpanc 60 1 100 panc 100 cash 81 6 75 93.21

CHAPTER 20 · 2020-05-25 · HOW WE PERFORMED IN THE 2020 FINANCIAL YEAR 2020 2019 2018 Rm 5 069 4 604 4 200 10% Core income 20202019 2018 % 93.1 93.1 92.9 Revenue generated in South

ГОСТ Р 50601-93 Счетчики питьевой воды крыльчатые ... · 2012. 3. 1. · P 501 93.1—92 Hayepegue pacxoÃS B Ilana* Tpe00BaHH* ... 83 norpewgocTh

93.1 Nervová soustava a smysly

Palaeogeography, Palaeoclimatology, Palaeoecology...Tylosaurus Tethysaurus 99.6 93.5 89.3 85.8 83.5 70.6 65.5 Age (Ma) Cenomanian Turonian Coniac. Campanian Maastrichtian Globidensini

Invesco Ltd.d18rn0p25nwr6d.cloudfront.net/CIK-0000914208/3c9d6f47-0...Third-party capital invested into CIP 133.1 179.7 Third-party capital distributed by CIP (187.2 ) (85.8 ) Borrowings

1982 Annual Reportc46b2bcc0db5865f5a76-91c2ff8eba65983a1c33d367b8503d02.r78.… · Company Profile With sales of $2.4 billion (up from $1.6 billion a year ago) and earnings of $82.8

LEGO-PoL: A 93.1% 54V-1.5V 300A Merged-Two-Stage ...minjie/files/jaeil_compel19_poster.pdfLEGO-PoL: A 93.1% 54V-1.5V 300A Merged-Two-Stage Hybrid Converter with a Linear Extendable

10º Seminário Tecnológico - abraci.org.br · 10º Seminário Tecnológico ... 3X solid via plating 4X pressing RIM Gemini ASP 82.8 ... fernando.guerra@br.multek.com Telefone: 011-3205-8305

MLIT · 120 1015 100 80 60 40 20 97.9 90.1 88.8 88.6 86.3 82.8 81.7 77.4 77.3 70.2 . (fiEfiä) ObEfiã : fi70E : ŒEfiãj)

THE JAPSIAN CHRONICLEapsjalandhar.com/Newsletterjan.pdf · Anurudh Kumar 99.6% Harsimrat Singh Kahlon 97.6% Aakash Sharma 93.1% Rohan Rana 92.9% . ... Rotary Club President Rotarian

Retention Plan 2011. DDS 2008: 92.2% DDS 2009: 97.0% DDS 2010: 92.5% DH 2008: 82.8% DH2009: 83.3% DH 2010: 93.3%

Informe Pormenorizado del Estado del Control Interno · Componente de Gestión de Riesgo 82.8 Componente Actividades de Control 78.6 Componente Información y Comunicación 72.4 Componente