Beyond Supervised Driving - Intelligent Vehiclesintelligent-vehicles.org/wp-content/uploads/2019/06/Gaidon-Beyond... · Why? Need all the data for true autonomy SPIGAN: unsupervised

1© 2018 Toyota Research Institute. Public.

Beyond Supervised Driving

Adrien Gaidon (twitter: @adnothing)Machine Learning Lead & Senior Research Scientist

Toyota Research Institute (TRI), CA, USA



Agenda● Why Beyond Supervised Driving● Sim2Real adaptation: SPIGAN● Self-Supervised Learning of Depth


"Crowdsourced steering doesn't sound quite as appealing as self driving"

Credit: https://xkcd.com/1897/

https://xkcd.com/1897/


But, but, … supervised learning Just Works™ !

ROI-10D: Monocular Lifting of2D Detection to 6D Pose and Metric Shape

F Manhardt, W Kehl, A Gaidonhttps://arxiv.org/abs/1812.02781

Learning to Fuse Things and Stuff

J Li, A Raventos, A Bhargava, T Tagawa, A Gaidonhttps://arxiv.org/abs/1812.01192

https://docs.google.com/file/d/13nWtDkhnj-vCmbQvK31m7FPsNeM0ZBi8/preview

https://docs.google.com/file/d/1r0jcX1PET8P1LK0SffsIhMPL-zN-wnks/preview

https://arxiv.org/abs/1812.02781



"The revolution will not be supervised" -- A. Efros

Credit: Ed Olson, May Mobility

Drive With Data. Anywhere. Anytime.

100M Cars, 95% Parked

AMOUNT OF DATA PER DAY

~10s PB

Machine Learning at Scale

Amount of Data

Perf

orm

ance

Deep Learning

Older Algorithms

supervise.ly

http://supervise.ly

Machine Learning at Scale

Amount of Data

Perf

orm

ance

Deep Learning

Older Algorithms

Supervised LearningLearning with labeled data

supervise.ly

http://supervise.ly

Machine Learning at Scale: Beyond Supervised Learning

Amount of Data

Perf

orm

ance

Supervised & Self-Supervised LearningLearning with labeled and “structured” unlabeled data

Deep Learning

Older Algorithms

Supervised LearningLearning with labeled data

GPU-Days

2017 2018 2019


Agenda● Why Beyond Supervised Driving● Sim2Real adaptation: SPIGAN● Self-Supervised Learning of Depth


SPIGANPrivileged

Adversarial Learningfrom Simulation

Kuan Lee, German Ros, Jie Li, Adrien GaidonICLR 2019 [arxiv]



Learning Using Simulator Privileged Information

15

Gaidon et al, "Virtual worlds as proxy for multiobject tracking analysis.", CVPR'16

Ros et al, "The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes", CVPR'16

de Souza et al, "Procedural Generation of Videos to Train Deep Action Recognition Networks.", CVPR'17


Network Architecture

16


Minimax Learning Objective

17

adversarial loss

privileged regularization

perceptual regularization(self-regularization)

task loss(this is what we care about)


where (resp. ) denotes the real-world (resp. synthetic) data distribution.

Least-squares Adversarial Loss

18

X. Mao et al. Multiclass generative adversarial networks with the l2 loss function. 2016.J.-Y. Zhu et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV 2017.


Task Loss: Cross-Entropy

19

synthetic imagerefined synthetic image ("fake")


Privileged Loss & Regularization

20

❏ Privileged auxiliary task loss: prediction of depth (from z-buffer)

❏ Perceptual regularization:

where Φ is a mapping from image spaceto a pre-determined feature space(we use VGG19 deep features)


Experiments

21

❏ Dataset :

❏ [Sim] Synthia: #train - 15k

❏ [Real] Cityscapes: #train - 3k, # test - 0.5k

❏ [Real] Vistas: #train - 16k, # test - 2k

❏ Settings:

❏ Task network T: FCN8s

❏ Standard resolution of 128×256

❏ 7 categories (flat, construction, object, nature, sky, human and vehicle.)

❏ Evaluation on mean intersection-over-union (mIoU)


Experiments: Synthia → Cityscapes

22


Experiments: Synthia → Cityscapes+Vistas

23


Experiments: negative transfer?

24



25



26


Experiments: Synthia → Vistas

27


Experiments: Synthia → Vistas

28


Hot off the press...Exploring the Limitations of Behavior Cloning for Autonomous Driving

F. Codevilla, E. Santana, AM. Lopez, A. Gaidonpaper: https://arxiv.org/abs/1904.08980 code: https://github.com/felipecode/coiltraine

https://docs.google.com/file/d/1eMxgGCFLTv7h8ZAzgqot1zFEP3XcqK0D/preview


https://github.com/felipecode/coiltraine


Agenda● Why Beyond Supervised Driving● Sim2Real adaptation: SPIGAN● Self-Supervised Learning: SuperDepth

31© 2018 Toyota Research Institute. Confidential. Do not distribute.

● SuperDepth: Self-Supervised Monocular Depth○ Exploit large volumes of unlabeled, structured camera data

○ Training only requires unlabeled driving video data!

● Why MonoDepth?○ LiDAR: Expensive, Bulky○ Cameras

■ Rich semantic and geometric sensing■ Ubiquitous (2019 Toyota models)

Self-Supervised Learning at Toyota-scale

Toyota Safety Sense 2.0 Camera


Supervised Learning

Raw Data PredictionsModel

Target Value/Labels Loss

Easy to acquire

Expensive / Difficult to acquire


Self-Supervised Learning

Raw Data PredictionsModel

Prior Knowledge

Loss

Easy to acquire


Monocular Depth Estimation

Single RGB Image Predicted Depth Image

MonoDepth Network


SuperDepthSelf-Supervised,Super-Resolved

Monocular Depth EstimationSudeep Pillai, Rares Ambrus, Adrien Gaidon

ICRA 2019 [arxiv + video]


https://drive.google.com/open?id=1JjZYYwsqGz_2IpDOFtef3qpteBFUD5zS


Self-Supervised Monocular Depth

DepthMonoDepth Network

Right

Left

Ste

reo

C

amer

a

Geometric Constraints

View SynthesisProxy Loss

C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monoculardepth estimation with left-right consistency,” CVPR 2017


Self-Supervised Depth Learning Objective

37

Photometric lossvia view-synthesis

Occlusion Regularization

Depth Regularization(edge-aware depth smoothing)

Depth Model Parameters


Overall objective

Multi-scale Photometric / Appearance Loss

Disparity Smoothness Loss

Occlusion Regularization Loss

Self-Supervised Objective


Photometric Loss ++

● Multi-scale photometric loss is limited by resolution● Super-resolve disparities → synthesize at high resolutions

Resolution Matters for View Synthesis!


SuperDepth: Depth Super-Resolution

A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, vol. 1, no. 10, p. e3, 2016.W. Shi, J. Caballero, F. Husza ́r, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super- resolution using

an efficient sub-pixel convolutional neural network,” CVPR 2016


Depth Super-Resolution

● Sub-pixel convolutions for disparity super-resolution (SP)○ Replace resize-convolutions with sub-pixel convolutions

Modified DispNet Architecture


Sub-pixel convolutions (SP), Differentiable Flip Augmentation (FA)

Disparity Estimation Performance


Pose Estimation Performance● Improved pose estimation performance

from accurate disparities● Scale recovery from a single camera● Improved photometric loss (crisp boundaries)


Qualitative Comparison to State-of-the-Art


Dense Monocular 3D Reconstruction

https://docs.google.com/file/d/1unmSdUGWkVQFXFPZs0FrqRxzH2cebhK4/preview

https://docs.google.com/file/d/1FZK4AXoT1TW3KDs-fmHlWDXIA7Wek3D-/preview

https://docs.google.com/file/d/1SarChxrZ6HXqZmioNkM1eIADCm45hnr3/preview

https://docs.google.com/file/d/1sPk7ceJNJmJ6xklC4UcJFyKhs668pr-k/preview

https://docs.google.com/file/d/1e6nKybnLpoW6RlsEvU8pTa6T9W3Gw2L1/preview

https://docs.google.com/file/d/1l0j_B0DoufaLbuVqW7TtrotYU6GSJISK/preview


https://docs.google.com/file/d/1JjZYYwsqGz_2IpDOFtef3qpteBFUD5zS/preview


PackNet-SfM3D Packing for Self-Supervised

Monocular Depth EstimationV. Guizilini, R. Ambrus, S. Pillai, A. Gaidon




Self-Supervised Monocular Depth

DepthMonoDepth Network

frame t-1

frame t

Mo

nocu

lar

Vid

eo

Geometric Constraints

View SynthesisProxy Loss

C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monoculardepth estimation with left-right consistency,” CVPR 2017


http://www.youtube.com/watch?v=-N8QFtL3ees

PackNet for Self-Supervised Depth

● Series of residual blocks

● No downsampling: spatial information is packed

● No upsampling: unpack representations

● Super-resolution: of intermediate depths

Packing/Unpacking blocks

Packing block Unpacking block

Velocity Scaling

Experimental Results

Experimental Results

PackNet Experimental Results


Beyond Supervised Driving at TRI ● Why? Need all the data for true autonomy

● SPIGAN: unsupervised sim2real adaptation using

privileged information from the simulator

● SuperDepth + PackNet-SfM: resolution is key for

Self-Supervised Monocular Depth Estimation

Learning from Structured Unlabeled Data

Newton + Hinton

Conclusion

SAFETY ACCESS QUALITY OF LIFE

Toyota Research Institute (TRI)

AI Research for Toyota's Future Products

SAFETY ACCESS QUALITY OF LIFE

Guardian Chauffeur Robots


http://www.youtube.com/watch?v=T4S5gHUB7_Y&t=600


Thank You!

Documents

Beyond Supervised Driving - Intelligent Vehiclesintelligent-vehicles.org/wp-content/uploads/2019/06/Gaidon-Beyond... · Why? Need all the data for true autonomy SPIGAN: unsupervised