Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1© 2018 Toyota Research Institute. Public.
Beyond Supervised Driving
Adrien Gaidon (twitter: @adnothing)Machine Learning Lead & Senior Research Scientist
Toyota Research Institute (TRI), CA, USA
2© 2018 Toyota Research Institute. Public.
3© 2018 Toyota Research Institute. Public.
Agenda● Why Beyond Supervised Driving● Sim2Real adaptation: SPIGAN● Self-Supervised Learning of Depth
4© 2018 Toyota Research Institute. Public.
"Crowdsourced steering doesn't sound quite as appealing as self driving"
Credit: https://xkcd.com/1897/
5© 2018 Toyota Research Institute. Public.
But, but, … supervised learning Just Works™ !
ROI-10D: Monocular Lifting of2D Detection to 6D Pose and Metric Shape
F Manhardt, W Kehl, A Gaidonhttps://arxiv.org/abs/1812.02781
Learning to Fuse Things and Stuff
J Li, A Raventos, A Bhargava, T Tagawa, A Gaidonhttps://arxiv.org/abs/1812.01192
6© 2018 Toyota Research Institute. Public.
"The revolution will not be supervised" -- A. Efros
Credit: Ed Olson, May Mobility
Drive With Data. Anywhere. Anytime.
100M Cars, 95% Parked
AMOUNT OF DATA PER DAY
~10s PB
Machine Learning at Scale
Amount of Data
Perf
orm
ance
Deep Learning
Older Algorithms
supervise.ly
Machine Learning at Scale
Amount of Data
Perf
orm
ance
Deep Learning
Older Algorithms
Supervised LearningLearning with labeled data
supervise.ly
Machine Learning at Scale: Beyond Supervised Learning
Amount of Data
Perf
orm
ance
Supervised & Self-Supervised LearningLearning with labeled and “structured” unlabeled data
Deep Learning
Older Algorithms
Supervised LearningLearning with labeled data
GPU-Days
2017 2018 2019
13© 2018 Toyota Research Institute. Public.
Agenda● Why Beyond Supervised Driving● Sim2Real adaptation: SPIGAN● Self-Supervised Learning of Depth
14© 2018 Toyota Research Institute. Public.
SPIGANPrivileged
Adversarial Learningfrom Simulation
Kuan Lee, German Ros, Jie Li, Adrien GaidonICLR 2019 [arxiv]
15© 2018 Toyota Research Institute. Public.
Learning Using Simulator Privileged Information
15
Gaidon et al, "Virtual worlds as proxy for multiobject tracking analysis.", CVPR'16
Ros et al, "The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes", CVPR'16
de Souza et al, "Procedural Generation of Videos to Train Deep Action Recognition Networks.", CVPR'17
16© 2018 Toyota Research Institute. Public.
Network Architecture
16
17© 2018 Toyota Research Institute. Public.
Minimax Learning Objective
17
adversarial loss
privileged regularization
perceptual regularization(self-regularization)
task loss(this is what we care about)
18© 2018 Toyota Research Institute. Public.
where (resp. ) denotes the real-world (resp. synthetic) data distribution.
Least-squares Adversarial Loss
18
X. Mao et al. Multiclass generative adversarial networks with the l2 loss function. 2016.J.-Y. Zhu et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV 2017.
19© 2018 Toyota Research Institute. Public.
Task Loss: Cross-Entropy
19
synthetic imagerefined synthetic image ("fake")
20© 2018 Toyota Research Institute. Public.
Privileged Loss & Regularization
20
❏ Privileged auxiliary task loss: prediction of depth (from z-buffer)
❏ Perceptual regularization:
where Φ is a mapping from image spaceto a pre-determined feature space(we use VGG19 deep features)
21© 2018 Toyota Research Institute. Public.
Experiments
21
❏ Dataset :
❏ [Sim] Synthia: #train - 15k
❏ [Real] Cityscapes: #train - 3k, # test - 0.5k
❏ [Real] Vistas: #train - 16k, # test - 2k
❏ Settings:
❏ Task network T: FCN8s
❏ Standard resolution of 128×256
❏ 7 categories (flat, construction, object, nature, sky, human and vehicle.)
❏ Evaluation on mean intersection-over-union (mIoU)
22© 2018 Toyota Research Institute. Public.
Experiments: Synthia → Cityscapes
22
23© 2018 Toyota Research Institute. Public.
Experiments: Synthia → Cityscapes+Vistas
23
24© 2018 Toyota Research Institute. Public.
Experiments: negative transfer?
24
25© 2018 Toyota Research Institute. Public.
Experiments: Synthia → Cityscapes
25
26© 2018 Toyota Research Institute. Public.
Experiments: Synthia → Cityscapes
26
27© 2018 Toyota Research Institute. Public.
Experiments: Synthia → Vistas
27
28© 2018 Toyota Research Institute. Public.
Experiments: Synthia → Vistas
28
29© 2018 Toyota Research Institute. Public.
Hot off the press...Exploring the Limitations of Behavior Cloning for Autonomous Driving
F. Codevilla, E. Santana, AM. Lopez, A. Gaidonpaper: https://arxiv.org/abs/1904.08980 code: https://github.com/felipecode/coiltraine
30© 2018 Toyota Research Institute. Public.
Agenda● Why Beyond Supervised Driving● Sim2Real adaptation: SPIGAN● Self-Supervised Learning: SuperDepth
31© 2018 Toyota Research Institute. Confidential. Do not distribute.
● SuperDepth: Self-Supervised Monocular Depth○ Exploit large volumes of unlabeled, structured camera data
○ Training only requires unlabeled driving video data!
● Why MonoDepth?○ LiDAR: Expensive, Bulky○ Cameras
■ Rich semantic and geometric sensing■ Ubiquitous (2019 Toyota models)
Self-Supervised Learning at Toyota-scale
Toyota Safety Sense 2.0 Camera
32© 2018 Toyota Research Institute. Public.
Supervised Learning
Raw Data PredictionsModel
Target Value/Labels Loss
Easy to acquire
Expensive / Difficult to acquire
33© 2018 Toyota Research Institute. Public.
Self-Supervised Learning
Raw Data PredictionsModel
Prior Knowledge
Loss
Easy to acquire
34© 2018 Toyota Research Institute. Public.
Monocular Depth Estimation
Single RGB Image Predicted Depth Image
MonoDepth Network
35© 2018 Toyota Research Institute. Public.
SuperDepthSelf-Supervised,Super-Resolved
Monocular Depth EstimationSudeep Pillai, Rares Ambrus, Adrien Gaidon
ICRA 2019 [arxiv + video]
36© 2018 Toyota Research Institute. Public.
Self-Supervised Monocular Depth
DepthMonoDepth Network
Right
Left
Ste
reo
C
amer
a
Geometric Constraints
View SynthesisProxy Loss
C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monoculardepth estimation with left-right consistency,” CVPR 2017
37© 2018 Toyota Research Institute. Public.
Self-Supervised Depth Learning Objective
37
Photometric lossvia view-synthesis
Occlusion Regularization
Depth Regularization(edge-aware depth smoothing)
Depth Model Parameters
38© 2018 Toyota Research Institute. Public.
Overall objective
Multi-scale Photometric / Appearance Loss
Disparity Smoothness Loss
Occlusion Regularization Loss
Self-Supervised Objective
39© 2018 Toyota Research Institute. Public.
Photometric Loss ++
● Multi-scale photometric loss is limited by resolution● Super-resolve disparities → synthesize at high resolutions
Resolution Matters for View Synthesis!
40© 2018 Toyota Research Institute. Public.
SuperDepth: Depth Super-Resolution
A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, vol. 1, no. 10, p. e3, 2016.W. Shi, J. Caballero, F. Husza ́r, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super- resolution using
an efficient sub-pixel convolutional neural network,” CVPR 2016
41© 2018 Toyota Research Institute. Public.
Depth Super-Resolution
● Sub-pixel convolutions for disparity super-resolution (SP)○ Replace resize-convolutions with sub-pixel convolutions
Modified DispNet Architecture
42© 2018 Toyota Research Institute. Public.
Sub-pixel convolutions (SP), Differentiable Flip Augmentation (FA)
Disparity Estimation Performance
43© 2018 Toyota Research Institute. Public.
Pose Estimation Performance● Improved pose estimation performance
from accurate disparities● Scale recovery from a single camera● Improved photometric loss (crisp boundaries)
44© 2018 Toyota Research Institute. Public.
Qualitative Comparison to State-of-the-Art
45© 2018 Toyota Research Institute. Public.
Dense Monocular 3D Reconstruction
46© 2018 Toyota Research Institute. Public.
47© 2018 Toyota Research Institute. Public.
PackNet-SfM3D Packing for Self-Supervised
Monocular Depth EstimationV. Guizilini, R. Ambrus, S. Pillai, A. Gaidon
https://arxiv.org/abs/1905.02693
48© 2018 Toyota Research Institute. Public.
Self-Supervised Monocular Depth
DepthMonoDepth Network
frame t-1
frame t
Mo
nocu
lar
Vid
eo
Geometric Constraints
View SynthesisProxy Loss
C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monoculardepth estimation with left-right consistency,” CVPR 2017
49© 2018 Toyota Research Institute. Public.
PackNet for Self-Supervised Depth
● Series of residual blocks
● No downsampling: spatial information is packed
● No upsampling: unpack representations
● Super-resolution: of intermediate depths
Packing/Unpacking blocks
Packing block Unpacking block
Velocity Scaling
Experimental Results
Experimental Results
PackNet Experimental Results
56© 2018 Toyota Research Institute. Public.
Beyond Supervised Driving at TRI ● Why? Need all the data for true autonomy
● SPIGAN: unsupervised sim2real adaptation using
privileged information from the simulator
● SuperDepth + PackNet-SfM: resolution is key for
Self-Supervised Monocular Depth Estimation
Learning from Structured Unlabeled Data
Newton + Hinton
Conclusion
SAFETY ACCESS QUALITY OF LIFE
Toyota Research Institute (TRI)
AI Research for Toyota's Future Products
SAFETY ACCESS QUALITY OF LIFE
Guardian Chauffeur Robots
59© 2018 Toyota Research Institute. Public.
60© 2018 Toyota Research Institute. Public.
Thank You!