Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Monocular 3D Object Detection with Pseudo-LiDAR Point CloudXinshuo Weng, Kris Kitani
Carnegie Mellon University{xinshuow, kkitani}@cs.cmu.edu
Background & Motivation
•Goal (Monocular 3D Object Detection): estimatethe object size (width, height, length), heading angleand center location (x, y, z) in 3D space from asingle input image.
•Modern day methods for 3D object detection re-quire the use of a 3D sensor (e.g., LiDAR). On theother hand, single image based methods have sig-nificantly worse performance.
•To bridge the performance gap between 3D sens-ing and 2D sensing for 3D object detection, we in-troduce an intermediate 3D point cloud represen-tation of the data, referred to as “pseudo-LiDAR”,which is achieved by lifting image pixels to 3D spacebased on the estimated depth.
• In order to handle the large amount of noise existingin the generated pseudo-LiDAR caused by inaccu-rate depth estimation, we propose two innovations:(1) use the instance mask instead of the boundingbox as the representation of 2D proposals; 2) use a2D-3D bounding box consistency (BBC) constraint.
Proposed Pipeline & Results
SegmentedPoint Cloud
Estimated DepthMonocular DepthEstimation Network
3D Box Estimation Module
Point CloudFrustum
3D BoundingBox Loss Lbox3d
3D Point CloudSegmentation
InstanceSegmentation
Network
Pseudo-LiDAR
3D Box CorrectionModule
Center (x, y, z)Size h, w, l
Heading angle !3D Bounding
Box Prediction
Center (Δx, Δy, Δz)Size Δh, Δw, Δl
Heading angle Δ!PredictionCorrection
+
ProjectionBounding BoxConsistency Loss
(BBCL) Lbbc
3D SegmentationLoss Lseg3d
Instance MaskProposals
2D ProposalLoss Lpp2d
Initial Estimate
Final Estimate
Input Image
(a) Pseudo-LiDAR Generation
(b) 2D Instance Mask Proposal Detection (c) Amodal 3D Object Detection with 2D-3D Bounding Box Consistency
CameraMatrix
CameraMatrix
Addition
Concatenation
(a) Lift every pixel of input image to 3D coordinates given estimated depthto generate pseudo-LiDAR;(b) Instance mask proposals detected for extracting point cloud frustum;(c) 3D bounding box estimated (blue) for each point cloud frustum madeto be consistent with corresponding 2D proposal.
Qualitative Results
Quantitative Results
Method APBEV / AP3D (in %), IoU = 0.5 APBEV / AP3D (in %), IoU = 0.7Easy Moderate Hard Easy Moderate Hard
ROI-10D [1] 46.9 / 37.6 34.1 / 25.1 30.5 / 21.8 14.5 / 9.6 9.9 / 6.6 8.7 / 6.3MonoGRNet [2] - / 50.5 - / 37.0 - / 30.8 - / 13.9 - / 10.2 - / 7.6MLF-MONO [4] 55.0 / 47.9 36.7 / 29.5 31.3 / 26.4 22.0 / 10.5 13.6 / 5.7 11.6 / 5.4PL-MONO [3] 70.8 / 66.3 49.4 / 42.3 42.7 / 38.5 40.6 / 28.2 26.3 / 18.5 22.9 / 16.4
Ours 72.1 / 68.4 53.1 / 48.3 44.6 / 43.0 41.9 / 31.5 28.3 / 21.0 24.5 / 17.5
Analysis
Effectiveness of Instance Mask Proposal
•Conclusion: lifting only the pixels within theinstance mask proposal on the right sig-nificantly removes the points not being en-closed by the ground truth box.
Effect of Bounding Box Consistency
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
•Conclusion: by adjusting the 3D boundingbox estimate in 3D space so that its 2D pro-jection can have a higher 2D IoU with thecorresponding 2D proposal, we demonstratethat the 3D IoU of 3D bounding box estimatewith its ground truth can be also increased.
Take-Home Message
•With the proposed instance mask proposaland bounding box consistency, monocular3D detection using pseudo-LiDAR repre-sentation can achieve much higher perfor-mance than direct regression on images.
[1] F. Manhardt, W. Kehl, and A. Gaidon. ROI-10D: Monocular Lifting of 2D Detection to 6DPose and Metric Shape. CVPR, 2019.
[2] Z. Qin, J. Wang, and Y. Lu. MonoGRNet: A Geometric Reasoning Network for Monocular3D Object Localization. AAAI, 2018.
[3] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Weinberger. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Au-tonomous Driving. CVPR, 2019.
[4] B. Xu and Z. Chen. Multi-Level Fusion based 3D Object Detection from Monocular Im-ages. CVPR, 2018.
IEEE International Conference on Computer Vision (ICCV) Workshops, 2019.