Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud · 2020. 8. 18. · Goal (Monocular 3D Object Detection): estimate the object size (width, height, length), heading angle

Monocular 3D Object Detection with Pseudo-LiDAR Point CloudXinshuo Weng, Kris Kitani

Carnegie Mellon University{xinshuow, kkitani}@cs.cmu.edu

Background & Motivation

•Goal (Monocular 3D Object Detection): estimatethe object size (width, height, length), heading angleand center location (x, y, z) in 3D space from asingle input image.

•Modern day methods for 3D object detection re-quire the use of a 3D sensor (e.g., LiDAR). On theother hand, single image based methods have sig-nificantly worse performance.

•To bridge the performance gap between 3D sens-ing and 2D sensing for 3D object detection, we in-troduce an intermediate 3D point cloud represen-tation of the data, referred to as “pseudo-LiDAR”,which is achieved by lifting image pixels to 3D spacebased on the estimated depth.

• In order to handle the large amount of noise existingin the generated pseudo-LiDAR caused by inaccu-rate depth estimation, we propose two innovations:(1) use the instance mask instead of the boundingbox as the representation of 2D proposals; 2) use a2D-3D bounding box consistency (BBC) constraint.

Proposed Pipeline & Results

SegmentedPoint Cloud

Estimated DepthMonocular DepthEstimation Network

3D Box Estimation Module

Point CloudFrustum

3D BoundingBox Loss Lbox3d

3D Point CloudSegmentation

InstanceSegmentation

Network

Pseudo-LiDAR

3D Box CorrectionModule

Center (x, y, z)Size h, w, l

Heading angle !3D Bounding

Box Prediction

Center (Δx, Δy, Δz)Size Δh, Δw, Δl

Heading angle Δ!PredictionCorrection

+

ProjectionBounding BoxConsistency Loss

(BBCL) Lbbc

3D SegmentationLoss Lseg3d

Instance MaskProposals

2D ProposalLoss Lpp2d

Initial Estimate

Final Estimate

Input Image

(a) Pseudo-LiDAR Generation

(b) 2D Instance Mask Proposal Detection (c) Amodal 3D Object Detection with 2D-3D Bounding Box Consistency

CameraMatrix

CameraMatrix

Addition

Concatenation

(a) Lift every pixel of input image to 3D coordinates given estimated depthto generate pseudo-LiDAR;(b) Instance mask proposals detected for extracting point cloud frustum;(c) 3D bounding box estimated (blue) for each point cloud frustum madeto be consistent with corresponding 2D proposal.

Qualitative Results

Quantitative Results

Method APBEV / AP3D (in %), IoU = 0.5 APBEV / AP3D (in %), IoU = 0.7Easy Moderate Hard Easy Moderate Hard

ROI-10D [1] 46.9 / 37.6 34.1 / 25.1 30.5 / 21.8 14.5 / 9.6 9.9 / 6.6 8.7 / 6.3MonoGRNet [2] - / 50.5 - / 37.0 - / 30.8 - / 13.9 - / 10.2 - / 7.6MLF-MONO [4] 55.0 / 47.9 36.7 / 29.5 31.3 / 26.4 22.0 / 10.5 13.6 / 5.7 11.6 / 5.4PL-MONO [3] 70.8 / 66.3 49.4 / 42.3 42.7 / 38.5 40.6 / 28.2 26.3 / 18.5 22.9 / 16.4

Ours 72.1 / 68.4 53.1 / 48.3 44.6 / 43.0 41.9 / 31.5 28.3 / 21.0 24.5 / 17.5

Analysis

Effectiveness of Instance Mask Proposal

•Conclusion: lifting only the pixels within theinstance mask proposal on the right sig-nificantly removes the points not being en-closed by the ground truth box.

Effect of Bounding Box Consistency

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

•Conclusion: by adjusting the 3D boundingbox estimate in 3D space so that its 2D pro-jection can have a higher 2D IoU with thecorresponding 2D proposal, we demonstratethat the 3D IoU of 3D bounding box estimatewith its ground truth can be also increased.

Take-Home Message

•With the proposed instance mask proposaland bounding box consistency, monocular3D detection using pseudo-LiDAR repre-sentation can achieve much higher perfor-mance than direct regression on images.

[1] F. Manhardt, W. Kehl, and A. Gaidon. ROI-10D: Monocular Lifting of 2D Detection to 6DPose and Metric Shape. CVPR, 2019.

[2] Z. Qin, J. Wang, and Y. Lu. MonoGRNet: A Geometric Reasoning Network for Monocular3D Object Localization. AAAI, 2018.

[3] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Weinberger. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Au-tonomous Driving. CVPR, 2019.

[4] B. Xu and Z. Chen. Multi-Level Fusion based 3D Object Detection from Monocular Im-ages. CVPR, 2018.

IEEE International Conference on Computer Vision (ICCV) Workshops, 2019.

Documents

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud · 2020. 8. 18. · Goal (Monocular 3D Object Detection): estimate the object size (width, height, length), heading angle