77
Developments of 3D Computer Vision Since 2017 Yihong Wu Institute of Automation Chinese Academy of Sciences ACCV, Perth, 2018.12

Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Developments of 3D Computer Vision Since 2017

Yihong WuInstitute of Automation

Chinese Academy of SciencesACCV, Perth, 2018.12

Page 2: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

➢ Preface

➢ Image matching

➢ Visual localization: PnP,SLAM

➢ 3D reconstruction: SfM,learning,RGBD

➢ Trends

Contents

Page 3: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

D. Marr (1945-1980): neuroscientist and physiologist, Computational visual

process: Vision. Freeman and Company, Oxford, 1982.

There are various academic awards and prizes named in his honour, Marr Prize

Preface

Three stages:

A primal sketch A 2.5D sketch A 3D model

Feature extraction:

Points, edges, regions

Viewer-centered three dimensional view of the environment

A continuous, 3-dimensional map

Page 4: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

3D vision is very important in computer vision

AR、VR、Robotics

Richard Hartley and Andrew Zisserman. Multiple View Geometry

in Computer Vision. Cambridge University Press, 2000/2004.

Yi Ma, Stefano Soatto, Jana Kosecka, and Shankar Sastry. An Invitation to

3-D Vision: From Images to Geometric Models. Interdisciplinary Applied

Mathematics #26, Springer, 2004.

Page 5: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

• 2016: AR、VR• 2017:Driverless car、robot、AGV、3D camera• 2017.6.5, Apple ARKit

• 2017. 8, Google ARCore

• 2018: Driverless car、robot 、AGV、3D camera• Boston Dynamics: Spotmini, 3D visual navigation

Page 6: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

SpotMini is a small four-legged robot. It weighs 25 kg (30 kg if you include the arm). SpotMini is all-electric and can go for about 90 minutes on a charge, depending on what it is doing. The sensor suite includes stereo cameras, depth cameras, an IMU, and position/force sensors in the limbs. These sensors help with navigation and mobile manipulation.

Page 7: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Imaging Process

Image plane

Camera coordinate

Z

X

Y

O

Mm

Similar but Different: projective geometry

Page 8: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Main tasks

•Image matching

•localization

•3D reconstruction

Page 9: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Image matching

Page 10: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Localization and 3D reconstruction

3D World

2D Image

Images captured by cameras

To compute camera pose To model the 3D environment structure

Page 11: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

➢ Preface

➢ Image matching

➢ Visual localization: PnP,SLAM

➢ 3D reconstruction: SfM,learning,RGBD

➢ Trends

Contents

Page 12: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Image matching

Feature detection

Descriptor extraction

Matching

Evaluation and datasset

Page 13: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

⚫ Overview• Traditional designed descriptors methods

Learning descriptor methods: deep learning

• Feature detection: deep learning

• In practice: Traditional methods

⚫ Learning feature detection⚫ CovDet:CNN learning Covariant feature, Zhang, Yu, Kumar, Chang,

CVPR2017

⚫ AffNet:CNN learning affine regions, Radenovic, Matas, arXiv2017

⚫ Yang Liu, Zhaowen Wang, Hailin Jin, Ian Wassell. Multi-Task Adversarial Network for Disentangled Feature Learning. CVPR 2018.

⚫ Haoliang Li, Sinno Jialin Pan, Shiqi Wang, Alex C. Kot. Domain Generalization with Adversarial Feature Learning. CVPR 2018.

Page 14: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics
Page 15: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

⚫ Learning descriptors:

⚫ L2Net:progressive sampling strategy,relative distance between descriptors and extra supervision. Tian, Fan, Wu,CVPR 2017

⚫ HardNet: improve L2Net,Mishchuk, Mishkin, Radenovic, Matas,NIPS 2017

⚫ DeepCD:learns a pair of complementary descriptors of binary and float,Yang, Hsu, Lin, Chuang, ICCV 2017

⚫ Spread-out:regularization term to maximize the spread in feature descriptor inspired by the property of uniform distribution,Zhang, Yu, Kumar, Chang,ICCV 2017 ( pairwise and triplet losses + regularization technique)

⚫ PPFNet: Global Context Aware Local Features for Robust 3D Point Matching. Haowen Deng, Tolga Birdal, Slobodan Ilic. CVPR 2018 (N-tuple loss, 3D point cloud)

⚫ Georgios Georgakis, Srikrishna Karanam, Ziyan Wu. End-to-End Learning of KeypointDetector and Descriptor for Pose Invariant 3D Matching. CVPR 2018 (depth image)

Page 16: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

L2Net:• Input: 32*32 patch Output: 128D, directly matched by L2 distance

• Network structure:

• 6 number of 3*3 convolutional layers;

• 1 number of 8*8 convolutional layers;

• CIC(Convolution in Convolution)only for training, similar features are matched, different features aren’t matched;

• LRN(Local Response Normalization)to normalize the network output

• Training:

• Cost function with multiple errors:

• Similar error:relative distance

• Error term for descriptor compactness: decrease redundancy among different dimensions for decreasing overfitting

• Error term for intermediate feature maps:for generalization ability without more parameters

• 2-4 hours,GPU

• Speed of descriptor extractions: 21.3 k patch/s

Page 17: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Patch retrieval

Patch classification

ECCV 2016 workshop “Local Features: state of the art, open problems and performance evaluation” , L2 net ranked No.1 in all tasks:image matching; image retrieval; image classification

Hpatches dataset

Page 18: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

⚫ Matching1. J. Bian, W. Lin, Y. Matsushita, S. Yeung, T. Nguyen, M. Cheng. GMS: Grid-based Motion

Statistics for Fast, Ultra-robust Feature Correspondence. CVPR2017 ( GMS (Grid-based Motion Statistics), grids, encapsulating motion smoothness as the statistical likelihood of a certain number of matches in a region, real time) NDL

2. Qianqian Wang, Xiaowei Zhou, Kostas Daniilidis. Multi-Image Semantic Matching by Mining Consistent Features. CVPR 2018. (low-rank constraint, geo.consistency, sparse points, for semantic correspondences,) NDL

3. A. Seki, M. Pollefeys. SGM-Nets: Semi-global matching with neural networks. CVPR 2017. KITTI, Stereo 2012, rank 10/ 32 improve to learn parameters: SGM: H. Hirschmuller. Stereo Processing by Semiglobal Matching and Mutual Information, PAMI 2008.

4. Johannes L. Schonberger, Sudipta N. Sinha, Marc Pollefeys. Learning to Fuse Proposals from Multiple Scanline Optimizations in Semi-Global Matching. ECCV 2018. (SGMForest, KITTI, 81)

5. Yan Huang, Qi Wu, Chunfeng Song, Liang Wang. Learning semantic concepts and order for image and sentence matching. CVPR 2018.

6. Andrei Zanfir, Cristian Sminchisescu. Deep Learning of Graph Matching. CVPR 2018. (Honorable Mention)

Page 19: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

7. Fudong Wang, Nan Xue, Yipeng Zhang, Xiang Bai, and Gui-Song Xia. Adaptively Transforming Graph Matching. ECCV 2018 (With a linear representation map of the transformation, the pairwise edge attributes of graphs --------→ unary node attributes, which enables us to reduce the space and time complexity significantly.)

8. Mohammed E. Fathy, Quoc-Huy Tran, M. Zeeshan Zia, Paul Vernaza, and Manmohan Chandraker. Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences. ECCV 2018. (While a metric loss applied to the deepest layer of a CNN, is often expected to yield ideal features irrespective of the task, in fact the growing receptive field as well as striding effects cause shallower features to be better at high precision matching tasks)

9. Yiran Zhong, Hongdong Li, Yuchao Dai. Open-World Stereo Video Matching with Deep RNN. ECCV 2018. (RNN, a continuous stereo video, 1) Feature-Net(CNN), (2) Match-Net(EncoDeco) --→ a depth-map, without a pre-training process, and without the need of ground-truth depth-maps as supervision)

Page 20: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Evaluation and dataset

⚫ Johannes L. Schonberger, Hans Hardmeier, Torsten Sattler, Marc Pollefeys. Comparative Evaluation of Hand-Crafted and Learned Local Features. CVPR 2017.

⚫ Dataset HpatchesBrownHpatches: data quality, evaluation methods

HPatches A benchmark and evaluation of handcrafted and learned local descriptor. V. Balntas, K. Lenc, A. Vedaldi, K. Mikolajczyk. In CVPR 2017

Page 21: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

➢ Preface

➢ Image matching

➢ Visual localization: PnP,SLAM

➢ 3D reconstruction: SfM,learning,RGBD

➢ Trends

Contents

Page 22: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Visual localization

Known 3D knowledge

Unknown 3D knowledge

General SLAM or

Point\line\edge\pla

ne SLAM

PnP, SLAM

relocalization

Semantic

SLAMEvent camera

SLAM

RGBD SLAM

SLAM/Pose with

learning

Marker SLAM

Camera+IMU

SLAMRelative pose

Page 23: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

• Known 3D knowledgeT. Sattler, A. Torii, J. Sivic, M. Pollefeys, H. Taira, M. Okutomi, T. Pajdla, Are Large-

Scale 3D Models Really Necessary for Accurate Visual Localization? CVPR 2017

(2D-based methods achieve the lowest localization accuracy, 3D-based methods offer more precise pose)

PnP,SLAM relocalization: 2D to 3D matching

RGBD SLAM: 3D to 3D matching

Learning SLAM; Marker SLAM

• Unknown 3D knowledgeSLAM, real-time and on line

Yihong Wu, Fulin Tang, Heping Li. Image Based Camera Localization: an Overview. Invited Paper by Visual Computing for Industry, Biomedicine and Art, 2018.

Complete overview for visual localization:

Page 24: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics
Page 25: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Known 3D knowledge: PnP, SLAM relocalization

1. Nathan Piasco, DésiréSidibé, Cédric Demonceaux, Valérie Gouet-Brunet. A survey on Visual-Based Localization: On the benefit of heterogeneous data. Pattern Recognition, 2018. (known environment, two distinct families: indirect(retrieval) and direct(6D))

2. Dylan Campbell, Lars Petersson, Laurent Kneip and Hongdong Li. Globally-Optimal Inlier Set Maximisation for Simultaneous Camera Pose and Feature Correspondence, ICCV 2017. (Marr Prize Honorable Mention)

3. Liu Liu, Hongdong Li, and Yuchao Dai. Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map, ICCV 2017.

4. Youji Feng, Yihong Wu, and Lixin Fan. Real-time SLAM Relocalization with On-line Learning of Binary Feature Indexing. Machine Vision and Applications, 2017.

5. T. Qin, P. Li and S. Shen. Relocalization, Global Optimization and Map Merging for MonocualrVisual-Inertial SLAM. ICRA 2018.

6. Linus Svarm, Olof Enqvist, Fredrik Kahl, Magnus Oskarsson. City-Scale Localization for Cameras with Known Vertical Direction. PAMI, 2018. CVPR 2014.

Large scale environment,heterogeneous data

Page 26: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Globally-Optimal Inlier Set Maximisation for Simultaneous Camera Pose and Feature Correspondence, ICCV 2017. (Marr Prize Honorable Mention)

Since a large proportion of outliers are common for this problem, we instead propose a globally-optimal inlier set cardinality maximisation approach which jointly estimates optimal camera pose and optimal correspondences.

Branchand-bound to search the 6D space of camera poses, guaranteeing global optimality without requiring a pose prior. to accelerate convergence.

4 Lemmas, 3 Theorems

Page 27: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Liu Liu, Hongdong Li, and Yuchao Dai. Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map, ICCV 2017

• G(V, E): V graph nodes, E graph edges

edge: the two points can be seen simultaneously from the same viewing point

node, a 3D point , visual words

node, a 3D point , visual words

Each edge is assigned a weight cij

For the i-th 3D point, denote the set of database images thatcontain this point as Ai. State transition matrix

Step 1. Build a Map-Graph

Page 28: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Step 2. Compute query vector

• Given a 2D image, compute:

a 2D query feature f and a 3D map point i

Page 29: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Step 3. Random walk on map-graph

Given a map-graph G(V,E) along with a state transition matrix C :a Markov Network or Markov Random Field

Once the iteration converges, they sort this steady-state probability vector in descending order, which gives the final “matchability” of every 3D point to the set of 2D query features.

Page 30: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

7. Federico Camposeco, Andrea Cohen, Marc Pollefeys, Torsten Sattler. Hybrid Camera Pose Estimation. CVPR 2018. (Hybrid Minimal Solvers , PnP, relative pose)

8. Pedro Miraldo, Tiago Dias, Srikumar Ramalingam. A Minimal Closed-Form Solution for Multi-Perspective Pose Estimation using Points and Lines. ECCV 2018.

9. Viktor Larsson, Zuzana Kukelova FEE, Yinqiang Zheng. Camera Pose Estimation with Unknown Principal Point. CVPR 2018. (minimal solvers P4.5Pfuv, P5Pfuva)

10. Carl Toft, Erik Stenborg, Lars Hammarstrand, Lucas Brynte, Marc Pollefeys, Torsten Sattler, Fredrik Kahl. Semantic Match Consistency for Long-Term Visual Localization. ECCV 2018.

11. Carl Toft, Carl Olsson, Fredrik Kahl. Long-term 3D Localization and Pose from Semantic Labellings. ICCV, 2017. (A sparse 3D model +semantically labelled points and curves)

12. Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, Akihiko Torii. InLoc: Indoor Visual Localization with Dense Matching and View Synthesis. CVPR 2018. (CNN based 2D image retrieval, CNN features and P3P combination to match between 3D and 2D, which can deal with textureless indoor scenes to some extent, indoor dataset)

13. Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, Fredrik Kahl, Tomas Pajdla. Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions. CVPR 2018 (benchmark datasets for outdoor, long-term localization is far from solved)

Page 31: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

• Yousif, K.; Taguchi, Y.; Ramalingam, S. MonoRGBD-SLAM: Simultaneous Localization and Mapping Using Both Monocular and RGBD Cameras. ICRA 2017.

• Helder J. Araujo et al. A Non-Rigid Map Fusion-Based RGB-Depth SLAM Method for Endoscopic Capsule Robots, 2017.

• Pyojin Kim, Brian Coltin, and H. Jin Kim. Linear RGB-D SLAM for Planar Environments. ECCV 2018. (jointly estimates camera position and planar landmarks in the map within a linear Kalman filter framework, and solve for the rotational motion of the camera using structural regularities in the Manhattan world)

• Aditya Dhawale, Kumar Shaurya Shankar, Nathan Michael. Fast Monte-Carlo Localization on Aerial Vehicles Using Approximate Continuous Belief Representations. CVPR 2018.

RGBD SLAM

Page 32: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Deep learning can improve camera localization robustness, having better performance in weak textures, illumination changes,season changes.

1. K. Tateno, F. Tombari, et al. CNN-SLAM: Real-time Dense Monocular SLAM with Learned Depth Prediction. CVPR 2017.

2. B. Ummenhofer, H. Z. Zhou, et al. DeMoN: Depth and Motion Network for Learning Monocular Stereo. CVPR 2017.

3. T. H. Zhou, M. Brown, N. Snavely, D.G. Lowe. Unsupervised Learning of Depth and Ego-Motion from video. CVPR 2017.

4. S. Vijayanarasimhan, S. Ricco, et al. Sfm-Net: Learning of Structure and Motion from Video. arXiv:1704.07804, 2017.

5. R. Li, S. Wang, et al. UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning. arXiv: 1709.06841,

2017.

6. D. Detone, T. Malisiewicz, A. Rabinovich. Toward Geometric Deep SLAM. arXiv:1707.07410, 2017.

7. R. Clark, S. Wang, et al. VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem. AAAI 2017.

8. Xiang Gao, Tao Zhang. Unsupervised learning to detect loops using deep neural networks for visual SLAM system. Auton

Robot (2017) 41:1–18.

9. Helder J. Araujo et al. Deep EndoVO: A Recurrent Convolutional Neural Network (RCNN) based Visual Odometry Approach

for Endoscopic Capsule Robots, Neurocomputing, 2017.

10. Reza Mahjourian, Martin Wicke, Anelia Angelova. Unsupervised Learning of Depth and Ego-Motion from Monocular

Video Using 3D Geometric Constraints. CVPR 2018.

11. Lang Wu, Yihong Wu, Heping Li. Similarity Hierarchy Based Place Recognition by Deep Supervised Hashing for

SLAM. Submitted to ICRA 2019.

SLAM/Pose determination with learning

Page 33: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

12. Nan Yang, Rui Wang, Jorg Stuckler, Daniel Cremers. Deep Virtual Stereo Odometry:

Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry. ECCV

2018. (Odometry)

13. R. Li, S. Wang, Z. Long, D. Gu. UnDeepVO: Monocular Visual Odometry Through

Unsupervised Deep Learning, ICRA 2018. (Odometry)

14. Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, Ian

Reid. Unsupervised Learning of Monocular Depth Estimation and Visual Odometry

with Deep Feature Reconstruction. CVPR 2018. (Odometry)

15. Yi Li , Gu Wang, Xiangyang Ji, Yu Xiang, and Dieter Fox. DeepIM: Deep Iterative

Matching for 6D Pose Estimation. ECCV 2018. (Object)

16. Fabian Manhardt, Wadim Kehl , Nassir Navab, and Federico Tombari. Deep Model-

Based 6D Pose Refinement in RGB. ECCV 2018. (object)

17. Bugra Tekin, Sudipta N. Sinha, Pascal Fua. Real-Time Seamless Single Shot 6D Object

Pose Prediction. CVPR 2018. (object)

Page 34: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

18. Eric Brachmann and Carsten Rother. Learning Less is More – 6D Camera Localization via 3D

Surface Regression. CVPR 2018. (fully convolutional neural network for densely regressing scene

coordinates, defining the correspondence between the input image and the 3D scene space.)

19. Jian Wu, Liwei Ma and Xiaolin Hu. Delving Deeper into Convolutional Neural Networks for

Camera Relocalization. ICRA 2017.

20. Sixing Hu Mengdan Feng Rang M. H. Nguyen Gim Hee Lee. CVM-Net: Cross-View Matching

Network for Image-Based Ground-to-Aerial Geo-Localization. CVPR 2018

21. Michael Bloesch, Jan Czarnowski, Ronald Clark, et al. CodeSLAM — Learning a Compact,

Optimisable Representation for Dense Visual SLAM. CVPR 2018. (Honorable Mention)

condition a depth autoencoder on intensity images, dense depth map estimate for a keyframe22. Peng Wang, Ruigang Yang, Binbin Cao, Wei Xu, Yuanqing Lin. DeLS-3D: Deep Localization and

Segmentation with a 3D Semantic Map. CVPR 2018. (Initial coarse camera pose from GPS/IMU -

→a label 3D semantic map, 3D semantic map+the RGB image--→pose CNN, to realize camera

pose determination )

23. Johannes L. Schonberger, Marc Pollefeys, Andreas Geiger, Torsten Sattler. Semantic Visual

Localization. CVPR 2018. (Learn descriptors for Bag of Semantic Words, Semantic Vocabulary

for Indexing and Search)

Page 35: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Marker SLAM1. Joseph DeGol, Timothy Bretl and Derek Hoiem. ChromaTag : A Colored Marker

and Fast Detection Algorithm. ICCV, 2017.

2. R. Munoz-Salinas, M.J. Marin-Jimenez, E. Yeguas-Bolivar, R. Medina-Carnicer.

Mapping and localization from planar markers. Pattern Recognition, Vol. 73, pp.

158-171, 2018.

3. Rafael Muñoz-Salinas, Manuel-Jesus Marin-Jimenez, Rafael Medina-Carnicer.

SPM-SLAM: Simultaneous Localization and Mapping with Squared Planar

Markers. Pattern Recognition, 2018.

4. Y. Wu. Circular marker SLAM. Patent, 2017. Pattern Recognition 2019. (without

matching and without PnP)

Page 36: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

AR Toolkit (IWAR1999)

ARTag (CVPR2005)

AprilTag (ICRA2011)

AprilTag 2 (IROS2016)

ChromaTag (ICCV2017)

3D

2D

matching PnP、RANSAC

Page 37: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

The problem:Are correspondences necessary?

Can we localize a camera without correspondences?

The proposed method:

Circular marker SLAM

Page 38: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics
Page 39: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

• AR/VR

• Robot grabbing at texture less circular objects

• Machine docking

Page 40: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Visual localization

Known 3D knowledge

Unknown 3D knowledge

General SLAM or

Point\line\edge\pla

ne SLAM

PnP, SLAM

relocalization

Semantic

SLAMEvent camera

SLAM

RGBD SLAM

SLAM/Pose with

learning

Marker SLAM

Camera+IMU

SLAM

Relative pose

Page 41: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Unknown 3D knowledge,SLAM

• C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J.J. Leonard. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE TRANSACTIONS ON ROBOTICS, 32(6), 2016.

• G. Younes, D. Asmar, E. Shammas, J. Zelek. Keyframe-based monocular SLAM: design, survey, and future directions. Robotics and Autonomous Systems, 98: 67–88, 2017.

• R. Li, S. Wang, D. Gu. Ongoing evolution of visual SLAM from geometry to deep learning: challenges and opportunities. Cognitive Computation, 2018.

Reviews:

Page 42: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Point\line\edge\plane fusion SLAM/General SLAM

Weak texture, illumination changes, empty corridor etc:Point, line, edge, plane fusion to improve camera localization robustness

1. A. Pumarola, A. Vakhitov, et al. PL-SLAM: Real-Time Monocular Visual SLAM with

Points and Lines. ICRA 2017.(point, line)2. S. C. Yang and S. Scherer. Direct Monocular Odometry Using Points and Lines. ICRA

2017. (point, line)

3. S. C. Yang, Y. Song, et al. Pop-up SLAM: Semantic Monocular Plane SLAM for Low-

texture Environments. IROS 2016.(plane)4. P. F. Proenca and Y. Gao. Probabilistic rgb-d odometry based on points lines and planes

under depth uncertainty.arXiv.org, 2017.(line, plane)5. K. Qiu, T. Liu and S. Shen. Model-based global localization for aerial robots using edge

alignment. IEEE Robotics and Automation Letters, 2(3):1256-1263, 2017. (edge)6. Y. Ling, M. Kuse and S. Shen. Edge alignment-based visual-inertial fusion for tracking

of aggressive motions. Autonomous Robots, pages 1-16, 2017. (edge)7. Yipu Zhao, Patricio A. Vela. Good Line Cutting: towards Accurate Pose Tracking of

Line-assisted VO/VSLAM. ECCV 2018.

Page 43: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

8. Y. Ling and S. Shen. Building maps for autonomous navigation using sparse visual

SLAM features. IROS 2017. (incremental SLAM, real-time dense mapping, and

free space extraction)

9. H. Wang, J. Lei, A. Li, and Y. Wu. A Geometry-based Point Cloud Reduction

Method for Mobile Augmented Reality System. JSCT 2018.

10. Titus Cieslewski, Siddharth Choudhary and Davide Scaramuzza. Data-Efficient

Decentralized Visual SLAM. ICRA 2018.

11. F.Tang, H. Li, Y. Wu. FMD Stereo SLAM: Fusing MVG and Direct Formulation

Towards Accurate and Fast Stereo SLAM. Submitted to ICRA, 2019.

12. R. Wang, M. Schworer and D. Cremers. Stereo DSO: Large-Scale Direct Sparse

Visual Odometry with Stereo Cameras. ICCV 2017. (stereo camera)

Page 44: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

FMD Stereo SLAM: Fusing MVG and Direct Formulation Towards Accurate and Fast Stereo SLAM

A novel framework fuses the advantages of direct and feature methods.

Both accuracy and speed are considered.

Page 45: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

FMD: 109 Hz

ORB SLAM:

15 Hz

FMD Stereo SLAM: Fusing MVG and Direct Formulation Towards Accurate and Fast Stereo SLAM

Public dataset EuRoc

Page 46: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras

Page 47: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics
Page 48: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Camera+IMU, other sensor SLAM

1. K. Sun, K. Mohta, et al. Robust Stereo Visual Inertial Odometry for Fast Autonomous Flight.

IEEE Robotics and Automation Letters, 2018. (stereo+IMU)

2. Yonggen Ling, Linchao Bao, Zequn Jie, Fengming Zhu, Ziyang Li, Shanmin Tang,

Yongsheng Liu, Wei Liu, and Tong Zhang. Modeling Varying Camera-IMU Time Offset in

Optimization-Based Visual-Inertial Odometry. ECCV 2018.

3. H. Rebecq, T. Horstschaefer and D. Scaramuzza. Real-time Visual-Inertial Odometry for

Event Cameras using Keyframe-based Nonlinear Optimization. BMVC 2017.

4. Niclas Zeller and Franz Quint and Uwe Stilla. Scale-Awareness of Light Field Camera based

Visual Odometry. ECCV 2018.

5. Luwei Yang, Feitong Tan, Ao Li, Zhaopeng Cui, Yasutaka Furukawa, Ping Tan. Polarimetric

Dense Monocular SLAM. CVPR 2018.

Multi cameras, multi kinds of sensors fusion to improve camera localization robustness

Page 49: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Semantic SLAM

Semantic SLAM:geometry with contents to improve SLAM accuracy and simultaneously to understand environments。

1. S.L. Bowman, N. Atanasov, et al. Probabilistic Data Association for Semantic SLAM. ICRA 2017. (One of the five Best Papers)

2. J. McCormac, A. Handa, et al. Semantic Fusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. ICRA 2017.

3. Saumitro Dasgupta. Object Detection for Semantic SLAM using Convolution Neural Networks. ICRA 2017.

Page 50: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Event Camera SLAM

Event camera:capture light intensity changes for each pixel.

1. G. Gallego, Jon E. A. Lund, et al. Event-based, 6-DOF Camera Tracking from Photometric Depth Maps. PAMI, 2017.

2. T. Rosinol Vidal, H. Rebecq, et al. Ultimate SLAM? Combining Events,Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios.IEEE Robotics and Automation Letters, 2018

3. H. Rebecq, T. Horstschaefer and D. Scaramuzza. Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization. BMVC 2017.

Page 51: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Relative pose

1. Ivan Eichhardt and Dmitry Chetverikov. Affine Correspondences between Central Cameras for Rapid Relative Pose Estimation. ECCV 2018.

2. Alexander Vakhitov, Victor Lempitsky, and Yinqiang Zheng. Stereo relative pose from line and point feature triplets. ECCV 2018. (point+line)

3. Pedro Miraldo, Tiago Dias, and Srikumar Ramalingam. A Minimal Closed-Form Solution for Multi-Perspective Pose Estimation using Points and Lines. ECCV 2018. (point+line)

4. Federico Camposeco, Andrea Cohen, Marc Pollefeys, Torsten Sattler. Hybrid Camera Pose Estimation. CVPR 2018. (PnP, relative pose)

5. Jesus Briales, Laurent Kneip, Javier Gonzalez-Jimenez. A Certifiably Globally Optimal Solution to the Non-Minimal Relative Pose Problem. CVPR 2018.

Page 52: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Deep learning methods are very active,but are less accurate or have lower generalization ability to larger scenes outside training data.

Traditional methods are still the main used methods in practice.

Page 53: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

➢ Preface

➢ Image matching

➢ Visual localization: PnP,SLAM

➢ 3D reconstruction: SfM,learning,RGBD

➢ Trends

Contents

Page 54: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

3D reconstruction

Structure from motion

Learning

Single image:learning depth

Multi images:Learning matching/ disparity

From depth camera:limited visual

scope; for non-rigid objects; for large environments or an entire object by RGBD SLAM

Others

Page 55: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Camera pose and structure computation

Structure from motion

Point detection and matching

Bundle adjustment

Epipolar geometry computation

⚫ Incremental

⚫ Global

⚫ Hybrid

Point cloud processing

Onur Ozyesil, Vladislav Voroninski, Ronen Basri, Amit SingerSurvey on Structure from MotionActa Numerica, 2017

Page 56: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Incremental

1. Hainan Cui, Shuhan Shen, Xiang Gao, Zhanyi Hu. Batched Incremental Structure-from-

Motion. 3DV 2017 (Speed up, selecting 3D points and verifying camera poses)

2. Joseph DeGol1, Timothy Bretl1, Derek Hoiem. Improved Structure from Motion Using

Fiducial Marker Matching. ECCV 2018. (Put markers in the scene, Higher accuracy,

limiting matches, changing order of images, enforcing new bundle adjustment constraints)

Global1. Hainan Cui, Shuhan Shen, Zhanyi Hu. Global Fusion of Generalized Camera Model for

Efficient Large-Scale Structure from Motion. Science China: Information Sciences,

2017. (multi-camera, initializes all cameras simultaneously)

2. Yi Zhou, Guillermo Gallego, Henri Rebecq, Laurent Kneip, Hongdong Li, Davide

Scaramuzza. Semi-Dense 3D Reconstruction with a Stereo Event Camera. ECCV 2018.

3. Anders Eriksson, Carl Olsson, Fredrik Kahl, Tat-Jun Chin. Rotation Averaging and

Strong Duality. CVPR 2018. (the role of duality principles within the problem ofrotation averaging, no duality gap)

Page 57: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Hybrid

Hainan Cui, Xiang Gao, Shuhan Shen, Zhanyi Hu. HSfM: Hybrid Structure-from-Motion. CVPR 2017.(rotation solvingis transformed as a linear L1 norm problems that is robust to outliers. After rotations are solved, the parameternumber decrease greatly. Then to solve only translations by incremental methods make errors decrease too.

⚫ Translation and rotation are estimated individually

⚫ Cameras are grouped, each group are reconstructed incrementally, then all groups are reconstructed globally

1. Hainan Cui, Shuhan Shen, Xiang Gao, Zhanyi Hu. CSfM: Community-based Structure from Motion, ICIP 2017.2. Siyu Zhu, Tianwei Shen, Lei Zhou, Runze Zhang, Jinglu Wang, Tian Fang, Long Quan. Parallel Structure from Motion

from Local Increment to Global Averaging. ICCV 2017.3. Siyu Zhu, Runze Zhang, Lei Zhou, Tianwei Shen, Tian Fang, Ping Tan, and Long Quan. Very Large-Scale Global SfM by

Distributed Motion Averaging. CVPR2018. (one PC, 1.2 m)

Page 58: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Bundle adjustment

1. Runze Zhang, Siyu Zhu, Tian Fang, Long Quan. Distributed Very Large Scale

Bundle Adjustment by Global Camera Consensus. 29-38, ICCV 2017. ( global consensus based on ADMM a general consensus framework regardless of the number of parameters of camera )

2. Hainan Cui, Shuhan Shen, Zhanyi Hu. Tracks Selection for Robust, Efficient and

Scalable Large-Scale Structure from Motion. PR 2017. ( Formulate the tracksselection task as finding a subset of tracks to cover multiple spanning trees ofepipolar geometry graph, decrease bundle adjustment constraint numbers andspeed up bundle adjustment greatly)

3. Haomin Liu, Mingyu Chen, Guofeng Zhang. ICE-BA: Incremental, Consistent and

Efficient Bundle Adjustment for Visual-Inertial SLAM. CVPR 2018. 10x speedup,release the source code)

Page 59: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Fusion from large view angles

1. Lei Zhou, Siyu Zhu, Tianwei Shen, Jinglu Wang,

Tian Fang, Long Quan. Progressive Large Scale-

Invariant Image Matching in Scale Space. ICCV

2017.

2. Xiang Gao, Lihua Hu, Hainan Cui, Shuhan Shen,

Zhanyi Hu. Accurate and Efficient Ground-to-

Aerial Model Alignment. Pattern Recognition,

76(4): 288-302, 2018.(point cloud from ground is projected as images viewed from aerial angles)

3. Yang Zhou, Shuhan Shen, Xiang Gao, Zhanyi Hu.

Accurate Mesh-based Alignment for Ground and

Aerial Multi-view Stereo Models. ICIP 2017.

A scale-invariant image matching approach to tackle the very large scale changes. From the scale space theory, encode image’s scale space into a compact multi-scale representation

Page 60: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Point cloud/3D point processing

1. Nan. et al., PolyFit: Polygonal Surface Reconstruction from Point Clouds, ICCV 2017

2. Kelly. et al., BigSUR: Large-scale Structured Urban Reconstruction, TOG 2017

3. Zhu. et al., Variational Building Modeling from Urban MVS Meshes, 3DV 2017

4. Yanping Fu, Qingan Yan, Long Yang, Jie Liao, Chunxia Xiao. Texture Mapping for 3D

Reconstruction with RGB-D Sensor. CVPR 2018.

5. Hang Su, Varun Jampani, Deqing Sun. SPLATNet: Sparse Lattice Networks for Point

Cloud Processing. CVPR 2018. (Honorable Mention)

6. Angela Dai, Daniel Ritchie, Martin Bokeloh, Scott Reed, Jurgen Sturm, Matthias

Nießner. ScanComplete: Large-Scale Scene Completion and Semantic Segmentation

for 3D Scans, CVPR 2018. (completion)

7. Wentao Yuan, Tejas Khot, David Held, Christoph Mertz, Martial Hebert. PCN: Point

Completion Network, 3DV 2018, Best Honorable Mention. (completion)

Mesh/texture mapping; complete; fuse/match

Page 61: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

7. Lei Zhou, Siyu Zhu, Zixin Luo, Tianwei Shen, Runze Zhang, Mingmin Zhen, Tian

Fang, and Long Quan. Learning and Matching Multi-View Descriptors for

Registration of Point Clouds. ECCV 2018.

8. Yinlong Liu, Chen Wang, Zhijian Song, Manning Wang. Efficient Global Point Cloud

Registration by Matching Rotation Invariant Features Through Translation Search.

ECCV 2018.

9. Haowen Deng,Tolga Birdal, Slobodan Ilic. PPFNet: Global Context Aware Local

Features for Robust 3D Point Matching. CVPR 2018.

10. Georgios Georgakis, Srikrishna Karanam, Ziyan Wu. End-to-End Learning of

Keypoint Detector and Descriptor for Pose Invariant 3D Matching. CVPR 2018.

11. A. Parra Bustos and T.-J. Chin. Guaranteed outlier removal for point cloud

registration with correspondences. PAMI, 2018. (reduce the input to a smaller set,which can reduce outerlier quickly and reliably)

Page 62: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Deep learning

1. Lei He, Guanghui Wang and Zhanyi Hu. Learning Depth from Single Images with DeepNeural Network Embedding Focal Length, IEEE Transactions on Image Processing, 2018.

2. Dan Xu, Wei Wang, Hao Tang, Hong Liu, Nicu Sebe, Elisa Ricci. Structured AttentionGuided Convolutional Neural Fields for Monocular Depth Estimation. CVPR, 2018.

3. Jianbo Jiao, Ying Cao, Yibing Song, Rynson Lau. Look Deeper into Depth: MonocularDepth Estimation with Semantic Booster and Attention-Driven Loss. ECCV 2018.

4. Zhenyu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, Jian Yang. Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation. ECCV 2018.

5. Dan Xu, Wanli Ouyang, Xiaogang Wang, Nicu Sebe. PAD-Net: Multi-Tasks GuidedPrediction-and-Distillation Network for Simultaneous Depth Estimation and SceneParsing. CVPR 2018.

6. Abhijit Kundu, Yin Li, James M. Rehg. 3D-RCNN: Instance-level 3D ObjectReconstruction via Render-and-Compare. CVPR 2018.

Single image

Page 63: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

7. Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Petros Daras. OmniDepth:Dense Depth Estimation for Indoors Spherical Panoramas. ECCV 2018.

8. Greire Payen de La Garanderie, Amir Atapour Abarghouei, Toby P. Breckon.Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular DepthEstimation to 360° Panoramic Imagery. ECCV 2018.

9. Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal,Ian Reid. Unsupervised Learning of Monocular Depth Estimation and VisualOdometry with Deep Feature Reconstruction. CVPR 2018.

10. Guandao Yang, Yin Cui, Serge Belongie, Bharath Hariharan. Learning Single-View3D Reconstruction with Limited Pose Supervision. ECCV 2018.

11. Zhichao Yin and Jianping Shi. GeoNet: Unsupervised Learning of Dense Depth,Optical Flow and Camera Pose. CVPR 2018.

Page 64: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

12. Jae-Han Lee, Minhyeok Heo, Kyung-Rae Kim, and Chang-Su. Single-Image Depth Estimation

Based on Fourier Domain Analysis. CVPR, 2018.

13. Chaoyang Wang, José Miguel Buenaposada, Rui Zhu, et al. Learning Depth From MonocularVideos Using Direct Methods. CVPR, 2018.

14. Zhao Chen, Vijay Badrinarayanan, Gilad Drozdov, Andrew Rabinovich. Estimating Depth from RGBand Sparse Sensing. ECCV 2018.

15. Dongwoo Lee, Haesol Park, In Kyu Park, Kyoung Mu Lee. Joint Blind Motion Deblurring and DepthEstimation of Light Field. ECCV 2018.

16. Despoina Paschalidou, Osman Ulusoy, Carolin Schmitt, Luc Van Gool, Andreas Geiger. RayNet:Learning Volumetric 3D Reconstruction With Ray Potentials. CVPR, 2018.

17. Yukang Gan, Xiangyu Xu, Wenxiu Sun, Liang Lin. Monocular Depth Estimation with Ainity, VerticalPooling, and Label Enhancement. ECCV 2018.

18. Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang. Learning Monocular Depth byDistilling Cross-domain Stereo Networks. ECCV 2018.

19. Xinjing Cheng, Peng Wang, Ruigang Yang. Depth Estimation via Affinity Learned withConvolutional Spatial Propagation Network. ECCV 2018.

20. Minhyeok Heo, Jaehan Lee, Kyung-Rae Kim, Han-Ul Kim, Chang-Su Kim. Monocular DepthEstimation Using Whole Strip Masking and Reliability-Based Refinement. ECCV 2018.

Page 65: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

21. Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Dacheng Tao. Deep Ordinal Regression Network for Monocular Depth Estimation. CVPR 2018.

22. Zhengqi Li, Noah Snavely. MegaDepth: Learning Single-View Depth Prediction From Internet Photos. CVPR, 2018.

23. Ayush Tewari, Michael Zollhöfer, Pablo Garrido, et al. Self-Supervised Multi-Level Face Model Learning for Monocular Reconstruction at Over 250 Hz. CVPR 2018.

24. Jogendra Nath Kundu, Phani Krishna Uppala, Anuj Pahuja, Venkatesh Babu. ADaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation. CVPR, 2018.

25. Amir Atapour-Abarghouei, Toby P. Breckon. Real-Time Monocular Depth Estimation Using Synthetic Data With Domain Adaptation via Image Style Transfer. CVPR, 2018.

26. Chengjie Niu, Jun Li, Kai Xu. Im2Struct: Recovering 3D Shape Structure From a Single RGB Image. CVPR, 2018.

27. Pratul P. Srinivasan, Rahul Garg, Neal Wadhwa, Ren Ng, Jonathan T. Barron. Aperture Supervision for Monocular Depth Estimation. CVPR 2018.

28. Clement Godard, Oisin Mac Aodha, Gabriel J. Brostow. Unsupervised Monocular Depth Estimation With Left-Right Consistency. CVPR 2017.

29. Chen Liu, Jimei Yang, Duygu Ceylan, Ersin Yumer, Yasutaka Furukawa. PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image. CVPR2018.

Page 66: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

PlaneNet: Piece-wise Planar Reconstruction from a Single RGB ImageCVPR 2018.

A end-to-end DNN learns to directly infer a set of plane parameters and corresponding plane segmentation masks from a single RGB image. More than 50,000 piece-wise planar depthmaps for training and testing from ScanNet, a largescale RGBD video database is generated.

Page 67: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Stereo/binocular: matching, disparity

1. A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach and A. Bry.End-to-End Learning of Geometry and Context for Deep Stereo Regression. ICCV 2017.(1. employ 3-D convolutions to regularize the cost volume;2. use a differentiable softargmin function to regress sub-pixel disparity)

2. J. Pang, W. Sun, J. Ren, C. Yang and Q. Yan: Cascade residual learning: A two-stageconvolutional neural network for stereo matching. ICCV 2017 (use color image andresidual network to refine the disparity estimation)

3. Z. Jie, P. Wang, Y. Ling, B. Zhao, Y. Wei, J. Feng and W. Liu . Left-Right ComparativeRecurrent Model for Stereo Matching. CVPR 2018 (introduce an soft attentionmechanism accompanying recurrent learning to simultaneously check consistency andselect proper regions for refinement)

Model accuracy: repeated patterns, occlusion areas, textureless regions, reflective surfaces, weak light ----→challenge problems Model GeneralizationModel speed

Page 68: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Stereo

4. J. Chang and Y. Chen . Pyramid Stereo Matching Network. CVPR 2018(1. introduce apyramid pooling module to incorporate hierarchical context information;2. stackedhourglass 3D-CNN to regularization)

5. Zhengfa Liang, Yiliu Feng, Yulan Guo, Hengzhu Liu, Wei Chen, Linbo Qiao, Li Zhou,Jianfeng Zhang. Learning for Disparity Estimation through Feature Constancy. CVPR2018 (use feature constancy to refine the initial disparity)

6. Aashish Sharma, Loong-Fah Cheong. Into the Twilight Zone: Depth Estimation usingJoint Structure-Stereo Optimization. ECCV 2018 (Weak light in twilight zone)

7. Eddy Ilg, Tonmoy Saikia, Margret Keuper, Thomas Brox. Occlusions, Motion and DepthBoundaries with a Generic Network for Disparity, Optical Flow or Scene FlowEstimation. ECCV 2018.

8. Satoshi Ikehata. CNN-PS: CNN-based Photometric Stereo for General Non-Convexsurface. ECCV 2018.

9. Guanying Chen, Kai Han, Kwan-Yee K. Wong. PS-FCN: A Flexible Learning Frameworkfor Photometric Stereo, ECCV 2018.

10. Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan. MVSNet: Depth Inference forUnstructured Multi-view Stereo. ECCV 2018.

Page 69: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

12. Yue Luo, Jimmy Ren, Mude Lin, Jiahao Pang, Wenxiu Sun, Hongsheng Li, Liang Lin. Single View Stereo Matching, CVPR 2018.

13. Po-Han Huang, Kevin Matzen, Johannes Kopf, Narendra Ahuja, Jia-Bin Huang. DeepMVS: Learning Multi-view Stereopsis. CVPR 2018.

14. Ke Xian, Chunhua Shen, Zhiguo Cao, Hao Lu, Yang Xiao, Ruibo Li, Zhenbo Luo Monocular Relative Depth Perception With Web Stereo Data Supervision. CVPR, 2018.

15. J. Pang, W. Sun, C. Yang, J. Ren, R. Xiao, J. Zeng and L. Lin. Zoom and Learn: GeneralizingDeep Stereo Matching to Novel Domains. CVPR2018 (formulate an iterative optimization problem with graph Laplacian regularization to generalize deep stereo matching to novel domains)

Page 70: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

16. S. Khamis, S. Fanello, C. Rhemann, A. Kowdle, J. Valentin, S. Izadi. StereoNet: GuidedHierarhical Refinement for Real-Time Edge-Aware Depth Prediction. ECCV 2018 (60fps onan Nvidia Titan X; gain the initial disparity at a very low resolution cost volume and refinethe disparity by a learned edge-aware upsampling function)

17. Alex Poms, Chenglei Wu, Shoou-I Yu, et al. Learning Patch Reconstructability forAccelerating Multi-View Stereo. CVPR, 2018.

18. Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon, Seon Joo Kim. EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images.CVPR 2018.

Tuple• Samarth Brahmbhatt, Jinwei Gu, Kihwan Kim, James Hays, Jan Kautz. Geometry-Aware Learning of Maps for Camera Localization. CVPR 2018.

Page 71: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

1. Suryansh Kumar, Yuchao Dai, Hongdong Li. The 1st Winner of “Non-Rigid Structure from

Motion Challenge 2017” @ CVPR 2017

2. Suryansh Kumar, Yuchao Dai, Hongdong Li. Spatial-temporal union of subspaces for multi-

body non-rigid structure-from-motion. Pattern Recognition, 2017. (An unified framework tojointly segment and reconstruct multiple non-rigid objects, along both temporal direction andspatial direction)

3. Suryansh, Yuchao Dai, Hongdong Li. Monocular Dense 3D Reconstruction of a Complex

Dynamic Scene from Two Perspective Frames, ICCV 2017.

4. Kangkan Wang, Guofeng Zhang, Shihong Xia. Templateless Non-Rigid Reconstruction and

Motion Tracking With a Single RGB-D Camera. IEEE Transactions on Image Processing,

26(12): 5966 – 5979, 2017. (local nonrigid bundle adjustment and global optimization)

5. Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, Gerard Pons-Moll. Video

Based Reconstruction of 3D People Models. CVPR, 2018.

6. Chao Li, Zheheng Zhao, Xiaohu Guo. ArticulatedFusion: Real-time Reconstruction of Motion,

Geometry and Segmentation Using a Single Depth Camera. ECCV 2018.

7. Antonio Agudo, Melcior Pijoan, Francesc Moreno-Noguer. 3D Reconstruction and Clustering

of Rigid and Non-Rigid Categories. CVPR, 2018.

Nonrigid or from RGBD:

Page 72: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

8. Qianyi Wu, Juyong Zhang, Yu-Kun Lai, Jianmin Zheng, Jianfei Cai. Alive Caricature

from 2D to 3D. CVPR 2018. (Caricature漫画, face)

9. Junho Jeon, Seungyong Lee. Reconstruction-based Pairwise Depth Dataset for Depth

Image Enhancement Using CNN. ECCV 2018.

10. Shi Yan, Chenglei Wu, Lizhen Wang, Feng Xu, Liang An, Kaiwen Guo, Yebin Liu.

DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras

Using Cascaded CNNs. ECCV 2018

11. Miroslava Slavcheva, Maximilian Baust, Slobodan Ilic. SobolevFusion: 3D

Reconstruction of Scenes Undergoing Free Non-Rigid Motion. CVPR, 2018.

12. Yinda Zhang, Thomas Funkhouser. Deep Depth Completion of a Single RGB-D

Image. CVPR2018. (predicts dense surface normals and occlusion boundaries,shiny, bright, transparent)

13. Jianwei Li, Wei Gao, Yihong Wu. Elaborate Scene Reconstruction with a Consumer

Depth Camera. International Journal of Automation and Computing,2017. (Higher

accuracy)

14. Jianwei Li, Wei Gao, Heping Li, Fulin Tang, Yihong Wu. Robust and Efficient CPU-

based RGB-D Scene Reconstruction. Sensors, 2018.

Page 73: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Others:

1. T. Schoeps, T. Sattler, C. Haene, M. Pollefeys. Large-scale outdoor 3D reconstruction on a

mobile device. CVIU 2017. (filter based method)2. C. Haene, C. Zach, A. Cohen, M. Pollefeys, Dense Semantic 3D Reconstruction, PAMI, 2017.

( volumetric )3. Zhaopeng Cui, Jinwei Gu, Boxin Shi, Ping Tan and Jan Kautz. Polarimetric Multi-View

Stereo. CVPR 2017. (textless object reconstruction by a novel polarization method)

4. Michal Polic, Wolfgang Forstner, and Tomas Pajdla. Fast and Accurate Camera Covariance

Computation for Large 3D Reconstruction. ECCV 2018.

5. N Dinesh Reddy Minh Vo Srinivasa G. Narasimhan. CarFusion: Combining Point Tracking and

Part Detection for Dynamic 3D Reconstruction of Vehicles. CVPR 2018.

6. Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem. Reconstructing the 3D Room Layout

From a Single RGB Image. CVPR, 2018.

7. Yang Yang, Shi Jin, Ruiyang Liu, Sing Bing Kang, Jingyi Yu. Automatic 3D Indoor Scene

Modeling from Single Panorama. CVPR 2018.

Page 74: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

8. Shiwei Li, Yao Yao, Tian Fang, Long Quan. Reconstructing Thin Structures

of Manifold Surfaces by Integrating Spatial Curves. CVPR 2018.

9. Alessandro Vianello, Jens Ackermann, Maximilian Diebold. Robust Hough

Transform Based 3D Reconstruction From Circular Light Fields. CVPR,

2018.

10. Qilin Sun, Xiong Dun, Yifan Peng, Wolfgang Heidrich. Depth and Transient

Imaging With Compressive SPAD Array Cameras. CVPR, 2018. (toovercome the spatial resolution limit of SPAD arrays by employing acompressive sensing camera design. And then the depths are reconstructedby a method of TVAL3.)

Page 75: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

Development trends

• Traditional computation and learning fusion:

• Multi sensor fusion: camera, IMU etc.

• Combination with hardware:depth camera,3D camera,eventcamera, light field camera…

M. Abouzahir, A. Elouardi, R. Latif, S. Bouaziz, A. Tajer. Embedding SLAM algorithms: Has it come of age? Robotics and Autonomous Systems 100 (2018) 14–26.

• Combination with applications:

AGV,driverless car,robotics,AR,VR

• Brain inspired:

Semantic+ but far away to simulate human intelligenceMatchingLocalization3D Map

Page 76: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

D. Marr (1945-1980) Vision. Freeman and Company, Oxford, 1982.

A primal sketch A 2.5D sketch A 3D model

Feature extraction:

Points, edges, regions

Viewer-centered three dimensional view of the environment

A continuous, 3-dimensional map

Traditional computation and learning fusion

Not limited in the higher level, fusing learning into the computation of each level is a trend. Marr’s vision frame is still the main stream. ----Yihong Wu, NLPR of IA of CAS

After thirty years, Tomaso Poggio adds one higher level beyond the

computational level, that is the learning.

I am not sure that Marr would agree, but I am tempted to add

learning as the very top level of understanding, above the

computational level. Only then may we be able to build

intelligent machines that could learn to see—and think—without

the need to be programmed to do it.

— Tomaso Poggio, Vision (2010, The MIT Press), Afterword,

P.367

Page 77: Developments of 3D Computer Vision Since 2017vision.ia.ac.cn/Faculty/yhwu/papers/3D Vision Since 2017... · 2019. 1. 16. · 3D vision is very important in computer vision AR、VR、Robotics

[email protected]://vision.ia.ac.cn/Faculty/yhwu/index.htm

模式识别国家重点实验室宣传小组模识识别国家重点实验室综合办公室2014年10月31日 制作