Practical Modeling and Recognition using R G B - D C ameras

Practical Modeling and Recognition using RGB-D Cameras

Joint work with Liefeng Bo, Kevin Lai, Peter Henry, Evan Herbst, Mike Krainin, Hao Du and others @ University of Washington

Xiaofeng Ren, Dieter FoxIntel Labs, University of Washington

June 27, 2011

2

RGB-D Camera: Color+Depth

04/19/2023

640x480, 30Hz, color + dense depth

3

At RGB-D 2010 Workshop:

3D modeling of indoor environments

RGBD-ICP matching + Loop closure; Flythrough visualization

3D modeling of everyday objects

Robot in-hand modeling through real-time registration and modeling

Robust recognition of everyday objects

Preliminary object dataset captured with RGB-D

Preliminary results on sparse distance learning

04/19/2023

4

RGB-D Perception @ UW and Intel

3D modeling of objects & environmentsIndoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10]

Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11]

Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ‘11]

Object Manipulation: [Krainin, Henry, Ren, Fox; IJRR ’10]

Interactive 3D Visualization: [Cheng, Ren; ’11]

Robust recognition of everyday objectsEgocentric recognition: [Ren, Gu; CVPR ’10]

Joint object-pose recognition: [Gu, Ren; ECCV ’10]

Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10, IROS ’11]

Hierarchical Kernel Descriptors: [Bo, Lai, Ren, Fox; CVPR ’11]

RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11]

Sparse distance learning: [Lai, Bo, Ren, Fox; ICRA ’11] (best vision paper)

Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]

04/19/2023

5




Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ’11]





Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10]





04/19/2023

6

RGB-D Mapping: Pipeline

04/19/2023

7 04/19/2023 [Henry-Krainin-Herbst-Ren-Fox]

8

Comparing to Laser-based Mapping

04/19/2023

9

From RGB-D to Interactive Modeling

04/19/2023 [Du-Henry-Ren-Fox-Goldman-Seitz; Ubicomp 11]

Discovering and Learning Objects

04/19/202310 [Herbst-Henry-Ren-Fox; ICRA 2011]


04/19/202311

• (Robot) capturing scenes in RGB-D over extended period of time• 3D scene reconstruction for efficient representation• Proper sensor models for both color and depth• Pairwise scene differencing with sensor models and MRF clean-up

[Herbst-Henry-Ren-Fox; ICRA 2011]


04/19/202312 [Herbst-Ren-Fox; IROS 2011]

• Handling changed detections in multiple visits with multi-label MRF• Matching potential objects by movements and appearance

• ICP for shape matching• Color image recognition with kernel descriptors

• Spectral clustering for object discovery


04/19/202313 [Herbst-Ren-Fox; IROS 2011]

14

Object Learning through Manipulation

04/19/2023 [Krainin-Henry-Ren-Fox IJRR 2011]

15

Next-Best-View Planning

04/19/2023 [Krainin-Curless-Fox ICRA 2011]

16




Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ’11]





Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10]





04/19/2023

17

RGB-D Object Dataset

04/19/2023

300 objects from 51 categories, 250,000 RGB-D views

Cluttered scenes

[Lai-Bo-Ren-Fox; ICRA 2011]

http://www.cs.washington.edu/rgbd-dataset/(search “rgbd”+”dataset”)

http://www.cs.washington.edu/rgbd-dataset/

http://www.cs.washington.edu/rgbd-dataset/

Classifier Shape (Depth) Vision (RGB) RGB-D

Linear SVM 51.71.8 72.73.2 80.52.9

Kernel SVM 63.52.3 72.93.2 83.03.7

RandomForest 65.52.4 73.13.7 78.54.1

Kernel Desc.+Linear SVM 75.72.2 76.12.6 84.12.2

18

Benchmarking RGB-D Recognition

04/19/2023


Linear SVM 29.40.5 90.40.5 89.60.5

Kernel SVM 50.10.9 90.80.5 90.40.6

RandomForest 51.61.1 89.60.7 90.20.3

Category-Level Recognition (51 categories)

Instance-Level Recognition (303 instances)

[Lai-Bo-Ren-Fox; ICRA 2011]

19

RGB-D Object Recognition

04/19/2023

Image Patch features Image features

Recognition

Your favorite model

Bag-of-WordsSparse Coding (LLC,LCC)

Spatial Pyramid Matching (SPM)Efficient Match Kernel (EMK)

Feed-forward Networks

SIFT (or HOG)?

20

Kernel Descriptors: Generalizing SIFT

04/19/2023

Pu Qv

pvuovu vukkmmQPK ),(),(),(grad Gradient Match Kernel

gradient orientation

image patch

pixel coordinates

kernels

normalized gradient magnitude

Includes SIFT as a special caseAvoids any “binning” issues in histogram features

Linear kernel on SIFT descriptors

= a product of two histograms

= a product summed over all pairs of pixels

[Bo-Ren-Fox; NIPS 2010]

21

Kernel Descriptors: Image Recognition

04/19/2023

Scene-15

KDES: 86.7% SIFT: 82.2%

Caltech-101

KDES: 76.4% CDBN[2]: 65.5% SPM[1]: 64.4% LCC[4]: 73.4%

CIFAR10 KDES: 76.0% LCC[4]: 74.5% mcRBM-DBN[3]: 71.0% TCNN[5]: 73.1%

[1] Lazebnik, Schmid, Ponce, CVPR ‘06. [2] Lee, Grosse, Ranganath, Ng, ICML ‘09.[3] Ranzato & Hinton, CVPR ‘10. [4] Yu & Zhang, ICML ‘10.[5] Le, Ngiam, Chen, Chia, Koh & Ng, NIPS ‘10.

Low-dimensional approximations of match kernels Explicitly compute descriptors/features from patches Easily generalize gradient features to color, binary shape, etc Outperform SIFT and sophisticated feature learning techniques

[Bo-Ren-Fox; NIPS 2010]


Linear SVM 51.71.8 72.73.2 80.52.9

Kernel SVM 63.52.3 72.93.2 83.03.7

RandomForest 65.52.4 73.13.7 78.54.1

Kernel Desc.+Linear SVM 75.72.2 76.12.6 84.12.2

22

Kernel Descriptors: RGB-D Recognition

04/19/2023


Linear SVM 29.40.5 90.40.5 89.60.5

Kernel SVM 50.10.9 90.80.5 90.40.6

RandomForest 51.61.1 89.60.7 90.20.3

Category-Level Recognition (51 categories)

Instance-Level Recognition (303 instances)

[Bo-Lai-Ren-Fox; CVPR 2011; IROS 2011]

Toward Practical Recognition

04/19/202323

• A mug?• Kevin’s mug?• A mug facing right?• A mug with orientation (90,15,0)

… …

Scalable and Hierarchical Recognition

04/19/202324 [Lai-Bo-Ren-Fox; AAAI 2011]

8 discrete views

continuous angles

Joint Recognition with Object-Pose Tree

04/19/202325

• Tree structure enables efficient joint recognition• Object-Pose tree outperforms nearest neighbor and 1vsA baselines• Joint tree-based learning outperforms separate learning• Promising pose estimation results on generic objects

• Natural tree structure of category-instance-pose works really well

[Lai-Bo-Ren-Fox; AAAI 2011]

RGB-D Dataset: 300 objects, 51 categories, 250,000 color-depth pairs

26

Application: Interactive LEGO

04/19/2023

RGB-D used for object recognition and hand tracking

[Ziola-Harrison-Powledge-Lai-Bo-Ren-Fox]

27

Application: Chess Playing Robot

04/19/2023 [Matuszek-Mayton-Aimi-Bo-Deisenroth-Chu-Kung-LeGrand-Smith-Fox]

28

RGB-D Perception: Summary

RGB-D cameras provide synchronized color and depth, making visual perception both robust and efficient.

RGB-D mapping generates detailed 3D maps at near real-time and enables on-the-fly user interaction and feedback.

Kernel descriptors provide a principled way to extract rich features from pixel attributes, outperforming SIFT and leading to robust RGB-D recognition.

Robust RGB-D recognition and modeling enable interesting scenarios for object-aware interactions and applications.

04/19/2023

29

RGB-D Perception: The Future?

Will RGB-D have a deep impact on vision applications?

YES! It’s already happening, faster than we can track.

Will RGB-D start a revolution in vision applications?

NO. We still need to solve recognition, segmentation, tracking, scene understanding, etc. etc.

YES! RGB-D helps address two BIG issues in computer vision: loss of 3D from projection; lighting conditions.

RGB-D helps “abstract away” many low-level problems.

Is RGB-D the future for smart vision-based systems?

Why not? At $50 today and $10 tomorrow.

04/19/2023

30

THANK YOU

04/19/2023

Documents

Practical Modeling and Recognition using R G B - D C ameras