1
Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin Motivation Main Idea Efficient Region Search (ERS) 1. A rectangle is imprecise Results (code available @ http://vision.cs.utexas.edu/projects/ers/ers-code.tar.gz) Background: Linear SVM with BoW 2. Extra features in a window can mislead the detector Goal: Identify the best- scoring region---the subset of spatially contiguous subregions whose features will maximize a classifier’s score. Naïve approach would require exponential time. Our optimal solution leads to significantly more accurate results on this challenging dataset. ERS search times similar to ESS, and orders of magnitude faster than sliding windows. Unlike ESS, ERS permits pixel-level detections of any shape. Detection overlap accuracy on PASCAL 2008 compared to the global connectivity CRF [Nowozin et al. CVPR 2009] Contour strengths Given a test image, we construct a region-graph on an oversegmentation: Maximum-Weight Connected Subgraph (MWCS) Problem Region-graph Prize-collecting Steiner tree (PCST) problem: connected subgraph that maximizes sum of vertex weights minus (positive) edge costs Convert MWCS PCST: subtract the smallest vertex weight from all vertex and edge weights. Point feature words: SURF within the superpixel Shape feature words: HoG on whole superpixel Branch-and-Cut Solution Branch-and-cut algorithm for PCST [Ljubic et al. ‘06] to obtain best scoring region: • Optimal solutions • Efficient in practice (100s of nodes) Efficient Region Search with Contours (ERS- C) A variant of ERS to help exclude background regions Training: Learning the Weights Vertex weights are obtained from SVM weights for: Our goal is to determine the arbitrarily shaped region within a novel image that maximizes the score: Region-graph Oversegmentati on MWCS instance Point descriptors Shape descriptors Bag of features SVM 0.4 9 - 0.1 0.1 5 0.1 1 - 0.2 3 - 0.0 5 0.0 7 PCST instance 0.34 0.13 0.38 0.33 0 0.18 0.3 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 PCST instance 0.34 0.13 0.38 0.33 0 0.18 0.3 0.4 0.25 0.28 0.24 0.26 0.35 0.23 0.29 0.29 0.27 Branch-and- cut solution Best-scoring region 0.34 0.13 0.38 0.33 0.3 0.28 0.24 0.26 0.27 - negative features, - positive features Our Approach 4 Main contribution: We show how to obtain the best-scoring region efficiently with a branch- and-cut solution. Applicable to classifiers whose total score is sum of localized feature scores (e.g., linear SVM, Naïve Bayes NN, boosting). Visual word histogram weights – linear SVM on segmented examples Bag-of-contours histogram weights – structured SVM Datasets Baselines Efficient Subwindow Search (ESS) [Lampert et al. 2008] Global connectivity CRF [Nowozin et al. 2009] Evaluation metrics Pixel-level AP, PASCAL bounding box metric, overlap scores ETHZ Shapes: 5 classes PASCAL 2008 seg: 20 classes PASCAL 2007: cat, dog PASCAL 2008 seg While windows over/underestimate object, ERS allows precise arbitrarily-shaped detections. Pixel-level precision recall curves on PASCAL 2007 (cat, dog) and ETHZ for our approach and ESS ERS more accurate than ESS, even under bounding box metric (19-70% better). Shape features excel on ETHZ; region detection crucial for “non-boxy” objects. Comparison with ESS Comparison with CRF Computation Time An efficient branch-and-cut method for region-based detection Demonstrated its advantages over both window-based detection and a CRF model In future work, we will examine the alternate classifiers accepted by our model. Conclusions Object detection via exhaustive search is too expensive. Branch-and-bound schemes can limit the search (Lampert et al. ’08, Lehmann et al. ’09, Yeh et al. ’09), but existing methods are restricted to rectangular or simple polygonal candidate windows. Problem: Divide image into superpixels and construct region-graph Weight each superpixel vertex by classifier output on its features Branch-and- cut to find best connected subgraph Maximum-weight connected subgraph → Prize- collecting Steiner tree problem As noted by Lampert et al. ‘08, for a linear SVM and bag-of-words, the classifier response for a region R can be written as sum of its N features’ word weights: Num occurrences of j-th word SVM weight for j-th word SVM weight for i-th feature’s word Identify the connected subgraph R* whose summed vertex weights are maximal. Edges set by adjacency, and to impose spatial layout. Class-specific edge weights via bag-of- contour strengths Example Detections PASCAL 2007 ETHZ (point/shape features) - neg features - pos features

Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman

  • Upload
    benoit

  • View
    33

  • Download
    1

Embed Size (px)

DESCRIPTION

0.4. 0.23. 0.23. 0.27. 0.13. 0.13. 0.34. 0.34. 0.23. 0.25. 0.23. 0.24. 0.23. 0.28. 0.29. 0.23. 0.33. 0.33. 0.38. 0.38. 0.23. 0.35. 0. 0. 0.23. 0.26. 0.3. 0.3. 0.18. 0.18. 0.29. 0.23. 0.23. 0.23. PCST instance. PCST instance. Point descriptors. Shape descriptors. - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman

Efficient Region Search for Object DetectionSudheendra Vijayanarasimhan and Kristen Grauman

Department of Computer Science, University of Texas at Austin

Motivation

Main Idea

Efficient Region Search (ERS)

1. A rectangle is imprecise

Results (code available @ http://vision.cs.utexas.edu/projects/ers/ers-code.tar.gz)

Background: Linear SVM with BoW

2. Extra features in a window can mislead the detector

Goal: Identify the best-scoring region---the subset of spatially contiguous subregions whose features will maximize a classifier’s score.

Naïve approach would require exponential time.

• Our optimal solution leads to significantly more accurate results on this challenging dataset.

• ERS search times similar to ESS, and orders of magnitude faster than sliding windows.

• Unlike ESS, ERS permits pixel-level detections of any shape.

Detection overlap accuracy on PASCAL 2008 compared to the global connectivity CRF [Nowozin et al. CVPR 2009]

Contour strengths

• Given a test image, we construct a region-graph on an oversegmentation:

Maximum-Weight Connected Subgraph (MWCS) Problem

Region-graph

• Prize-collecting Steiner tree (PCST) problem: connected subgraph that maximizes sum of vertex weights minus (positive) edge costs

• Convert MWCS PCST: subtract the smallest vertex weight from all vertex and edge weights.

• Point feature words: SURF within the superpixel

• Shape feature words: HoG on whole superpixel

Branch-and-Cut Solution

Branch-and-cut algorithm for PCST [Ljubic et al. ‘06] to obtain best scoring region:

• Optimal solutions • Efficient in practice (100s of nodes)

Efficient Region Search with Contours (ERS-C)

A variant of ERS to help exclude background regions

Training: Learning the Weights

• Vertex weights are obtained from SVM weights for:

Our goal is to determine the arbitrarily shaped region within a novel image that maximizes the score:

Region-graphOversegmentation

MWCS instance

Point descriptors

Shape descriptors

Bag of features SVM

0.49

-0.1

0.15

0.11

-0.23-0.05 0.07

PCST instance

0.34 0.13

0.38 0.33

00.18 0.3

0.23

0.23 0.23 0.23

0.230.23

0.23

0.23

0.23

0.23

PCST instance

0.34 0.13

0.38 0.33

00.18 0.3

0.4

0.25 0.28 0.24

0.260.35

0.23

0.29

0.29

0.27

Branch-and-cut solution

Best-scoring region

0.34 0.13

0.38 0.33

0.3

0.28 0.24

0.26

0.27

- negative features, - positive features

Our Approach

4

Main contribution: We show how to obtain the best-scoring region efficiently with a branch-and-cut solution.

Applicable to classifiers whose total score is sum of localized feature scores (e.g., linear SVM, Naïve Bayes NN, boosting).

• Visual word histogram weights – linear SVM on segmented examples• Bag-of-contours histogram weights – structured SVM

Datasets

Baselines• Efficient Subwindow Search (ESS) [Lampert et al. 2008]• Global connectivity CRF [Nowozin et al. 2009]

Evaluation metrics• Pixel-level AP, PASCAL bounding box

metric, overlap scores

ETHZ Shapes: 5 classes

PASCAL 2008 seg: 20 classes

PASCAL 2007: cat, dog

PASCAL 2008 seg

• While windows over/underestimate object, ERS allows precise arbitrarily-shaped detections.

Pixel-level precision recall curves on PASCAL 2007 (cat, dog) and ETHZ for our approach and ESS

• ERS more accurate than ESS, even under bounding box metric (19-70% better).

• Shape features excel on ETHZ; region detection crucial for “non-boxy” objects.

Comparison with ESS

Comparison with CRF

Computation Time

• An efficient branch-and-cut method for region-based detection

• Demonstrated its advantages over both window-based detection and a CRF model

• In future work, we will examine the alternate classifiers accepted by our model.

Conclusions

Object detection via exhaustive search is too expensive.

Branch-and-bound schemes can limit the search (Lampert et al. ’08, Lehmann et al. ’09, Yeh et al. ’09), but existing methods are restricted to rectangular or simple polygonal candidate windows.

Problem:

Divide image into superpixels and construct region-graph

Weight each superpixel vertex

by classifier output on its

features

Branch-and-cut to find best connected subgraph

Maximum-weight connected

subgraph → Prize-collecting Steiner

tree problem

As noted by Lampert et al. ‘08, for a linear SVM and bag-of-words, the classifier response for a region R can be written as sum of its N features’ word weights:

Num occurrences of j-th word

SVM weight for j-th word

SVM weight for i-th feature’s word

Identify the connected subgraph R* whose summed vertex weights are maximal.

• Edges set by adjacency, and to impose spatial layout.

• Class-specific edge weights via bag-of-contour strengths

Example Detections

PASCAL 2007 ETHZ (point/shape features)

- neg features - pos features