29
R-FCN: Object Detection with Really - Friggin’ Convolutional Networks Or “Region-based Fully Convolutional Networks” VGG Reading Group - Sam Albanie Kaiming He FAIR Li Yi Tsinghua Univ. Jifeng Dai Microsoft Research Jian Sun Microsoft Research NIPS, 2016

R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Embed Size (px)

Citation preview

Page 1: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

R-FCN: Object Detection with Really - Friggin’ Convolutional

Networks

Or “Region-based Fully Convolutional Networks”VGG Reading Group - Sam Albanie

Kaiming He FAIR

Li Yi Tsinghua Univ.

Jifeng Dai Microsoft Research

Jian Sun Microsoft Research

NIPS, 2016

Page 2: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Object Detection

Page 3: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Some members of the postdeepluvian* object detection family tree

*Serge Bolongieism

R-CNN

CVPR 2014

FAST R-CNN

SPP-Net

ARXIV Nov, 2013

R-CNN minus R

YOLO

Faster RCNN

ProNet

G-CNNFully connected bidirectional inspiration layer

ARXIV Apr, 2015

ICCV 2015ARXIV June, 2014

ECCV 2014

ARXIV June, 2015

BMVC 2015

ARXIV Nov, 2015

ARXIV June, 2015

CVPR 2016

NIPS 2015

SSD

ARXIV Dec, 2015

CVPR 2016ARXIV June, 2015

ARXIV Dec, 2015

CVPR 2016

CVPR 2016

R-FCN

SSD+ DSSD

YOLO 9000

ARXIV June, 2016

NIPS 2016NOW WITH MORE LAYERS

Fully connected unidirectional inspiration layer

ARXIV Dec, 2016

ARXIV Jan, 2017

Page 4: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

*Motivation: Sharing is Caring

*Trademark: Salvation Army

SHARED CONVOLUTIONS

SHARED CONVOLUTIONS

RoI

RoI

Faster R-CNN

R-FCN

ResNet-101 backbone(to scale)

UNSHARED CONVOLUTIONS R-CNN

RoI

Page 5: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Problem: Location Invariance

• For image classification we want location invariance

• For object detection, we want location variance

In previous work, a RoI pooling layer has been inserted before the final convolutions to break the invariance at the cost of reduced sharing

Page 6: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Solution: Position-Sensitive Score Maps

Waffle explanation. Much like neural networks, it works on multiple layers.

Page 7: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Position-Sensitive Score Maps

Channels take responsibility for relative spatial locations

Page 8: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Efficient Sharing of Diagrams

Page 9: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Backbone: Res-101

Minor modifications:

• Remove the GAP

• Dimensionality reduction layer (1024)

Page 10: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Further Details• Bbox regression under standard parameterisation

• Standard loss function

• Online Hard Example Mining during training

• Faster R-CNN-style alternating optimisation

• Dilation used at conv5 (RPN works from conv4) - gives a 2.6 mAP boost

Page 11: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Visualisation: Hit

Page 12: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Visualisation: Miss

Page 13: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Experiments

Page 14: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

The Effect of Position Sensitivity on fully convolutional strategies

(“naive” Faster R-CNN still has FC layer after RoI pooling)

Without position sensitivity, Faster R-CNN takes a major performance hit when the RoI pooling is late in the network

Page 15: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Standard Benchmarks: VOC 2007

Page 16: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Standard Benchmarks: VOC 2012

Page 17: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

The Effect of Depth

Saturates at ResNet-101

Page 18: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

The Effect of Proposal Type

Works pretty well with any proposal method

Page 19: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Summary

• A little more efficient than Faster RCNN

• Simpler

• Makes a tradeoff with efficiency for accuracy

Page 20: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Appendix/Details

Page 21: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Standard Benchmarks: MS COCO

Page 22: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

The Effect of Proposal Numbers: VOC 2007

Page 23: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Position Sensitive RoI Pooling: for all the indexing fans

Scores are averaged over bins inside regions

where (i,j)-th bin spans:

Page 24: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Standard Object Detection Multitask Loss Function

Positive examples are formed from the RoIs that have intersection-over-union (IoU) overlap with a ground-truth box of at least 0.5, and negative otherwise

Class loss is computed by averaging the positional scores (i.e. voting) to produce a C+1 dim vector for each RoI, pushing through softmax and computing cross entropy.

Regression loss is similar, producing a 4-dim vector which is passed into Huber loss. The two losses are combined in a weighted sum:

Page 25: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Bounding Box Regression In Object Detection: R-CNN stylePredict bounding box updates with additional 4*k*k-dim convolutional layer

{(Pi

, Gi

)}i=1,...,N , where P i = (P i

x

, P i

y

, P i

w

, P i

h

)

Parameterise mapping with linear functions such that:

dx

(P ), dy

(P ), dw

(P ), dh

(P )

Gx

= Pw

dx

(P ) + Px

Gy

= Ph

dy

(P ) + Py

ˆGw = Pwexp(dw(P )))

ˆGh = Phexp(dh(P )))

(scale invariant) (log space)

Page 26: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

OHEM: Online Hard Example Mining (bootstrapping)

Rank regions by loss and only use the top ranked

These “hard examples” will evolve as the network trains

OHEM is particularly efficient in R-FCN due to the (almost) free ranking of all region proposals

Page 27: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Alternating Optimisation:

You put your left boot in, your left boot out…

1. Train RPN

2. Use proposals to train Fast R-CNN

3. The resulting network is used to initialise RPN

4. Retrain Fast R-CNN with the updated RPN sharing convolutions

Page 28: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Dilated Convolutions

Figure 1 from Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions.”

Page 29: R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

Dilated Convolutions