R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross

R-FCN: Object Detection with Really - Friggin’ Convolutional

Networks

Or “Region-based Fully Convolutional Networks”VGG Reading Group - Sam Albanie

Kaiming He FAIR

Li Yi Tsinghua Univ.

Jifeng Dai Microsoft Research

Jian Sun Microsoft Research

NIPS, 2016

Object Detection

Some members of the postdeepluvian* object detection family tree

*Serge Bolongieism

R-CNN

CVPR 2014

FAST R-CNN

SPP-Net

ARXIV Nov, 2013

R-CNN minus R

YOLO

Faster RCNN

ProNet

G-CNNFully connected bidirectional inspiration layer

ARXIV Apr, 2015

ICCV 2015ARXIV June, 2014

ECCV 2014

ARXIV June, 2015

BMVC 2015

ARXIV Nov, 2015

ARXIV June, 2015

CVPR 2016

NIPS 2015

SSD

ARXIV Dec, 2015

CVPR 2016ARXIV June, 2015

ARXIV Dec, 2015

CVPR 2016

CVPR 2016

R-FCN

SSD+ DSSD

YOLO 9000

ARXIV June, 2016

NIPS 2016NOW WITH MORE LAYERS

Fully connected unidirectional inspiration layer

ARXIV Dec, 2016

ARXIV Jan, 2017

*Motivation: Sharing is Caring

*Trademark: Salvation Army

SHARED CONVOLUTIONS

SHARED CONVOLUTIONS

RoI

RoI

Faster R-CNN

R-FCN

ResNet-101 backbone(to scale)

UNSHARED CONVOLUTIONS R-CNN

RoI

Problem: Location Invariance

• For image classification we want location invariance

• For object detection, we want location variance

In previous work, a RoI pooling layer has been inserted before the final convolutions to break the invariance at the cost of reduced sharing

Solution: Position-Sensitive Score Maps

Waffle explanation. Much like neural networks, it works on multiple layers.

Position-Sensitive Score Maps

Channels take responsibility for relative spatial locations

Efficient Sharing of Diagrams

Backbone: Res-101

Minor modifications:

• Remove the GAP

• Dimensionality reduction layer (1024)

Further Details• Bbox regression under standard parameterisation

• Standard loss function

• Online Hard Example Mining during training

• Faster R-CNN-style alternating optimisation

• Dilation used at conv5 (RPN works from conv4) - gives a 2.6 mAP boost

Visualisation: Hit

Visualisation: Miss

Experiments

The Effect of Position Sensitivity on fully convolutional strategies

(“naive” Faster R-CNN still has FC layer after RoI pooling)

Without position sensitivity, Faster R-CNN takes a major performance hit when the RoI pooling is late in the network

Standard Benchmarks: VOC 2007

Standard Benchmarks: VOC 2012

The Effect of Depth

Saturates at ResNet-101

The Effect of Proposal Type

Works pretty well with any proposal method

Summary

• A little more efficient than Faster RCNN

• Simpler

• Makes a tradeoff with efficiency for accuracy

Appendix/Details

Standard Benchmarks: MS COCO

The Effect of Proposal Numbers: VOC 2007

Position Sensitive RoI Pooling: for all the indexing fans

Scores are averaged over bins inside regions

where (i,j)-th bin spans:

Standard Object Detection Multitask Loss Function

Positive examples are formed from the RoIs that have intersection-over-union (IoU) overlap with a ground-truth box of at least 0.5, and negative otherwise

Class loss is computed by averaging the positional scores (i.e. voting) to produce a C+1 dim vector for each RoI, pushing through softmax and computing cross entropy.

Regression loss is similar, producing a 4-dim vector which is passed into Huber loss. The two losses are combined in a weighted sum:

Bounding Box Regression In Object Detection: R-CNN stylePredict bounding box updates with additional 4*k*k-dim convolutional layer

{(Pi

, Gi

)}i=1,...,N , where P i = (P i

x

, P i

y

, P i

w

, P i

h

)

Parameterise mapping with linear functions such that:

dx

(P ), dy

(P ), dw

(P ), dh

(P )

Gx

= Pw

dx

(P ) + Px

Gy

= Ph

dy

(P ) + Py

ˆGw = Pwexp(dw(P )))

ˆGh = Phexp(dh(P )))

(scale invariant) (log space)

OHEM: Online Hard Example Mining (bootstrapping)

Rank regions by loss and only use the top ranked

These “hard examples” will evolve as the network trains

OHEM is particularly efficient in R-FCN due to the (almost) free ranking of all region proposals

Alternating Optimisation:

You put your left boot in, your left boot out…

1. Train RPN

2. Use proposals to train Fast R-CNN

3. The resulting network is used to initialise RPN

4. Retrain Fast R-CNN with the updated RPN sharing convolutions

Dilated Convolutions

Figure 1 from Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions.”

Dilated Convolutions

Documents

R-FCN: Object Detection with - imlab.postech.ac.krimlab.postech.ac.kr/dkim/class/csed514_2019s/rfcn.pdf · C+1 dim vector for each RoI, pushing through softmax and computing cross