Upload
duongkhue
View
222
Download
0
Embed Size (px)
Citation preview
R-FCN: Object Detection with Really - Friggin’ Convolutional
Networks
Or “Region-based Fully Convolutional Networks”VGG Reading Group - Sam Albanie
Kaiming He FAIR
Li Yi Tsinghua Univ.
Jifeng Dai Microsoft Research
Jian Sun Microsoft Research
NIPS, 2016
Object Detection
Some members of the postdeepluvian* object detection family tree
*Serge Bolongieism
R-CNN
CVPR 2014
FAST R-CNN
SPP-Net
ARXIV Nov, 2013
R-CNN minus R
YOLO
Faster RCNN
ProNet
G-CNNFully connected bidirectional inspiration layer
ARXIV Apr, 2015
ICCV 2015ARXIV June, 2014
ECCV 2014
ARXIV June, 2015
BMVC 2015
ARXIV Nov, 2015
ARXIV June, 2015
CVPR 2016
NIPS 2015
SSD
ARXIV Dec, 2015
CVPR 2016ARXIV June, 2015
ARXIV Dec, 2015
CVPR 2016
CVPR 2016
R-FCN
SSD+ DSSD
YOLO 9000
ARXIV June, 2016
NIPS 2016NOW WITH MORE LAYERS
Fully connected unidirectional inspiration layer
ARXIV Dec, 2016
ARXIV Jan, 2017
*Motivation: Sharing is Caring
*Trademark: Salvation Army
SHARED CONVOLUTIONS
SHARED CONVOLUTIONS
RoI
RoI
Faster R-CNN
R-FCN
ResNet-101 backbone(to scale)
UNSHARED CONVOLUTIONS R-CNN
RoI
Problem: Location Invariance
• For image classification we want location invariance
• For object detection, we want location variance
In previous work, a RoI pooling layer has been inserted before the final convolutions to break the invariance at the cost of reduced sharing
Solution: Position-Sensitive Score Maps
Waffle explanation. Much like neural networks, it works on multiple layers.
Position-Sensitive Score Maps
Channels take responsibility for relative spatial locations
Efficient Sharing of Diagrams
Backbone: Res-101
Minor modifications:
• Remove the GAP
• Dimensionality reduction layer (1024)
Further Details• Bbox regression under standard parameterisation
• Standard loss function
• Online Hard Example Mining during training
• Faster R-CNN-style alternating optimisation
• Dilation used at conv5 (RPN works from conv4) - gives a 2.6 mAP boost
Visualisation: Hit
Visualisation: Miss
Experiments
The Effect of Position Sensitivity on fully convolutional strategies
(“naive” Faster R-CNN still has FC layer after RoI pooling)
Without position sensitivity, Faster R-CNN takes a major performance hit when the RoI pooling is late in the network
Standard Benchmarks: VOC 2007
Standard Benchmarks: VOC 2012
The Effect of Depth
Saturates at ResNet-101
The Effect of Proposal Type
Works pretty well with any proposal method
Summary
• A little more efficient than Faster RCNN
• Simpler
• Makes a tradeoff with efficiency for accuracy
Appendix/Details
Standard Benchmarks: MS COCO
The Effect of Proposal Numbers: VOC 2007
Position Sensitive RoI Pooling: for all the indexing fans
Scores are averaged over bins inside regions
where (i,j)-th bin spans:
Standard Object Detection Multitask Loss Function
Positive examples are formed from the RoIs that have intersection-over-union (IoU) overlap with a ground-truth box of at least 0.5, and negative otherwise
Class loss is computed by averaging the positional scores (i.e. voting) to produce a C+1 dim vector for each RoI, pushing through softmax and computing cross entropy.
Regression loss is similar, producing a 4-dim vector which is passed into Huber loss. The two losses are combined in a weighted sum:
Bounding Box Regression In Object Detection: R-CNN stylePredict bounding box updates with additional 4*k*k-dim convolutional layer
{(Pi
, Gi
)}i=1,...,N , where P i = (P i
x
, P i
y
, P i
w
, P i
h
)
Parameterise mapping with linear functions such that:
dx
(P ), dy
(P ), dw
(P ), dh
(P )
Gx
= Pw
dx
(P ) + Px
Gy
= Ph
dy
(P ) + Py
ˆGw = Pwexp(dw(P )))
ˆGh = Phexp(dh(P )))
(scale invariant) (log space)
OHEM: Online Hard Example Mining (bootstrapping)
Rank regions by loss and only use the top ranked
These “hard examples” will evolve as the network trains
OHEM is particularly efficient in R-FCN due to the (almost) free ranking of all region proposals
Alternating Optimisation:
You put your left boot in, your left boot out…
1. Train RPN
2. Use proposals to train Fast R-CNN
3. The resulting network is used to initialise RPN
4. Retrain Fast R-CNN with the updated RPN sharing convolutions
Dilated Convolutions
Figure 1 from Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions.”
Dilated Convolutions