Upload
ngodan
View
219
Download
2
Embed Size (px)
Citation preview
SSD: Single Shot MultiBox DetectorWei Liu1, Dragomir Anguelov2, Dumitru Erhan3, Christian Szegedy3,
Scott Reed4, Cheng-Yang Fu1, Alexander C. Berg1
1UNC Chapel Hill 2Zoox Inc. 3Google Inc.4University of Michigan, Ann-Arbor
OVERVIEW
SSD discretizes bounding boxes space into a set of default box shapesper feature map location, and uses convolution kernel (3× 3) to predictboth the bounding box offsets and object probabilities per location.
COMPARE STATE-OF-THE-ART METHODS
#1: MULTI-SCALE FEATURE MAPS
SSD uses multiple feature maps of decreasing resolution to outputbounding boxes of increasing size.
Prediction source layers from: mAPuse boundary boxes? # Boxes
� � � � � � 74.3 63.4 8732� � � 70.7 69.2 9864
� 62.4 64.0 8664
#2: MORE DEFAULT BOXES
8× 8 feature map 4× 4 feature map
SSD discretizes bounding boxes spaces into many bins, preventing boxcoordinates averaging when several likely hypotheses are present inthe same default box.
SSD300include { 1
2, 2} box? � �
include { 13, 3} box? �
number of boxes 3880 7760 8732VOC2007 test mAP 71.6 73.7 74.3
SSD ARCHITECTURE
Method mAP FPS batchsize # Boxes Input resFaster R-CNN (VGG16) 73.2 7 1 ∼ 6000 ∼ 1000× 600
Fast YOLO 52.7 155 1 98 448× 448YOLO (VGG16) 66.4 21 1 98 448× 448
SSD300 74.3 46 1 8732 300× 300SSD512 76.8 19 1 24564 512× 512SSD300 74.3 59 8 8732 300× 300SSD512 76.8 22 8 24564 512× 512
THE DEVIL IS IN THE DETAILS1. Data augmentation
data augmentation SSD300horizontal flip � � �
random crop & color distortion � �
random expansion �
VOC2007 test mAP 65.5 74.3 77.2
2. Ground truth to default box matching
3. Hard negative mining
DETECTION EXAMPLES
REFERENCES[1] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural
networks. In CVPR, 2014.
[2] R. Girshick. Fast R-CNN. In ICCV, 2015.
[3] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detec-tion and semantic segmentation. In CVPR, 2014.
[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time objectdetection. In CVPR, 2016.
[5] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection withregion proposal networks. In NIPS, 2015.
MORE RESULTS
MethodVOC2007
testVOC2012
testMS COCOtest-dev
ILSVRC2014val2
Fast R-CNN 70.0 68.4 19.7 N/A
Faster R-CNN 73.2 70.4 21.9 N/A
YOLO 63.4 57.9 N/A N/A
SSD300 74.3 72.4 23.2 43.4
SSD512 76.8 74.9 26.8 46.4
SSD300* 77.2 75.8 25.1 N/A
SSD512* 79.8 78.5 28.8 N/A