[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向

Histograms of Oriented Gradients (HOG) --

[] Deep LearningMachine Perception Robotics Group

0

1

1

2

2

3

2004, 2005

2009, 2010

3

Histogram of Oriented Gradient Support Vector Machine [Dalal 2005]4

SVM

[Dalal 2005] N. Dalal and B. Triggs,"Histograms of Oriented Gradients for Human Detection", CVPR, 2005.

4

5

2004, 2005

2009, 2010

2013, 2014

2015, 2016

5

2009RGBLIDAR

Deep Convolutional Neural Network

6

6

7

INRIA Dataset [Dalal 2004]

Caltech Pedestrian Dataset[Dollr 2009]1,568 - 1,20856633,171 - 192,0004,02480,000 - 25,000 - LIDAR, , GPSKITTI Dataset [Andreas 2012]

[Andreas 2012] G. Andreas, L. Philip and U. Raquel, "Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite", CVPR, 2012.[Dalal 2005] N. Dalal and B. Triggs,"Histograms of Oriented Gradients for Human Detection", CVPR, 2005.[Dollr 2009] P. Dollr, C. Wojek, B. Schiele and P. Perona"Pedestrian Detection: A Benchmark, CVPR, 2009.

7

Toronto City DatasetKITTI DatasetKITTI DatasetRGB(, ) LIDARGPS712km8,439km400,000

8

Deep Convolutional Neural Network2012[Krizhevsky 2012]AlexNet10009

AlexNetAlexNet[Krizhevsky 2012] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", NIPS, 2012.

9

2

Region proposal10

10

2

Region proposal11

11

2CNNCNNCNN12

CNN

12

213

13

~2013Color Self Similarity-HOG + SVM2014~

14

14

LUVHOG15ICF[Dollr 2009]VeryFast[Benenson, 2012]ACF[Dollr , 2014]LDCF[Nam, 2014]Checkerboard[Benenson, 2015]SquaresChrFtrs[Benenson, 2013]

Filtered Channel Feature

[Benenson 2013] R. Benenson, M. Mathias, T. Tuytelaars and L. Van Gool, "Seeking the strongest rigid detector", CVPR, 2013.[Dollr 2009] P. Dollr, Z. Tu, P. Perona and S. Belongie, "Integral Channel Features", BMVC, 2009.[Benenson 2012] R. Benenson, M. Mathias, R. Timofte and L. Van Gool, "Pedestrian detection at 100 frames per second", CVPR2012.[Nam 2014] W. Nam, P. Dollr and J. H. Han, "Local Decorrelation For Improved Pedestrian Detection", NIPS, 2014.[Zhang 2015] S. Zhang, R. Benenson and B. Schiele, "Filtered Channel Features for Pedestrian Detection", CVPR, 2015.[Dollr 2014] P. Dollr, R. Appel, S. Belongie and P. Perona, "Fast feature pyramids for object detection", PAMI, 2014.

HOG+SVM&DPM

15

Integral Channel Feature [Dollr 2009]HOGBoosted tree16

[Dollr 2009] P. Dollr, Z. Tu, P. Perona and S. Belongie, "Integral Channel Features", BMVC, 2009.

16

VeryFast [Benenson 2012] 11Feature pyramidFeature pyramidFast Feature pyramid17

N/K models 1 scale image1 model N scale images[Benenson 2012] R. Benenson, M. Mathias, R. Timofte and L. Van Gool, "Pedestrian detection at 100 frames per second", CVPR2012.

17

Aggregate Channel Feature [Benenson 2014] ICF, VeryFast

18

[Benenson 2014] P. Dollr, R. Appel, S. Belongie and P. Perona, "Fast feature pyramids for object detection", PAMI, 2014.

18

Filtered Channel Feature [Nam 2014] [Zhang 2015]LDCFCheckerboardCheckerboard

19

LDCFCheckerboard

[Nam 2014] W. Nam, P. Dollr and J. H. Han, "Local Decorrelation For Improved Pedestrian Detection", NIPS, 2014.[Zhang 2015] S. Zhang, R. Benenson and B. Schiele, "Filtered Channel Features for Pedestrian Detection", CVPR, 2015.

CNNCNN

CNNCNN20

20

221Miss rate (fps)CNN+Joint Deep Learning201339.32--CNN + RBMSDN201437.870.7CNN + RBMEIN201537.771CNNTACNN201534.99--AlexNetCCF201517.32--VGGDeep Cascade201526.2115VGGDeepParts201511.89--GoogLeNetCompACT201511.752CNN, VGGNet

21

1. CNNJoint Deep Learning, Switchable Deep Network, DeepParts

22

22

23

Joint Deep Learning[Ouyang 2013] - Level - RBM

Switchable Deep Network[Luo 2013]3 - RBM

DeepParts[Tian 2015] - Caltech -

[Luo 2013] P. Luo, Y. Tian, X. Wang and X. Tang, "Switchable Deep Network for Pedestrian Detection", CVPR, 2014.[Tian 2015] Y. Tian, P. Luo, X. Wang and X. Tang, "Deep Learning Strong Parts for Pedestrian Detection", ICCV, 2015.[Ouyang 2013] W. Ouyang and X. Wang, "Joint deep learning for pedestrian detection" ,ICCV, 2013.

23

2. CNNConvolutional Channel FeatureCNNBoosted treeACF24

[Yang 2015] B. Yang, J. Yan, Z. Lei and S. Z. Li, "Convolutional Channel Features: Tailoring CNN to Diverse Tasks", ICCV, 2015.

24

3. Deep cascade Complex Aware Cascade Training

CNN25 CNN[Cai 2015] Z. Cai, M. Saberian and N. Vasconcelos, "Learning Complexity-Aware Cascades for Deep Pedestrian Detection", ICCV, 2015.[Angelova 2015] A. Angelova, A. Krizhevsky, M. View, V. Vanhoucke, A. Ogale and D. Ferguson, "Real-Time Pedestrian Detection With Deep Network Cascades", BMVC, 2015.

25

Deep Cascade [Angelova 2015] VeryFastCNNDeep Learning26

VeryFast

Tiny CNN

Baseline CNN

Tiny CNN

BaselineCNN

[Angelova 2015] A. Angelova, A. Krizhevsky, M. View, V. Vanhoucke, A. Ogale and D. Ferguson, "Real-Time Pedestrian Detection With Deep Network Cascades", BMVC, 2015.

2

Region proposal27

27

Region proposal1

Fast R-CNN [Girshick 2015] Faster R-CNN [Ren 2015] You Only Look Once [Redmon 2016] Single Shot Multi-box Detector [Liu 2016]

28[Redmon 2016] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified, real-time object detection", CVPR, 2016.[Girshick 2015] R. Girshick, "Fast R-CNN", ICCV, 2015.[Ren 2015] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", NIPS, 2015.[Liu 2016] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu and A. C. Berg, "SSD : Single Shot MultiBox Detector", ECCV, 2016.

28

Region proposal29Miss rate (fps)CNN+Fast R-CNN201512.863Fast R-CNNSA-FAST R-CNN20159.682.5Fast R-CNNFaster R-CNN201518.022RPNMS-CNN2016102.5RPNRPN+BF20169.62RPNSSD201613.0610SSDFused DNN20168.20.5SSD + FCN

29

R-CNNSelective searchCNNSelective searchCNNCNNSVM30

Selective searchCNN[Girshick 2014] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", CVPR, 2014.

30

Selective search()

31

[Jasper 2013] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, "Selective Search for Object Recognition", In International Journal of Computer Vision 2013.

31

R-CNN (1 / 2)32

Selective searchCNN

CNN - 12000CNN - CNN - Selective searchetc

32

R-CNN (2 / 2)33

Selective searchCNN

Selective search - CNN

33

R-CNN34

Selective searchCNN

Faster R-CNN

Fast R-CNN

34

Fast R-CNN [Girshick 2015] & Faster R-CNN [Ren 2015] Fast R-CNN1Faster R-CNN1CNNRegion Proposal Network(RPN)35

Fast R-CNNFaster R-CNN[Girshick 2015] R. Girshick, "Fast R-CNN", ICCV, 2015.[Ren 2015] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", NIPS, 2015.

35

Fast R-CNNScale Aware Fast R-CNN36

[Li 2016] J. Li, X. Liang, S. Shen, T. Xu and S. Yan, "Scale-aware Fast R-CNN for Pedestrian Detection", ECCV, 2015.

Faster R-CNN(RPN)Multi Scale CNN

37

[Cai 2016] Z. Cai, Q. Fan, R. S. Feris and N. Vasconcelos, "A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection", ECCV, 2016.

Faster R-CNN(RPN)RPN + Boosted ForestRPNBoosted Forest

38

[Zhang 2016] L. Zhang, L. Lin, X. Liang and K. He, "Is Faster R-CNN Doing Well for Pedestrian Detection", abs/1607.07032, 2016.

38

Single shot1CNN1Faster R-CNN1CNN

You Only Lock OnceSingle Shot Multi-box Detector

39

Faster R-CNNYOLOSSDVS.[Redmon 2016] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified, real-time object detection", CVPR, 2016.[Liu 2016] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu and A. C. Berg, "SSD : Single Shot MultiBox Detector", ECCV, 2016.

39

You Only Look Once( + ) x 2Faster R-CNN

40

[Redmon 2016] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified, real-time object detection", CVPR, 2016.

40

Single Shot Multibox Detector113%

41

[Liu 2016] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu and A. C. Berg, "SSD : Single Shot MultiBox Detector", ECCV, 2016.

41

Fused DNNSSDSSDSoft-rejection based Network FusionCaltech Pedestrian Dataset42

[Du 2016] X. Du, M. El-Khamy, J. Lee, S. D. Larry, "Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection", abs/1610.03466, 2016.

42

2016

2Faster R-CNNSSD

4322Region proposal1

43

Deep LearningRGB

Toronto City DatasetRGB(, ) LIDARGPS712km8,439km400,000

44

44

Region Proposal NetworkSSD2016RPNSSD2016.12.12

CG

()45

Fused DNN [Du 2016][Du 2016] X. Du, M. El-Khamy, J. Lee, S. D. Larry, "Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection", abs/1610.03466, 2016.

45

46

47[Dalal 2005] N. Dalal and B. Triggs,"Histograms of Oriented Gradients for Human Detection", CVPR, 2005.[Andreas 2012] G. Andreas, L. Philip and U. Raquel, "Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite", CVPR, 2012.[Dollr 2009] P. Dollr, C. Wojek, B. Schiele and P. Perona"Pedestrian Detection: A Benchmark, CVPR, 2009.[Krizhevsky 2012] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", NIPS, 2012.[Benenson 2013] R. Benenson, M. Mathias, T. Tuytelaars and L. Van Gool, "Seeking the strongest rigid detector", CVPR, 2013.[Dollr 2009] P. Dollr, Z. Tu, P. Perona and S. Belongie, "Integral Channel Features", BMVC, 2009.[Benenson 2012] R. Benenson, M. Mathias, R. Timofte and L. Van Gool, "Pedestrian detection at 100 frames per second", CVPR2012.[Nam 2014] W. Nam, P. Dollr and J. H. Han, "Local Decorrelation For Improved Pedestrian Detection", NIPS, 2014.[Zhang 2015] S. Zhang, R. Benenson and B. Schiele, "Filtered Channel Features for Pedestrian Detection", CVPR, 2015.[Dollr 2014] P. Dollr, R. Appel, S. Belongie and P. Perona, "Fast feature pyramids for object detection", PAMI, 2014.[Luo 2013] P. Luo, Y. Tian, X. Wang and X. Tang, "Switchable Deep Network for Pedestrian Detection", CVPR, 2014.[Tian 2015] Y. Tian, P. Luo, X. Wang and X. Tang, "Deep Learning Strong Parts for Pedestrian Detection", ICCV, 2015.[Ouyang 2013] W. Ouyang and X. Wang, "Joint deep learning for pedestrian detection" ,ICCV, 2013.[Yang 2015] B. Yang, J. Yan, Z. Lei and S. Z. Li, "Convolutional Channel Features: Tailoring CNN to Diverse Tasks", ICCV, 2015.[Cai 2015] Z. Cai, M. Saberian and N. Vasconcelos, "Learning Complexity-Aware Cascades for Deep Pedestrian Detection", ICCV, 2015.[Angelova 2015] A. Angelova, A. Krizhevsky, M. View, V. Vanhoucke, A. Ogale and D. Ferguson, "Real-Time Pedestrian Detection With Deep Network Cascades", BMVC, 2015.[Girshick 2014] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", CVPR, 2014.[Girshick 2015] R. Girshick, "Fast R-CNN", ICCV, 2015.[Ren 2015] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", NIPS, 2015.[Zhang 2016] L. Zhang, L. Lin, X. Liang and K. He, "Is Faster R-CNN Doing Well for Pedestrian Detection", abs/1607.07032, 2016.[Redmon 2016] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified, real-time object detection", CVPR, 2016.[Liu 2016] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu and A. C. Berg, "SSD : Single Shot MultiBox Detector", ECCV, 2016.[Du 2016] X. Du, M. El-Khamy, J. Lee, S. D. Larry, "Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection", abs/1610.03466, 2016.

Engineering

[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向