Upload
phungkhue
View
215
Download
0
Embed Size (px)
Citation preview
Towards perspective-free object counting with deep learningDaniel Oñoro-Rubio and Roberto J. López-SastreGRAM, University of Alcalá
This work is supported by the projects SPIP2014-1468, SPIP2015-01809, and MINECO TEC2013-45183-R.
Our models
Experiments
Conv1 Conv2 Conv3 Conv4 Conv5 Conv6
72x72x32 31x31x32 18x18x32 18x18x1000 18x18x400 18x18x1
7x7
3x37x7
1x1 1x1 1x1
18x18
72x72
ConvC 1 ConvC 22 Conv3 ConvC 4 Conv5 ConvC 66
72x72x32 31x31x32 18x18x32 18x18x1000 18x18x400 18x18x1
7x7
33xx337x7
1x1 1x1 1x1
Input ImagePatch
Density Prediction
S0 (72X72)
S1 (72X72)
Fc6
Fc7
Fc8
512 512 324
18X18
Sn (72X72)Sn
CCNN_Sn
CCNN_S0
CCNN_Sn
CCNN_S1
CCNN_S0
CCNN_Sn
Our Counting CNN is a fully convolutional model that maps an input image patch into its estimated density map.
Counting CNN Hydra CNN
TRANCOS
UCSD
UCF
METHOD GAME 0 GAME 1 GAME 2 GAME FIASCHI ET AL. 17.77 20.14 23.65 25.99LEMPITSKY ET AL. 13.76 16.72 20.72 24.36CCNN 12.49 16.58 20.02 22.41HYDRA 2S 11.41 16.36 20.89 23.67HYDRA 3S 10.99 13.75 16.69 19.32HYDRA 4S 12.92 15.54 18.45 20.96
PRED
ICTI
ONGR
OUN
D TR
UTH
73.8
75.2
30.0
31.6
28.2
27.4
20.3
70.1
29.3
92.5
Qualitative ResultsTable of Results
Database characteristics:o Target object: Carso Different sceneso No perspective mapo 1244 images
Data base characteristics:o Target object: Peopleo Single sceneo With perspective mapo 2000 images o Video sequence
Table of Results
Data base characteristics:o Target object: Peopleo Multiple sceneso No perspective mapo 50 imageso Count between 94-4543o Average of 1280 per image
PRED
ICTI
ONGR
OUN
D TR
UTH
1331.6
441.9
1334.1
2106.3
381.2
163.8
1196.5
1116.0
3008.9
3407.0
Qualitative ResultsTable of ResultsMETHOD MAE MSD
RODRIGUEZ ET AL. 655.7 697.8 LEMPITSKY ET AL. 493.4 487.1 ZHANG ET AL. 467.0 498.5 IDREES ET AL. 419.5 541.6 ZHANG ET AL. 377.6 509.1 CCNN 488.67 646.68 HYDRA 2S 333.73 425.26 HYDRA 3S 465.73 371.84
The Hydra model uses a pyramid of input patches cropped from the center of the target patch to provide multiscale information to the network.
The counting by regression model with deep learningNotation GT density
CNN REGRESSOR
PATCH EXTRACTION FORWARD PASS DENSITY MAP ASSEMBLY
Understanding the counting problem
Given an image and a target object, we want to guess the total number of objects.
TRANCOS
UCF
UCSD Perspective map
Perspective free
nº
nº
Single loss function
Find it!
Qualitative Results
METHOD MAXIMAL DOWNSCALE UPSCALE MINIMAL LEMPITSKY ET AL. 1.70 1.28 1.59 2.02 FIASCHI ET AL. 1.70 2.16 1.61 2.20 PHAM ET AL. 1.43 1.30 1.59 1.62 ARTETA ET AL. 1.24 1.31 1.69 1.49 ZHANG ET AL. 1.70 1.26 1.59 1.52 CCNN 1.65 1.79 1.11 1.50 HYDRA 2S 2.22 1.93 1.37 2.38 HYDRA 3S 2.17 2.99 1.44 1.92
Download us!
PRED
ICTI
ON
GRO
UND
TRUT
H
47.0
44.6
20.6
21.5
Sequence Prediction