nº - ECCV 2016 · 419.5 541.6 ZHANG ET AL. 377.6 509.1 CCNN 488.67 646.68 HYDRA 2S 333.73 425.26 HYDRA 3S 465.73 371.84 x The Hydra model uses a pyramid of input patches cropped

Towards perspective-free object counting with deep learningDaniel Oñoro-Rubio and Roberto J. López-SastreGRAM, University of Alcalá

This work is supported by the projects SPIP2014-1468, SPIP2015-01809, and MINECO TEC2013-45183-R.

Our models

Experiments

Conv1 Conv2 Conv3 Conv4 Conv5 Conv6

72x72x32 31x31x32 18x18x32 18x18x1000 18x18x400 18x18x1

7x7

3x37x7

1x1 1x1 1x1

18x18

72x72

ConvC 1 ConvC 22 Conv3 ConvC 4 Conv5 ConvC 66

72x72x32 31x31x32 18x18x32 18x18x1000 18x18x400 18x18x1

7x7

33xx337x7

1x1 1x1 1x1

Input ImagePatch

Density Prediction

S0 (72X72)

S1 (72X72)

Fc6

Fc7

Fc8

512 512 324

18X18

Sn (72X72)Sn

CCNN_Sn

CCNN_S0

CCNN_Sn

CCNN_S1

CCNN_S0

CCNN_Sn

Our Counting CNN is a fully convolutional model that maps an input image patch into its estimated density map.

Counting CNN Hydra CNN

TRANCOS

UCSD

UCF

METHOD GAME 0 GAME 1 GAME 2 GAME FIASCHI ET AL. 17.77 20.14 23.65 25.99LEMPITSKY ET AL. 13.76 16.72 20.72 24.36CCNN 12.49 16.58 20.02 22.41HYDRA 2S 11.41 16.36 20.89 23.67HYDRA 3S 10.99 13.75 16.69 19.32HYDRA 4S 12.92 15.54 18.45 20.96

PRED

ICTI

ONGR

OUN

D TR

UTH

73.8

75.2

30.0

31.6

28.2

27.4

20.3

70.1

29.3

92.5

Qualitative ResultsTable of Results

Database characteristics:o Target object: Carso Different sceneso No perspective mapo 1244 images

Data base characteristics:o Target object: Peopleo Single sceneo With perspective mapo 2000 images o Video sequence

Table of Results

Data base characteristics:o Target object: Peopleo Multiple sceneso No perspective mapo 50 imageso Count between 94-4543o Average of 1280 per image

PRED

ICTI

ONGR

OUN

D TR

UTH

1331.6

441.9

1334.1

2106.3

381.2

163.8

1196.5

1116.0

3008.9

3407.0

Qualitative ResultsTable of ResultsMETHOD MAE MSD

RODRIGUEZ ET AL. 655.7 697.8 LEMPITSKY ET AL. 493.4 487.1 ZHANG ET AL. 467.0 498.5 IDREES ET AL. 419.5 541.6 ZHANG ET AL. 377.6 509.1 CCNN 488.67 646.68 HYDRA 2S 333.73 425.26 HYDRA 3S 465.73 371.84

The Hydra model uses a pyramid of input patches cropped from the center of the target patch to provide multiscale information to the network.

The counting by regression model with deep learningNotation GT density

CNN REGRESSOR

PATCH EXTRACTION FORWARD PASS DENSITY MAP ASSEMBLY

Understanding the counting problem

Given an image and a target object, we want to guess the total number of objects.

TRANCOS

UCF

UCSD Perspective map

Perspective free

nº

nº

Single loss function

Find it!

Qualitative Results

METHOD MAXIMAL DOWNSCALE UPSCALE MINIMAL LEMPITSKY ET AL. 1.70 1.28 1.59 2.02 FIASCHI ET AL. 1.70 2.16 1.61 2.20 PHAM ET AL. 1.43 1.30 1.59 1.62 ARTETA ET AL. 1.24 1.31 1.69 1.49 ZHANG ET AL. 1.70 1.26 1.59 1.52 CCNN 1.65 1.79 1.11 1.50 HYDRA 2S 2.22 1.93 1.37 2.38 HYDRA 3S 2.17 2.99 1.44 1.92

Download us!

PRED

ICTI

ON

GRO

UND

TRUT

H

47.0

44.6

20.6

21.5

Sequence Prediction

Documents

nº - ECCV 2016 · 419.5 541.6 ZHANG ET AL. 377.6 509.1 CCNN 488.67 646.68 HYDRA 2S 333.73 425.26 HYDRA 3S 465.73 371.84 x The Hydra model uses a pyramid of input patches cropped