10
SPECIAL ISSUE PAPER Real-time vehicle type classification with deep convolutional neural networks Xinchen Wang 1 Weiwei Zhang 1 Xuncheng Wu 1 Lingyun Xiao 2 Yubin Qian 1 Zhi Fang 1 Received: 6 January 2017 / Accepted: 6 August 2017 / Published online: 22 August 2017 Ó Springer-Verlag GmbH Germany 2017 Abstract Vehicle type classification technology plays an important role in the intelligent transport systems nowa- days. With the development of image processing, pattern recognition and deep learning, vehicle type classification technology based on deep learning has raised increasing concern. In the last few years, convolutional neural net- work, especially Faster Region-convolutional neural net- works (Faster R-CNN) has shown great advantages in image classification and object detection. It has superiority to traditional machine learning methods by a large margin. In this paper, a vehicle type classification system based on deep learning is proposed. The system uses Faster R-CNN to solve the task. Experimental results show that the method is not only time-saving, but also has more robust- ness and higher accuracy. Aimed at cars and trucks, it reached 90.65 and 90.51% accuracy. At last, we test the system on an NVDIA Jetson TK1 board with 192 CUDA cores that is envisioned to be forerunner computational brain for computer vision, robotics and self-driving cars. Experimental results show that it costs around 0.354 s to detect an image and keeps high accurate rate with the network embedded on NVDIA Jetson TK1. Keywords Convolutional neural network Vehicle type classification Deep learning Intelligent transportation system Object detection 1 Introduction The intelligent transportation system cannot only use the existence transportation facilities effectively, but also can lessen the environment pollution, keep the traffic safety and improve the conveying efficiency. The intelligent trans- portation system includes three parts, intelligent vehicles, intelligent highway systems and intelligent drivers. The research on vehicle type classification has significant value on the development of the intelligent transportation system and intelligent automobiles. The application of vehicle type classification is quite crucial in daily life, such as intelligent monitoring system, auto-charging system in the highway and illegal preemp- tion of way detection. In the earlier times, the sensors laid out under the roads were the main method of vehicle type classification. The data were collected and analyzed from the sensors to get the information of related vehicles. With the development of computer vision technology, the detection of vehicle types through the method of image processing and pattern recognition has been widely used [17]. The vehicle classification system based on machine vision can embedded with current traffic cameras. It has many advantages, such as convenient installation, easy maintainability and small areas occupation. And the data gotten from the system can be used to research and process for other purposes. With the rapid advancement of graphics processing unit (GPU), the calculation ability of processing image has been greatly enhanced, which also in turn brought the fast advancement of deep learning. Compared with traditional feature extraction algorithm, deep learning has better adaptability and universal applicability. In recent years, the technology of deep learning has been success- fully applied to segmentation, detection and recognition of images and videos, such as face recognition and pedestrian & Weiwei Zhang [email protected] 1 College of Automotive Engineering, Shanghai University of Engineering Science, Shanghai, China 2 China National Institution of Standardization, Beijing, China 123 J Real-Time Image Proc (2019) 16:5–14 https://doi.org/10.1007/s11554-017-0712-5

Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

SPECIAL ISSUE PAPER

Real-time vehicle type classification with deep convolutionalneural networks

Xinchen Wang1 • Weiwei Zhang1 • Xuncheng Wu1 • Lingyun Xiao2 •

Yubin Qian1 • Zhi Fang1

Received: 6 January 2017 / Accepted: 6 August 2017 / Published online: 22 August 2017

� Springer-Verlag GmbH Germany 2017

Abstract Vehicle type classification technology plays an

important role in the intelligent transport systems nowa-

days. With the development of image processing, pattern

recognition and deep learning, vehicle type classification

technology based on deep learning has raised increasing

concern. In the last few years, convolutional neural net-

work, especially Faster Region-convolutional neural net-

works (Faster R-CNN) has shown great advantages in

image classification and object detection. It has superiority

to traditional machine learning methods by a large margin.

In this paper, a vehicle type classification system based on

deep learning is proposed. The system uses Faster R-CNN

to solve the task. Experimental results show that the

method is not only time-saving, but also has more robust-

ness and higher accuracy. Aimed at cars and trucks, it

reached 90.65 and 90.51% accuracy. At last, we test the

system on an NVDIA Jetson TK1 board with 192 CUDA

cores that is envisioned to be forerunner computational

brain for computer vision, robotics and self-driving cars.

Experimental results show that it costs around 0.354 s to

detect an image and keeps high accurate rate with the

network embedded on NVDIA Jetson TK1.

Keywords Convolutional neural network � Vehicle type

classification � Deep learning � Intelligent transportationsystem � Object detection

1 Introduction

The intelligent transportation system cannot only use the

existence transportation facilities effectively, but also can

lessen the environment pollution, keep the traffic safety and

improve the conveying efficiency. The intelligent trans-

portation system includes three parts, intelligent vehicles,

intelligent highway systems and intelligent drivers. The

research on vehicle type classification has significant value

on the development of the intelligent transportation system

and intelligent automobiles.

The application of vehicle type classification is quite

crucial in daily life, such as intelligent monitoring system,

auto-charging system in the highway and illegal preemp-

tion of way detection. In the earlier times, the sensors laid

out under the roads were the main method of vehicle type

classification. The data were collected and analyzed from

the sensors to get the information of related vehicles. With

the development of computer vision technology, the

detection of vehicle types through the method of image

processing and pattern recognition has been widely used

[1–7]. The vehicle classification system based on machine

vision can embedded with current traffic cameras. It has

many advantages, such as convenient installation, easy

maintainability and small areas occupation. And the data

gotten from the system can be used to research and process

for other purposes. With the rapid advancement of graphics

processing unit (GPU), the calculation ability of processing

image has been greatly enhanced, which also in turn

brought the fast advancement of deep learning. Compared

with traditional feature extraction algorithm, deep learning

has better adaptability and universal applicability. In recent

years, the technology of deep learning has been success-

fully applied to segmentation, detection and recognition of

images and videos, such as face recognition and pedestrian

& Weiwei Zhang

[email protected]

1 College of Automotive Engineering, Shanghai University of

Engineering Science, Shanghai, China

2 China National Institution of Standardization, Beijing, China

123

J Real-Time Image Proc (2019) 16:5–14

https://doi.org/10.1007/s11554-017-0712-5

Page 2: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

detection [8, 9]. During last decades, the method of vehicle

type classification based on machine vision is mainly

adopted traditional image processing method for vehicle

location and recognition, such as Histograms of Oriented

Gradient combined with Support Vector Machine

(HOG ? SVM). The vehicle type classification technology

based on deep learning has become more and more popular

among many researchers recently. If the deep learning can

be successfully applied in vehicle location and recognition

in natural scene, it will show great value to construct the

Intelligent Traffic System and driverless system.

This paper has applied advanced deep learning library

Caffe and GPU accelerating technology with powerful

computation ability. Advanced Faster R-CNN object

detection framework is also used in this paper. As a

backbone network, the ZF net is a deep convolution neural

network with five sharing convolutional layers.

The rest of this paper is organized as follows: In Sect. 2,

it introduces the related work of the research from two

aspects, one is the background of vehicle type classifica-

tion, and the other is the development of convolution

neural network. In Sect. 3, the revised algorithm frame-

work of vehicle type classification is proposed. The struc-

ture of vehicle type classification based on Faster R-CNN

and its application are discussed in this part. In Sect. 4, the

proposed method is evaluated and analyzed by the exper-

iment. Section 5 draws the conclusion of this paper.

2 Related work

2.1 Background vehicle type classification

During the last decade, various vehicle type classification

methods have been proposed. They have been successfully

applied to the fields of transportation and military. Sarfraz

put forward local characteristics of shape histogram based

on vehicle’s frontal area. The characteristic was later

classified by the Bayes prior model [10]. Ramnath

extracted the 3D space curve of automobiles and classified

the automobiles from the appearance [11], which also hold

the disadvantage of huge amount of calculation. The

method can classify vehicles from images took by any

angle. The disadvantage is huge amount of calculation.

Alonso adopted the method of multi-dimensional classifi-

cation to realize the vehicle detection on the traffic road

[12]. Chang and Cho proposed a novel method based on

online boosting to detect vehicle. It solved the difficult

question of vehicle detection in different scenes [13].

Zhang presented a vehicle detection method based on deep

convolution neural network. It was a solution to finely

recognize the vehicle in the natural scene [14, 15].

2.2 Background on convolution neural network

Convolutional neural network (CNN) is a feed-forward

neural network, which is inspired by the cognitive mech-

anism of biological natural vision. In 1959, Hubel and

Wiesel studied neurons used to local sensitivity and

direction selection in cortex of cats. They discovered the

unique network structure can effectively reduce the com-

plexity of the feedback neural network. Fukushima pro-

posed the Neocognitron, which is the predecessor of CNN

in 1980s [16]. In the 1990s, LeCun et al. [17] established

the modern structure of CNN. They designed an artificial

neural network with multilayer and named LeNet-5. It

realized the classification of handwritten numbers. This

model was applied to read the numbers on the checks in

America. With the development of big data and GPU

acceleration technology, Krizhevsky et al. [18] proposed a

classic CNN structure AlexNet and won the ILSVRC 2012

champion.

In recent years, due to the success of region proposal

technology, the object detection has developed rapidly.

Object detection systems have sprung out, like R-CNN,

SPP-net [19] and Fast R-CNN. However, the computing

time of region proposal has limited the development of

detection systems. In 2015, Ren et al. proposed a new

object detection algorithm framework Faster R-CNN based

on Fast R-CNN, which realized region proposals through

the use of the region proposal network (RPN). By sharing

convolutional layer parameters, RPN realizes region pro-

posals. And it takes only 10 ms of each picture. Faster

R-CNN can be simply seen as a detection system which

combines RPN and Fast R-CNN algorithm framework [20].

It uses RPN to replace selective search (SS) in Fast

R-CNN. The speed of object detection with simple network

(The ZF net [21]) can reach 17 FPS, and the accuracy rate

is 59.95% on the PASCAL VOC benchmark, while the

speed with complex network (VGG16 Net) can reach 5

FPS, and the accuracy rate is 78.8% on the PASCAL VOC

Benchmark [22].

3 Network architecture

There are differences in the installation angle of the camera

at each traffic intersection, so the photos taken will also

have differential angle problems. In order to solve this

problem, we adopt a new method for data augmentation.

Synthetically create new training examples by applying

some transformations on the input data. We adopt a method

which combined picture flip with picture crop. As shown in

Fig. 1, a picture is extended to 10 pictures by using flipped

and cropped operation.

6 J Real-Time Image Proc (2019) 16:5–14

123

Page 3: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

The Faster R-CNN adopted in this paper is an advanced

object detection method. The specific details of vehicle

type classification algorithm structure are shown in Fig. 2.

It includes two elements, one is the region proposal net-

work (RPN) and the other is the detection network with

five shared convolutional layers.

3.1 Region proposal network for vehicle location

This paper proposes a better region proposal algorithm,

which is region proposal network (RPN). The RPN shares

the convolutional layer parameters with object detection

network. As a result, the computation time of region pro-

posal has been reduced. Faster R-CNN is developed from

Fast R-CNN object detection system. It replaces the

selected search (SS) by the region proposal network.

Selective search (SS) is a typical region proposal

technology. It costs 2 s in average to detect an image on

CPU. EageBoxes method only costs 0.2 s in average [22].

Although the time has greatly decreased, it still spends lots

of computing time. Different from the method of image

pyramids and filter pyramids, RPN represents the region

proposals of multi-scale and aspect ratio by anchor boxes.

As shown in Fig. 2, the proposed method builds the

RPN at the top of the last shareable convolutional layer

(layer 5). By using a small network sliding on the feature

map generated by the last shareable convolution layer, the

feature of each sliding window is mapped to 256 dimen-

sions (for the ZF net). After ReLU nonlinear processing,

the feature would be fed to the two fully connection layers,

a bounding box-regression layer (reg layer) and a box-

classification layer (cls layer). The reg layer is used to

predict the 4 k coordinates in k proposals. The cls layer is

used to output 2 k scores which are probabilities of objects

included in k proposals. The k proposals are parameterized

related to k anchor boxes. Aimed at the practical problems

of vehicle type classification, the proposed method uses a

3 9 3 convolutional layer and two 1 9 1 convolutional

layers (corresponding to reg and cls layer, respectively).

The RPN cannot only be used to predict the position of the

vehicle, but also can output the score of two categories of

each proposal.

According to the actual aspect ratio of the vehicle

frontal-view images, each sliding window uses three scales

and two aspect ratios in this paper. The aspect ratios of

anchor boxes are 0.9 and 0.6. In addition, each sliding

window uses three scales with areas of 1002, 1602 and 4102

pixels, so it has 6 (k = 6) anchor boxes. The detailed sizes

of the anchor are shown in Table 1.

Fig. 1 Method of data augmentation

Fig. 2 Vehicle type classification algorithm structure

J Real-Time Image Proc (2019) 16:5–14 7

123

Page 4: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

3.2 Convolutional neural network for vehicle type

classification

In the proposed method, the detection network can be

realized by Fast R-CNN detection method. For shareable

convolutional layers of RPN and detection network, the

improved ZF net is applied on the PASCAL VOC2012 as

the backbone network. We have improved the ZF net and

add two new convolution layers and a new max pooling

layer on the basis of the original network. It has a total of 7

shareable convolutional layers. It can improve expression

ability of the network by increasing the depth of the net-

work. Detailed network structure is shown in Fig. 3.

As shown in Fig. 3, the structure of backbone network

uses 96 convolution kernels of 5 9 5 in the first layer and

256 convolution kernels of 5 9 5 in the second layer. The

convolutional stride is 2, so it can get more information in

the first and second convolutional layer. In the third, fourth

and seventh layer, it uses 384 convolutional kernels of

3 9 3, and the convolutional stride is 1. In the fifth and

sixth layer, it uses 256 convolutional kernels of 3 9 3, and

the convolutional stride is 1. In the first layer, second layer

and fifth layer, it uses the max pooling whose sliding

window size is 3 9 3 and the stride is 2. As a result, it

reduces data dimension and computation time and avoids

network over fitting effectively. The detailed parameters of

convolutional layers are shown in Table 2.

The specific details of training are shown in Fig. 2. The

feature map generated by the last convolutional layer is

used as RPN and ROI pooling layer input. The feature

maps will generate some high-quality proposal regions by

using RPN and then feed to the ROI pooling layer to train

the detection network and RPN. In the end, the trained

network can detect the vehicle frontal-view images in a

large scale and aspect ratio. Each region proposal outputs a

kind of class label and a softmax score between 0 and 1.

Each image is trained with a multitasking loss function

according to formula (1).

LðfPig; ftigÞ ¼1

Ncls

X

i

LclsðPi;P�i Þ þ k

1

Nreg

X

i

Lregðti; t�i Þ

ð1Þ

where i represents the number of anchor, Pi indicates that

the anchor is object probability. If anchor is a positive

sample, P�i is 1, or P�

i is 0. ti is 4 coordinates of the pre-

dicted bounding box. t�i is label associated with a positive

anchor. Ncls and Nreg are two normalization parameters. k isa balancing parameter, and it is set to 10.

Formula (2) is used to describe bounding box-regression

loss.

Lreg ti; t�i

� �¼ smoothL1 ti � t�i

� �ð2Þ

where smoothL1 is robust regression loss function, as

shown in formula (3).

smoothL1ðxÞ ¼0:5x2 xj j\1

xj j � 0:5 xj j � 1

�ð3Þ

4 Experiment and results

Vehicle type classification system was evaluated on the

dataset setup. The experiment was run on Intel Xeon CPU

E5-2630 v3 running at 2.40 GHz, 64 GB RAM and a NVI-

DIA GTX 1080 GPU on an Ubuntu 14.04 64 bit system.

Table 1 Sizes of the anchorK 1 2 3 4 5 6

Anchor sizes 1002, 0.9 1002, 0.6 1602, 0.9 1602, 0.6 4102, 0.9 4102, 0.6

Fig. 3 Structure of the backbone network

8 J Real-Time Image Proc (2019) 16:5–14

123

Page 5: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

4.1 Datasets

The original data in the dataset were collected from the real

images taken at the crossroads. The standard label of

PASCAL VOC was adopted. The standard dataset for

studying the vehicle location and recognition was set up.

According to the actual situation of vehicle appearance at

the crossroads, there are four major types of vehicles, such

as cars, minivans, trucks and buses. The constructed dataset

contains more than 60,000 labeled pictures. These pictures

are different of scales, illumination and angle.

The total number of selected sample pictures in the

training is 37,578. There are 15,000 images of cars, 13,698

images of trucks, 4805 images of minivans and 4075

images of buses.

4.2 Training of RPN and detection network

RPN is trained by using Stochastic Gradient Descent

(SGD). By using the zero-mean Gaussian distribution with

standard deviation 0.01, the method randomly initializes

new layers of RPN. The other layers are initialized by pre-

Table 2 Parameters of

convolutional layersLayers Input Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7

Names Original image Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Conv 6 Conv 5

The improved ZF net

Kernel sizes – 5 9 5 5 9 5 3 9 3 3 9 3 3 9 3 3 9 3 3 9 3

Strides – 2 2 1 1 1 1 1

Channels 3 96 256 384 384 256 256 384

Fig. 4 Precision–recall curves of shared and unshared convolutional layers

Table 3 Classification results

on the test setMethods Classes Average precision (AP) (%) MAP (%)

RPN ? ZF, shared Car 90.6560 81.0553

Bus 66.3634

Minivan 76.6880

Truck 90.5138

RPN ? ZF, unshared Car 90.6096 78.8245

Bus 60.3027

Minivan 74.0719

Truck 90.3138

J Real-Time Image Proc (2019) 16:5–14 9

123

Page 6: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

Fig. 5 Selected examples of vehicle detection results on the test set using the proposed method

10 J Real-Time Image Proc (2019) 16:5–14

123

Page 7: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

trained model on PASCAL VOC2012 Benchmark. For the

detection network, the method adjusts all layer parameters.

The algorithm implementation adopts Caffe which is an

advanced deep learning framework and uses a momentum

of 0.9, a weight decay of 0.0005 and a mini-batch size of

256. Each mini-batch extracts multiple positive and nega-

tive samples anchors from each picture. In order to elimi-

nate redundant region proposals, the method uses non-

maximum suppression (NMS) to reduce the number of

region proposals according to the scores generated by cls

layer. The threshold for NMS is set to 0.7. The method uses

2000 proposal regions in the training stage, while the

number of proposal regions is no more than 300 in the test

stage. After the non-maximum suppression, the highest

score region proposal is selected to detect objects.

If RPN and detection network are separately trained, the

parameters of convolutional layer are changed in different

ways. So we adopt a method that can make two networks

share convolution layer for training. In this paper, the four-

step alternating training method is adopted in the training

part [23]. This method first trains the RPN and then uses

the proposals generated by previous step to train the

detection network (Fast R-CNN). The detection network

obtained by this step will be used to initialize parameters

Fig. 6 Incomplete vehicles shown in the pictures of detection results on the test set using the proposed method

Fig. 7 SS and RPN computation time distribution curves

Fig. 8 Vehicle type classification hardware equipment based on

NVDIA Jetson TK1

J Real-Time Image Proc (2019) 16:5–14 11

123

Page 8: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

for training RPN in the next step. The process is gradually

iterated.

4.3 Results analysis of training and test

In the experiment, there are 42,578 pictures in total to be

trained and tested. Among the pictures, there are 37,578

pictures used for training, 5000 pictures used for test. The

number of iteration is 100,000 in total. Through the GPU

acceleration technology, the network training has been fin-

ished in 10 h. The test result is shown as follows. The pre-

cision–recall curves of shared and unshared convolutional

layers on test dataset are shown in Fig. 4a, b, respectively.

From the figures, we recognized that it has better effect when

sharing convolutional layers with RPN and detection net-

work, themAP is 81.0553%.Themethod has better detection

average precision toward cars and trucks, while the average

precision ofminivans and buses is lower. The result might be

caused by little training set of minivans and buses. The

detailed detection precision is shown in Table 3.

The selected examples of vehicle detection results on

the test set using the proposed method are shown in

Fig. 5a–h. These bounding boxes are the closest object

region proposals to the ground-truth box per image. An

NMS threshold of 0.7 was used to determine correctness,

and each output bounding box is associated with a category

label and a softmax score in [0, 1]. Here, the output box

which has 0.7 and more softmax score is shown. From the

results in Fig. 5a–h, the proposed method has accurate

detection results no matter when day and night is.

In terms of incomplete vehicles shown in the pictures,

the method still has high accuracy to detect them, which

shown in Fig. 6a, b. This proved the advantage of the

proposed method. It can keep stable in translation, zoom

and deformation of image processing.

The method can meet the precision demand. Mean-

while, the detection time of the method is needed to

consider whether to meet the detection requirement of

end-to-end. The detection time is compared using dif-

ferent region proposal methods. The test result is shown

in Fig. 7. Figure 7a, b are the computation time distri-

bution curves of using SS and RPN, respectively. From

the test results, the average detection time of an image is

2.124 s with SS, while the average detection time of an

image is only 0.123 s with RPN. As a result, RPN can

greatly reduce the detection time of an image. The

proposed method can meet the requirements of real-time

detection in engineering. From the specific results of

RPN test, the average time of normalization, convolu-

tional layer and region proposal is 0.101 s, and the

average time of NMS and region detection is 0.022 s. As

a result, RPN reduces great amount of computation time

of region proposal.

4.4 Realization based on NVDIA Jetson TK1

After training, the network based on Caffe is mounted to

NVDIA Jetson TK1. Then it can be operated. The vehicle

type classification hardware equipment based on NVDIA

Jetson TK1 used is shown in Fig. 8. After implementing

the program needed by environment and cameras, the

system is used to detect the vehicle position and type on the

crossroad. The selected examples of vehicle detection

results are as shown in Fig. 9.

4.5 Detection results based on NVDIA Jetson Tk1

Figure 9a, b are the selected examples of vehicle detection

results detected on the crossroad. The experiment results

indicate that under the complex environment, even if there

Fig. 9 Selected examples of vehicle

12 J Real-Time Image Proc (2019) 16:5–14

123

Page 9: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

are several cars and trucks appeared at the same time in the

picture, the proposed method can still realize accurate

classification. Besides, it costs average time around 0.354 s

to process an image in the system. According to the actual

situation of traffic flow, the passage time per vehicle is

about 0.89 s, so it meets the requirement of real-time

classification.

5 Conclusion

This paper proposes a vehicle type classification method

based on convolutional neural networks (CNN). The

proposed method has high accuracy. Aimed at cars and

trucks, it has over 90% accuracy. The method can realize

real-time classification on the test of NVDIA develop-

ment board. It costs around 0.354 s to detect each image

with the network embedded on NVDIA Jetson TK1 and

keeps high accurate rate. In future work, the scope of

bus and minivan training datasets will be enlarged in

order to enhance the detection precision. The ability of

detection the vehicles occluded from each other will be

improved as well.

Acknowledgements This work was supported in part by National

Fund for Fundamental Research (No. 282017Y-5303), in part by the

Fund of National Automobile Accident In-depth Investigation System

(No. HT2016X-007), in part by National Natural Science Foundation

of China (No. 51675324), in part by Training and funding Program of

Shanghai College young teachers (No. ZZGCD15102), in part by

Scientific Research Project of Shanghai University of Engineering

Science (No. 2016-19) and in part by the Shanghai University of

Engineering Science Innovation Fund for Graduate Students (No.

16KY0602).

References

1. Hsieh, J.W., Chen, L.C., Chen, D.Y. et al.: Vehicle make and

model recognition using symmetrical SURF. In: 2013 10th IEEE

International Conference on Advanced Video and Signal Based

Surveillance (AVSS), pp. 472–477 (2013)

2. Dong, Z., Wu, Y., Pei, M. et al.: Vehicle type classification using

a semisupervised convolutional neural network. In: IEEE

Transactions on Intelligent Transportation Systems,

pp. 2247–2256 (2015)

3. Lai, A.H. Fung, G.S., Yung, N.H.: Vehicle type classification

from visual-based dimension estimation. In: Proceedings of the

IEEE Intelligent Transportation Systems Conference,

pp. 201–206 (2001)

4. Gupte, S., Masoud, O., Martin, R.F., et al.: Detection and clas-

sification of vehicles. IEEE Trans. Intell. Transp. Syst. 3(1),37–47 (2002)

5. Saravi, S., Edirisinghe, E.A.: Vehicle make and model recogni-

tion in CCTV footage. In: 2013 18th International Conference on

Digital Signal Processing (DSP), pp. 1–6 (2013)

6. Foresti, G.L., Murino, V., Regazzoni, C.: Vehicle recognition and

tracking from road image sequences. IEEE Trans. Veh. Technol.

48(1), 301–318 (1999)

7. Jang, D.M., Turk, M.: Car-rec: a real time car recognition system.

In: 2011 IEEE Workshop on Applications of Computer Vision

(WACV), Kona, HI, USA, pp. 599–605 (2011)

8. Tong, B., Fan, B., Wu, F.: Convolutional neural networks with

neural cascade classifier for pedestrian detection. In: Chinese

Conference on Pattern Recognition 2016, pp. 243–257. Springer

Nature Singapore Pte Ltd

9. Tome, D., Monti, F., Baroffo, L., et al.: Deep convolution neural

networks for pedestrian detection. Signal Process. Image Com-

mun. 47, 482–489 (2016)

10. Sarfraz, S.M., Saeed, A., Khan, M.H. et al. Bayesian prior models

for vehicle make and model recognition. In: Proceedings of the

7th International Conference on Frontiers of Information Tech-

nology, pp. 35:1–35:6. ACM, New York (2009)

11. Ramnath, K., Hsiao, E. et al.: Car make and model recognition

using 3D curve alignment. In: Winter Conference on Applica-

tions of Computer Vision, pp. 285–292 (2014)

12. Alonso, D., Salgado, L. et al.: Robust vehicle detection through

multidimensional classification for on board video based systems.

In: IEEE International Conference on Image Processing, pp. 4:

IV–321–IV–324 (2007)

13. Chang, W.C., Cho, C.W.: Online boosting for vehicle detection.

IEEE Trans. Syst. Man Cybernet. Part B (Cybernetics) 40(3),892–902 (2010)

14. Zhang, F.: Car Detection and Vehicle Type Classification Based

on Deep Learning. Jiangsu University, Jiangsu (2016)

15. Zhang, F., Xu, X, Qiao, Y.: Deep classification of vehicle makers

and models: the effectiveness of pre-training and data enhance-

ment. In: 2015 IEEE International Conference on Robotics and

Biomimetics (ROBIO), pp. 231–236 (2015)

16. Gu, J., Wang, Z., Kuen, J. et al.: Recent Advances in Convolu-

tional Neural Networks. arXiv preprint arXiv:1512.07108[cs.CV]

(2016)

17. LeCun, Y., Boser, B., Denker, J.S., et al.: Backpropagation

applied to handwritten zip code recognition. Neural Comput.

1(4), 541–551 (1989)

18. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification

with deep convolutional neural networks. In: Neural Information

Processing Systems (NIPS), pp. 1097–1105 (2012)

19. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in

deep convolutional networks for visual recognition. In: European

Conference on Computer Vision (ECCV) (2014)

20. Girshick, R: Fast R-CNN. In: Proceedings of IEEE International

Conference on Computer Vision (ICCV) (2015)

21. Zeiler, M.D., Fergus, R.: Visualizing and understanding convo-

lutional networks. In: Proceedings of Computer vision-ECCV

2014. Springer, pp 818–833 (2014)

22. Ren, S., He, K., Girshick, R. et al.: Faster R-CNN: towards real-

time object detection with region proposal networks. In: Pro-

ceedings of Advances in Neural Information Processing Systems,

pp. 91–99 (2015)

23. Ren, S.: Efficient Object Detection with Feature Sharing.

University of Science and Technology of China, Hefei (2016)

Xinchen Wang is a postgraduate student in Shanghai University of

Engineering Science, Shanghai, China. His research direction focuses

on the technology of intelligent vehicle. His current research interests

include the technology of image processing and deep learning

technology.

Weiwei Zhang received Ph.D. degree in Mechanical Engineering in

Hunan University in 2015. Now he is a lecturer in Shanghai

University of Engineering Science. His research direction is the

technology of intelligent vehicle. His current research interests

include the technology of image processing, intelligent vehicle and

J Real-Time Image Proc (2019) 16:5–14 13

123

Page 10: Real-time vehicle type classification with deep ...static.tongtianta.site/paper_pdf/62621fd0-78f1-11e... · backbone network, the ZF net is a deep convolution neural network with

power train of vehicle. Now his team undertakes several major

projects from renowned Chinese companies.

Xuncheng Wu received Ph.D. degree in Mechanical Engineering in

Xi’an Jiaotong University in the year of 2000. Now he is a professor

in Shanghai University of Engineering Science. His team has

designed three new transmissions for Shanghai Automobile Gear

Works, China, since 2010. His current research interests include the

nonlinear dynamics of gear system, electric control shifting of AMT

and the technology of intelligent vehicle.

14 J Real-Time Image Proc (2019) 16:5–14

123