Robust and accurate moving shadow detection based on multiple features fusion

Optics & Laser Technology 54 (2013) 232–241

Contents lists available at ScienceDirect

Optics & Laser Technology

0030-39http://d

n Corrnn Cor

TechnolProcess

E-m

journal homepage: www.elsevier.com/locate/optlastec

Robust and accurate moving shadow detection based on multiplefeatures fusion

Jiangyan Dai a,b, Miao Qi a,n, Jianzhong Wang a, Jiangkun Dai c, Jun Kong a,b,nn

a School of Computer Science and Information Technology, Northeast Normal University, Key Laboratory of Intelligent Information Processing of JilinUniversities, Changchun, Chinab School of Mathematics and Statistics, Northeast Normal University, Changchun, Chinac College of Science, Northwest A&F University, Yangling, Shanxi, China

a r t i c l e i n f o

Article history:Received 14 December 2012Received in revised form21 May 2013Accepted 30 May 2013Available online 26 June 2013

Keywords:Shadow detectionObject segmentationFeature fusion

92/$ - see front matter & 2013 Elsevier Ltd. Ax.doi.org/10.1016/j.optlastec.2013.05.033

esponding author. Tel./fax: +86 431 84536331responding author at: School of Computeogy, Northeast Normal University, Key Laborating of Jilin Universities, Changchun, China.ail addresses: [email protected] (M. Qi), ko

a b s t r a c t

In recent years, moving cast shadow detection has been becoming a critical challenge to improve theaccuracy of moving object detection in video surveillance. In this paper, we derive a robust moving castshadow detection method based on multiple features fusion. Firstly, several kinds of features such asintensity, color and texture are extracted sufficiently by means of various measures for the foregroundimage. Then, the synthetic feature map is generated by linear combination of these features.Consequently, moving cast shadow pixels are distinguished from their moving objects roughly. Finally,spatial adjustment is applied to correct misclassified pixels for acquiring the refined shadow detectionresult. The effectiveness of our proposed method is evaluated on various scenes. The results demonstratethat the method can achieve high detection rate. In particular, the experiments also indicate that itsignificantly outperforms several state-of-the-art methods by extensive comparisons.

& 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Moving object detection is a fundamental and critical task inmany applications such as object tracking, object recognition,video surveillance, video compression and so forth. Backgroundsubtraction is one of the common approaches for detectingmoving objects. However, cast shadows always move with theircorresponding objects such that many background subtractionmethods cannot separate them accurately. The inaccurate separa-tion might lead to object merging, object shape distortion, andeven object losses. Therefore, detecting and eliminating shadowregions are highly desirable and necessary in video processing andmotion analysis fields.

Many efficient methods have been put forward to detectmoving shadows in recent years. In general, existing shadowdetection methods can be classified into four categories based onseveral features [1]: chromaticity, physical, geometry and textures.

Chromaticity-based methods take advantage of the assumptionthat shadow regions are darker but almost maintain their chro-maticity invariant. For better separation between intensity andchromaticity, several color space such as HSV [2], c1c2c3 [3],

ll rights reserved.

.r Science and Informationory of Intelligent Information

[email protected] (J. Kong).

HSL [4], RGB [5–8], or a combination of them [9] have beendeveloped to detect moving cast shadow robustly. Cucchiaraet al. [2] exploited color information in HSV color space forshadow detection to improve object segmentation. They provedthat cast shadows were darker than the corresponding back-ground in luminance component, while hue and saturation com-ponents were consisted with the corresponding background andchanged within a certain range experimentally. Salvador et al. [3]described an efficient method to segment cast shadow in both stillimages and video sequences, which verified the invariant c1c2c3color and geometric properties of shadows. Grest et al. [4]discussed a similarity measure in HSL color space to separatecolor from intensity information, which improved the quality ofshadow removal significantly. Hypothesizing that RGB ratios wereconstants, Song et al. [6] constructed a color ratio model followedGaussian distribution to determine whether a pixel belonged toshadows or not in traffic images. Amato et al. [7] employed localcolor constancy property to detect both achromatic and chromaticshadows from foreground accurately. Choi et al. [8] proposed anadaptive shadow elimination method using cascading chromati-city difference estimator, brightness difference estimator and localrelation estimator, which could be adapted to variations ofillumination and environment. To be complementary with eachother efficiently, Sun et al. [9] utilized the combined models inboth HSI and c1c2c3 color spaces to distinguish shadows fromforeground regions. As mentioned above, most of these methodsare simple to implement and computationally inexpensive.

www.sciencedirect.com/science/journal/00303992

www.elsevier.com/locate/optlastec

http://dx.doi.org/10.1016/j.optlastec.2013.05.033



http://crossmark.dyndns.org/dialog/?doi=10.1016/j.optlastec.2013.05.033&domain=pdf



mailto:[email protected]

mailto:[email protected]



J. Dai et al. / Optics & Laser Technology 54 (2013) 232–241 233

However, they are sensitive to noise and will fail when shadowregions are darker or moving objects have similar color informa-tion with background.

Nadimi et al. [10] presented a physical-based model on aspatio-temporal albedo test and dichromatic reflection model formoving cast shadow detection. They focused on outdoor scenesand incorporated multiple sources with different spectral powerdistributions. Martel-Brisson et al. [11] described a pixel-basedstatistical approach to model and detect moving cast shadows byparameterizing probability density functions. Over restrictiveprobabilistic models, Joshi et al. [12] introduced a semi-supervisedlearning technique with color and edge characteristics to identifyshadows, which was implemented by Support Vector Machines(SVMs) and co-training algorithms on a small set of human-labeleddata. Physical methods can adapt automatically to complex sceneconditions but need update shadow models timely and userinteractions.

Geometry-based methods are designed with the predictedorientation, size and even shape of shadows by proper priorknowledge of the illumination source, camera location and objectgeometry to detect moving shadows. For eliminating unwantedpedestrian-like shadows, Hsieh et al. [13] proposed a Gaussianshadow model via parameterizing with several features includingthe orientation, mean intensity, and center position of one shadowregion. For exploiting spectral and geometrical properties ofshadows, and the relationship between points in shadow regionand space position and vehicle shape, Fang et al. [14] presented amoving vehicle cast shadow detection method, which was carriedout by occluding function using 1D wavelet transform. Geometric-based methods do not rely on the background reference but needmore prior knowledge and scene limitations.

Generally, texture-based shadow detection methods hypothe-size that background image has similar texture with shadowregions while different textures with moving objects. Leone et al.[15] proposed a moving cast shadows method based on Gaborfunctions and matching pursuit strategy. Zhang et al. [16] firstemployed ratio edge as the ratio between the intensity of one pixeland its neighboring pixels to detect shadows. The experimentresults proved it was illumination invariant. Confirming theexistance of shadows, Xiao et al. [17] reconstructed coarse objectshapes, and then extracted cast shadows by subtracting movingobjects from one changed mask. By creating one mask of candidateshadow pixels using chromaticity information, Sanin et al. [18]discriminated cast shadows from moving objects by means ofgradient information. Bullkich et al. [19] assumed nonlinear tonemapping between shadows and their corresponding background,and analyzed the structural content by Tone Mapping to separateshadows from suspected foregrounds. Meher and Murty [20]applied a statistical method called principal component analysis(PCA) to obtain the search directions for moving shadow regionsand then utilized the variance of regions to test the homogeneityfor separating shadow regions from vehicle regions. Methodsbased on texture similarity are independent of color information,and against illumination changes. However, they will be failurewhen moving objects and shadow regions possess similar textureinformation with corresponding background regions.

As mentioned above, methods based on only a single featuremight lead to misclassification for moving cast shadows. Recently,multiple features fusion is becoming an active research area andexhibits a significant trade-off among features [21–27]. Groupingpotential shadows into partitions in terms of a bluish effect andedge, Huerta et al. [22] analyzed temporal and spatial similaritiesfor all these regions in order to detect umbra shadows. Lin et al.[23] presented a moving shadow removal algorithm by combiningtexture and statistical models, which was realized via edgeinformation and gray level-based feature modeled with Gaussian.

HAMAD et al. [24] employed both color and texture information toidentify cast shadow regions. They utilized intensity ratio andentropy to characterize the two features. Boroujeni et al. [25]proposed a semi-supervised classification method based on hier-archical mixture of MLP experts to detect moving cast shadows.They constructed feature vectors including color intensity, averageillumination, color distortion and light distortion to demonstrateenvironmental properties of frames. McFeely et al. [26] adoptedthe combination of color illumination invariance and textureanalysis to identify shadows after tree-structured segmentationfor digital imagery. Dai et al. [27] exploited color information inHSV color space, texture similarity by LBP and local variance todetect moving cast shadows. Although the fusion of variousfeatures was adopted in many works, different measures for thesame type of features are not considered significantly. Addition-ally, most of those methods detected shadow pixels in serial moderather than in parallel.

Inspired of the existing methods, we propose a novel movingcast shadow detection method based on feature fusion. Instead ofusing one single feature or more features sequentially, three typesof features are taken into account simultaneously in our work.First, intensity, color and texture features are extracted for oneforeground image. In order to characterize these features compre-hensively as much as possible, we represent color information interms of multiple color spaces and multi-scale images. Meanwhile,texture information is described by entropy and local binarypatterns. Second, a feature map corresponding to the foregroundis generated by fusing these features. Subsequently, moving castshadows can be identified roughly from their moving objects interms of the feature map. At last, in order to obtain the refinedshadow detection result, spatial adjustment is carried out tomodify the misclassified pixels. Extensive experiments and com-parable results demonstrate that the proposed method exhibitsexcellent performance and outperforms several well-knownmethods.

This paper is organized as follows. Section 2 presents theproposed shadow detection method, which consists of foregroundsegmentation, feature extraction, feature fusion for shadow detec-tion and spatial adjustment. Section 3 analyzes the experimentsand conclusions are given in Section 4.

2. The proposed shadow detection method

Shadows appear when objects partially or totally occlude directlight from a source of illumination. Shadows can be classified intotwo classes: self-shadow and cast shadow. The former occurs inthe part of an object which is not illuminated by direct light. Thelatter is the area projected by the object in direct light. Inparticular, the latter is called moving cast shadow if the object ismoving.

In this section, we present a multiple feature fusion method forrobust shadow detection, called MFF. Without loss of generality,the proposed method is based on assumptions that shadowregions are darker but retain similar chromaticity and textureinformation with respect to background regions. Fig. 1 shows theflowchart of the proposed method including foreground segmen-tation, feature extraction, feature fusion and spatial adjustment.

2.1. Foreground segmentation

Foreground extraction is very necessary prior to moving castshadow detection, which can reduce computation time andimprove detection accuracy. For the sake of simplicity, the ana-lyzed video sequences are assumed to be captured in a still cameraenvironment. In our study, Gaussian Mixture Model (GMM) [28] is

J. Dai et al. / Optics & Laser Technology 54 (2013) 232–241234

utilized to segment moving pixels in RGB color space. In thisalgorithm, each pixel is independent, and GMM models each pixelwith a mixture model.

After implementing GMM, we can obtain the foreground imagewhich may contain moving objects and their moving cast sha-dows. Meanwhile, the estimated background is taken as back-ground image. Further procedure is performed to detect shadowsfrom one foreground image. For convenience, one frame I at time tis denoted as It, and its corresponding foreground and backgroundimages are denoted as Ft and Bt, respectively. Besides, the binarymask of foreground is Mt. The segmentation results of one sampleframe are indicated in Fig. 2.

Fig. 1. The flowchart of moving cast shadow detection.

Fig. 2. The segmentation results after GMM. (a) Current frame,

2.2. Feature extraction

For effective shadow detection, we need to extract usefulfeatures in which shadows differ from their moving objects.According to assumptions of shadows, several types of featuresincluding intensity, color and texture are employed adequately.Specially, to exploit color feature as much as possible, three colorspaces are considered to measure color information. We not onlycompute the feature differences between one foreground and itsbackground, but also consider the foreground differences itselfwith different spatial scales. Additionally, the texture similarity isanalyzed by both entropy and Local Binary Patterns (LBP).

2.2.1. Intensity feature(1) Intensity constraintsSuch that shadow pixels are darker than that of pixels in

background, we impose the intensity constraint for shadowdetection. In other words, one moving pixel might not be shadowif it has higher intensity in Ft than in Bt. Besides, moving pixels fallnear the black may cause unstable feature values. Therefore, thosepixels whose intensities are below a certain value Th (Th isdifferent for various videos) are regarded as moving objects. Thestrategy is formed:

Mobt ðx; yÞ ¼ 1 if ðFtðx; yÞ4Btðx; yÞ or Ftðx; yÞoThÞ and Mtðx; yÞ ¼ 1

0 otherwise;

�

Msht ðx; yÞ ¼Mtðx; yÞ and :Mob

t ðx; yÞ; ð1Þ

where Mobt and Msh

t are binary masks of moving objects andcandidate moving shadows, respectively. The corresponding pixelsin one current frame of Msh

t will be analyzed further, which mayconsist of moving objects and their shadow pixels.

(2) Normalized cross correlationTo measure the similarity between one foreground image Ft

and its background Bt, the normalized cross correlation NCC istaken into account [29]. For one pixel p at location (x, y) in Msh

t , the

(b) background, (c) binary foreground, and (d) foreground.


NCC is calculated as following:

NCCtðx; yÞ ¼

∑u∈Ωp FtðuÞ � BtðuÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑u∈Ωp F

2t ðuÞ � ∑

u∈Ωp

B2t ðuÞ

r if Msht ðx; yÞ ¼ 1

0 otherwise

;

8>>><>>>:

ð2Þ

where Ωp denotes the neighborhood centered at the pixel p, FtðuÞand BtðuÞ are intensities of pixels at position u in current frame andbackground, respectively. NCC is the similarity map, in which thevalue should be larger (close to 1) if the pixel p is shadow.

2.2.2. Color feature(1) Chromaticity differenceSmith [30] described a triangle-based Hue-Saturation-Value

(HSV) model, which was close to the human perception of color[31] and it has been proven more accurately to detect shadows.Moreover, it is observed that the hue and saturation componentschange within certain ranges when a pixel is covered by shadow[2]. Inspired by this idea, the chromaticity difference is defined:

Chtðx; yÞ ¼ ðjFtðx; yÞ:S−Btðx; yÞ:Sj þ jFtðx; yÞ:H−Btðx; yÞ:HjÞ=2 if Msht ðx; yÞ ¼ 1

0 otherwise

(

ð3Þwhere Ftðx; yÞ:S and Btðx; yÞ:S denote saturation component valuesof foreground and background in HSV space, respectively. Like-wise, Ftðx; yÞ:H and Btðx; yÞ:H are hue component values. Obviously,the smaller chromaticity difference indicates larger probabilitythat one pixel belongs to shadow.

(2) Invariant photometric colorConsidering photometric color invariants, c1c2c3 color model

[32] is adopted to measure color information of moving shadowsand objects. It is demonstrated to be invariant to variable illumi-nation conditions and only dependent on the sensors and surfacealbedo. The c1c2c3 is defined:

c1ðx; yÞ ¼ arctanRðx; yÞ

maxðGðx; yÞ;Bðx; yÞÞ

� �;

c2ðx; yÞ ¼ arctanGðx; yÞ

maxðRðx; yÞ;Bðx; yÞÞ

� �;

c3ðx; yÞ ¼ arctanBðx; yÞ

maxðRðx; yÞ;Gðx; yÞÞ

� �; ð4Þ

where Rðx; yÞ, Gðx; yÞ and Bðx; yÞ are corresponding values of red,green and blue components in RGB color space. To weigh thedifference of one pixel which is covered by shadow or not,invariant photometric color is computed:

Dc1t ðx; yÞ ¼ jFc1t ðx; yÞ−Bc1

t ðx; yÞj;Dc2t ðx; yÞ ¼ jFc2t ðx; yÞ−Bc2

t ðx; yÞj;Dc3t ðx; yÞ ¼ jFc3t ðx; yÞ−Bc3

t ðx; yÞj; ð5Þwhere Fc1t ðx; yÞ, Fc2t ðx; yÞ and Fc3t ðx; yÞ denote the values of one pixelat ðx; yÞ in c1c2c3 space for the foreground. Similarly, Bc1

t ðx; yÞ,Bc2t ðx; yÞ and Bc3

t ðx; yÞ are the values in the background at the sameposition.

For sake of coping with distortions caused by the noise as muchas possible, a synthetic operation is carried out and the invariantcolor map is established by linear combination:

CCCtðx; yÞ ¼ ðDc1t ðx; yÞ þ Dc2

t ðx; yÞ þ Dc3t ðx; yÞÞ=3 if Msh

t ðx; yÞ ¼ 10 otherwise

:

(

ð6ÞIn an ideal situation, shadow pixels should have smaller

difference CCCtðx; yÞ than moving object pixels.(3) Salient color informationInstead of calculating the difference in color information

between the foreground and background, the salient color feature

is exploited by taking the foreground itself into account withdifferent scales. Assuming that the background is planar, shadowpixel has similar property with its surround pixels, while movingobject pixel may be not because of different surface albedos.Therefore, one pixel may belong to moving objects, which standsout from its surround pixels, otherwise is shadow. The local spatialdiscontinuities are called as salient property. It is yielded bycenter-surround operation, which denotes the across-scale differ-ence between a fined scale f and a coarser scale s. In our study, fourbroadly-tuned color channels (red, green, blue and yellow) areestablished to describe the salient color contrast [33]:

r¼ R−Gþ B2

; g ¼ G−Rþ B2

;

b¼ B−Rþ G2

; y¼ Rþ G2

−jR−Gj2

−B; ð7Þ

where R, G and B are channels in RGB color space.Center-surround operation is implemented at different scales

for one foreground image. Thus, dyadic Gaussian pyramids [34] areadopted to create nine spatial scales, which are operated by low-pass filter and sampling the input foreground image with hor-izontal and vertical image-reduction factors ranging from 1:1(scale zero) to 1:256 (scale eight) in eight octaves. Consequently,four Gaussian pyramids rðsÞ, gðsÞ, bðsÞ and yðsÞ are created fromthese channels, where s∈f0;⋯;8g. In human primary visual cortex[35], spatial and chromatic opponency exists for the red/green,green/red, blue/yellow and yellow/blue color pairs. Consideringthe double opponency of color, rgFt ðf ; sÞ and byFt ðf ; sÞ of oneforeground Ft are defined:

rgFt ðf ; sÞ ¼ jðrFt ðf Þ−gFt ðf ÞÞ⊗ðgFt ðsÞ−rFt ðsÞÞj;byFt ðf ; sÞ ¼ jðbFt ðf Þ−yFt ðf ÞÞ⊗ðyFt ðsÞ−bFt ðsÞÞj; ð8Þ

where f∈ 2;3;4f g, s¼ f þ δ, δ∈ 3;4f g, ⊗ represents across-scaledifference.

After realizing Eq. (8), 12 feature maps are generated. Thefollowing formula combines them at the same scale (s¼ 0) andcreates the final salient color map:

Salt ¼ ⊕4

f ¼ 2⊕fþ4

s ¼ fþ3ðrgFt ðf ; sÞ þ byFt ðf ; sÞÞ; ð9Þ

where ⊕ denotes across-scale addition, which contains reductionof each map to scale zero and point-by-point addition. As men-tioned above, moving object pixels possess greater salient colorvalues.

2.2.3. Texture feature(1) Entropy criterionEntropy is a statistical measurement of randomness that can be

adopted to compute the texture difference, Ect ðx; yÞ is the entropyof pixel q at position ðx; yÞ over color channel c at time t, wherec∈ R;G;Bf g, defined as

Ect ðx; yÞ ¼− ∑u∈Ωq

pct ðIct ðuÞÞ � logðpct ðIct ðuÞÞÞ; ð10Þ

Ωq denotes the neighborhood centered at pixel q. Ict ðuÞ is theintensity level in the neighborhood Ωq at RGB channel c, pct ðIct ðuÞÞrepresents the probability of intensity levels in Ωq at each RGBchannel.

The texture difference between the foreground and corre-sponding background can be formulated as

ΔEtðx; yÞ ¼min

c∈ R;G;Bf gðjEcFt ðx; yÞ−E

cBtðx; yÞjÞ if Msh

t ðx; yÞ ¼ 1

0 otherwise;

8<: ð11Þ

where EcFt ðx; yÞ and EcBtðx; yÞ are entropy values of one pixel at

position ðx; yÞ in foreground F and background B over RGB channel

Fig. 5. Average detection rates with different values for the parameter T.


c at time t, respectively. Clearly, ΔEt is the smaller, the similarity ishigher.

(2) LBPThe LBP is one of the most widely applied descriptors because

of its illumination invariant and computational simplicity.A modified LBP feature proposed in [36] is employed to estimatetexture information, which improves the robustness on flat imageregions.

Given a center pixel p with intensity value gpc, its LBP

descriptor is defined:

LBPcQ ;Rðx; yÞ ¼ ∑

q∈Ωp

sðgcq−gcp−TlbpÞ2q; sðaÞ ¼ 1; a≥00 otherwise

;

�ð12Þ

where Q is the number of pixels in a neighborhood Ωq, R is theradius of a circle, gqc denotes values in the circular neighborhoods,and Tlbp is a relatively small threshold for enhancing the robust-ness. In particular, with increasing the radius R, the correlationamong pixels will decrease. Therefore, the radius R of LBP operatorshould be kept small generally.

Fig. 3. Shadow detection results. (a) Original frames, (b) masks of moving objects and (c) masks of cast shadows.

Fig. 4. The refined detection results. (a) Masks of moving objects, (b) masks of moving shadows, and (c) moving objects.

Table 1Quantitative comparison results.

Sequence Hallway Highway CAVIAR Intelligent room

Method η ζ Mean η ζ Mean η ζ Mean η ζ Mean

SNP1 83.77 45.77 64.77 39.28 66.68 52.98 61.39 87.87 74.63 84.07 74.62 79.35DNM 83.73 79.61 81.67 86.84 58.31 72.58 93.24 78.80 86.02 86.67 75.45 81.06ICF 95.32 83.01 89.17 84.54 60.93 72.73 92.76 88.56 90.66 85.25 87.81 86.53SNP2 83.93 98.10 91.01 70.91 72.56 71.74 88.12 97.60 92.86 95.50 88.61 92.05CCM 96.46 68.55 82.51 87.65 36.37 62.01 87.45 94.77 91.11 88.85 85.50 87.17ASE 90.09 93.19 91.64 73.14 96.21 84.68 85.11 98.57 91.84 92.55 92.66 92.60MFF 86.31 97.04 91.67 83.91 96.49 90.20 95.70 95.59 95.64 93.36 92.34 92.85

Note: Bold values are the best results.

1 http://cvrr.ucsd.edu/aton/shadow/2 http://vision.gel.ulaval.ca/�castShadows/3 http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/


Moreover, a Q-bits binary pattern of p is derived from theneighborhood differences by Eq. (12). Therefore, the histogramconsisting of 2Q-bins is introduced for texture description. Thehistogram intersection is chosen to measure the similarity:

ρcpðhc;hc′Þ ¼ ∑N−1

n ¼ 0minðhcn;hc′n Þ; ð13Þ

where hc and hc′ are the two texture histograms, N is the numberof histogram bins, and ρcpðhc;hc′Þ is the similarity coefficientreporting the common part of two histograms corresponding thepixel p. The texture similarity map is generated according to

LBPtðx; yÞ ¼max

c∈ R;G;Bf gðρcpÞ if Msh

t ðx; yÞ ¼ 1

0 otherwise;

8<: ð14Þ

where LBPtðx; yÞ is the texture similarity of one pixel at positionðx; yÞ. One can see that the greater similarity degree, the LBPtðx; yÞ,is larger.

2.3. Feature fusion

After extracting six features from one foreground image, sixfeature maps are obtained. Instead of determining whether onepixel belongs to shadow or not in terms of one or more featuressequentially, we identify the pixel in parallel by linear combinationof multiple features. In order to get the fused feature mapconsistently, each feature map is normalized and the fused mapMapt is established by

Mapt ¼ 16ðNð1−NCCtÞ þ NðChtÞ þNðCCCtÞ þ NðΔEtÞ þNðSaltÞþNð1−LBPtÞÞ; ð15Þ

where Nð:Þ is the normalization operation. Generally, one fore-ground image may consist of moving objects and their castshadows. Therefore, the classification decision is made followingthis principle:

Obtðx; yÞ ¼ 1 if Maptðx; yÞ4T or Mobt ðx; yÞ ¼ 1

0 otherwise;

(

Shtðx; yÞ ¼Mtðx; yÞ and :Obtðx; yÞ; ð16Þwhere T is a constant threshold that is determined manually.Obtðx; yÞ and Shtðx; yÞ are the binary masks of moving object imageand moving cast shadow image, respectively. Obtðx; yÞ ¼ 1 signifiesthat the pixel is labeled as moving object whereas Obtðx; yÞ ¼ 0implies that the pixel is labeled as shadows as well as Shtðx; yÞ.Obviously, we can realize to discriminate each pixel using theseuseful features simultaneously through the fused map.

Fig. 3 shows the shadow detection results of two frames derivedfrom two different scenes. The first column is original frames, thesecond column indicates binary masks Obt of moving objects and thelast column gives binary masks Sht of moving cast shadow. Seeing

from this figure, we can find that there are somemisclassified pixels intruth moving objects and truth moving cast shadows. These mis-classified pixels look like noises. It is desirable to correct as much aspossible for accurate moving object detection.

2.4. Spatial adjustment

As shown in Fig. 3, it reveals that there are still some errors in bothtrue shadow and object regions. As a result, spatial adjustment isapplied to correct the errors necessarily for improving the accuracy ofshadow detection in terms of geometric properties. In process ofshadow detection, the detected shadow regions consist of manyregions classified correctly and some small object blobs which aredefined incorrectly and the same to the detected object regions. Towipe off these small misclassified blobs, a connected componentlabeling algorithm is adopted to label different regions, and then sizefilter is utilized to abandon small misclassified blobs from detectedshadow regions and subsequently applied to detected object regions.Therefore, some isolated errors are corrected exactly.

After spatial adjustment, the refined shadow detection resultscorresponding to Fig. 3 are given in Fig. 4. Seeing from Fig. 4(a), thereare almost no error pixels after spatial adjustment. It indicates thatspatial adjustment plays a very important role in modifying errorpixels. The results of object detection are given in Fig. 4(c), fromwhichwe can see that the shape distortion of moving objects caused bymoving cast shadows has been coped with effectively.

3. Experiments

In order to evaluate the performance of our method MFFeffectively and systematically, we present extensive results onseveral well-known benchmarks for which ground truth data wasavailable. The chosen benchmarks consist of indoor and outdoorscenes. Specifically, Intelligent Room,1 Hallway2 and CAVIAR3 aretypical indoor environments while Highway is a scene of highwayin outdoor. In addition, we compare our method with severalstate-of-the-art methods to prove the superiority from the aspectsof quality and quantity, including deterministic nonmodal-based(DNM [2]), invariant color features (ICF [3]), statistical nonpara-metric (SNP1 [5], SNP2 [6]), adaptive shadow estimator (ASE [8]),and combined color models (CCM [9]) methods.

3.1. Quantitative evaluation

For the purpose of obtaining a systematic and objectiveevaluation for the proposed method MFF, the performance is

http://cvrr.ucsd.edu/aton/shadow/

http://vision.gel.ulaval.ca/~castShadows/

http://vision.gel.ulaval.ca/~castShadows/

http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/


estimated quantitatively. Two standard metrics are employedwhich was introduced by Prati et al. [37]: shadow detectionrate η and shadow discrimination rate ζ. Clearly, any onemetric cannot be competent to evaluate the performance.Thus, the average of shadow detection rate and shadow discrimi-nation rate is also calculated simultaneously. These metrics are

Fig. 6. Quantitative comparison results on Highway and Intelligent Room sequence. (a) Sh

defined as follows:

η¼ TPS

TPS þ FNS� 100%; ζ¼ TPO

TPO þ FNO�100%;

mean¼ ηþ ζ

2; ð17Þ

adow detection rate, (b) Shadow discrimination rate and (c) Average detection rate.

Table 2Comparison results for F-measure (%).

Method Hallway Highway CAVIAR Intelligent room

SNP1 51.98 65.36 83.14 79.29DNM 83.22 69.63 85.99 80.48ICF 89.12 70.63 91.59 87.42SNP2 94.32 69.09 95.50 91.81CCM 78.13 49.72 92.99 87.41ASE 93.64 89.85 94.90 92.97MFF 95.12 92.97 96.00 92.20

Note: Bold values are the best results.

Fig. 7. The chart of comparison results.


where the subscripts S and O represent shadow and objectrespectively. TPS and TPO are the numbers of shadow and objectpixels correctly detected, respectively. FNS and FNO are the num-bers of shadow and object pixels misclassified, respectively.

In order to verify the performance of MFF method affected bythe parameter T∈ð0;1�, the average detection rate is taken intoaccount for characterizing the variation. Fig. 5 illustrates averagedetection rates for the four benchmarks. Here it is worth notingthat the average detection rates can achieve the top with differentT. For Intelligent Room, it takes on upward trend when To0:39.With the increasing of T, the average detection rate decreasesrapidly for those benchmarks. Therefore, our method exhibits thebest performance when T is set to 0.36, 0.31, 0.43 and 0.39 forHighway, Hallway, CAVIAR and Intelligent Room, respectively.

We calculate the quantitative metrics on the selected bench-marks in extensive experiments and list the comparison results inTable 1 distinctly. As listed in Table 1, we can see that MFF has thehighest shadow detection rate in CAVIAR and the highest shadowdiscrimination rate in Highway. However, any one of the shadowdetection rate and shadow discrimination rate cannot stand outthe superiority of our method. From the viewpoint of averagedetection rate, our method MFF is superior to the existing methodsin these benchmarks.

On the whole, the proposed method possesses the best detec-tion performance, followed by ASE. For the other methods, theyprovide relatively worse detection performance. Take Highway forexample, our method can achieve the average detection rate at90.20%, followed by ASE of 84.68%. Specially, SNP1 displays worstdetection rates which only take into account one kind of colorinformation and CCM gives the worse detection rates even thoughit considers two color models. For DNM, it has lower averagedetection rates than MFF in these benchmarks about 10.00%,17.62%, 9.62% and 11.79%. The reason is that DNM has notintroduced other features but only color information. The excellentperformance of MFF attributes to the linear combination of multi-ple features. In particular, these features can coordinate with eachother to judge the classes of moving pixels in every foregroundimage simultaneously.

For comparing the stability of various methods, we calculatethe shadow detection rate η and the shadow discrimination rate ζof each frame on Highway and Intelligent Room, respectively. Thevisualized comparison results with some well-known methods areshown in Fig. 6(a,b) intuitively. Note that we plot one point every10 frames for Highway and 4 frames for Intelligent Room aiming toobserve explicitly. Furthermore, MFF has better shadow detectionrate and shadow discrimination rate in most frames. For instance,the shadow discrimination rates resulting from our method inHighway and Intelligent Room are higher than the others in mostframes while have relatively lower shadow detection rates.For giving a just evaluation between the two metrics, theiraverage detection rates are calculated and illustrated in Fig. 6(c).The average detection rates always occupy the top points.Besides, the variance is analyzed in detail so as to evaluate thestability quantitatively. The variances of average detection rates inSNP1, DNM, ICF, SNP2, CCM, ASE and MFF for Intelligent Room are0.85, 0.08, 0.46, 0.14, 0.68, 0.35 and 0.25, respectively. Obviously,DNM exhibits the best stability followed by SNP2 and MFF.Meanwhile, the variances of average detection rates for Highwaysequence are 0.41, 0.19, 0.34, 0.91, 0.62, 0.70 and 0.31, respectively.We can see that the similar trend appears in Highway. In summary,it indicates that our method retains good stability in the twodifferent scenes.

Specially, the goal of detecting moving cast shadows is toimprove the accuracy of moving object detection. Based on thisconsideration, F-measure [24] is adopted to evaluate the objectdetection performance comprehensively. The definition of

F-measure is given by

F ¼ 2� recall� precisionrecallþ precision

; ð18Þ

where recall measures the number of correctly classified movingobject pixels as a percentage of the number of moving objectpixels in the ground truth. Precision is the number of correctlyclassified moving object pixels as a percentage of the total numberof pixels detected as moving objects.

F-measure is computed on these benchmarks with the six well-known methods and MFF. The comparison results are summarizedin Table 2. For comparison clearly, the results listed in Table 2 arecharted in Fig. 7. Overall, our method has best F-measure thanother methods except ASE on Intelligent Room. It is worth notingthat the F-measure of our method is lower than ASE about 0.77%merely although MFF has higher average detection rate. Specially,for the Highway sequence, the proposed MFF has higher F-measurethan SNP1, DNM, ICF, SNP2, CCM and ASE about 27.61%, 23.34%,22.34%, 23.88%, 43.25% and 3.12%, respectively.

3.2. Qualitative evaluation

For the sake of demonstrating the effectiveness of the proposedmethod subjectively, visualized results are obtained by six state-of-the-art shadow detection methods and MFF for comparisonsclearly.

The visual comparison results on different benchmarks areillustrated in Fig. 8. In this figure, the first row displays the originalframes, the second row is the ground truths, and the rest rowsshow detection results obtained by various methods and MFF. Asshown in Fig. 8, all methods can detect moving cast shadowscorrectly in a certain degree. Obviously, some methods are help-less on some scenes. For instance, SNP1 has worst performanceespecially on Highway and Intelligent Room. The detection resultson Highway of DNM, ICF, SNP2 and CCM are relatively worse thanindoor scenes. Specially, ASE has better detection performancethan the mentioned methods but is inferior to the proposed MFF.

Fig. 8. The visualized comparison results. (a) Original frames, (b) Ground truths, (c) SNP1, (d) DNM, (e) ICF, (f) SNP2, (g) CCM, (h) ASE, (i) MFF



This can be depicted on Highway and Intelligent Room comparingFig. 8(h) with Fig. 8(i). In particular, moving cast shadows can bealmost completely distinguished by MFF. Obviously, the pedes-trians and their weak shadows, which are projected on the groundsurface, the cabinet and the wall in indoor scenes, are detected,accurately shown in Fig. 8(i). Moreover, the effectiveness in out-door scene is improved because of the utilization of variousfeatures fusion. The visual comparison results demonstrate thatthe proposed method has a good ability to discriminate movingcast shadows from their moving objects.

The proposed method is implemented on the computer withIntel Core i3-2100 CPU at 3.2 GHz and 2 GB physical memoryrunning on Windows. According to the proposed MFF, we can seethat the time-consuming is highly dependent on the number offoreground pixels obtained by the foreground segmentation. On anaverage, the processing time for Hallway, Highway, CAVIAR andIntelligent Room is 0.76 s, 0.80 s, 1.10 s and 0.78 s, respectively.Although multiple features are extracted in MFF, the speed isacceptable. However, it is not fast enough and should be improved.Therefore, further optimized implementation of MFF in a compiledlanguage is one of the future research issues.

As the analysis mentioned above, we can see that the MFFexhibits well performance compared with the several existingmethods from the aspects of qualitative and quantitative indicators.

4. Conclusions

In this paper, a robust moving cast shadow detection methodhas been presented on the basis of multiple feature fusion, whichimproved the accuracy of moving object detection. In our study,three kinds of features such as intensity constraints, color andtexture are exploited from the aspects of various color spaces,multi-scale images and measurements. In particular, the featuremap is derived by feature fusion, which makes these featuresdetermine whether one pixel belongs to shadows or not inparallel. The effectiveness of the proposed method is validatedby compared results with several well-known methods. Experi-ments also demonstrate that the proposed method not only hashigher and relatively stable detection accuracy but also is robust toslight illumination changes.

Acknowledgments

This work is supported by the Young Scientific ResearchFoundation of Jilin Province Science and Technology DevelopmentProject (No. 201201070, No. 201201063), the Jilin ProvincialNatural Science Foundation (No. 201115003), the Fund of JilinProvincial Science & Technology Department (No. 20111804, No.20110364), the Science Foundation for Post-doctor of Jilin Province(No. 2011274), and the Program for Changjiang Scholars andInnovative Research Team in University (PCSIRT).

References

[1] Sanin A, Sanderson C, Lovell BC. Shadow detection: a survey and comparativeevaluation of recent methods. Pattern Recognition 2012;45(4):1684–95.

[2] Cucchiara R, Grana C, Piccardi M, Prati A. Detecting moving objects, ghosts andshadows in video streams. IEEE Transactions on Pattern Analysis and MachineIntelligence 2003;25(10):1337–42.

[3] Salvador E, Cavallaro A, Ebrahimi T. Cast shadow segmentation using invariantcolor features. Computer Vision and Image Understanding 2004;95(2):238–59.

[4] Grest D, Frahm J-M, Koch R. A color similarity measure for robust shadowremoval in real time. In: Vision, modeling and visualization; 2003. p. 253–60.

[5] Horprasert T, Harwood D, Davis L. A statistical approach for real-time robustbackground subtraction and shadow detection. In: IEEE ICCV99 frame-rateworkshop; 1999.

[6] Song KT, Tai JC. Image-based traffic monitoring with shadow suppression. In:Proceedings of the IEEE, vol. 95; 2007. p. 413–26.

[7] Amato A, Mozerov MG, Bagdanov AD, Gonz‘alez J. Accurate moving castshadow suppression based on local color constancy detection. IEEE Transac-tions on Image Processing 2011;20(October (10)):2954–66.

[8] Choi JM, Yoo YJ, Choi JY. Adaptive shadow estimator for removing shadowof moving object. Computer Vision and Image Understanding 2010;114(9):1017–29.

[9] Sun B, Li S. Moving cast shadow detection of vehicle using combined colormodels. In: Chinese conference on pattern recognition; 2010. p. 1–5.

[10] Nadimi S, Bhanu B. Physical models for moving shadow and object detectionin video. IEEE Transactions on Pattern Analysis and Machine Intelligence2004;26(8):1079–87.

[11] Martel-Brisson N, Zaccarin A. Learning and removing cast shadows through amultidistribution approach. IEEE Transactions on Pattern Analysis andMachine Intelligence 2007;29(7):1133–46.

[12] Joshi AJ, Papanikolopoulos NP. Learning to detect moving shadows in dynamicenvironments. IEEE Transactions on Pattern Analysis and Machine Intelligence2008;30(11):2055–63.

[13] Hsieh JW, Hu WF, Chang CJ, Chen YS. Shadow elimination for effective movingobject detection by Gaussian shadow modeling. Image and Vision Computing2003;21(6):505–16.

[14] Fang LZ, Qiong WY, Sheng YZ. A method to segment moving vehicle castshadow based on wavelet transform. Pattern Recognition Letters 2008;29(16):2182–8.

[15] Leone A, Distante C, Buccolieri F. Shadow detection for moving objects basedon texture analysis. Pattern Recognition 2007;40(4):1222–33.

[16] Zhang W, Fang XZ, Yang XK, Wu QMJ. Moving cast shadows detection usingratio edge. IEEE Transactions on Multimedia 2007;9(6):1202–14.

[17] Xiao M, Han CZ, Zhang L. Moving shadow detection and removal for trafficsequences. International Journal of Automation and Computing 2007;4(1):38–46.

[18] Sanin A, Sanderson C, Lovell B. Improved shadow removal for robust persontracking in surveillance scenarios. In: International conference on patternrecognition; 2010. p. 141–4.

[19] Bullkich E, Ilan I, Moshe Y, Hel-Or Y, Hel-Or H. Moving shadow detection bynonlinear tone-mapping. In: Proceedings of 19th international conference onsystems, signals and image processing (IWSSIP 2012), Vienna; April 2012.

[20] Meher SK, Murty MN. Efficient method of moving shadow detection andvehicle classification. International Journal of Electronics and Communications(AEU) 2013;67(8):665–670.

[21] Qin R, Liao S, Lei Z, Li S. Moving cast shadow removal based on localdescriptors. In: International conference on pattern recognition; 2010.p. 1377–80.

[22] Huerta I, Holte M, Moeslund T, Gonzalez J. Detection and removal of chromaticmoving shadows in surveillance scenarios. In: IEEE international conferenceon computer vision; 2009. p. 1499–506.

[23] Lin CT, Yang CT, Shou YW, Shen TK. An efficient and robust moving shadowremoval algorithm and its applications in ITS. EURASIP Journal on Advances inSignal Processing 2010;4(2):1–20.

[24] Hamad AM, Tsumura N. Background updating and shadow detection based onspatial, color, and texture information of detected objects. Optical Review2012;19(3):182–97.

[25] Boroujeni HS, Charkari NM. Robust moving shadow detection with hierarch-ical mixture of MLP experts. Signal, Image and Video Processing 2012.

[26] McFeely R, Glavin M, Jones E. Shadow identification for digital imagery usingcolour and texture cues. IET Image Processing 2012;6(2):148–59.

[27] Dai JY, Qi M, Yu XX, Kong J. Integrated moving cast shadows detection methodfor surveillance videos. Optical Engineering 2012;51(11):117005.

[28] Stauffer C, Grimson WEL. Adaptive background mixture models for real-timetracking. CVPR99; June 1999. http://dx.doi.org/10.1109/CVPR.1999.784637.

[29] Jacques JCS, Jung CR, Musse SR. Background subtraction and shadow detectionin grayscale video sequences. In: SIBGRAPI; 2005.

[30] Smith AR. Color gamut transform pairs. In: Proceedings of SIGRAPH, 1978.ACM, vol. 3. p. 12–9.

[31] Herodotou N, Plataniotis KN, Venetsanopoulos AN. A color segmentationscheme for object-based video coding. In: Proceedings of the IEEE symposiumon advances in digital filtering and signal processing; 1998. p. 25–9.

[32] Gevers T, Smeulders AWM. Color-based object recognition. Pattern Recogni-tion 1999;32:453–64.

[33] Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapidscene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence1998;20(11):1254–9.

[34] Greenspan H, Belongie S, Goodman R, Perona P, Rakshit S, Anderson CH.Overcomplete steerable pyramid filters and rotation invariance. In: Proceed-ings of the IEEE computer vision and pattern recognition; June 1994. p. 222–8.

[35] Engel S, Zhang X, Wandell B. Colour tuning in human visual cortex measuredwith functional magnetic resonance imaging. Nature 1997;388(July (6637)):68–71.

[36] Heikkila M, Pietikainen M. A texture-based method for modeling the back-ground and detecting moving objects. IEEE Transactions on Pattern Analysisand Machine Intelligence 2006;28(4):657–62.

[37] Prati A, Mikic I, Trivedi M, Cucchiara R. Detecting moving shadows: algorithmsand evaluation. IEEE Transactions on Pattern Analysis and Machine Intelli-gence 2003;25(7):918–23.

http://refhub.elsevier.com/S0030-3992(13)00215-6/sbref1















































dx.doi.org/10.1109/CVPR.1999.784637















Documents

Robust and accurate moving shadow detection based on multiple features fusion