40
1. INTRODUCTION 1.1Introduction In recent years, the researches about intelligent video surveillances have increased noticeably. Foreground object detection could be considered as one of the fundamental and critical techniques in this field. In the conventional methods, background subtraction and temporal difference have been widely used for foreground extraction in case of using stationary cameras. The continuing improvements in background modeling techniques have led to many new and fascinating applications like event detection, object behavior analysis, suspicious object detection, traffic monitoring, and so forth. However, some factors like dynamic backgrounds and moving shadows might affect the results of foreground detection and make the problems more complicated. Dynamic backgrounds, one of the factors, might detect and treat the escalators and swaying trees as the foreground regions. Another factor, moving shadows, occurred when light was blocked by moving objects, usually misclassified the foreground regions. This project would focus on the studies of moving shadows and aim at developing an efficient and robust algorithm of moving shadow removal as well as the related applications. Real-time extraction of moving objects from video sequences is an important topic for various applications of computer vision. Applications include counting the number of 1

shadow remove from video

Embed Size (px)

Citation preview

Page 1: shadow remove from video

1. INTRODUCTION

1.1 Introduction

In recent years, the researches about intelligent video surveillances have increased

noticeably. Foreground object detection could be considered as one of the fundamental

and critical techniques in this field. In the conventional methods, background subtraction

and temporal difference have been widely used for foreground extraction in case of using

stationary cameras. The continuing improvements in background modeling techniques

have led to many new and fascinating applications like event detection, object behavior

analysis, suspicious object detection, traffic monitoring, and so forth. However, some

factors like dynamic backgrounds and moving shadows might affect the results of

foreground detection and make the problems more complicated. Dynamic backgrounds,

one of the factors, might detect and treat the escalators and swaying trees as the

foreground regions. Another factor, moving shadows, occurred when light was blocked

by moving objects, usually misclassified the foreground regions. This project would

focus on the studies of moving shadows and aim at developing an efficient and robust

algorithm of moving shadow removal as well as the related applications.

Real-time extraction of moving objects from video sequences is an important topic

for various applications of computer vision. Applications include counting the number of

cars in traffic, observing traffic patterns, automatic detection of trespassers, video data

compression, and analysis of non-rigid motion. To extract moving objects, there is a

problem that shadows produced by objects are regarded as moving objects. This is one of

the big problems to be solved. The methods for shadow detection or object extraction by

using chromatic information have been proposed. When the shadow detection process is

added to the method for extracting moving objects, it becomes difficult to handle the

whole process in real time. While, object extraction methods by using chromatic

information, to which RGB color space is converted easily, can handle the whole process

in real time. However, a simple background subtraction using chromatic information has

the limitation to address the intensity changes.

1

Page 2: shadow remove from video

Objectives

The project will be having two objectives. The first one stands at eliminating

shadows from frames of videos, which will assist the machine vision to count or detect

the objects separately from shadows. This will be done so as the shadows are not

accounted as moving objects.

The Second objective stands at removing shadows from live videos, so that the

shadows are completely invisible to the user’s observation, and the details are not hid in

the background. This is an important aspect where user has to monitor the videos on

surveillance system for some abnormal movement or trespassing.

1.2 Environment to be used

For this project we will be using Mathworks MATLAB environment. It is one of

the most suitable software for carrying out Signal processing operations, specially on

Images, Audio, Video and morphing purposes. MATLAB® is a high-performance

language for technical computing. It integrates computation, visualization, and

programming in an easy-to-use environment where problems and solutions are expressed

in familiar mathematical notation.

The name MATLAB stands for matrix laboratory. MATLAB was originally written

to provide easy access to matrix software developed by the LINPACK and EISPACK

projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries,

embedding the state of the art in software for matrix computation. MATLAB has evolved

over a period of years with input from many users. In university environments, it is the

standard instructional tool for introductory and advanced courses in mathematics,

engineering, and science. In industry, MATLAB is the tool of choice for high-

productivity research, development, and analysis.

The platform for the project is of the following specifications

INTEL Celeron Laptop

Frequency – 1.5 GHz

760 MB RAM

Window XP Professional

MATLAB Version 7.0

2

Page 3: shadow remove from video

2. LITERATURE SURVEY

2.1 Literature Survey

About the studies on influences of moving shadows, Zhang et al.[1] classified

these techniques into four categories, including color model, statistical model, textural

model, and geometric model. Color model used the difference of colors between the

shaded and nonshaded pixels.

Cucchiara et al.[2] removed moving shadows by using the concept on HSV color

space that the hue component of shaded pixels would vary in a smaller range and the

saturation component would decrease more obviously. He used this to detect moving

objects, shadows and ghosts in video sequences. The detection of shadows was based on

the observation that shadows changed significantly the lightness of an area without

greatly modifying the colour information.

Some researchers proposed shadow detection methods based on RGB color space

and normalized-RGB color space. Yang et al. [3] described the ratio of intensities

between a shaded pixel and its neighboring shaded pixel in the current image, and this

intensity ratio was found to be close to that in the background image. They also made use

of the slight change of intensities on the normalized R and G channels between the

current and background image.

Cavallaro et al.[4] found that the color components would not change their orders

and the photometric invariant features would have a small variation while shadows

occurred. Beside of color model, statistical model used the probabilistic functions to

determine whether or not a pixel belonged to the shadows.

Zhang et al.[1] introduced an illumination invariance feature and then analyzed

and modeled shadows as a Chi-square distribution. They classified each moving pixel

into the shadow or foreground object by performing a significance test.

Song and Tai [5] applied Gaussian model to representing the constant RGB-color

ratios, and determined whether a moving pixel belonged to the shadow or foreground

object by setting ±1.5 standard deviation as a threshold.

Martel-Brisson and Zaccarin [6] proposed GMSM (Gaussian Mixture Shadow

Model) for shadow detection which was integrated into a background detection algorithm

based on GMM. They tested if the mean of a distribution could describe a shaded region,

3

Page 4: shadow remove from video

and if so, they would select this distribution to update the corresponding Gaussian

mixture shadow model.

The texture model assumed that the texture of the foreground object would be

totally different from that of the background, and that the textures would be distributed

uniformly inside the shaded region. Joshi and Papanikolopoulos [7,8] proposed an

algorithm that could learn and detect the shadows by using support vector machine

(SVM). Support vector machines (SVMs) are a set of related supervised

learning methods that analyze data and recognize patterns, used

for classification and regression analysis. The original SVM algorithm was invented

by Vladimir Vapnik and the current standard incarnation (soft margin) was proposed

by Corinna Cortes and Vladimir Vapnik. The standard SVM takes a set of input data,

and predicts, for each given input, which of two possible classes the input is a member

of, which makes the SVM a non-probabilistic binary linear classifier. Since an SVM is a

classifier, then given a set of training examples, each marked as belonging to one of two

categories, an SVM training algorithm builds a model that predicts whether a new

example falls into one category or the other. Intuitively, an SVM model is a

representation of the examples as points in space, mapped so that the examples of the

separate categories are divided by a clear gap that is as wide as possible. New examples

are then mapped into that same space and predicted to belong to a category based on

which side of the gap they fall on. More formally, a support vector machine constructs

a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be

used for classification, regression or other tasks. Intuitively, a good separation is

achieved by the hyperplane that has the largest distance to the nearest training data

points of any class (so-called functional margin), since in general the larger the margin

the lower the generalization error of the classifier whereas the original problem may be

stated in a finite dimensional space, it often happens that in that space the sets to be

discriminated are not linearly separable. For this reason it was proposed that the original

finite dimensional space be mapped into a much higher dimensional space presumably

making the separation easier in that space. SVM schemes use a mapping into a larger

space so that cross products may be computed easily in terms of the variables in the

original space making the computational load reasonable. The cross products in the

t

4

Page 5: shadow remove from video

larger space are defined in terms of a kernel function K(x,y) which can be selected to suit

the problem. The hyperplanes in the large space are defined as the set of points whose

cross product with a vector in that space is constant. The vectors defining the

hyperplanes can be chosen to be linear combinations with parameters αi of images of

feature vectors which occur in the data base.

In this way the sum of kernels above can be used to measure the relative nearness

of each test point to the data points originating in one or the other of the sets to be

discriminated. Note the fact that the set of points x mapped into any hyperplane can be

quite convoluted as a result allowing much more complex discrimination between sets

which are far from convex in the original space. Joshi and Papanikolopoulos defined four

features of images, including intensity ratio, color distortion, edge magnitude distortion

and edge gradient distortion. They introduced a cotraining architecture which could make

two SVM classifiers help each other in the training process, and they should need a small

set of labeled samples on shadows before training SVM classifiers for different video

sequences. Leone et al. presented a shadow detection method by using Gabor features.

Mohammed Ibrahim and Anupama [9] proposed their method by using division image

analysis and projection histogram analysis. Image division operation was performed on

the current and reference frames to highlight the homogeneous property of shadows.

They afterwards eliminated the left pixels on the boundaries of shadows by using both

column-and row-projection histogram analyses. Geometric model attempted to remove

shadowed regions or the shadowing effect by observing the geometric information of

objects.

Hsieh et al.[10] used the histograms of vehicles and the calculated center of lane

to detect the lane markings, and also developed a horizontal and vertical line-based

method to remove shadows by characteristics of those lane markings. This method might

become ineffective in case of no lane markings.

2.2 Related Work

Several techniques for background subtraction and shadow detection have been

proposed in the past years. Background detection techniques may use grayscale or color

images, while most shadow detection methods make use of chromaticity information.

5

Page 6: shadow remove from video

Next, some of these techniques are described. The car tracking system of Kolleretal, used

an adaptive background model based on monochromatic images filtered with Gaussian

and Gaussian derivative (vertical and horizontal) kernels. McKenna et al proposed a

background model that combines pixel RGB and chromaticity values with local image

gradients. IntheirW4system, Haritaoglu and collaborators used gray scale images to build

a background model, representing each pixel by three values its minimum intensity value,

its maximum intensity value and the maximum intensity difference between consecutive

frames observed during the training period.

Haritaoglu and collaborators used grayscale images to build a background model,

representing each pixel by three values; its minimum intensity value, its maximum

intensity value and the maximum intensity difference between consecutive frames

observed during the training period. Elgammal et al. used a nonparametric background

model based on kernel based estimators, that can be applied to both color or grayscale

images. KaewTrakulPong and Bowden used color images for background representation.

In their method, each pixel in the scene is modelled by a mixture of Gaussian

distributions (and different Gaussians are assumed to represent different colors).

Cucchiara’s group used a temporal median filtering in the RGB color space to produce a

background model. Shadow detection algorithms have also been widely explored by

several authors, mostly based on invariant color features, that are not significatively

affected by illumination conditions. McKenna et al used pixel and edge information of

each channel of the normalized RGB color space (or rgb) to detect shadowed pixels.

Elgammal et al. also used the normalized rgb color space, but included a lightness

measure to detect cast shadows. Prati’s and Cucchiara’s groups used the HSV color space,

classifying as shadows those pixels having the approximately the same hue and saturation

values compared to the background, but lower luminosity. KaewTrakulPong and Bowden

used a chromatic distortion measure and a brightness threshold in the RGB space to

determine foreground pixels affected by shadows. Salvador et al. adopted the c1c2c3

photometric invariant color model, and explored geometric features of shadows. A few

authors have studied shadow detection in monochromatic video sequences, having

in mind applications such as indoor video surveillance and conferencing. Basically, they

detect the penumbra of the shadow, assuming that edge intensity within the penumbra is

6

Page 7: shadow remove from video

much smaller that edge intensity of actual moving objects. Clearly, such hypothesis does

not hold for video sequences containing low-contrast foreground objects (specially in

outdoors applications). A revision of the literature indicates that several background

models are available, applicable for color and/or grayscale video sequences. Also, there

are several shadow detection algorithms to remove undesired segmentation of cast

shadows in video sequences. However, in accordance with other authors, we chose to use

a background model based on median filtering, because it is effective and requires less

computational cost than the Gaussian or other complex statistics.

7

Page 8: shadow remove from video

3. PERFORMANCE ANALYSIS

3.1 Block Diagram of the process

For the purpose of the shadow removal from objects in the video, we will follow

more or less the following block diagram The process shown here is mainly for a fixed

reference background image. This is the most basic algorithm that relies on extracting the

foreground by calculating the difference in the gray level values of the foreground and

the background.

Fig. 3.1 Block Diagram of the shadow removal process

As stated above, the project shall be undertaken in various stages. Firstly,

experiments will be carried out for foreground extraction for the correct detection of

moving objects in a video. Then the algorithm will be applied for complete removal of

shadows in a video for clarity in the view. But foremost, it is important to know about the

basic steps involved for the whole process. The above block diagram has been explained

in brief below.

Reading of frame from a video sequence

The mmreader function constructs a multimedia reader object that can read video

data from Multimedia file. This function is very useful as it can read the video file with

different formats which are not been read by simple

The object created by mmreader is used for reading the video file so that frame can

be extracted from it and the processing can be done on each of the frame. The function

aviread is used to extract a frame from the video, where videoFile is the path of the

movie file and Frameno is the number of frame to be extracted. The function will extract

the entire frame and will store it into the Movie.

8

FRAMEACQUISITION

BACKGROUND IMAGE

NORMALISATION

REMOVAL OF SHADOW

MORPHOLOGICAL OPERATION

RECONSTRUCT-ION

Page 9: shadow remove from video

Background Reference Image

A background reference image is required for the process of shadow removal. It

will be used to as reference to every new frame which will result in the outline of objects

and their shadows. The first frame of every video can be used as the background

reference image. The Background reference need not be necessary that it will be first

frame of the video sequence. It may be one of the other frames also. The background

reference image is stored in a variable matrix which will be used in further processing of

the image.

Normalizing the RGB factor in each frame

First extract the background reference image which is generally the first frame of

the sequence.

For Normalization we will do the following things

N=R+G+B

N= Summation of Each color in a frame

R= The Red pixel value

G= The Green pixel value

B= The Blue pixel value

R=R/N, G=G/N, B=B/N;

In this step we will do preprocessing for the background reference image and also

for the subsequent frame of the given video. This is done to remove the change of

intensity in any frame. Converting an RGB image into normalized RGB removes the

effect of any intensity variations.

The light reflected from an object depends not only on object colours but also on

lighting geometry and illuminant colour. As a consequence the raw colour recorded by a

camera is not a reliable cue for object based tasks such as recognition and tracking. One

solution to this problem is to find functions of image colours that cancel out dependencies

due to illumination. While many invariant functions cancel out either dependency due to

geometry or illuminant colour, only the comprehensive normalisation has been shown

(theoretically and experimentally) to cancel both.

9

Page 10: shadow remove from video

Removal of Shadow

We will perform for each of the frame the normalization for RGB color and after

that will do the further processing of the image. I will subtract each frame of the video

from the background frame. Performed this for the entire frame in the video

sequence .After that we applied the threshold on each frame so that we can get a shadow

free image in distorted frame where we will just show the moving object.

Morphological operation

Morphology is a broad set of image processing operations that process images

based on shapes. Morphological operations apply a structuring element to an input image,

creating an output image of the same size. In a morphological operation, the value of

each pixel in the output image is based on a comparison of the corresponding pixel in the

input image with its neighbors. By choosing the size and shape of the neighborhood, you

can construct a morphological operation that is sensitive to specific shapes in the input

image.

We will apply the dilation to get the accurate boundary of the moving object as

well as shadow .The dilation will add pixel to the boundary of an image to get an exact

boundary of image and in later process we will apply this dilated image to the normal

image to get exact boundary of the moving object.

The process of dilation is as described:   The dilation operator takes two pieces of

data as inputs. The first is the image which is to be dilated. The second is a set of

coordinate points known as a Structuring Element. It is this structuring element that

determines the precise effect of the dilation on the input image. The mathematical

definition of grayscale dilation is identical except for the way in which the set of

coordinates associated with the input image is derived. In addition, these coordinates are

3-D rather than 2-D. The brief procedure for implementing the dilation in the binary

image is explained.

   The dilation is applied on the binary image in a single pass. During the pass, if the

pixel in hand is equal to binary 1, then apply the structuring element on the image by

starting from that particular pixel as the origin. The sample project supposes that, in the

10

Page 11: shadow remove from video

thresholded image, the background is black and the foreground is white. If the sample

image is not so, then you have to adjust accordingly.

  The dilation can also be applied on the graylevel images also in a single pass.

During the passing through the image, the structuring element is applied on each pixel of

the image, such that the origin of the structuring element is applied on that particular

pixel. In this case the corresponding pixel of the output image contains the maximum of

the pixels surrounding it. In this case, only those pixels are compared with each other,

where the structuring element contains. Let us take the example of the following

image:                                

Fig. 3.2 A binary image Fig. 3.3 A 3X3 structuring element

 The following image shows the effect of the dilation on the above binary image by

the above structuring element:                                      

Fig. 3.4 Effect of dilation

 In the above images, the boxes shows the pixels, and the white colored boxes

shows that the binary pixel contains 1, while the black box shows that the corresponding

pixel contains 0. It can be seen that the hole is removed from the image after applying the

dilation. Dilations can be made directional by using less symmetrical structuring

elements. e.g. a structuring element that is 10 pixels wide and 1 pixel high will dilate in a

horizontal direction only. Similarly, a 3*3 square structuring element with the origin in

the middle of the top row rather than the center, will dilate the bottom of a region more

strongly than the top.

Reconstruction of original image and its boundary

11

1 1 1

1 1 1

1 1 1

Page 12: shadow remove from video

After applying all the above 5 process, the reconstruction of the image needs to be

done so that we can get the original video with only the detection of moving object and

the shadows are not detected. For getting reconstructed boundary of the object without

shadow region, point wise multiplication of objects and its shadow is multiplied with the

dilated object region. After getting the boundary of the moving object using the point

wise multiplication we will able to get the boundary size of moving object and then we

can apply this boundary to the original frames of video for showing the boundary of

moving object only.

3.2 Flow chart for foreground extraction based on Gaussian Mixture Model

In this section, we would introduce our entire architecture for algorithms of moving

shadow removal which consisted of five blocks, including foreground object extraction,

foreground-pixel extraction by edge-based shadow removal, foreground-pixel extraction

by gray level-based shadow removal, feature combination, and the practical applications.

Fig. 3.5 Flow Chart representing the process of Shadow removal

3.2.1 Foreground object extraction

12

Page 13: shadow remove from video

The sequence of images in gray-level should be taken as the input of the foreground

object extraction processes, and the moving object with its minimum bounding rectangle

as the output of the algorithm. The Gaussian Mixture Model (GMM) can be incorporated

here, functioning as the development of background image which is a representative

approach to background subtraction. It would be more appropriate to choose the way by

background subtraction for extracting foreground objects rather than that by temporal

difference under the consideration of all the pros and cons of these two typical

approaches. Furthermore, the latter could do a better job in extracting all relevant pixels,

and this paper aimed at tackling the problems generated from the traffic monitoring

systems where cameras would be usually set fixedly. Some previous studies have

proposed a standard process of background construction, hence we would put a higher

premium on the following two parts, inclusive of foreground-pixel extraction by edge-

based and gray-level based shadow removal.

Gaussian Mixture Model (GMM): In statistics, a mixture model is a probabilistic

model for density estimation using a mixture distribution. That is, the observations in a

mixture model are assumed to be distributed according to a mixture density. A mixture

model can be regarded as a type of unsupervised learning or clustering. Mixture

models should not be confused with model for compositional data, i.e., data whose

components are constrained to sum to a constant value. Typical finite-dimensional

mixture model is a hierarchical model consisting of the following components:

N random variables corresponding to observations, each assumed to be

distributed according to a mixture of K components, with each component belonging

to the same parametric family of distributions but with different parameters

N corresponding random latent variables specifying the identity of the

mixture component of each observation, each distributed according to a K-

dimensional categorical distribution

A set of K mixture weights, each of which is a probability (a real number

between 0 and 1), all of which sum to 1

A set of K parameters, each specifying the parameter of the corresponding

mixture component. In many cases, each "parameter" is actually a set of parameters.

For example, observations distributed according to a mixture of one-

13

Page 14: shadow remove from video

dimensional Gaussian distributions will have a mean and variance for each

component. Observations distributed according to a mixture of V-dimensional

categorical distributions (e.g. when each observation is a word from a vocabulary of

size V) will have a vector of V probabilities, collectively summing to 1. Typically a

Gaussian Model will have the following parameters.

K Number of mixture components

N Number of observations

ϴi = 1….KParameter of distribution of observation associated with

component i

ᶲi = 1….K Mixture weight, i.e prior probability of a particular component i

ᶲK-Dimensional vector composed of all individual ᶲi = 1….K must

sum to 1

µi= 1…N Mean of component i

X2i= 1…N Variance of component i

F(x|ϴ) Probability distribution of an observation, parameterized on ϴ

Table 1 Gaussian Mixture Model Parameters

3.2.2 Foreground pixel extraction by edge-based shadow removal

The main ideas for extracting foreground-pixels by the information of edges in the

detected object were inspired by that the edges of object in interest would be identified

and removed much easier if the homogeneity in the shadow region could be within a

small range of variance. Relatively, we could also obtain the features of edges for non -

shadow regions. The flowchart of foreground-pixel extraction by edge-based shadow

removal is as shown in Figure 3.6. More clearly Sobel operations can be used to extract

the edges for both GMM-based background images and foreground objects.

The main ideas for extracting foreground-pixels by the information of edges in the

detected object were inspired by that the edges of object in interest would be identified

and removed much easier if the homogeneity in the shadow region could be within a

small range of variance. Relatively, we could also obtain the features of edges for non -

shadow regions. The flowchart of foreground-pixel extraction by edge-based shadow

removal was shown in Figure 3.6. More clearly Sobel operations can be used to extract

the edges for both GMM-based background images and foreground objects.

14

Page 15: shadow remove from video

Fig. 3.6 Flow chart showing the process of foreground pixel extraction

Sobel Operation

The Sobel operator performs a 2-D spatial gradient measurement on an image and

so emphasizes regions of high spatial frequency that correspond to edges. Typically it is

used to find the approximate absolute gradient magnitude at each point in an input

grayscale image. The operator consists of a pair of 3×3 convolution kernels as shown in

Figure 3.7. One kernel is simply the other rotated by 90°.

Figure 3.7 Sobel convolution kernels

15

Page 16: shadow remove from video

Here Gx is the vertical Sobel kernel, Gy is the horizontal Sobel kernel

These kernels are designed to respond maximally to edges running vertically and

horizontally relative to the pixel grid, one kernel for each of the two perpendicular

orientations. The kernels can be applied separately to the input image, to produce

separate measurements of the gradient component in each orientation (call these Gx and

Gy). These can then be combined together to find the absolute magnitude of the gradient

at each point and the orientation of that gradient. The gradient magnitude is given by:

Typically, an approximate magnitude is computed using:

which is much faster to compute.

The angle of orientation of the edge (relative to the pixel grid) giving rise to the

spatial gradient is given by:

In this case, orientation 0 is taken to mean that the direction of maximum contrast

from black to white runs from left to right on the image, and other angles are measured

anti-clockwise from this.

Often, this absolute magnitude is the only output the user sees --- the two

components of the gradient are conveniently computed and added in a single pass over

the input image using the pseudo-convolution operator shown in Figure 3.8.

Figure 3.8 Pseudo-convolution kernels used to quickly compute approximate gradient magnitude

Using this kernel the approximate magnitude is given by:

16

Page 17: shadow remove from video

Image subtraction

One image can be subtracted from another one, pixel by pixel. Where the pixels

have the same value, the resulting pixel is 0. Where the two pixels are different, the

resulting pixel is the difference between them. Image subtraction is used in motion

detection, i.e. to detect object motion between two or more images. Typically, two

images with the same backgound scene are subtracted, yielding an image that only shows

the different (the object in motion).

Boundary Elimination

The outer borders have to be removed because of the following problems. The

shadow region and real foreground object with the same motion vectors would make

them always adjacent to each other. Also, the interior region of shadowed/foreground

objects should be homogeneous/nonhomogeneous (non texture or edgeless/dominant

edged), which implies that the edges from shadows would appear at the outer borders of

foreground objects. Considering these two properties, the objective of removing shadows

could be treated as eliminating the outer borders and preserving the remaining edges

which belongs to real foreground objects. Also, the latter property mentioned above

might not be always satisfied.

We will use a mask to achieve boundary elimination. The selection of the mask

completely depends upon the size of the border of the foreground in question. If the

region covered by the mask completely belongs to foreground objects, we reserve this

point; otherwise, we eliminate this point. After applying the outer boundary elimination,

we would obtain the features for non shadow pixels.

3.2.3 Gray level based shadow removal foreground pixel extraction

Figure 3.9 shows the flowchart of gray level-based shadow removal foreground

pixel extraction. Some pixels can be selected which belong to shadow-potential regions

from foreground objects. The darkening factors can be calculated, and then a Gaussian

model can be built for each gray level. Once the Gaussian model is trained, we could use

this model to determine if each of the pixels inside foreground objects belonged to the

shadowed region or not.

17

Page 18: shadow remove from video

Fig. 3.9 Gray level based shadow removal foreground pixel extraction

Gaussian Darkening Factor Model Updating

We would rather stimulate the darkening factor with respect to each gray level by

one Gaussian Model than that with respect to each pixel. In the beginning, we would

select the shadow-potential pixels as the updating data of Gaussian models by our three

predefined conditions introduced in the followings.

Pixels must belong to the foreground objects, for the shadowed pixels must be

part of the foregrounds ideally.

The intensity of a pixel in the current frame should be smaller than that in the

frame of backgrounds, for the shadowed pixels must be darker than background-

pixels.

The pixels obtained from the foreground-pixel extraction by edge-based shadow

removal should be excluded to reduce the number of pixels which might be classified

as nonshadowed pixels.

After the pixels for updating were selected, we would update the mean and

standard deviation of Gaussian model. Figure 3.9 displayed the flowchart of the

18

Page 19: shadow remove from video

updating process of Gaussian darkening factor model. After calculating the

darkening factor, we would update the Gaussian model.

Fig. 3.10 Gaussian darkening factor model updating procedure

A threshold has to be set as a minimum number of updating times, and the updating

times of each Gaussian model must exceed this threshold to ensure the stability of each

model. Besides, in order to reduce the computation loading of updating procedure, we

gave a limit that each Gaussian model could only be updated at most 200 times for one

frame.

Determination of non-shadowed pixels

Here, we introduce how to extract the nonshadowed pixels by using the trained

Gaussian darkening factor model. Figure 3.10 gives us the rules to calculate the

difference between the mean of Gaussian model and the darkening factor, and check if

the difference was smaller than 3 times of standard deviation. If yes, the pixel would be

classified as shadowed. Otherwise, it would be considered as the nonshadowed pixel and

could be reserved as a feature point. Figure 3.11 describes our tasks to determine the

nonshadowed pixels. If the Gaussian model was not trained, we would go checking if the

nearby Gaussian models were marked as trained or not. In our programs, we selected the

nearby 6 Gaussian models for checking, and we chose the nearest one if there existed any

trained Gaussian model.

19

Page 20: shadow remove from video

Fig. 3.11 Non shadowed pixel determination

3.2.4 Feature combination

The two kinds of features which have been introduced in the Section 3.2.1 and

3.2.2 in our algorithm should be combined to extract foreground objects in a more

accurate way. Figure 3.12 exhibits the flowchart of feature combination.

20

Page 21: shadow remove from video

Fig. 3.12 Flow chart for feature combination

Noise filtering and dilation

Opening and Closing: The opening filter performs an erosion then a dilation (see

above). In images containing bright objects on a dark background, the opening filter

smoothes object contours, breaks (opens) narrow connections, eliminates minor

protrusions and removes small dark spots. In images with dark objects on a bright

background, the opening filter fills narrow gaps between objects. The closing filter is a

morphological filter that performs a dilation followed by an erosion. In images containing

dark objects on a bright background, the opening filter smoothes object contours, breaks

narrow connections, eliminates minor protrusions and removes small bright spots. In

images with bright objects on a dark background, the closing filter fills narrow gaps

between objects.

3.3 Proposed algorithm

In this Section, we describe the background model W4 [11] given by I. Haritaoglu,

D. Harwood, and L. Davis: Realtime surveillance of people and their activities. The

proposed algorithm is a small improvement over the model. We also highlight a novel

21

Page 22: shadow remove from video

method for shadow segmentation of foreground pixels, based on normalized cross-

correlations and pixel ratios.

3.3.1 Background scene modelling

W4 uses a model of background variation that is a bimodal distribution constructed

from order statistics of background values during a training period, obtaining robust

background model even if there are moving foreground objects in the field of view, such

as walking people, moving cars, etc. It uses a two stage method based on excluding

moving pixels from background model computation. In the first stage, a pixel wise

median filter over time is applied to several seconds of video (typically 20-40 seconds) to

distinguish moving pixels from stationary pixels (however, our experiments showed that

100 frames ≈ 3.4 seconds are typically enough for the training period, if not too many

moving objects are present). In the second stage, only those stationary pixels are

processed to construct the initial background model. Let V be an array containing N

consecutive images, Vk(i, j) be the intensity of a pixel (i, j) in the k-th image of V, σ(i, j)

and λ(i, j) be the standard deviation and median value of intensities at pixel (i, j) in all

images in V, respectively. The initial background model for a pixel (i, j) is formed by a

three-dimensional vector: the minimum m(i, j) and maximum n(i, j) intensity values and

the maximum intensity difference d(i, j) between consecutive frames observed during this

training period. This condition guarantees that only stationary pixels are computed in the

background model, i.e., Vz(i, j) is classified as a stationary pixel. After the training

period, an initial background model B(i, j) is obtained. Then, each input image It (i, j) of

the video sequence is compared to B(i, j), and a pixel (i, j) is classified as a background

pixel if: It(i, j)−m(i, j) ≤ kµ or It (i, j)−n(i, j) ≤ kµ - (2)

where µ is the median of the largest inter frame absolute difference image d(i, j),

and k is a fixed parameter (the authors suggested the value k = 2). It can be noted that, if a

certain pixel (i, j) has an intensity m(i, j)<It(i, j)<n(i, j) at a certain frame t, it should be

classified as background (because it lies between the minimum and maximum values of

the background model). However, Equation (2) may wrongly classify such pixel as

foreground, depending on k, µ, m(i, j) and n(i, j). For example, if µ= 5, k = 2, m(i, j) = 40,

n(i, j) = 65 and It (i, j) = 52, Equation (2) would classify It(i, j) as foreground, even

22

Page 23: shadow remove from video

though it lies between m(i, j) and n(i, j). To solve this problem, we propose an alternative

test for foreground detection, and classify It (i, j) as a foreground pixel if:

It (i, j)>(m(i, j)−kµ) and It (i, j)<(n(i, j)+kµ) - (3)

3.3.2 Shadow identification

In shadowed regions, it is expected that a certain fraction α of incoming light is

blocked. Although there are several factors that may influence the intensity of a pixel in

shadow, we assume that the observed intensity of shadow pixels is directly proportional

to incident light; consequently, shadowed pixels are scaled versions (darker) of

corresponding pixels in the background model. As noticed by other authors, the

normalized crosscorrelation (NCC) can be useful to detect shadow pixel candidates, since

it can identify scaled versions of the same signal. In this work, we use the NCC as an

initial step for shadow detection, and refine the process using local statistics of pixel

ratios, as explained next.

Detection of shadow pixel candidates

Let B(i, j) be the background image formed by temporal median filtering, and I(i, j)

be an image of the video sequence. For each pixel (i, j) belonging to the foreground,

consider a (2N + 1) × (2N + 1) template Ti j such that Ti j(n,m) = I(i+n, j+m), for −N ≤ n

≤ N, −N ≤ m ≤ N (i.e. Ti j corresponds to a neighborhood of pixel (i, j)). Then, the NCC

between template Ti j and image B at pixel (i, j) is given by:

For a pixel (i, j) in a shadowed region, the NCC in a neighboring region Ti j should

be large (close to one), and the energy ETi j of this region should be lower than the

23

where

Page 24: shadow remove from video

energy EB(i, j) of the corresponding region in the background image. Thus, a pixel (i, j) is

pre-classified as shadow if:

NCC(i, j) ≥ Lncc and ETi j < EB(i, j) - (6)

where Lncc is a fixed threshold. If Lncc is low, several foreground pixels corresponding

to moving objects may be misclassified as shadows. On the other hand, selecting a larger

value for Lncc results in less false positives, but pixels related to actual shadows may not

be detected.

Shadow refinement

The NCC provides a good initial estimate about the location of shadowed pixels, by

detecting pixels for which the surrounding neighborhood is approximately scaled with

respect to the reference background. However, some background pixels related to valid

moving objects may be wrongly classified as shadow pixels. To remove such false

positives, a refinement stage is applied to all pixels that satisfy Equation (6). The

proposed refinement stage consists of verifying if the ratio I(i, j)/B(i, j) in a neighborhood

around each shadow pixel candidate is approximately constant, by computing the

standard deviation of I(i, j)/B(i, j) within this neighborhood. More specifically we

consider a region R with (2M+1) × (2M+1) pixels centered at each shadow pixel

candidate (i,j), and classify it as a shadow pixel if:

where stdR(I(i,j)/ B(i,j)) is the standard deviation of quantities I(i,j)/B(i,j) over the

region R,andLstd,Llow are thresholds. More precisely, Lstd controls the maximum

deviation within the neighbourhood being analyzed, and Llow prevents the

misclassification of dark objects with very low pixel intensities as shadowed pixels.

24

Page 25: shadow remove from video

4. CONCLUSION

4.1 Conclusion

In this report we have presented some of the techniques that can be used for the

shadow removal from videos or an image sequence. The actual project implementation

will be carried out in steps:

Testing of algorithm on still gray level images

Applying the algorithm on a gray level image sequence

Extracting the foreground from a fixed background reference image

Applying the algorithm to remove shadows in real time surveillance systems.

The algorithm given above in section 3.1 is based on RGB color model to detect the

shadow in the video sequence. When compared with the area in the background of a

video sequence, there is only decrease on Red ,Green, Blue value in the shadow while its

percentage of RGB keeps comparatively steady and constant The above proposed

algorithm removes the shadow simply and effectively

We have also presented a real-time and efficient moving shadow removal

algorithm based on versatile uses of GMM in section 3.2, including the background

removal and development of features by Gaussian models. This algorithm innovates to

use the homogeneous property inside the shadowed regions, and hierarchically detects the

foreground objects by extracting the edge-based, gray level-based features, and feature

combination. Our approach can be characterized by some original procedures such as

“pixel-by-pixel maximization”, subtraction of edges from background images in the

corresponding regions, adaptive binarization, boundary elimination, the automatic

selection mechanism for shadow-potential regions, and the Gaussian darkening factor

model for each gray level.

Among all these proposed procedures, “pixel-by-pixel maximization” and

subtraction of edges from background images in the corresponding regions deal with the

problems which result from the shadowed regions with edges. Adaptive binarization and

25

Page 26: shadow remove from video

boundary elimination are developed to extract the foreground-pixels of nonshadowed

regions. Most significantly, we propose the Gaussian darkening factor model for each

gray level to extract nonshadow pixels from foreground objects by using the information

of gray levels, and integrate all the useful features to locate the real objects without

shadows. Finally, in comparison with the previous approaches, the experimental results

show that our proposed algorithm can accurately detect and locate the foreground objects

in different scenes and various types of shadows. We will apply the presented algorithm

to vehicle counting to prove its capability and effectiveness. Our algorithm indeed

improves the results of vehicle counting and it is also verified to be efficient with the

prompt processing speed

4.2 Future Scope

The above algorithm works on videos with a normal background and stable videos

from fixed camera. This algorithm can be modified to work on the video having complex

background as well videos that not stable. The videos that are not stable can be made

stable using the video stabilizer and then we can work on it. The video stabilizer can be

implemented in future so that it can stable the video and then processing can be done.

Also by working on the real time shadow removal algorithm, we can completely remove

the shadows from videos so that it assists the human user monitoring on the surveillance

system to visualize the objects clearly.

26