8
Multitarget Tracking with a Corner-based Particle Filter Alessio Dore Andrea Beoldo Carlo S. Regazzoni Department of Biophysical and Electronic Engineering, University of Genoa Via Opera Pia 11 A, Genoa, Italy {dore, beoldo, carlo}@dibe.unige.it Abstract This paper presents a multitarget tracking algorithm based on a particle filter framework that exploits a sparse distributed shape model to handle partial occlusions. The state vector is composed by a set of points of interest (i.e. corners) and it enables to jointly describe position and shape of the target. An efficient importance sampling strat- egy is developed to limit the number of used particles and it is based on multiple Kanade-Lucas-Tomasi (KLT) fea- ture trackers used to estimate local motion. The importance sampling strategy adaptively handles KLT failures and par- tial occlusions. Particles weights are computed exploiting a shape matching technique combined with object local ap- pearance encoded in color histograms of patches centered on the points of interest constituting the state. The proposed approach does not require background subtraction tech- niques and overcomes several common difficulties in the tracking domain as partial occlusions, object deformations, scale changes, abrupt motion and non-static background. Extensive experimental results are provided on challenging sequences to demonstrate the robustness of the algorithm. 1. Introduction Tracking is one of the fundamental processing step in several applications aiming at understanding what is hap- pening in a monitored scene, as for instance videosurveil- lance, activity analysis, human-computer interaction, and so far, which . A quite complete overview of object track- ing can be found in [23]. Many approaches can be found in the literature that can be categorized into three main classes: feature-based tracking (e.g. [6] [12]), contour-based track- ing (e.g. [25] [11]) and region-based tracking (e.g. [8], [15]). These tracking methods have been demonstrated to be suitable to solve some specific problems in certain sce- narios, but a general solution is still a research target far to be reached. As a matter of fact, in general feature-based approaches performances are related to the discriminative capabilities of the chosen features and to the robustness of the feature extraction methods. On the other hand region- based tracking usually loses shape information and, there- fore, it cannot provide important cues to higher level anal- ysis modules. Contour-based methods are often time con- suming above all when they handle non-rigid objects limit- ing their usage in real-time multi-target applications. More- over all of these approaches suffer of performances decrease in presence of occlusions. Therefore, the partial or total lack of reliable observations, possibly substituted by dis- tracting ones, needs to be handled with specific solutions taking into account available information and eventually ex- isting knowledge on target dynamic either defined a-priori or inferred by the tracker. In this work a feature-based tracking method is proposed where a Kanade-Lucas-Tomasi (KLT) feature tracking [19] is used in a particle filter framework to predict local object motion. Shape and position of the object are described us- ing a set of points of interest that constitutes the state vector of a Particle Filter. The state-space model is exploited to provide a continuous and consistent tracking of object sub- parts related to points of interest regardless KLT features tracking lost or inaccuracy. The transition model combines motion information provided by KLT tracker together with second-order autoregressive model and re-association tech- niques properly used when feature tracker fails because of distractors or occlusions. Observation model combines a shape matching technique with local color information as- sociated to each point of interest related to the state. In case of occlusions the particle weights are computed taking into account only the visible part. The proposed approach aims at tackling several major tracking problems. As a matter of fact the part-based model provides a natural solution to partial occlusion allowing to rely only on the visible area of the object. Moreover with this representation, knowledge on object shape modifica- tion can be inferred to be exploited to improve scale adapta- tion and to provide relevant information for activity analysis 1

Multitarget tracking with a corner-based particle filter

Embed Size (px)

Citation preview

Multitarget Tracking with a Corner-based Particle Filter

Alessio Dore Andrea BeoldoCarlo S. Regazzoni

Department of Biophysical and Electronic Engineering, University of GenoaVia Opera Pia 11 A, Genoa, Italy

{dore, beoldo, carlo}@dibe.unige.it

Abstract

This paper presents a multitarget tracking algorithmbased on a particle filter framework that exploits a sparsedistributed shape model to handle partial occlusions. Thestate vector is composed by a set of points of interest (i.e.corners) and it enables to jointly describe position andshape of the target. An efficient importance sampling strat-egy is developed to limit the number of used particles andit is based on multiple Kanade-Lucas-Tomasi (KLT) fea-ture trackers used to estimate local motion. The importancesampling strategy adaptively handles KLT failures and par-tial occlusions. Particles weights are computed exploitinga shape matching technique combined with object local ap-pearance encoded in color histograms of patches centeredon the points of interest constituting the state. The proposedapproach does not require background subtraction tech-niques and overcomes several common difficulties in thetracking domain as partial occlusions, object deformations,scale changes, abrupt motion and non-static background.Extensive experimental results are provided on challengingsequences to demonstrate the robustness of the algorithm.

1. Introduction

Tracking is one of the fundamental processing step inseveral applications aiming at understanding what is hap-pening in a monitored scene, as for instance videosurveil-lance, activity analysis, human-computer interaction, andso far, which . A quite complete overview of object track-ing can be found in [23]. Many approaches can be found inthe literature that can be categorized into three main classes:feature-based tracking (e.g. [6] [12]), contour-based track-ing (e.g. [25] [11]) and region-based tracking (e.g. [8],[15]). These tracking methods have been demonstrated tobe suitable to solve some specific problems in certain sce-narios, but a general solution is still a research target far tobe reached. As a matter of fact, in general feature-based

approaches performances are related to the discriminativecapabilities of the chosen features and to the robustness ofthe feature extraction methods. On the other hand region-based tracking usually loses shape information and, there-fore, it cannot provide important cues to higher level anal-ysis modules. Contour-based methods are often time con-suming above all when they handle non-rigid objects limit-ing their usage in real-time multi-target applications. More-over all of these approaches suffer of performances decreasein presence of occlusions. Therefore, the partial or totallack of reliable observations, possibly substituted by dis-tracting ones, needs to be handled with specific solutionstaking into account available information and eventually ex-isting knowledge on target dynamic either defined a-priorior inferred by the tracker.

In this work a feature-based tracking method is proposedwhere a Kanade-Lucas-Tomasi (KLT) feature tracking [19]is used in a particle filter framework to predict local objectmotion. Shape and position of the object are described us-ing a set of points of interest that constitutes the state vectorof a Particle Filter. The state-space model is exploited toprovide a continuous and consistent tracking of object sub-parts related to points of interest regardless KLT featurestracking lost or inaccuracy. The transition model combinesmotion information provided by KLT tracker together withsecond-order autoregressive model and re-association tech-niques properly used when feature tracker fails because ofdistractors or occlusions. Observation model combines ashape matching technique with local color information as-sociated to each point of interest related to the state. In caseof occlusions the particle weights are computed taking intoaccount only the visible part.

The proposed approach aims at tackling several majortracking problems. As a matter of fact the part-based modelprovides a natural solution to partial occlusion allowing torely only on the visible area of the object. Moreover withthis representation, knowledge on object shape modifica-tion can be inferred to be exploited to improve scale adapta-tion and to provide relevant information for activity analysis

1

tasks. Also the use of feature matching techniques as KLTtracker makes possible the usage of the tracker in presenceof time varying background as in video sequences capturedby moving cameras.

The remainder of the paper is organized as follow. Sec-tion 2 describes works dealing with multicue and part-basedtracking algorithms that are present in the state of the art.In Section 3 an overview of the particle filter algorithm ispresented. In Section 4 the proposed particle filter trackingmethod based on a sparse shape model is described. Ob-tained results on public and challenging data set are shownin Section 5. Finally in Section 6 conclusions are drawn andpossible improvements are commented.

2. Related WorksA relevant line of research has been focused on feature-

based tracking approaches using part-based model to de-scribe the object shape. These approaches have been suc-cessfully exploited in several scenarios because of their suit-ability to handle partial occlusions, pose changes and forthe possibility to derive interesting information on shapedynamic. In [4] an algorithm is proposed where templateobject is represented using multiple patches. A voting ap-proach is used to estimate position and scale and the inte-gral histogram data structure [14] is used to compute mul-tiple patches histograms. This method however is not ableto provide a shape description since patches are arbitraryand not related to a model identifying object parts. In [20]points of interest are tracked using a Multiple HypothesisTracking - Interacting Multiple Models (MHT-IMM) algo-rithm and tracking correctness is confirmed by verifyingtheir position against object edge map. The work presentedin [13] describes a non-rigid object tracking algorithm thatmakes use of a point distribution model to represent the ob-ject. This model is learned on several instances of the objectand then a point tracker based on maximally-weighted pathcover search on a directed graph [18] is used to find thenew position of the object and update the model. The initialmodel construction and the correct tracking of the points ofinterest are fundamental elements of the above mentionedmethod.

In [6] and [7] the authors propose two approaches to im-prove feature point tracking exploiting the motion estimateof other features. The paper presented in [5] proposes theuse of partial linear Gaussian models in Particle Filter es-timation and the authors demonstrate the effectiveness ofsuch models in point tracking. In [9] a particle filter basedtracker has been proposed where shape is described by a setof points of interest and local motion is predicted using aMean-Shift tracker. The global model of the object is thencompared to observation in the update phase using a shapematching approach.

Several works can be found in the recent literature

specifically designed to handle lack of observation infor-mation caused by partial or total occlusions. For example in[24] a tracker is proposed that cope with occlusion by con-structing online a shape (contour) prior model that encodesthe object motion. When occlusions occur the shape prior isexploited to recover the occluded part. The paper presentedin [10] handles the parts of the object where the motion can-not be recovered by a collaborative approach that takes intoconsideration the motion estimation of other objects sub-parts. Qu et al. in [16] propose a Bayesian formulation formultiple interactive trackers that are able to efficiently copewith mutual occlusions and data association problems by a“magnetic-inertia potential” state transition model. In [22]long-term occlusions between targets moving in groups aresolved assuming that individual object moves according tothe other targets motions.

3. Particle FilterRecursive Bayesian state estimation seeks, step by step,

the estimate of the state vector xk ∈ Rnx , where nx is thevector dimension and k ∈ N is the time index, based onall the available observations z1:k = {z1, . . . ,zk} up totime k. To do that, the posterior probability density func-tion (pdf) p(xk|z1:k) is computed recursively using a twosteps method based on a prediction-correction strategy. TheParticle Filter [17] approximates Bayesian filtering by rep-resenting the posterior as a finite set of weighted samplesχk = {xm

k , wmk }

Nsm=1. The principle of the importance

sampling (IS) consists in drawing a set of Ns candidatesamples (i.e. particles) {xm

k }Nsm=1 from a so called proposal

distribution (or importance distribution) q(x0:k−1|z1:k) in-stead of directly from the posterior p(x0:k−1|z1:k) that isusually impossible to be modeled. The difference betweenthe proposal and the posterior is handled by a correction(weighting or updating) procedure. Under certain hypoth-esis of independence, the importance distribution can bewritten as q(xk|xk−1, zk) and, consequently, the weightcan be computed sequentially as:

wmk ∝ wm

k−1

p(zk|xmk ) p(xm

k |xmk−1)

q(xmk |xm

k−1, zk)(1)

The Sequential Importance Sampling (SIS) algorithm con-sists in drawing Ns samples x(m)

k from the importancedistribution q(xk|xk−1, zk) (prediction procedure) and tocompute the correspondent weights w(m)

k using (1). Afterthe weight normalization, the posterior can be approximatedas

p(xk|zk) ≈Ns∑

m=1

wmk δ(xk − xm

k ) (2)

where δ(·) is the Kronecker delta function. However, thisalgorithm has been demonstrated to lead to a degeneracy

phenomenon due to the continuous increase of the weightvariance. To overcome this issue and to have a more ac-curate approximation of the posterior, so called resamplingprocedures are used to redistribute particles so that the oneswith low weights are eliminated and the ones with highweight are multiplied. From the approximated posteriorprobability the MMSE (Minimum Mean Square Error) andMAP (Maximum A Posteriori) estimate can be easily ob-tained by computing respectively the mean of the weightedparticles and taken the sample with the highest weight. Theefficiency of the particle filter is dependent on the appropri-ate definition of the importance distribution q(xk|xk−1, zk)that must represent well the posterior. As a matter of fact,in the case that the distribution q(xk|xk−1, zk) is not suffi-ciently representative of the posterior, an high number ofparticle is necessary to sample in a denser way the statespace and compensate the propagation inaccuracy. Like-wise, if the state space is of large dimension, the same con-sideration hold implying the increase of the particle num-ber and consequently of the computational load. In visualtracking the definition of a reliable importance distributionis a hard task due to the high variety of motions and defor-mations that can present the targets in a monitored scene.For example the non rigidity of the object and the conse-quent local motion of its subparts, the (de)zooming effectproduced by the relative motion of the object with respectto the camera and appearance changes due to rotation aretypical phenomena difficult to be represented.

4. Occlusion Handling Part-based Tracking us-ing Particle Filter

4.1. Adaptive Transition Model

In the presented framework, particles are drawn froma proposal distribution that explicitly depends on currentobservations zk. This method is opposed to what oc-curs in common Sequential Importance Resampling (SIR)schemes where samples are drawn from the prior probabil-ity p(xk|xk−1). Following this approach though the pre-diction computational complexity increases, the accuracy ofthe prediction procedure is improved, making possible theusage of a lower number of particles to describe the state.The state of the target at time instant k is described by a vec-tor xk = [(x(1), y(1)), . . . , (x(nx/2), y(nx/2))]Tk , constitutedby the image plane position xk(i) = (x(i), y(i)) of nx/2corners. This representation jointly describes the position(e.g. centroid of the corners) and the shape (e.g. displace-ment of the corner with respect to the centroid) of the ob-ject to be tracked. The object bounding box can be definedas a rectangle surrounding the corners. Moreover a localcolor histogram Hk(i) = pu(xk(i))u=1,...,M is constructedat each frame in a region centered on the corner location

(x(i), y(i)), where:

pu(y) = Ch

nh∑i=1

k

(∥∥∥∥y − xi

h

∥∥∥∥2)δ[b(xi)− u] (3)

where k(·) is a convex and monotonic decreasing kernelprofile with bandwidth h that underestimates pixels fartherfrom the center y; nh is the number of pixels of the patchsurrounding the corner, the function b : R → {1, . . . ,M}associates pixel location to correspondent bin index u; Ch isa normalizing factor not dependent on y that is the value tobe estimated andM is the number of bins. The observationszk = [zk(j),KLT k(j), Hk(j)]j=1,...,nz

are constituted bya set of nz corners zk(j), by their motionKLT k(j) with re-spect to the previous frame both computed in the image (orin surrounding area of the target) using a Kanade-Lucas-Tomasi (KLT) feature detector and tracker ([19] [21]) andby the local color appearance of corners described by thehistogramHk(j). As a mater of fact, this algorithm assumesthat images taken at near time instants are usually similar toeach other and this property is exploited in the interrelatedprocess of feature detection and tracking that aims at findingassociations between patches of image information in twoconsecutive frames. Therefore by using KLT algorithm bothpoints of interest and their motion with respect to the previ-ous instant are available. Therefore in this work we exploitthis property by associating each corner of the state xk(i)

to a KLT feature. The importance density function takesinto account this information to predict the future positionof the corner and pilot the sampling procedure on an areawith high probability of finding the new state. Then the KLTtracker of the corner zk−1(p), indicated as KLTi(zk−1(p)),associated to the i-th state subspace xk−1(i) provides a mo-tion vectorKLT k(i):

KLT k(i) → KLTi(zk−1(p))|xk−1(i)↔zk−1(p) (4)

that is used to compute the local motion of each object sub-part.

However the KLT feature tracking can fail due to ob-ject deformation, illumination changes, pose changes, par-tial occlusions and so far which do not allow to associate apatch in the previous image to a correspondent one in thenext frame. In this case the motion information to be pro-vided to the importance density function cannot be derivedby the KLT tracker. Although in some cases the above men-tioned distracting phenomenon can last for a limited num-ber of frames and the object part correspondent to the lostfeature can have a motion not too different with respect tothe previous instant. In this situation it has been decidedto apply a second order autoregressive model using the lastmotion vector computed to maintain the local dynamic ofthat part. Therefore it is checked when two tracked targetsare close one to another and in particular if one or more sub-spaces of the state (corners of the state) xk−1(i) lies in the

other objects bounding boxes. When this situation occursthe previous sub-part dynamic is maintained fixed throughthe second-order autoregressive model.

Moreover, for those sub-spaces which are not affected byocclusions, at each step a feature in a close position (withrespect to the estimated object center) and with similar mo-tion is searched within an area proportional to the transi-tion model standard deviation σ, in order to re-associate thesub-space to another feature. This process is repeated forTra frames to re-cover from momentary distracting prob-lems. When the re-association procedure does not succeedfor more than Tra instants, the transition model for that sub-space is a Gaussian centered on a KLT feature zk(s) that iswithin the object bounding box. Therefore when the occlu-sion is concluded this process tries to recover the associa-tion with a tracked KLT feature in order to estimate the localmotion in a reliable way. Different strategies can be fol-lowed to choose the substituting feature, for example it canbe selected the one that is farther from all the other cornersof the state since it can be useful to well represent the ob-ject shape. Therefore, the resulting mixed transition modelis in the form of p(xk|xk−1, zk), and predicted particles aredrawn from the following probability:

p(xk|xk−1, zk) =

N (xk−1(i) +KLT i,σ

2k(i))

N (xk−1(i) + vk−1(i) ∗ T,σ2k(i))

N (zk(s),σ2k(i))

(5)where vk−1(i) ∗ T is the last computed displacement of

the tracked feature associated to the corner of the state be-fore tracking is lost at frame klost

i . Resuming, the transitionmodel p(xk|xk−1, zk) is adaptively selected as follows:

1. N (xk−1(i) + KLT i,σ2k(i)) is used ∀i : xk−1(i) ↔

zk−1(p), that is, for each sub-space of the state xk−1(i)

associated to a tracked feature zk−1(p) (symbol↔ in-dicates the association between a corner of the statexk−1(i) and the tracked feature zk−1(p)).

2. N (xk−1(i) + vk−1(i) ∗ T,σ2k(i)) is used ∀i :

(xk−1(i) = zk−1(p) ∧ k − klosti < Tra) ∨ (xk−1(i) ∈

O), that is, when the sub-space of the state xk−1(i)

cannot be associated to a tracked feature or when itlies within an occlusion area. The symbol = is usedwhen tracking of the associated feature is lost. Thevariance σ2

k(i) of the importance densityN (xk−1(i) +vk−1(i) ∗T,σ2

k(i)) related to those corners of the statewhose associated feature is not tracked for less thanTra is enlarged of a factor 2 in order to take into ac-count local deformations.

3. N (zk(s),σ2k(i)) is used ∀i : (xk−1(i) = zk−1(p)∧k−

klosti > Tra) ∧ (xk−1(i) /∈ O), i.e. the sub-space of

the state is re-associated to a feature zk(s) that presentssimilar motion and it is sufficiently close to xk−1(i) tokeep the shape information. The re-association proce-dure is not performed for those corners states that arein an occlusion area to avoid the re-association with afeature of the other object

4.2. Multiple Cue Observation Model

According to (1), the weight is directly proportional tothe likelihood p(zk|xk). In this work the likelihood takeinto account both shape and color cues to evaluate the con-fidence of each predicted particle. The shape componentof the likelihood pshape(zk|xk) is computed matching theposition/shape information of the state vector xk with cor-ner observations zk(j)|j=1,...,nz

. A function s(xmk(i)) can be

defined to determine if a corner of the model is close to anobserved corner zk(j), that is:

s(xmk(i)) =

M∑j=1

exp(−(dm

k(i,j)

)2)

∀i : xmk(i) /∈ O

(6)

where M is the total number of extracted corners and(dm

k(i,j)

)2

=∥∥∥zk(j) − xm

k(i)

∥∥∥2

is the Euclidean distancebetween the i-th predicted corner of the m-th particlexm

k(i) = (xi, yi) and the j-th extracted corner. The matchingis not performed for those corners of the state xk(i) that arewithin an occlusion area. This allows to take into accountonly the reliable observations for weight computation.

Then, in order to provide a one-by-one association be-tween predicted and observed corners, s(xm

k(i)) is comparedto a unitary distribution, i.e.:

V mk =

nx/2∑i=1

(s(xm

k(i))− 1)2

∀i : xmk(i) /∈ O

(7)

The shape likelihood probability between the detected cor-ners and the m-th particle, representing a possible positionand configuration of the object, is:

pshape(zk|xmk ) ∝ exp(−V m

k ) (8)

Local color information of patches surrounding the cor-ners of the state is used to compute the color compo-nent of the likelihood pcolor(zk|xk). For each particle mat the current frame it is computed the color histogramsHm

k(i) = pu(xmk(i))u=1,...,M (see Eq. 3) of the patches

of linear dimension hi (i.e. radius) centered on the cor-ner of the state xm

k(i)|i=1...nx/2. Then each of these pre-dicted color models is compared using the Bhattacharyyadistance with the reference models given by the histograms

H∗k−1(i) = qu(x∗k−1(i))u=1,...,M computed at the previousframe with respect to the estimated state x∗k−1. The colorlikelihood probability is therefore:

pcolor(zk|xmk ) ∝

∑nx/2i=1 ρ(Hm

k(i), H∗k−1(i))

nx/2∀i : xm

k(i) /∈ O(9)

where ρ(Hmk(i), H

∗k−1(i)) =

∑Mu=1

√pu(xm

k(i))qu(x∗k−1(i))is the Bhattacharyya coefficient computed for the patchesrelated to each i-th state sub-space. As already mentionedfor the shape measurement model also in this case the ob-servations in the occlusion areas O are not considered toprevent wrong weight evaluation.

The multiple cue likelihood can be then computed as aweighted sum of the shape and color likelihood:

p(zk|xmk ) ∝ αpshape(zk|xm

k ) + βpcolor(zk|xmk ) (10)

where the weights 0 ≤ α ≤ 1 and 0 ≤ β ≤ 1 such thatα + β = 1 allows to balance the relevance of color andshape information.

The prior probability p(xk|xk−1) is a uniform win-dowed probability since the predicted position is computedaccording to the KLT motion vector associated to eachsub-space of the state xk(i). The dimension of the win-dow is the maximum displacement KLT k(w) such thatKLT k(w) > KLT k(i) ∀i 6= w. Finally the weight ofthe m-th particle is updated dividing by q(xm

k |xmk−1, zk)

in order to assign higher weight to predicted particles farfrom the mean value of the Gaussian pdf p(xk|xk−1, zk)(see (5)). This factor tends to boost the weight of theless probable particles that can be useful to describe theposterior.

4.3. Initialization and Resampling

To initialize the tracker, a prior probability where the firstparticle should be drawn from, is to be defined. It is cho-sen a Gaussian centered on a set of corners x0(i)|i=1,...,nx/2

that are distributed in order to well represent the shape ofthe object. These corners are selected by dividing the initialbounding box into nx/2 zones so that each of them con-tains an equal number of corners. Then, one of the cornersin each of these zones is randomly chosen. In this workthe first bounding box is manually selected. However, itis worth noticing that the use of KLT features allows thealgorithm to be automatically initialized using a clusteringapproach with the aim of grouping features with similar mo-tion and in a similar position.

After initialization at each step the particle filter algo-rithm is applied and a resampling step is performed every

time the estimated covariance is above a predefined thresh-old in order to discard the particles that do not well representthe object model.

5. ResultsIn this section experimental results are presented on se-

quences taken from the public data sets of PETS06 [3],AVSS i LIDS [1], ETISEO [2], and a moving camera se-quence. These videos are chosen with the aim of showingproblems of interest in the feature-based tracking domain.As a matter of fact, in AVSS i LIDS , ETISEO and MovingCamera the target is a rigid object (vehicle) and rotations,zooming and dezooming (due to the relative motion of thetarget with respect to the camera) are present. Video se-quences of PETS06 presents partial occlusions of a humantarget with other targets with similar appearance.

Performances of the proposed method are comparedagainst two other trackers: a feature-based algorithm calledMAPT ([9]) and the Mean Shift ([8]). The algorithms areinitialized using the ground truth of the target and no changedetection technique is used during tracking. The ParticleFilter state is 16-dimensional (i.e. 8 corners), the color his-tograms are computed in the RGB space with 8×8×8 binsand the kernel is the Epanechnikov one. The proposed ap-proach and MAPT use 160 particles to represent the poste-rior density function.

Table 1. Comparison of position error (in px on a 352x288 image)and average computational complexity (C++ non-optimized code;Pentium Dual Core Ghz 2.1, 3 GB RAM).

Sequence Proposed MAPT Mean ShiftPETS06 (209 frames) 5.51 7.6 9.6AVSS07 (200 frames) 2.7 3.7 3.5ETISEO (116 frames) 4.7 10.7 7.7

Mov. Cam. (430 frames) 7.2 14.1 8.5Computational complexity 7.2 f/s 12.1 f/s 22.2 f/s

Table 1 shows the superior results of the proposedmethod with respect to the other considered approaches.These results evaluate the performances of the methodwhile tracking a single target in each scene. It can be no-ticed that the computational complexity of the proposedmethod is the highest between the three algorithms. It mustbe pointed out that the 50% of the processing time is due tothe KLT feature detection and tracking. However, if multi-ple targets are present this problem is less affective becausecorners are already available for all the targets. Thereforethe complexity of the proposed method increases less thanlinearly with the number of targets.

The proposed algorithm is also very accurate in handlingscale deformations and rotation as it is demonstrated by re-sults in Figure 1. It can be noticed that the presented ap-proach is capable to manage the significant scale reductionand the 90 degrees rotation situations that make MAPT and

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 1. Results of the three considered algorithms for iLIDSsequence. (a)(b)(c) proposed method; (d)(e)(f) MAPT; (g)(h)(i)Mean Shift

Figure 2. Medium distance of the corners of the state with respectto the center for iLIDS sequence. This graph indicates the capa-bility of coping with target scale reductions

Mean Shift produce a imprecise results. It is worth of notethat this sequence is very difficult for corner-based track-ing since it is affected by camera motion that renders theKLT tracking unstable. In Figure 2 the temporal evolutionof the medium distance of the corners of the state xk(i) withrespect to the center of the bounding box, given by the cen-troid of the corners, is plotted. It can be noticed that it iseasy to detect the scale reduction in the first part of the videosequence while the vehicle is moving farther with respect tothe camera. From frame 169 the horizontal motion is ev-ident from the fact that the distance remains constant over

time.In the sequence from the ETISEO data set and shown in

Figure 3 the vehicle is moving towards the camera and itslightly turns. Therefore it is shown the capability of theproposed approach in handling scale also in the case of sig-nificant target enlargement. Figure 4 shows the medium dis-tance of the corner of the state with respect to the center ofthe bounding box. Also in this case it is appropriately rep-resented the target enlargement.

(a) (b)

(c) (d)

Figure 3. Results of the proposed algorithm in ETISEO sequence.(a) frame 5; (b) frame 55; (c) frame 80; (d) frame 100

Figure 4. Medium distance of the corners of the state with respectto the center for ETISEO sequence. This graph indicates the capa-bility of coping with target scale enlargement

The effectiveness of the transition model to handle par-tial occlusions is shown in Figure 5. In particular in Fig-ure 5(a) and 5(b) the particles drawn from the importancedensity function of Equation 5 are shown. The graphic in

Figure 5(c) shows which of the three importance densityfunctions is used frame by frame. When a state corner isassociated to a tracked feature (blue line) the first pdf ofEquation 5 is employed. Instead when the feature is nottracked the second order autoregressive model can be used(purple line) or re-associated to a new feature (yellow line)when tracking is lost for many frames. It can be seen thatwhen occlusions occur some of the features are not trackedand therefore prediction process is based on the other twomodels.

(a) (b)

(c)

Figure 5. Transition model behavior during partial occlusions. (a)particles distributions in frame 32; (b) particles distributions inframe 67; (c) graphic comparing the number of states corners as-sociated to a track feature, predicted with autoregressive second-order model and re-associated to a new feature

In Figure 6 multiple trackers are initialized with theground truth. It can be noticed that the tracker is able totrack the vehicles handling camera shakes, partial occlu-sions and object shrinks and enlargements. As already men-tioned the computational complexity of the proposed ap-proach it is not consistently affected by the increase of thenumber of targets. As a matter of fact whereas in the singletarget case (see Table 1) the tracker works at 7.2 fps in thiscase it needs 5 fps.

6. ConclusionsIn this work a particle filter tracking approach based on a

sparse points shape model is presented. A new mixed transi-tion model is proposed to track sub-parts of the objects andto handle partial occlusions exploiting points features ex-tracted and tracked with the Kanade-Lucas-Tomasi (KLT)

(a) (b)

(c) (d)

(e) (f)

Figure 6. Results of the proposed algorithm in iLIDS sequence.(a) frame 5; (b) frame 15; (c) frame 40; (d) frame 65; (e) frame105; (f) frame 175

algorithm. The observation model is jointly characterizedby shape and color cues and avoids to consider misleadingobservations due to targets superimpositions. Results onreal world sequences of challenging data sets demonstratethe robustness of the method against several typical track-ing problems (rotations, scale changes, partial occlusions,etc.).

Acknowledgments

The work was partially supported by the followingproject: VIsion-based Computer Aided Safe Transportationfunded by the Italian Ministry of University and Research(FIRB Vicast Project)

References[1] AVSS iLIDS Dataset. Available at: http:

//www.elec.qmul.ac.uk/staffinfo/andrea/avss2007_ss_challenge.html.

[2] ETISEO Dataset. Available at: http://www-sop.inria.fr/orion/ETISEO/.

[3] PETS 2006 Dataset. Available at: http://www.pets2006.net/.

[4] A. Adam, E. Rivlin, and I. Shimshoni. Robust fragments-based tracking using the integral histogram. In Proc. of IEEEConference on Computer Vision and Pattern Recognition,CVPR’06, volume 1, pages 798–805, 2006.

[5] E. Arnaud and E. Memin. Partial linear gaussian mod-els for tracking in image sequences using sequential montecarlo methods. International Journal of Computer Vision,74(1):75–102, 2007.

[6] S. Birchfield and S. Pundlik. Joint tracking of features andedges. In Proc. of IEEE Conference on Computer Vision andPattern Recognition, CVPR’08, 2008.

[7] A. Buchanan and A. Fitzgibbon. Combining local and globalmotion models for feature point tracking. In Proc. of IEEEConference on Computer Vision and Pattern Recognition,CVPR’07, 2007.

[8] D. Comaniciu, V. Ramesh, and P. Meer. Kernel based ob-ject tracking. IEEE Transactions on Pattern Analysis andMachine Intelligence, 25(5):564–577, 2003.

[9] A. Dore, A. Beoldo, and C. S. Regazzoni. Multiple cueadaptive tracking of deformable objects with particle filter.In Proc. of International Conference on Image Processing,ICIP 2008, San Diego, CA, USA, October 2008.

[10] Z. Fan, M. Yang, and Y. Wu. Multiple collaborative kerneltracking. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 29(7):1268–1273, 2007.

[11] M. Isard and A. Blake. Condensation - conditional den-sity propagation for visual tracking. International Journalof Computer Vision, 29:5–28, 1998.

[12] N. K. Kanhere, S. J. Pundlik, and S. T. Birchfield. Vehiclesegmentation and tracking from a low-angle off-axis cam-era. In Proc. of IEEE Conf. on Computer Vision and PatternRecognition, CVPR’05, volume 2, pages 1152–1157, 2005.

[13] T. Mathes and J. Piater. Robust non-rigid object trackingusing point distribution models. In Proc. of British MachineVision Conference, BMVC’05, pages 849–858, 2005.

[14] F. Porikli. Integral histogram: A fast way to extract his-tograms in cartesian spaces. In Proc. of IEEE Conference onComputer Vision and Pattern Recognition, CVPR’05, pages829–836, 2005.

[15] F. Porikli, O. Tuzel, and P. Meer. Covariance tracking usingmodel update based on lie algebra. In Proc. IEEE Conf. onComputer Vision and Pattern Recognition, CVPR’05, 2005.

[16] W. Qu, D. Schonfeld, and M. Mohamed. Real-timedistributed multi-object tracking using multiple interactivetrackers and a magnetic-inertia potential model. IEEE Trans-actions on Multimedia, 9(3):511–519, 2007.

[17] B. Ristic, S. Arulapalam, and N. Gordon. Beyond theKalman Filter. Artech House Publishers, 2004.

[18] K. Shafique and M. Shah. A noniterative greedy algorithmfor multiframe point correspondence. IEEE Transactionson Pattern Analysis and Machine Intelligence, 27(1):51–65,2005.

[19] J. Shi and C. Tomasi. Good features to track. In Proc. ofIEEE Conference on Computer Vision and Pattern Recogni-tion, CVPR’94, pages 593 – 600, 1994.

[20] P. Tissainayagam and D. Suter. Object tracking in im-age sequences using point features. Pattern Recognition,38(1):105–113, January 2005.

[21] C. Tomasi and T. Kanade. Detection and tracking of pointfeatures. Technical Report CMU-CS-91-132, Carnegie Mel-lon University, 1991.

[22] T. Yang, S. Z. Li, Q. Pan, and J. Li. Real-time multiple ob-jects tracking with occlusion handling in dynamic scenes.In IEEE International Conference on Computer Vision andPattern Recognition, CVPR 2005, volume 1, pages 970–975,2005.

[23] A. Yilmaz, O. Javed, and M. Shah. Object tracking: A sur-vey. ACM Computing Surveys, 38(4), 2006.

[24] A. Yilmaz, X. Li, and M. Shah. Contour based object track-ing with occlusion handling in video acquired using mobilecameras. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 26:1531–1536, 2004.

[25] M. Yokoyama and T. Poggio. A contour-based moving ob-ject detection and tracking. In Proc. of IEEE InternationalWorkshop on Visual Surveillance and Performance Evalua-tion of Tracking and Surveillance, PETS’05, pages 271–276,2005.