16
Multimed Tools Appl DOI 10.1007/s11042-012-0994-3 Efficient tracking using a robust motion estimation technique Constantinos Lalos · Athanasios Voulodimos · Anastasios Doulamis · Theodora Varvarigou © Springer Science+Business Media, LLC 2012 Abstract Camera based supervision is a critical part of event detection and analysis applications. However, visual tracking still remains one of the biggest challenges in the area of computer vision, although it has been extensively discussed during in the previous years. In this paper we propose a robust tracking approach based on object flow, which is a motion model for estimating both the displacement and the direction of an object of interest. In addition, an observation model that utilizes a generative prior is adopted to tackle the pitfalls that derive from the appearance changes of the object under study. The efficiency of our technique is demonstrated using sequences captured in a complex industrial environment. The experimental results show that the proposed algorithm is sound, yielding improved performance in comparison with other tracking approaches. Keywords Object tracking · Temporal inference · Motion estimation 1 Introduction Humans mostly perceive the world by exploiting visual information to derive event outcomes and understand what is going on in the environment. For this reason, tools for automatic video analysis are considered nowadays as one of the most challenging research issues in the computer vision society. Current trends include intelligent camera networks for extracting and interpreting events from the visual data, which can (i) survey public areas by focusing only on salient actions (e.g. avoid accidents, set alarms when safety issues are not precisely retained, alert production ineffectiveness when products quality assembly is poor), and (ii) improve the efficiency of the industrial production lines by preventing erroneous construction work-flows. C. Lalos (B ) · A. Voulodimos · A. Doulamis · T. Varvarigou School of Electrical and Computer Engineering, National Technical University of Athens, 9, Iroon Polytechniou Street, 157 80, Athens, Greece e-mail: [email protected]

Efficient tracking using a robust motion estimation technique

Embed Size (px)

Citation preview

Page 1: Efficient tracking using a robust motion estimation technique

Multimed Tools ApplDOI 10.1007/s11042-012-0994-3

Efficient tracking using a robust motionestimation technique

Constantinos Lalos · Athanasios Voulodimos ·Anastasios Doulamis · Theodora Varvarigou

© Springer Science+Business Media, LLC 2012

Abstract Camera based supervision is a critical part of event detection and analysisapplications. However, visual tracking still remains one of the biggest challenges inthe area of computer vision, although it has been extensively discussed during in theprevious years. In this paper we propose a robust tracking approach based on objectflow, which is a motion model for estimating both the displacement and the directionof an object of interest. In addition, an observation model that utilizes a generativeprior is adopted to tackle the pitfalls that derive from the appearance changes of theobject under study. The efficiency of our technique is demonstrated using sequencescaptured in a complex industrial environment. The experimental results show thatthe proposed algorithm is sound, yielding improved performance in comparison withother tracking approaches.

Keywords Object tracking · Temporal inference · Motion estimation

1 Introduction

Humans mostly perceive the world by exploiting visual information to derive eventoutcomes and understand what is going on in the environment. For this reason, toolsfor automatic video analysis are considered nowadays as one of the most challengingresearch issues in the computer vision society. Current trends include intelligentcamera networks for extracting and interpreting events from the visual data, whichcan (i) survey public areas by focusing only on salient actions (e.g. avoid accidents, setalarms when safety issues are not precisely retained, alert production ineffectivenesswhen products quality assembly is poor), and (ii) improve the efficiency of theindustrial production lines by preventing erroneous construction work-flows.

C. Lalos (B) · A. Voulodimos · A. Doulamis · T. VarvarigouSchool of Electrical and Computer Engineering, National Technical University of Athens,9, Iroon Polytechniou Street, 157 80, Athens, Greecee-mail: [email protected]

Page 2: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

Intelligent vision systems are becoming more and more important in manufac-turing industries. In that field significant progress has been made with regard tothe interpretation of the information extracted by the moving objects inside anindustrial environment (e.g. human operators, machines, etc.) [34, 37]. Early workson industrial vision systems targeted on energy saving control strategies [29]. Overthe years a significant progress has been made with respect to the detection ofproduction deviations to avoid waste [12]. In addition the human labour can bedecreased, since human inspectors can be efficiently substituted by intelligent visionsystems. In general, these systems are being evolved with the objective to furtherensure the production scheduling [4, 10], the quality assurance of an automatedcontrol system [33] and the safety of human operators in human-robot collaborationproduction systems [30].

However, extracting specific information from video streams is generally a chal-lenging task, since generic mathematical models cannot easily interpret complexdata. For instance, tracking algorithms are often distracted by noisy visual informa-tion, thus automatic tracking of salient objects in a video scene is still a challengingproblem. In this paper we mainly consider an industrial environment, where both theappearance and motion of the background and targets significantly alter. Figure 1depicts some frames from a real auto-mobile industrial plant [36], where the visualcontent complexity is evident by just inspecting the frame sequence. The traditionalmotion analysis algorithms cannot be directly applied on this sequence since, (i) mov-ing persons have unpredictable motion and similar colour with the background (i.e.there is low discriminant information between the foreground and background pixelvalues), (ii) the background abruptly changes (e.g. sparks from welding machines,moving robots, etc.), (iii) discontinuities in object model due to the occurrence ofocclusion between foreground objects and outliers in the background. Therefore,various state-of-art techniques may fail in tracking the object of interest in sucha challenging environment. To this direction, new tracking algorithms should bedeveloped to address the aforementioned issues.

This paper introduces a robust object tracking approach to overcome thedifficulties in image sequences, captured from complex industrial environments likethe one illustrated in Fig. 1. In more detail, the technique of object flow [16] isembedded on a particle filter framework. Object flow is a classification based ap-proach for obtaining the displacement and the direction of an object directly, whereasother irrelevant movements inside the scene are ignored. Our tracking approach usesobject flow both as a motion as well as an observation model. However, to tacklepitfalls related to unreliable displacement detections due to the complexity of anindustrial environment (See Fig. 1), an additional observation model is introduced.

Fig. 1 Examples of images captured at an auto-mobile industrial plant [36]. Common trackingapproaches cannot be directly applied due to the challenging environment

Page 3: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

This observation model utilizes a generative prior (i.e. a learned model) basedon the Gaussian Process Latent Variable Model (GPLVM). In our approach theobject under study (i.e. moving humans) undergoes various appearance changes, thusGPLVM can provide an efficient description of its appearance state.

Tracking is by its nature a semi-automatic process, since a user is often requiredto define the initial region, which contains the object of interest. In our case, thetracker is interwoven with object localization algorithms that are used to compute theobject flows. This constitutes a major innovation of this paper, since the derived flowsare tailored to specific properties of an object. Incorporation of an automatic self-initialization procedure [7] can improve the automation of the algorithm by makingit robust to dynamic changes of the environment.

The remainder of the paper is organized as follows. Firstly, the related work ispresented in Section 2. Then, our tracking approach is described in detail in Section 3.Finally, the experimental results and the conclusions are elaborated in Sections 4and 5 respectively.

2 Previous work

Object tracking is considered as an active research topic in computer vision. Inthis section we present a detailed description of the related current state of the arttechniques. Many approaches in literature [8, 14, 21, 23] employ particle filters fortracking a moving object in a scene. This technique heavily depends on the selectionof the proposal distribution (i.e. motion model), because it guides the particles intoregions where the measurements will be taken [2]. An inappropriate selection of aproposal distribution may reduce sampling efficiency, since many particles can bewasted. A common assumption is to sample from the motion estimates that areprovided using a constant velocity model [5]. However, this approach may impactthe performance of a particle filter, especially when the behaviour of a trackedobject violates the temporal continuity, which is the case of abrupt or unpredictablemovements. This problem can be tackled by coupling an object detector with atracker [7, 19, 40]. A robust technique on temporal violations can be also achievedusing a proposal distribution based on a gradient-based multi-resolution estimationmethod [13]. Furthermore, a geometrically defined proposal distribution can beadopted as described in [15]. In our approach, the proposal distribution is based onthe object flow method. In comparison with the aforementioned approaches, objectflow has the ability to focus only on moving objects of interest in the scene, whereasother irrelevant movements are ignored.

As already mentioned, the selection of the motion model has a significant im-portance, but also an accurate observation model assist in the development of anefficient tracking system [1]. For instance, variations in the appearance can be quan-tified by an observation model that utilizes a classification approach. Classificationapproaches are mainly divided in to two categories: generative and discriminative.Discriminative algorithms [6] simplify the process of inference, by focusing onbuilding a model for predicting the state conditional directly (i.e. whether a trainingsample is the appropriate one or not). In particular, “tracking by detection” methodsfall into this category, since a discriminative classifier is trained to separate the objectfrom the background. Babenko et al. [3] uses a discriminative learning algorithm

Page 4: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

based on Multiple Instance Learning (MIL). However, discriminative methods cannot provide a detailed description of the appearance state.

On the other hand, in generative approaches [20] inference is handled in aconstructive manner. This is achieved by learning a model for the target appearance(i.e. prior) to evaluate a candidate sample. Inference is handled by determiningwhether the current target estimate is still valid given this model. More specifically,an appearance model can be defined using generative approaches, such as PrincipalComponents Analysis (PCA) [32]. Then, noisy features in the current estimate canbe detected by projecting their data onto eigenimages. However, much more efficientapproaches have been proposed in the literature. In particular, Leonardis et al. [20]introduces an alternative way for determining the coefficients of the eigenimagesand a subsampling method is proposed as a selection procedure. Woodley et al. [39]also uses a generative model computed using local non-negative matrix factorization(LNMF) to avoid regions that contain outliers. These problems can be also handledby a two stage sparse optimization [22], which jointly minimizes the target reconstruc-tion error. However, the case of dramatic appearance changes cannot be handled bylinear generative models. In this paper, we adopt a non-linear generative approach,GPLVM [17], which can efficiently discover a low dimensional manifold given onlya small number of training samples.

A vision system in an industrial environment that aims at the automatic recog-nition of visually observable procedures and work-flows. These systems often relyon tracking algorithms, which are used as an input for the work-flow recognitionmodules. However, this is a challenging problem due to the difficulty of trackingmoving targets for a long duration inside a complex scene. For this purpose, aframework for the classification of visual tasks in industrial environments is pro-posed [37] that bypasses object tracking by employing a different feature extractionmechanism. More specifically, this method automatically segments the input streamand classifies the resulting segments using prior knowledge and hidden Markovmodels. In addition, the task of real-time work-flow recognition can be achieved bya simple scene descriptor and an efficient time series analysis method [34]. Finally,there are approaches [25, 26] that focus only on the recognition of assembly parts inmanufacturing lines.

3 Tracking framework

In this section we describe a robust tracking approach that on one hand employsobject flow as a motion and observation model and on the other hand uses agenerative prior as an additional observation model.

3.1 Particle filter

Particle filters have been proven to be a good solution for tracking. From a Bayesianperspective, the tracking problem is defined as the recursive calculation of some de-gree of belief of the hidden state, related to a data sequence of noisy observations (i.e.posterior distribution). In the framework of particle filters, the posterior distributionp (X0:t|Y1:t) of an observation sequence Y1:t conditioned on a hidden state sequenceX0:t can be estimated using Monte Carlo simulations for the prediction step and

Page 5: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

an appropriate update step. Random samples are simulated by a proposal functionXi

t ∼ q ( . ) using the importance sampling technique. For each sample the weightupdate equation can be expressed as,

wit = wi

t−1p

(Yt|Xi

0:t, Y1:t−1)

p(Xi

t |Xi0:t−1, Y1:t−1

)

q(Xi

t |Xi0:t−1, Y1:t

) (1)

However, it is often assumed that the observations are independent and the statesequence follows a first order Markov chain model, i.e.,

p(Yt|Xi

0:t, Y1:t−1) = p

(Yt|Xi

t

)

p(Xi

t |Xi0:t−1, Y1:t−1

) = p(Xi

t |Xit−1

) (2)

One of the issues that arise with the implementation of the particle filter is theincrease of the weight variance. This can be solved by choosing the appropriateproposal function. More specifically, a common approach is to choose the importancefunction to be the prior, i.e. q

(Xi

t |Xi0:t−1, Y1:t

) = p(Xt|Xi

t−1

). In addition, a solution

that is often used is the introduction of an additional re-sampling with replacementstep [24].

3.2 Object flow

In this section, we present a classification-based method [16] for obtaining a motionrepresentation known as the object flow, which is useful for a variety of visualapplications. This approach can be also used for object tracking to estimate thedisplacement of a moving object of interest. In addition, it can offer improvedestimation performance, since it is not affected by other irrelevant movements insidethe scene. In this paper, object flow is integrated on a particle filter tracker and it canbe used both for sampling a new set of particles and for taking measurements (i.e.observation).

A margin-based binary classifier can be trained on object displacement using aset of the appropriate positive and negative samples. More specifically, the positivesamples X+ have information about the way object appearance transforms throughtime. They are created by concatenating two samples that contain the object ofinterest in different time instances. These samples can be provided either manuallyor by an on-line object verifier. On the other hand, the negative training samples aredivided into two subsets, X− = X−

back ∪ X−obj. The first subset X−

back contains imagepairs from the background. The second subset X−

obj contains samples with the objectin the current frame with a patch that contains a portion of it in a different frameand they assist the classifier to suppress local maxima around the real object region.Examples of positive and negative samples are depicted in Fig. 2.

Let C(pt, pt−1) be the classifier response for a pair of patches, where pt is a patchin the current image and pt−1 is a patch belonging to the neighbourhood region oflocal patches � in the previous image. We define the displacement �x and �y ofan object of interest on the x and y directions respectively, as the weighted sum ofdistances within the local region �, i.e.,

(�xobj(pt)

�yobj(pt)

)= 1

∑pt−1∈� C(pt, pt−1)

pt−1∈�

C(pt, pt−1)

(dxdy

)(3)

Page 6: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

Fig. 2 Illustrative example of the typical training samples for training a classifier on object flow

Therefore position estimates for a moving object at the current image can becalculated as:

(x, y)t = (�xobj(pt), �yobj(pt)

) + (x, y)t−1 (4)

In order to reduce outliers, local region displacements within the region � have toextend a significant positive classifier response i.e.,

¯Cobj(pt) = 1|�|

pt−1∈�

C(pt, pt−1)2, where C(pt, pt−1) = max(0, C(pt, pt−1)). (5)

3.3 Observation model based on a generative prior

We introduce an additional observation model, to measure the correlation betweenan input sample and a learned generative representation for the appearance model.This representation can be obtained using a Gaussian Process Latent Variable Model(GPLVM) [17] and it can handle dramatic appearance changes by discovering a lowdimensional manifold given only a small number of training samples.

3.3.1 Gaussian process latent variable model

In this section we briefly describe the basics of the GPLVM. GPLVM fits a high di-mensional data space on a lower dimensional latent space, using Gaussian processes.This technique is often used for dimensionality reduction [9], but it can also providea powerful generative probabilistic prior in various visual applications [1, 38]. Incontrast to PCA [31], GPLVM can also deal with cases where the assumption foridentically distributed data points is violated.1

For a non-linear GPLVM, an N × N radial basis function (RBF) kernel matrixwith entries kij = k

(xi, x j

)can be used to map the latent-space points X ∈ R

N×q todata-space points Y ∈ R

N×d (q << d). In detail, an entry kij ( . , . ) can be defined as,

kij(xi, x j

) = β1 exp(

−β2

2‖ xi − x j ‖2

)+ δxi,x j

β3+ β4 (6)

1This happens when the data space contains different distributions on each output dimension.

Page 7: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

where xi and x j are the ith and jth columns of the matrix XT and β1,...,4 are thekernel hyperparameters. GPLVM minimizes the following log-likelihood (objectivefunction) w.r.t. X ∈ R

N×q and β1,...,4,

L = −(

d2

ln |KY | + 12

tr(K−1

Y YYT) + 12

N∑

i=1

‖xi‖2 +4∑

i=1

ln βi

)

(7)

Given a new test point y′ ∈ R1×d, its low dimensional representation x′ ∈ R

1×q canbe estimated with an additional optimization step. However, this can be avoided byexploiting GPLVM with backconstraints [18]. This is achieved by minimizing (7) withrespect to the following equation,

xij = g j(yi; a

) =N∑

m=1

a jmkbc(yi − ym

)(8)

where xij is the jth element of xi and kbc ( . , . ) is an RBF kernel, which can bedefined as,

kbc(yi − ym

) = exp(−γ

2‖yi − ym‖2

)(9)

Thus the low dimensional representation of a test point y′ can be calculated byevaluating x′ = g j

(y′; a

).

3.3.2 Foreground likelihood modelling

As mentioned before, GPLVM can be exploited to develop a robust observationmodel. To train a simple GPLVM model (See (7)) in an off-line manner, one can usean image dataset Y ∈ R

N×d consisting of a number of representative samples withthe object under study at different poses. These representative samples can be alsoobtained using a key frame extraction technique [27].

The low dimensional representation x′ of a patch pt belonging to a particle Xit

(See Section 3.1) can be calculated by evaluating its inverse mapping (See (8)). Then,Mahalanobis distance is used to compare x′ with the vector μX , which contains theaverage on each column of the matrix X ∈ R

N×q. A similar approach has been alsoused in [28, 39]. This measure can be expressed as,

d(x′ , μX

) =√

(x′ − μX) S−1X (x′ − μX) (10)

However, since we are dealing with a probabilistic approach (i.e. particle filters),this distance measure is required to be expressed in terms of a probability distribu-tion. To this direction, it is assumed that the distance measure follows an exponentiallaw as in,

p(Ya

t |Xit

) = exp(−d2 (

x′ , μX))

(11)

3.4 Integrating object flow with motion particle filter

In this section we describe a framework for integrating a motion estimator on aparticle filter tracker. As mentioned before, it is often convenient to choose the

Page 8: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

importance function to be the state transition density (See (2)). However, when thetracked object makes an abrupt or an unpredictable movement, the state dynamicscan be inefficient and many particles can be wasted. Therefore, the choice of arobust importance function is critical for the performance of a particle filter. Inthis paper, we use the motion estimates provided by the proposed object flowmethod as a proposal distribution (See Section 3.2). More specifically, importancesampling is performed using the position estimates that are calculated using (4).Then, the new particles are propagated using the normal distribution N

(xt; xt, σ 2

x

)

and N(

yt ; yt, σ 2y

)with predefined variance σ 2

x and σ 2y and with mean equal to the

predicted object displacement on the xt and yt axis respectively.In our tracking algorithm, we define object state as Xn = {xn, yn}, which consists

of the 2D image position (xt, yt) at time instance t. For each particle, we cancalculate its weight using the classifier response defined in (5) and the observationlikelihood term defined in (11). Therefore, the final weight update equation can beextended as,

wit ∝ wi

t−1 p(

Yobjt |Xi

t

)p

(Ya

t |Xit

)(12)

where p(

Yobjt |Xi

t

)= ¯Cobj (pt). In addition, re-sampling is performed in each time

step. More specifically, particles with negligible weight are eliminated and replacedby those, whose weight significantly contributes on the posterior distribution. Theapproach of systematic re-sampling is adopted, since it is rather efficient and straight-forward to implement.

4 Experimental results

In this section we present qualitative and quantitative experimental results of ourtracking approach. In order to verify the robustness of the proposed method, weuse image sequences captured in a very challenging real auto-mobile industrialenvironment [36] (See Fig. 1) containing different conditions and scenarios. In case oftracker drifting, we may need to re-initialize the algorithm either by imposing expertusers to re-set the tracker or by incorporating automatic self-initialization strategieslike the one proposed in [7]. No on-line learning is performed during runtime,thus errors cannot accumulate over time leading to outstanding target drift (i.e. theappearance adapts to another moving object or the background). Furthermore, wecompare our results to other methods, such as a particle filter with a color model(i.e. condensation tracker), two different variations of the on-line AdaBoost (OAB)algorithm [11] and the MIL tracker [3]. All experiments are performed on an i7 2.67GHz workstation with 4 GB of memory.

Our tracking approach is also capable of dealing with multiple moving objects.Object flow can provide motion information for all the moving objects of interestinside the scene. This fact is also evident in Fig. 3 (See first column third row).However, problems may occur in very crowded conditions, especially when multipleobjects of interest are moving in the same direction.

The proposed method can be used either with a static or with a moving cameraconfiguration. The approach of boosting for feature selection [35] is used for learning

Page 9: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

imag

e se

q.op

tical

flo

w

Obj

ect F

low

Fig. 3 This figure summarizes the benefits of object flow technique. Whilst optical flow approaches(second row) may be disoriented in complex backgrounds, especially when other objects are movingin the same/different direction with the object of interest, object flow (third row) can simulate themotion field correctly by being able to focus only on the object under study

object flow and as features we use the classical Haar-like features. For all theexperiments, the object flow estimation is performed using a dense Grid of 81 × 81overlapping and equally sized cells and we use a region � that comprises of 5 × 5cells (see Section 3.2). For the particle filter we use 30 particles of fixed width andheight and variance equal with σ 2

x , σ 2y = 2.0. For the observation model based on

a generative prior, scaled conjugate gradient (SCG) algorithm is iterated for 400cycles and the parameter γ is set to 0.0001 (see (9)). Finally, matrix X is initialisedusing PCA.

We captured three different datasets from a panoramic camera located on in auto-mobile industry [36]. These datasets, obtained at a resolution of 680 × 480, containchallenging scenarios with various moving objects in the background. For all thesequences, object flow is trained in an off-line manner using a pool of |X+| ≈ 1,000,|X−| ≈ 4,000 negative object samples and numerous negative samples from thebackground. On the other hand, we model the appearance for the object under studyusing a dataset consisting of 400 samples digitized at 70 × 70.

Illustrative results are depicted in Fig. 4. As it can been seen, our trackingapproach (sixth row) has the ability to remain focused on the moving target, whereasthe stability of other approaches can be interrupted by the complex backgroundand from other moving objects. The color tracker (fifth row) drifts away from thetarget, since it is influenced by objects of similar color inside the scene. Similarly,the OAB1 [11] and MIL [3] on-line tracking approaches (second and fourth row)cannot adapt to significant appearance changes and are disorientated in complexbackgrounds. Finally, whilst OAB5 [11] on-line tracker (third row) has betterperformance from the other two on-line trackers, it does not remain completelyfocused on the target during the entire the image sequence.

In addition, we perform a quantitative comparison of our approach to the afore-mentioned tracking techniques. For all the sequences, ground truth is created by

Page 10: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

imag

e se

q.O

AB

1O

AB

5M

ILco

lor

trac

ker

our

appr

oach

Fig. 4 Illustrative results. OAB1 and MIL trackers (second and fourth row) drift away from thetarget, since they are affected by the complex industrial background. OAB5 tracker (third row) doesnot remain focused on the target for all the image sequence. Colour tracker (f ifth row) has unstableperformance, since it is influenced by other colours (e.g., sparks from a welding machines). Ourapproach (fourth line) remains focused on the moving target for all the sequence

manually labelling the bounding boxes for the object under study, and for all theframes in the sequence, we calculate the absolute displacement error (in pixels). Asit can be observed in Fig. 5, our approach has more stable performance from the

Page 11: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

Fig. 5 Quantitativeevaluation: our trackingapproach has stableperformance in all sequences,whereas other approaches driftaway from the target, sincethey are influenced from bythe complex industrialbackground

(a) image sequence 1

(b) image sequence 2

(c) image sequence 3

Page 12: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

Table 1 The average positionerror (in pixels) on 3 differentsequences present theperformance of our approachcompared to other trackingmethods

Average position error Sequence 1 Sequence 2 Sequence 3(in pixels)

OAB1 81.762 13.642 51.154OAB5 8.9843 117.27 25.758MIL 7.1575 16.509 64.905Color tracker 19.152 20.082 168.84Our approach 5.9877 11.233 13.862

other tracking methods. This can also be seen in Table 1, where the average positionerror for all the trackers involved is shown.

Computational cost The average runtime of the unoptimized C++ implementationis ∼ 10 fps. More specifically, the computational complexity of the object flowtracker relies on the number of particles used. However, we use a small numberof particles (i.e. 30 particles), thus the computational complexity of the object flowtracker remains very low, imposing no overhead against a real time (or almost real-time) implementation. On the other hand, some computational effort is requiredto calculate the low dimensional representation of a test point (See Section 3.3.1).However, this issue can be resolved with the appropriate code optimization.

5 Conclusions

In this paper we proposed a robust tracking approach suitable for complex industrialenvironments. We adopted object flow as a motion model, which is similar to opticalflow, but has the additional ability to ignore other irrelevant movements in the scene.Furthermore, we used a generative representation technique to deal with changes inthe appearance.

Experimental results demonstrate that the proposed algorithm in comparison withother tracking approaches achieves robust performance in challenging scenarios.More specifically, we conducted experiments on a complex real-world dataset cap-tured on an auto-mobile construction company [36], and compared the performanceof our approach to other methods. The experiments indicate that the proposedmethodology outperforms other approaches, whereas the presented approach isreliable and robust in terms of visual complexity and long-time processing.

Future work will focus on the incorporation of new learning strategies suchas active learning and transfer learning. The latter may significantly improve therobustness of the tracker, since knowledge for the environment can be exploitedduring the object monitoring process.

Acknowledgements This research was supported by the European Community SeventhFramework Programme under grant agreement no FP7-ICT-216465 SCOVIS.

References

1. Andriluka M, Roth S, Schiele B (2008) People-tracking-by-detection and people-detection-by-tracking. In: In CVPR

Page 13: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

2. Arulampalam MS, Maskell S, Gordon N (2002) A tutorial on particle filters for onlinenonlinear/non-gaussian bayesian tracking. IEEE Trans Signal Process 50:174–188

3. Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning.In: In CVPR

4. Balestrino A, Landi A, Pacini L (2006) Vision system for monitoring the production of cor-rugated cardboard. In: IEEE international conference on control applications, computer aidedcontrol system design, pp 626–631

5. Breitenstein MD, Reichlin F, Leibe B, Koller-Meier E, Gool LV (2009) Robust tracking-by-detection using a detector confidence particle filter. In: In ICCV

6. Collins R, Liu Y, Leordeanu M (2005) Online selection of discriminative tracking features. PAMI27:1631–1643

7. Doulamis A (2010) Dynamic tracking re-adjustment: a method for automatic tracking recoveryin complex visual environments. Multimed Tools Appl 50:49–73

8. Gao T, Li G, Lian S, Zhang J (2011) Tracking video objects with feature points based particlefiltering. Multimed Tools Appl 1–21. doi:10.1007/s11042-010-0676-y

9. Geiger A, Urtasun R, Darrell T (2009) Rank priors for continuous non-linear dimensionalityreduction. In: In CVPR

10. Gong Z, Ding W, Zou H (2006) Data-logging and monitoring of production auto-lines based onvisual-tracking tech. In: 32nd annual conference on IEEE industrial electronics (IECON), 2006,pp 5468–5473

11. Grabner H, Bischof H (2006) On-line boosting and vision. In: In CVPR12. Heleno P, Davies R, Correia BAB, Dinis Ja (2002) A machine vision quality control system for

industrial acrylic fibre production. EURASIP J Appl Signal Process 2002:728–73513. Odobez J-M, Gatica-Perez D, Ba S (2006) Embedding motion in model-based stochastic

tracking. IEEE Trans Image Process 15:3515–353114. Khan Z, Balch T, Dellaert F (2004) An MCMC-based particle filter for tracking multiple

interacting targets. In: In ECCV15. Kwon J, Lee K, Park F (2009) Visual tracking via geometric particle filtering on the affine group

with optimal importance functions. In: In CVPR16. Lalos C, Grabner H, Van Gool L, Varvarigou T (2010) Object flow: learning object displacement.

In: In proceedings of tenth international workshop on visual surveillance, ACCV17. Lawrence ND (2005) Probabilistic non-linear principal component analysis with gaussian process

latent variable models. J Mach Learn Res 6:1783–181618. Lawrence ND, Quiñonero Candela J (2006) Local distance preservation in the GP-LVM through

back constraints. In: Proceedings of the 23rd international conference on machine learning,ICML

19. Leibe B, Schindler K, Gool LV (2007) Coupled detection and trajectory estimation for multi-object tracking. In: In ICCV

20. Leonardis A, Bischof H (2000) Robust recognition using eigenimages. CVIU 78:99–11821. Li Y, Ai H, Yamashita T, Lao S, Kawade M (2007) Tracking in low frame rate video: a cascade

particle filter with discriminative observers of different lifespans. In: In CVPR22. Liu B, Yang L, Huang J, Meer P, Gong L, Kulikowski C (2010) Robust and fast collaborative

tracking with two stage sparse optimization. In: In ECCV23. Lu WL, Okuma K, Little JJ (2009) Tracking and recognizing actions of multiple hockey players

using the boosted particle filter. Image Vis Comput 27:189–20524. Maskell S, Gordon N (2001) A tutorial on particle filters for on-line nonlinear/non-gaussian

bayesian tracking. IEEE Trans Signal Process 50:174–18825. Mörzinger R, Sardis M, Rosenberg I, Grabner H, Veres G, Bouchrika I, Thaler M, Schuster

R, Hofmann A, Thallinger G, Anagnostopoulos V, Kosmopoulos D, Voulodimos A, Lalos C,Doulamis N, Varvarigou T, Zelada RP, Soler IJ, Stalder S, Van Gool L, Middleton L, SabeurZ, Arbab-Zavar B, Carter J, Nixon M (2010) Tools for semi-automatic monitoring of industrialworkflows. In: Proceedings of the first ACM international workshop on analysis and retrieval oftracked events and motion in imagery streams, ARTEMIS ’10. ACM, pp 81–86

26. Peña M, López I, Osorio R (2006) Invariant object recognition robot vision system for assembly.In: Electronics, robotics and automotive mechanics conference, vol 1, pp 30–36

27. Panagiotakis C, Doulamis A, Tziritas G (2009) Equivalent key frames selection based oniso-content principles. IEEE Trans Circuits Syst Video Technol 19:447–451

28. Phung SL, Bouzerdoum A, Chai D (2002) A novel skin color model in YCbCr color space andits application to human face detection. In: International conference on image processing, vol 1,pp 289–292

Page 14: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

29. Santos-Victor J, Costeira J, Tome J, Sentieiro J (1993) A computer vision system for the charac-terization and classification of flames in glass furnaces. IEEE Trans Ind Appl 29:470–478

30. Tan J, Arai T (2011) Triple stereo vision system for safety monitoring of human-robot collabo-ration in cellular manufacturing. In: IEEE international symposium on assembly and manufac-turing (ISAM), pp 1–6

31. Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J R Stat Soc Ser B61:611–622

32. Turk M, Pentland A (1991) Face recognition using eigenfaces. In: In CVPR, pp 586–59133. Usamentiaga R, Molleda J, Garcia D, Bulnes F (2009) Machine vision system for flatness

control feedback. In: Second international conference on machine vision. ICMV ’09, pp 105–110.doi:10.1109/ICMV.2009.14

34. Veres G, Grabner H, Middleton L, Gool LV (2010) Automatic workflow monitoring in industrialenvironments. In: Asian conference on computer vision (ACCV)

35. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features.In: In CVPR, pp 511–518

36. Voulodimos A, Kosmopoulos D, Vasileiou G, Sardis E, Doulamis A, Anagnostopoulos V, LalosC, Varvarigou T (2011) A dataset for workflow recognition in industrial scenes. In: Internationalconference on IEEE image processing (ICIP)

37. Voulodimos A, Kosmopoulos D, Veres G, Grabner H, Gool LV, Varvarigou T (2011) On-line classification of visual tasks for industrial workflow monitoring. Neural Netw 852–860.doi:10.1016/j.neunet.2011.06.001

38. Wang JM, Fleet DJ, Member S, Hertzmann A (2007) Gaussian process dynamical models forhuman motion. IEEE Trans Pattern Anal Mach Intell 30:283–298

39. Woodley T, Stenger B, Cipolla R (2007) Tracking using online feature selection and a localgenerative model. In: In BMVC

40. Yang M, Lv F, Xu W, Gong Y (2009) Detection driven adaptive multi-cue integration formultiple human tracking. In: In ICCV

Constantinos Lalos was born in Athens, Greece. He received an Electronic Engineering Diplomain 2005 by the University of Bristol UK. He is currently pursuing his PhD while at the same timeworking as a research associate in the Distributed Knowledge and Media Systems Laboratory of thefaculty of Electrical and Computer Engineering (NTUA). His research interests are mainly focusedon the field of Computer Vision and Content Based Retrieval. He is participating in POLYMNIA,SCOVIS and SemVeillance projects.

Page 15: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

Athanasios Voulodimos received his Dipl.-Ing. degree from the School of Electrical and ComputerEngineering of the National Technical University of Athens (NTUA) in 2007 ranking among thetop 2% of his class. His thesis entitled “Quality of service and privacy protection in personalizedcontext-aware mobile services” was awarded the “Thomaidis” award for the best Diploma Thesis inNTUA in the year 2007. In 2010 he acquired an MSc degree in “TechnoEconomic Systems” from theNational Technical University of Athens and the University of Piraeus. He is currently pursuing hisPhD at the School of Electrical and Computer Engineering of NTUA, in the area of computer visionand machine learning. His research interests lie in the fields of machine learning, computer vision,and also pervasive and cloud computing. Working as a researcher at the Institute of Computer andCommunication Systems of NTUA, he has been and is currently involved in National and Europeanresearch projects, such as MAGNET Beyond, My-e-Director 2012, SCOVIS, VISION Cloud.

Anastasios Doulamis received the Diploma degree in Electrical and Computer Engineering fromthe National Technical University of Athens (NTUA) in 1995 with the highest honor. In 2000, hehas received the PhD degree in electrical and computer engineering from the NTUA. From 1996–2000, he was with the Image, Video and Multimedia Lab of the NTUA as research assistant. From2001 to 2002, he serves his mandatory duty in the Greek army in the computer center department ofthe Hellenic Air Force, while in 2002, he join the NTUA as senior researcher. His PhD thesis wassupported by the Bodosakis Foundation Scholarship. In 2006, he is tenured Assistant professor inthe Technical University of Crete in the area of multimedia systems.

Dr. Doulamis has received several awards and prizes during his studies, including the Best GreekStudent in the field of engineering in national level in 1995, the Best Graduate Thesis Award inthe area of electrical engineering with A. Doulamis in 1996 and several prizes from the NationalTechnical University of Athens, the National Scholarship Foundation and the Technical Chamber ofGreece. In 1997, he was given the NTUA Medal as Best Young Engineer. In 2000, he received thebest Phd thesis award by the Thomaidion Foundation in conjunction with N. Doulamis.

Page 16: Efficient tracking using a robust motion estimation technique

Multimed Tools Appl

In 2001, he served as technical program chairman of the VLBV’01. He has also served as programcommittee in several international conferences and workshops. He is reviewer of IEEE journals andconferences as well as and other leading international journals. He is author of more than 200 papersin the above areas, in leading international journals and conferences.

His research interests include, non-linear analysis, neural networks, multimedia content descrip-tion, intelligent techniques for video processing.

Theodora Varvarigou received the B. Tech degree from the National Technical University ofAthens, Athens, Greece in 1988, the MS degrees in Electrical Engineering (1989) and in ComputerScience (1991) from Stanford University, Stanford, California in 1989 and the Ph.D. degree fromStanford University as well in 1991.

She worked at AT&T Bell Labs, Holmdel, New Jersey between 1991 and 1995. Between 1995and 1997 she worked as an Assistant Professor at the Technical University of Crete, Chania, Greece.Since 1997 she is working as an Assistant Professor at the National Technical University of Athens.Her research interests include parallel algorithms and architectures, fault-tolerant computation,optimisation algorithms and content management.