TRIDIMENSIONAL PROBABILISTIC TRACKING FOR SPORTS …perso.esiee.fr/~diasf/master_resumed.pdf · TRIDIMENSIONAL PROBABILISTIC TRACKING FOR SPORTS Fabio A. S. Dias´ , Neucimar J. Leite,

TRIDIMENSIONAL PROBABILISTIC TRACKING FOR SPORTS

Fabio A. S. Dias∗, Neucimar J. Leite, Siome K. Goldenstein, Ricardo M. L. de Barros

University of Campinas UNICAMPP.O. 6176, 13084-971, Campinas, SP, Brazil

ABSTRACT

In this work we present a simple, but powerful, method formulti-camera people tracking, aiming collective sports track-ing. We explore the tridimensional information to performthe tracking, a concept not widely explored in this context,using the video images only to reconstruct the tridimensionalinformation of the current scene. To deal with false recon-structed objects, we considered a generalization of the con-cept of visual rhythm, transforming the tracking problem intoa segmentation problem, solved by a simple graph based ap-proach.

Index Terms— Sport Tracking, Tridimensional recon-struction, Probabilistic object identification, Generalized Vi-sual Rhythm, Trajectory definition.

1. INTRODUCTION

This work introduces a new and simple method for peopletracking in sports, exploring the tridimensional informationof the scene as base data for tracking. This approach is notwidely explored in the general people tracking context andeven less explored in sports tracking.

However, in the sports context this approach can reveal itsfull potention to deal with occlusions, fast motion and non-constrained moviment like jumping.

We propose a method with a few requirements: resonableframe rate, color images, syncronized cameras with a goodcoverage. With a good frame rate, we can assume smoothtransitions, color images provides more information for ob-ject identification and the camera coverage is needed to pro-vide different points of view of the same object, improvingthe reconstruction result. Each part of the considered field ofaction should be covered with at least two cameras.

The main contribution of this work is the generalization ofthe visual rhythm concepts and its use to reduce the complex-ity of the tracking problem, transforming it into a segmenta-tion problem, expliciting spatio-temporal continuity.

This paper is organized as follows: section 2 places thiswork among other related works in the considered context,section 3 introduces a probabilistic object detection method,

∗Thanks to FAPESP for funding, project 06/59526-8.

based on Bayes’ Law, section 4 introduces a probabilistic re-construction method, using the computed object probabilities,section 5 introduces the generalization of the Visual Rhythmconcept and its application to the tridimensional video andsection 7 introduces the considered method to estimate eachtrajectory. In section 8 we present some experimental resultsand in section 9 we discuss the results and propose some fu-ture work.

2. RELATED WORK

The table 1 summarizes the related works. The Sports fieldrepresents the considered sport, 3D indicates the domain ofthe considered information, Stat.C. indicates the need of staticcameras, Sup.C. indicates the need of superior cameras andB.M. indicates the use of a background model.

# Sport 3D Stat.C. Sup.C. B.M.[1] Handball - X X -[2] Tennis - X - X[3] Squash - X X X[4] Soccer - X - -[5] Soccer - X - X[6] Handball - X X -[7] Soccer - - - -[8] Soccer - X - X[9] Soccer - X - X[10] Soccer - X - X[11] Soccer - - - X[12] Soccer - X - -[13] People X X - X[14] People X X - X

This work Handball X X - X

Table 1. Relevant related works

This table also contains related work in general peopletracking due to its similarity to the proposed work. In [13],dinamic proggraming is used to estimate the trajectories andthe moving subjects do not leave the floor. In [14], a verysimilar approach is used to the reconstruction step, but aimingpeople counting.

3. OBJECT DETECTION

The first step of the proposed method is the object detection.Assuming static cameras in a static background, we build astatistical background model, modelling the background aspixel-wise gaussian distributions and the foreground using anhistogram in RGB color space.

To improve the classification results and reduce process-ing time, this step uses only a small subset of the videoframes to estimate these distributions, with each sampledframe equally spaced in time of another. Then the methodidentify pairs of frames with maximum difference, computedas follows:

∀Si ∈ S,maxSj∈S

∑|Si − Sj | (1)

With S representing the subset of frames used to computede model. This pairs are subtracted and all pixels with dif-ference smaller of equal to a threshold ε are used to modelthe background. The remaining pixels are used to compute antridimensional color histogram of the foreground distribution.Additionally, more frames can be used for each comparisonframe, improving results for videos with fast motion.

Using the foreground and background models, previouslycomputed, we perform the object detection, based on theBayes’ Law. We defined the following events, in a pixel-wiseway:

• Ci: The pixel in question has cromatic information ofthe class i. These classes are defined by the used his-togram, each bin defining one different class.

• B: The pixel belongs to the background.

• F : The pixel belongs to the foreground.

With these events, we formulate the following equation,based on Bayes’ Law, to estimate the probability of a givenpixel to belong to the foreground, given the cromatic infor-mation:

P (F/Ci) = 1− P (Ci/B)P (B)P (Ci/B)P (B) + P (Ci/¬B)(1− P (B))

(2)The valor of P (Ci/B) is obtained from the corresponding

gaussian and P (Ci/¬B) from the foreground histogram.Despite the simplicity of this method, it have two major

advantajes over the traditional method of background subtrac-tion: the result is a probabilistic measure of the existanceof a non-background object, which, in this work, is used asis for further exploration of the scene. Additionally, it caneasily overcome small differences between foreground andbackground colors, the typical scenario of failure of the tra-ditional approach, ocasionaly leading to a result with almostnone false negatives and some false positives, which is a de-sirable behavior in our context.

The threshold ε used in this step represents the expectedcolor flutuation of the scene. Its influence to the performanceof the method can be reduced by using more frames to theestimation.

4. TRIDIMENSIONAL RECONSTRUCTION

In order to use the tridimensional information as base for peo-ple tracking, we first need to reconstruct the current scene. Weconsidered a reconstruction algorithm, inspired in space carv-ing. In our method, the probability of existance of a movingobject in a given voxel Vi is estimated by:

P (Vi) =Nc∏c=1

1Np

∑xi∈proj(Vi,c)

P (xci ) (3)

Where Nc is the number of cameras viewing the voxel Vi,proj(Vi, c) represents the set of all pixels belonging to theprojection of the voxel Vi in the camera c, Np is the numberof pixels in this projection and P (xc

i ) is the probability of athe pixel xi belong to a foreground object, calculated in theobject detection step, in the camera c.

In a simple way, the probability of a given voxel exists isa function of the probabilities of the corresponding projectionin the images of identified objects. We average these regionsand multiply them between the cameras, representing a proba-bilistics and operation. To reduce small noise, tridimensionalmorphological operations are applied to the scene.

This reconstruction technique leads to a major problem,shared with visuall hull reconstruction, the generation of falsevolumes, often called phantoms, results of all possible combi-nations between identified blobs. We deal with this problemduring the trajectory estimation step. However, we stronglyrecommend the use of as many cameras as possible, to reducethe generated phantoms.

5. INFORMATION SAMPLING BY GENERALIZEDVISUAL RHYTHM

With the reconstructed tridimensional information, the finalstep of this work is to estimate the trajectories of all movingobjects in the current scene. We considered a generalizationof the visual rhythm concept, widely used to cut detection invideo images [15, 16, 17].

In the classic approach, each frame of a given video issampled, providing an unidimensional information, which isre-arranged with all other samples, resulting in a bidimen-sional information. This resulting image contains all the de-sired relevant information of the video, often expliciting thisrelevant features.

In this work, we sample each tridimensional frame to abidimensional sample and re-arrange these samples to ob-tain a tridimensional information. The considered sampling

function is the maximum operation in the Z direction. To re-arrange the samples, we used the stacking operation, replac-ing the Z axis with the time axis.

This simple operation reinforces one of the most reliablefeatures in the considered videos: the space-time continuity.At the considered frame rate, the trajectories of the movingobjects will correspond to connected volumes in the final vol-ume, transforming the tracking problem into a segmentationproblem, with the advantage of well defined shapes to seg-ment.

6. SMALL GAPS COMPLETION BY GENERALIZEDMULTISCALE DIRECTIONAL INFORMATION

To improve the robustness of the considered method, weconsider an approach to reconnect small gaps in the consid-ered data, by a generalization of approach suggested in [18]for fingerprint images to a tridimensional domain, due tothe abstract similarity between fingerprint images and there-arraged sampled data considered here.

The tridimensional generalization of [18] is straightfor-ward, we use two angles to define the orientation vector. Foreach considered direction, we also define two perpendicu-lar vectors, arbitrarily. The directional information for eachvoxel is the direction with maximum information. In thiswork, we define information as the double of the sum of thesampled data in the considered direction minus the sum in theperpendicular directions, considered in a window centered inthe current voxel and size based on the manual initialization.The multiscale approach is also considered, removing smallnoises in the orientation field.

Instead of a watershed operation, as suggested in the orig-inal work, we simply apply a morphological dilatation, usingthe corresponding directions as structuring elements.

7. TRAJECTORY ESTIMATION

With the re-arranged sampled data from the reconstructionstep, we need to identify each connected volume, to definethe respective trajectories. Ideally, the only connected compo-nents are the real moving objects, but due to non-ideal camerapositioning or ilumination, this assumption may fall. To in-crease noise robustness, we perform a manual initialization ofthe method, provinding a reliable count of subjects in scene.

Then, we proceed with a binarization in the consideredinformation, however, due to noise, we cannot simply apply athreshold. We compute the gradient information of the tridi-mensional data, followed by a morphological closing, withsize of the structuring element infered from the inicial frame.This operation aims to capture the regions with higher values,even with constant noise. The morphological operation closesthe tridimensional solid and filters some noise.

In this binarized data, we identify connected componentsand if the selected volume goes from the begin to the end of

the volume, in the temporal Z axis, it must contain the trajec-tory of, at least, one object. Further, due the manual initial-ization, the number of objects is equal to the connected areasin the first frame.

To identify the correspondent trajectories, we label eachconnected component, for each temporal slice, and build adirected graph, connecting a labelX to a label Y if they sharepixels locations in temporal consecutive frames. Next, weprocess this connexity graph, as follows:

1. If a label does not point to another label and is not atthe topmost level, it should be removed.

2. If a label is not pointer by another label and is not at thebottom-most level, it should be removed.

3. If a label Y is pointed only by a label X and X onlypoints to Y , Y should be equal to X .

4. If a labelX points only to a label Y , others labels whichalso points to Y should not point to Y .

5. If a label is pointed by many labels, it must be parti-tioned.

6. If a partitioned label is much smaller than the knowntrue size of the objects from the manual initialization, itshould be removed.

The rules 1 and 2 eliminates abrupts phantoms appear-ing and disappearing, forcing the spatio-temporal constraint.The rule 3 explicites the equivalence between labels, merg-ing labels which implicates each other. The rule 4 deals withthe case of only one possible path, removing the option fromothers possible paths, for different labels. The rules 5 and 6enforces the spatio-temporal continuity by breaking wronglyconnected components in the reconstruction step, but remov-ing small labels to cover the situation of smooth phantom ap-pearing and disappearing in the trajectory of real subjects.

To process the information, we first split the labels, usingrules 4, 5 and 6, then removing pointless labels, using rules1 and 2, then merging the equivalent labels, using rule 3. Alloperations are repeated until convergence.

8. EXPERIMENTAL RESULTS

In the first considered video, we utilized four Basler A602fccameras with external hardware trigger, 35 fps and three mov-ing persons, as shown in the figures 1, 2, 3, 4, including thecorresponding object identification result. We used ε = 10,RGB histogram with 128x128x128 bins and 100 frames tomodel the objects, as described in section 3. The sequencehas 1000 frames and each voxel has 10cm x 10cm x 10cm.

The object identification method also detects casted shad-ows, moving background objects and some illuminationnoise. However, the reconstruction process eliminates great

(a) Original image. (b) Object identification result.

Fig. 1. Frame #1 from camera 1.



part of this noise. Therefore, this step can have false posi-tive results, but we should avoid methods with false negativeresults because it will remove existent volumes of the recon-structed scene. The figure 5 shows the result of the manualinitialization and the reconstruction of the subsequent frame.

We proceed with the information sampling, consideringbatches of 200 frames, with the first batch shown in the fig-ure 6(a). The corresponding result of the trajectory estimationstep is shown in the figure 6(b). The multiscale directional fil-tering was not used in this example.

9. CONCLUSION AND FUTURE WORK

In this paper we presented a simple approach to tracking forsports, exploring the tridimensional nature of the scene. De-spite its originality and potential, more work is needed toachieve the robustness necessary for pratical applications, as:

• Remove the binarization step and deal with the rawprobabilistic data, exploring the sampling operation toremove the complexity of the problem, avoiding aprox-imations as in [13].





(a) Reconstructed scence of themanual initialization.

(b) Reconstructed scene of thesubsequent frame.

Fig. 5. Reconstruction results.

(a) Original sampled informa-tion.

(b) Estimated trajectories.

Fig. 6. Trajectory estimation results.

• Explore the utilization of the multiscale directionalinformation to define a tridimensional skeleton of theconsidered trajectories.

• Aditional processing of the resulting trajectories re-moving noise and smoothing the resulting solids.

10. REFERENCES

[1] J. Pers and S. Kovacic, “Computer vision system fortracking players in sports games,” in Proceedings of theFirst Int’l Workshop on Image and Signal Processingand Analysis, 2000, pp. 81–86.

[2] G. Pingali, A. Opalach, Y. Jean, and I. Carlbom, “Visu-alization of sports using motion trajectories: providinginsights into performance, style, and strategy,” in Vi-sualization, 2001. VIS ’01. Proceedings. October 2001,pp. 75 – 83, IEEE.

[3] J. Pers, G. Vuckovic, S. Kovacic, and B. Dezman, “Alow-cost real-time tracker of live sport events,” in In2nd international symposium on image and signal pro-cessing and analysis, June 2001, pp. 362–365.

[4] E.L. Andrade, E. Khan, J.C. Woods, and M. Ghan-bari, “Player identification in interactive sport scenes us-ing region space analysis prior information and numberrecognition,” in Visual Information Engineering, 2003.VIE 2003. International Conference on, July 2003, pp.57 – 60.

[5] Bruno Mller and Ricardo de Oliveira Anido, “Dis-tributed real-time soccer tracking,” in VSSN ’04: Pro-ceedings of the ACM 2nd international workshop onVideo surveillance & sensor networks, New York, NY,USA, 2004, pp. 97–103, ACM Press.

[6] M. Kristan, J. Pers, M. Perse, S. Kovacic, and M. Bon,“Multiple interacting targets tracking with applicationto team sports,” in Image and Signal Processing andAnalysis, 2005. ISPA 2005. Proceedings of the 4th In-ternational Symposium on, September 2005, pp. 322 –327.

[7] J.B. Hayet, Mathes, J. T. Czyz, J. Piater, J.and Verly,and B. Macq, “A modular multi-camera framework forteam sports tracking,” in Advanced Video and SignalBased Surveillance. AVSS, September 2005, pp. 493–498, IEEE.

[8] Wayne Chelliah Naidoo and Jules Raymond Tapamo,“Soccer video analysis by ball, player and referee track-ing,” in SAICSIT ’06: Proceedings of the 2006 an-nual research conference of the South African instituteof computer scientists and information technologists onIT research in developing countries, , Republic of SouthAfrica, 2006, pp. 51–60, South African Institute forComputer Scientists and Information Technologists.

[9] Aye Pa Pa Mya, Thin Lai Lai Thein, and Myint Myintsein, “Extracting the motion pattern of the players froma video stream of the football game,” in SICE-ICASE,2006. International Joint Conference, October 2006, pp.5624 – 5627.

[10] Pascual J. Figueroa, Neucimar J. Leite, and RicardoM. L. Barros, “Tracking soccer players aiming theirkinematical motion analysis,” Comput. Vis. Image Un-derst., vol. 101, no. 2, pp. 122–135, 2006.

[11] Guangyu Zhu, Qingming Huang, Changsheng Xu, YongRui, Shuqiang Jiang, Wen Gao, and Hongxun Yao, “Tra-jectory based event tactics analysis in broadcast sportsvideo,” in MULTIMEDIA ’07: Proceedings of the 15thinternational conference on Multimedia, New York, NY,USA, 2007, pp. 58–67, ACM Press.

[12] Jacek Czyz, Branko Ristic, and Benoit Macq, “A par-ticle filter for joint detection and tracking of color ob-jects,” Image and Vision Computing, vol. 25, no. 8, pp.1271–1281, August 2007.

[13] F. Fleuret, J. Berclaz, R. Lengagne, and P. Fua, “Mul-ticamera people tracking with a probabilistic occupancymap,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 30, no. 2, pp. 267–282, Feb. 2008.

[14] D.B. Yang, H.H. Gonzalez-Banos, and L.J. Guibas,“Counting people in crowds with a real-time networkof simple image sensors,” Computer Vision, 2003. Pro-ceedings. Ninth IEEE International Conference on, pp.122–129 vol.1, Oct. 2003.

[15] Hyeokman Kim, Jinho Lee, Jae-Heon Yang, SanghoonSull, Woonkyung M. Kim, and S. Moon-Ho Song, “Vi-sual rhythm and shot verification,” Multimedia Toolsand Applications, vol. 15, no. 3, pp. 227–245, Decem-ber 2001.

[16] Chong-Wah Ngo, “A robust dissolve detector by supportvector machine,” in MULTIMEDIA ’03: Proceedingsof the eleventh ACM international conference on Multi-media, New York, NY, USA, 2003, pp. 283–286, ACMPress.

[17] Francisco N. Bezerra and Neucimar J. Leite, “Usingstring matching to detect video transitions,” PatternAnalysis & Applications, vol. 10, no. 1, pp. 45–54, Oc-tober 2006.

[18] M.D. Oliveira and N.J. Leite, “Reconnection of fin-gerprint ridges based on morphological operators andmultiscale directional information,” Computer Graphicsand Image Processing, 2004. Proceedings. 17th Brazil-ian Symposium on, pp. 122–129, Oct. 2004.

Documents

TRIDIMENSIONAL PROBABILISTIC TRACKING FOR SPORTS …perso.esiee.fr/~diasf/master_resumed.pdf · TRIDIMENSIONAL PROBABILISTIC TRACKING FOR SPORTS Fabio A. S. Dias´ , Neucimar J. Leite,