6
Person Tracking with Partial Occlusion Handling Xiaofeng Lu [1][2] [email protected] Junhao Zhang [2] [email protected] Li Song [1] [email protected] [1] Shanghai Jiao Tong University 800 Dongchuan Road, Shanghai, China 200000 Rui Lei [2] [email protected] Hengli Lu [2] [email protected] Nam Ling [1] [email protected] [2] Shanghai University 99 Shangda Road, Shanghai, China 200436 Abstract—Occlusion is a challenge for tracking especially in dynamic scene. It adds the consideration for background modeling. In the condition, the tracker will be influenced by both occlusions and background. In this paper, we address the problem by proposing a robust algorithm based on improved particle filter using discriminative model without background modeling. Discriminative model offers accurate templates for occlusion detection by alleviating influence from background pixels. Since particle filter cannot carry out effective tracking under heavy occlusion, blocking is introduced to solve the problem by abandoning unobservable parts of the target. Experimental results show that our algorithm can work persistently and effectively under severe occlusion even in dynamic scene compared with state-of-the-arts. Keywords—particle filter; occlusion adaption; tracking; dynamic scene I. INTRODUCTION Moving object tracking is an important branch of computer vision. Information acquired from tracking results is significant for surveillance systems. Several successful algorithms have been applied in the field of tracking such as Mean-shift and particle filter. Mean-shift finds the local minima of density distribution in one set of data, which is namely searching the center of gravity in the window [1] . Particle filter uses a set of random particles to approach the posterior probability distribution of the state [2] . Recently, particle filter has gained more attention due to its relatively strong robustness in complex environment. However, to ensure the accuracy of tracking results, there should be a good number of particles serving as samples which is extremely time-consuming [3] . In [4] , Camshift is used for improving the efficiency by guiding particles to its local minima, which tremendously lower the number of particles. Occlusion is a challenging problem in object tracking. It happens frequently in crowded scenes. During occlusion, part of object is covered and its characteristic is missing, which will lead to the mismatch with the template of target. Some papers try to solve the problem by improving its way of updating template [5] . They stop updating template when occlusion occurs in order to maintain the accuracy of template [6] . However, it can only handle short-time occlusions. The idea of multi-kernel based on blocking is proposed to track objects under severe long-time occlusion [7] . The target will be divided into several blocks. Since the occluded kernels are not observable, it will be given a lower weight according to the template [8] . When the weight of block is lower than a certain value, tracker will abandon the block and tracks only the observable part [9] . This algorithm is effective when the weights of blocks are reliable. Traditional way to locate the target is using its time relationship in video sequence [10] . Many trackers extract features from a rectangular window directly which absolutely contain many mistakes [11] . This results in the inaccuracy of weight for matching. This problem may go worse when a heavy occlusion occurs. Background modeling is applied frequently to trackers so as to avoid the problem [12] . Once the background model is set up, trackers don’t need to consider those pixels in target belonging to background. In general, the background image is modeled by training. There are some successful updating algorithms like GMM. However, the object tracking has a great dependency on the background model, which makes the robustness of algorithm extremely drop in dynamic scene and limits its application fields [13] . Furthermore, it is not able to handle the occlusion when objects moving together which happens frequently in crowded scene [14] . Figure 1 shows a challenging tracking example under dynamic scene using our method. We introduce discriminative model [15] to refine the target. The tracking algorithm provides the initial location for discriminative model, and the model returns a precise characteristic of target. A feature similar weakening algorithm is introduced when computing the similarity weight. It solves the interface of background by weakening the background pixels when calculate the feature of target [16] . By calculating the ratio of pixels in target and background, we may get another parameter besides weight. Note that the background pixels are given by discriminative model using segmentation algorithm other than background model. Then, occlusion can be detected accurately according to the ratio and weight. Experimental results mentioned below indicate that our algorithm is significantly accurate and robust under severe occlusion. Fig. 1. Tracking in dynamic scene. 978-1-4673-9604-2/15/.00 ©2015 IEEE

Person Tracking with Partial Occlusion Handling

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Person Tracking with Partial Occlusion Handling Xiaofeng Lu[1][2]

[email protected] Junhao Zhang[2]

[email protected] Li Song[1]

[email protected] [1] Shanghai Jiao Tong University

800 Dongchuan Road, Shanghai, China 200000

Rui Lei[2] [email protected]

Hengli Lu[2] [email protected]

Nam Ling[1] [email protected]

[2] Shanghai University 99 Shangda Road, Shanghai, China 200436

Abstract—Occlusion is a challenge for tracking especially in dynamic scene. It adds the consideration for background modeling. In the condition, the tracker will be influenced by both occlusions and background. In this paper, we address the problem by proposing a robust algorithm based on improved particle filter using discriminative model without background modeling. Discriminative model offers accurate templates for occlusion detection by alleviating influence from background pixels. Since particle filter cannot carry out effective tracking under heavy occlusion, blocking is introduced to solve the problem by abandoning unobservable parts of the target. Experimental results show that our algorithm can work persistently and effectively under severe occlusion even in dynamic scene compared with state-of-the-arts.

Keywords—particle filter; occlusion adaption; tracking; dynamic scene

I. INTRODUCTION Moving object tracking is an important branch of computer

vision. Information acquired from tracking results is significant for surveillance systems. Several successful algorithms have been applied in the field of tracking such as Mean-shift and particle filter. Mean-shift finds the local minima of density distribution in one set of data, which is namely searching the center of gravity in the window[1]. Particle filter uses a set of random particles to approach the posterior probability distribution of the state[2]. Recently, particle filter has gained more attention due to its relatively strong robustness in complex environment. However, to ensure the accuracy of tracking results, there should be a good number of particles serving as samples which is extremely time-consuming[3]. In[4], Camshift is used for improving the efficiency by guiding particles to its local minima, which tremendously lower the number of particles.

Occlusion is a challenging problem in object tracking. It happens frequently in crowded scenes. During occlusion, part of object is covered and its characteristic is missing, which will lead to the mismatch with the template of target. Some papers try to solve the problem by improving its way of updating template[5]. They stop updating template when occlusion occurs in order to maintain the accuracy of template[6]. However, it can only handle short-time occlusions. The idea of multi-kernel based on blocking is proposed to track objects under severe long-time occlusion[7]. The target will be divided into several

blocks. Since the occluded kernels are not observable, it will be given a lower weight according to the template[8]. When the weight of block is lower than a certain value, tracker will abandon the block and tracks only the observable part[9]. This algorithm is effective when the weights of blocks are reliable.

Traditional way to locate the target is using its time relationship in video sequence[10]. Many trackers extract features from a rectangular window directly which absolutely contain many mistakes[11]. This results in the inaccuracy of weight for matching. This problem may go worse when a heavy occlusion occurs. Background modeling is applied frequently to trackers so as to avoid the problem[12]. Once the background model is set up, trackers don’t need to consider those pixels in target belonging to background. In general, the background image is modeled by training. There are some successful updating algorithms like GMM. However, the object tracking has a great dependency on the background model, which makes the robustness of algorithm extremely drop in dynamic scene and limits its application fields[13]. Furthermore, it is not able to handle the occlusion when objects moving together which happens frequently in crowded scene[14].

Figure 1 shows a challenging tracking example under dynamic scene using our method. We introduce discriminative model[15] to refine the target. The tracking algorithm provides the initial location for discriminative model, and the model returns a precise characteristic of target. A feature similar weakening algorithm is introduced when computing the similarity weight. It solves the interface of background by weakening the background pixels when calculate the feature of target[16]. By calculating the ratio of pixels in target and background, we may get another parameter besides weight. Note that the background pixels are given by discriminative model using segmentation algorithm other than background model. Then, occlusion can be detected accurately according to the ratio and weight. Experimental results mentioned below indicate that our algorithm is significantly accurate and robust under severe occlusion.

Fig. 1. Tracking in dynamic scene.

978-1-4673-9604-2/15/.00 ©2015 IEEE

II. TRACKING ALGORITHM The proposed tracking framework includes the steps

illustrated in Fig.2. First we select the target in video and initialize its blocks. The accurate target without background pixels in each block are extracted by a fast segmentation method according to the specific feature of template like color histogram. After creating discriminative model, feature template of target will be built according to the model. Then, tracking algorithm (particle filter) is applied to each block respectively. We then get two parameters in the process. One is weight which means the similarity with its template. The other one is ratio which is computed by the number of pixels in foreground and background according to discriminative model. Finally, we judge whether there is an occlusion happened by the two parameters, and tracker can handle occlusions considering the characteristic of particle filter.

Fig. 2. Block diagram of the tracking framework.

A. Particle Filter The algorithm of particle filter is proposed in the

framework of Bayesian using Monte Carlo method by modeling dynamic probabilistic model and systematic observation model[17]. In this paper, the dynamic probabilistic model is modeled in Gaussian, which predicts the location of target by the following formula (second-order recursive equation)[18], which could update the parameter of location.

(1)

where denotes Gaussian distribution with expectation ( ) and variance ( ).

The systematic observation model updates the parameter of weight w using Bhattacharyya distance between the target model and candidate model as shown in (2),

(2)

where and denote the histogram features of the target and candidate[19].

According to the maximum posteriori criterion, location of target in the next frame could be chosen from particles by its weight.

B. Background Similar Weakening The weight w contains the feature of target. But if the main

features of target also belong to background or other goals, the target cannot be well tracked and the scale of the selected target may be tremendous expanded. The background similar weakening algorithm is introduced to solve this problem.

Since the background features can be computed after the segmentation in the course of occlusion detection, we rewrite the histogram features of the target as follows:

(3)

where K(u) is a coefficient used to weak the specific pixels; here is calculated by the feature of color histogram;

K(u) is computed by the histogram of background after normalization .

C. Occlusion Detection In order to detect which part of target is occluded, we

consider it as union of blocks. Once occlusion happens, the unobservable block will have a lower score as demonstrated in Fig.3.

Fig. 3. Tracking with blocks.

The weight of block w is a similarity value between candidate and target template. It is calculated by the Bhattacharyya distance using (3). Note that, the similarity value is used for the target after segmentation.

Ratio r is the percentage of its pixels belonging to foreground after segmentation. It is one of the coefficient for the final score of the block. Formulation is given in (4)

(4)

where N represents total number of pixels in rectangular window. We may get final score s in (5) by adjusting the proportion of w and r mentioned above, where m and n are normalization coefficient. Comparing with traditional similarity value, score s considers not only time relationship but also spatial relationship through discriminative model. With the reliable score, detection of occlusion is much more accurate.

(5)

D. Occlusion Adaption Since the score s in (5) only asserts the optimum location of

a single block, the location of target in current frame cannot be completely reliant on it. For instance, the score s of one block is extremely high while the other one is rather low. Tracker may consider that target is right over there and assert that occlusion occurs and abandon the block with lower score. However, there is another candidate area whose score of both blocks is suboptimum. That is in fact where the target locates in current frame. With the problem mentioned above, extra constraints should be introduced according to the characteristic of particle filter.

In the case of people, for example, the person could be divided into two blocks according to clothes and pants. We overwrite the formula (5) as

(6)

The concrete steps of handling are shown in Fig.4. Particle filter will return several particles of block with higher score. Figure 5(a) shows that tracker must first be satisfied with the constraints C. For upper block, there should have a lower block in its neighborhood below. For lower block, there should have an upper block in its neighborhood above. Then the maximal score S of candidate meeting with the constraints will be chosen as target in current frame. As shown in Fig.5(b), it means that occlusion occurs if there is no such candidate meeting with the constraints. Tracker will abandon one block and let it just go along with the observable block.

Fig. 4. Steps of occlusion handling.

(a) (b)

Fig. 5. (a) Groups of particles meet with constraints. (b) None of particles meet with constraints.

III. EXPERIMENTAL RESULTS The proposed algorithm has been tested on several video

sequences with different challenges including long-time occlusion, crowded and dynamic scene. In the following experiments, we manually selected the tracking window in the first frame as target. For the purpose of comparison, the three

algorithms we tested were: (a) APG-L1 tracker proposed in[20] ; (b) Camshift guiding particle filter using color histogram as the features in[4]; and (c) our proposed method. Algorithm (a) is implemented in MATLAB. Algorithm (b) and (c) are implemented in C.

APG-L1 tracker combines the trivial templates which can represent the occlusion and image noise accurately without background modeling. However, it may be easy to converge over due to its quadratic convergence of APG especially when the tracking target is occluded by another visually similar object.

A. Experiment on Long-time Occlusion Handling The first sequence describes people undertaking a long-

time occlusion. In Fig.6(a), APG-L1 tracker failed to track the target when occlusion occurs because of its over convergence. In Fig.6(b), the algorithm (b) adopts traditional color histogram as features without considering any spatial characteristics. Although the feature of target is relatively notable, the results of tracking using algorithm (b) are extremely rough. Therefore, it fails to track the target and converges into background due to long-time occlusion. As shown in Fig.6(c), the proposed algorithm in this paper, in contrast, withstands long-time occlusion and performs well all along by introducing discriminative model and the idea of blocking.

B. Experiment on Crowded Scene Next two challenging sequences were tested in crowds with

frequent occlusions and cluttered background. The former one describes a woman in pink undergoing frequent occlusions on her way toward escalator. We can find that the tracker in algorithm (a) and (b) lose the target when occlusion occurs. Our algorithm, as shown in Fig.7(c), provides a much more accurate and consistent tracking compared with algorithm (a) and (b) in Fig.7. The latter one describes a woman in cyan walking by another woman with same clothes. The tracker in algorithm (b) produces a large tracking window which cannot accurately locate target due to occlusion of the woman with same color features as shown in Fig.8(b). Again, our tracker still provides stable results in Fig.8(c).

C. Experiment on Dynamic Scene In the following two sequences, we focus on tracking the

targets in dynamic scene. Both sequences are captured by a mobile device moving along with the target. In Fig.9(a), APG-L1 tracker may consistently locates the targets when occlusion occurs, but its excessive convergence makes the result inaccurate. CAMSGPF can hardly catch the target when it is occluded as Fig.9(b) shows. Our proposed algorithm performs well consistently in Fig.9(c). Figure 10 is a rather challenging sequence which involves not only the occlusion and dynamic scene but also a visually similar occlusion and relatively crowded scene. APG-L1 tracker fails to track the target because of its convergence problem mentioned above. CAMSGPF doesn’t consider the case when there is a visually similar object around which lead to the confusion of target. Our proposed method, in contrast, introduces the system to weaken background pixels which effectively depress the influence. It tracks the target accurately even if there is a visually similar occlusion.

D. Quantitative results To quantitatively evaluate the robustness and accuracy of

the proposed method under various kinds of challenging conditions, we manually annotated the target’s location frame by frame for all sequences mentioned above. The evaluation criteria of tracking error are defined as the relative position errors (in pixels) between the center of the tracking result and that of the ground truth. As shown in Fig.11, the position differences line of (x, y) should respectively be superposition ideally.

The legend of the chart shows the corresponding relations as follows, where represents the ground truth;

represents the result of APG-L1; represents the result of Camshift guiding particle filter; represents the result of our proposed method.

By comparing the line chart according to the color of legend, our proposed method performs much better than other two methods in the terms of accuracy and robustness on all the sequences we tested.

(a) APG-L1 tracker

(b) CAMSGPF

(c) The proposed method

Fig. 6. Tracking results on man sequence in front of trees for frames of 55,75,95,135.

(a) APG-L1 tracker

(b) CAMSGPF

(c) The proposed method

Fig. 7. Tracking results on Person2 sequence in crowds for frames of 10,95,120,295.

(a) APG-L1 tracker

(b) CAMSGPF

(c) The proposed method

Fig. 8. Tracking results on Person3 sequence in crowds for frames of 10,50,106,149.

(a) APG-L1 tracker

(b) CAMSGPF

(c) The proposed method

Fig. 9. Tracking results on Person4 sequence in dynamic for frames of 77,81,87,99,112,134.

(a) APG-L1 tracker

(b) CAMSGPF

(c) The proposed method

Fig. 10. Tracking results on Person5 sequence in dynamic scene for frames of 23,133,216,283,311,352

Fig. 11. Errors of the target center against the ground truth for Fig.6-Fig.10 respectively.

IV. CONCLUSION In this paper, we have presented an effective tracking

algorithm based on particle filter using discriminative model and blocking. Our method considers spatial relationship in the target itself with accurate features, which provides a reliable scoring mechanism for occlusion detection. By introducing blocking according to the specialty of particle filter as occlusion handling, the proposed algorithm could lead an accurate and consistent tracking in contrast with Camshift guiding particle filter and APG-L1 tracker. Experimental results on several challenging sequences demonstrate that our tracking method performs favorably in dynamic scene with long-time and frequent occlusions.

REFERENCES [1] Comaniciu D, Meer P. Mean shift: A robust approach toward feature

space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(5): 603-619.

[2] Nummiaro K, Koller-Meier E, Van Gool L. An adaptive color-based particle filter. Image and Vision Computing, 2003, 21(1): 99-110, Elsevier.

[3] Yang C, Duraiswami R, Davis L. Fast multiple object tracking via a hierarchical particle filter. IEEE International conference on Computer Vision (ICCV), pp. 212-219, 2005.

[4] Wang Z, Yang X, Xu Y, et al. Camshift guided particle filter for visual tracking. Pattern Recognition Letters, 2009, 30(4): 407-413, Elsevier.

[5] Hu W, Zhou X, Hu M, et al. Occlusion reasoning for tracking multiple people. IEEE Transactions on Circuits and Systems for Video Technology (CSVT), 2009, 19(1): 114-121.

[6] Pan J, Hu B, Zhang J Q. Robust and accurate object tracking under various types of occlusions. IEEE Transactions on Circuits and Systems for Video Technology (CSVT), 2008, 18(2): 223-236.

[7] Shahed Nejhum, S. M., Jeffrey Ho, and Ming-Hsuan Yang. Visual tracking with histograms and articulating blocks. IEEE International conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-8, 2008.

[8] Chu C T, Hwang J N, et al. Robust video object tracking based on multiple kernels with projected gradients. IEEE International conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1421-1424, 2011.

[9] Shu, Guang, et al. Part-based multiple-person tracking with partial occlusion handling. IEEE International conference on Computer Vision and Pattern Recognition (CVPR), pp. 1815-1821, 2012.

[10] Wang Shu, Huchuan Lu, Fan Yang, and Ming-Hsuan Yang. Superpixel tracking. IEEE International conference on Computer Vision (ICCV), pp. 1323-1330, 2011.

[11] Sun X, Yao H, Zhang S. A novel supervised level set method for non-rigid object tracking. IEEE International conference on Computer Vision and Pattern Recognition (CVPR), pp. 3393-3400, 2011.

[12] Péteri R. Tracking dynamic textures using a particle filter driven by intrinsic motion information. Machine Vision and Applications, 2011, 22(5): 781-789, Springer.

[13] Ning J, Zhang L, Zhang D, et al. Robust mean-shift tracking with corrected background-weighted histogram. Computer Vision, 2012, 6(1): 62-69, IET.

[14] Danescu R, Oniga F, Nedevschi S. Modeling and tracking the driving environment with a particle-based occupancy grid. IEEE Transactions on Intelligent Transportation Systems , 2011, 12(4): 1331-1342.

[15] Chao G. C., Jeng S. K, and Lee, S. S. An improved occlusion handling for appearance-based tracking. IEEE International Conference on Image Processing (ICIP), pp. 465-468, 2011.

[16] Mei X, Ling H. Robust visual tracking and vehicle classification via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(11): 2259-2272

[17] Del Moral P, Doucet A, Jasra A. An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and Computing, 2012, 22(5): 1009-1020, Springer.

[18] Whiteley N, Singh S, Godsill S. Auxiliary particle implementation of probability hypothesis density filter. IEEE Transactions on Aerospace and Electronic Systems, 2010, 46(3): 1437-1454.

[19] Chu H, Wang K. Research of kernel particle filtering target tracking algorithm based on multi-feature fusion. World Congress on Intelligent Control and Automation (WCICA), pp. 6189-6194, 2010.

[20] Bao C, Wu Y, Ling H, et al. Real time robust l1 tracker using accelerated proximal gradient approach. IEEE International conference on Computer Vision and Pattern Recognition (CVPR), 2012: 1830-1837.