A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences

A Dynamic Conditional Random Field Model for Object Segmentation

in Image Sequences

Duke University Machine Learning Group

Presented by Qiuhua Liu

March 23, 2006

Paper by Yang Wang and Qiang Ji, Rensselaer Polytechnic Institute, CVPR 2005

Outline

• Markov Random Field (MRF)

• Conditional Random Field (CRF)

• The Spatial–temporal CRF Model for Video Sequence Segmentation

• Results & Conclusions

Markov Random Field

• Random Field: Let be a family of random variables defined on the set S , in which each random variable takes a value in a label set L. The family F is called a random field.

• Markov Random Field: F is said to be a Markov random field on S with respect to a neighborhood system N if and only if the following two conditions are satisfied:

},...,,{ 21 MFFFF

iF if

FffP ,0)( :yPossitivit

)|(}){|( :tyMarkovianiiNii ffPiSfP

Conditional Random Field

Figure 1: Graphical structure of a chain-structured CRFs for sequences. The variables corresponding to unshaded nodes

are not generated by the model.

• Conditional Random Field: a Markov random field (Y) globally conditioned on another random field (X).

Posterior Probability in CRF

• Lafferty et al. [2] define the the probability of a particular label sequence y given observation sequence x to be a normalized product of potential functions:

• where is a transition feature function of the entire observation sequence and the labels at positions i and i−1 in the label sequence;

• is a state feature function of the label at position i and the observation sequence;

The Spatial–temporal CRF for Video Segmentation

• Observation : intensity and motion information at point x.

• Segmentation Label : one of the L interpedently moving objects composing the scene.

• The conditional random field:

The segmentation labels obey the Markov Property given the observed data:

Where is the sequence of the observed data up to time k.

)(xzk

)(xsk

The Posterior Probability, the State Transition Probability and the Observation Likelihood

• The posterior probability of the segmentation field is given by:

The one-pixel potential reflects the information from the observed data for a single position; The two-pixel potential imposes the spatial interaction (or pairwise constraint).

The State Transition Probability

• The temporal dependence between consecutive segmentation fields are modeled by the state transition probability:

The one-pixel potential models the label state transition for a single site;The two-pixel potential imposes the spatial connectivity constraint to form contiguous regions.

The State Transition Probability(cont.)

• To encourage a point to have the same segmentation label as those of its temporal neighborhood, the one-pixel potential is further expressed as:

• Mx, Temporal Neighborhood; Nx, Spatial Neighborhood.

Figure 2.The 5-pixel temporal neighborhood and the 4-pixel spatial neighborhood.

The State Transition Probability(cont.)

• Where the smoothness constraints is imposed by:

Spatial connectivity:

Temporal continuity:

The Observation Likelihood

• The observation model p(zk|sk) is also formulated by a conditional random field:

• The observation data is represented as zk = {gk , mk}, intensity and motion information for the kth video frame. And they are independent.

The Observation Likelihood (cont.)

• For the intensity likelihood, the one-pixel potential is set as:

where the probability density is modeled as a Gaussian mixture, whose parameters are estimated from segmented images at time k-1 via EM:


• For the intensity likelihood, the two-pixel potential is defined as non-zero only when two pixels belong to different objects:

• For the motion likelihood, the one-pixel potential:

where the density of motion is defined as one Gaussian:


• For the motion likelihood, the two-pixel potential imposes smoothness of motion in one object.

• By adding the two-pixel potentials, the conditional independence assumption for the observed data such as in HMM and MRF are not required….

The S-T CRF Filter

• Given the potentials of the distribution at time k, the posterior at time k+1 is recursively updated as:

• Which can be efficiently approximated by a CRF with the potentials in eq. (5a) and (5b) from Appendix A.

Initialization and Optimization

• Initialization Segmentation Field:

The initial frame is divided into small blocks and the number of objects is determined by clustering the motion of each block - Each object with one motion model.

• Optimization:

got from an iterative procedure.

Results and Conclusions

• 24-pixel spatial neighborhood

• 15-pixel temporal neighborhood

• C Program

• Speed: Two 320*240 frames per second on Pentium 4 2.8 G PC.

Mother-Daughter Sequence

Sofa behind Mother and Daughter’s Shoulder are more accurate.

Coastguard Sequence

The ship’s bottom and the boat’s trail are more accurate.

References

[1] Wallach, H.M.: Conditional random fields: An introduction. Technical Report MS-CIS-04-21, University of Pennsylvania (2004)

[2] J. Lafferty, A. McCallum, and F. Pereira. “Conditional random fields: Probabilistic models for segmenting and labeling sequence data.” Proc. Int’l Conf. Machine Learning, pp. 282-289, 2001.

[3] Wang, Y., Loe, K.F., Wu, J.K.: A dynamic conditional random field model for foreground and shadow segmentation. IEEE Trans Pattern Anal Mach Intell. 28(2):279-89 (2006)

Documents

A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences