53
Segmentation -Based Stereo Michael Bleyer LVA Stereo Vision

Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Embed Size (px)

Citation preview

Page 1: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Segmentation-Based Stereo

Michael BleyerLVA Stereo Vision

Page 2: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

What happened last time? Once again, we have looked at our energy function:

We have investigated the matching cost function m():• Standard measures:

- Absolute/squared intensity differences- Sampling insensitive measures

• Radiometric insensitive measures:- Mutual information- ZNCC- Census

• The role of color

• Segmentation-based aggregation methods

NqpIpp qpsdpmDE

,),(),()(

2

Page 3: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

What is Going to Happen Today?

Occlusion handling in global stereo Segmentation-based matching The matting problem In stereo matching

3

Page 4: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Michael BleyerLVA Stereo Vision

Occlusion Handling in Global Stereo

Page 5: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

There is Something Wrong with our Data Term

Recall the data term:

We compute the pixel dissimilarity m() for each pixel of the left image.

As we know, not every pixel has a correspondence, i.e. there are occluded pixels.

It does not make sense to compute the pixel dissimilarity for occluded pixels.

5

Ippdata dpmDE ),()(

Page 6: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

We Should Modify the Data Term In a more correct formulation, we incorporate occlusion

information:

where - O(p) is a function that returns 1 if p is occluded and 0, otherwise.- Pocc is a constant penalty for occluded pixels (occlusion penalty)

Idea:• We measure the pixel dissimilarity, if the pixel is not occluded.

• We impose the occlusion penalty, if the pixel is occluded.

Why do we need the occlusion penalty?• If we would not have it, declaring all pixels as occluded would represent a trivial

energy optimum. (Data costs would be equal to 0.)

Ipoccpdata pOPpOdpmDE )())(1)(,()(

6

Page 7: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

We Should Modify the Data Term In a more correct formulation, we incorporate occlusion

information:

where - O(p) is a function that returns 1 if p is occluded and 0, otherwise.- Pocc is a constant penalty for occluded pixels (occlusion penalty)

Idea:• We measure the pixel dissimilarity, if the pixel is not occluded.

• We impose the occlusion penalty, if the pixel is occluded.

Why do we need the occlusion penalty?• If we would not have it, declaring all pixels as occluded would represent a trivial

energy optimum. (Data costs would be equal to 0.)

Ipoccpdata pOPpOdpmDE )())(1)(,()(

7

How can we define the occlusion function O()?

Page 8: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Occlusion Function Let us assume we have two surfaces in the left image. We know their disparity values.

8

Dis

pa

rity

X-Coordinates

Left Image

Page 9: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Occlusion Function We can use the disparity values to transform the left image into the

geometry of the right image. (We say that we warp the left image.) The x-coordinate in the right view x’p is computed by x’p = xp – dp.

9

Dis

pa

rity

X-Coordinates

Dis

pa

rity

X-Coordinates

Left Image Right Image

Warp

Page 10: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Occlusion Function We can use the disparity values to transform the left image into the

geometry of the right image. (We say that we warp the left image.) The x-coordinate in the right view x’p is computed by x’p = xp – dp.

10

Dis

pa

rity

X-Coordinates

Dis

pa

rity

X-Coordinates

Left Image Right Image

Warp

Small disparity => Small shift

Page 11: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Occlusion Function We can use the disparity values to transform the left image into the

geometry of the right image. (We say that we warp the left image.) The x-coordinate in the right view x’p is computed by x’p = xp – dp.

11

Dis

pa

rity

X-Coordinates

Dis

pa

rity

X-Coordinates

Left Image Right Image

Warp

Large disparity => Large shift

Page 12: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Occlusion Function There are pixels that project to the same x-coordinate in the right

view (see p and q). Only one of these pixels can be visible (uniqueness constraint).

12

Dis

pa

rity

X-Coordinates

Dis

pa

rity

X-Coordinates

Left Image Right Image

Warp

q

p

q

p

Page 13: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Occlusion Function There are pixels that project to the same x-coordinate in the right

view (see p and q). Only one of these pixels can be visible (uniqueness constraint).

13

Dis

pa

rity

X-Coordinates

Dis

pa

rity

X-Coordinates

Left Image Right Image

Warp

q

p

q

p

Which of the two pixels is visible – p or q?

Page 14: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Occlusion Function There are pixels that project to the same x-coordinate in the right

view (see p and q). Only one of these pixels can be visible (uniqueness constraint).

14

Dis

pa

rity

X-Coordinates

Dis

pa

rity

X-Coordinates

Left Image Right Image

Warp

p

p

q

p

q has a higher disparity =>q is closer to the camera =>

q has to be visible

p is occluded by q

Page 15: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Occlusion Function There are pixels that project to the same x-coordinate in the right

view (see p and q). Only one of these pixels can be visible (uniqueness constraint).

15

Dis

pa

rity

X-Coordinates

Dis

pa

rity

X-Coordinates

Left Image Right Image

Warp

p

q

p

q

Visibility Constraint:

A pixel p is occluded if there exists a pixel q so that p and q have the same matching point in the other view and q has a

higher disparity than p.

Page 16: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

The Occlusion-Aware Data Term We have already defined our data term:

The function O(p) is defined using the visibility constraint:

Ipoccpdata pOPpOdpmDE )())(1)(,()(

16

)( pO1 if0 otherwise.

qp dqdpIq : and qp dd

Page 17: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

The Occlusion-Aware Data Term We have already defined our data term:

The function O(p) is defined using the visibility constraint:

Ipoccpdata pOPpOdpmDE )())(1)(,()(

17

)( pO1 if0 otherwise.

qp dqdpIq : and qp dd

Pixels have the same matching

point

q has a higher disparity than p

Page 18: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

How Can We Optimize That? I just give a rough sketch for using graph-cuts Works for α-expansions and fusion moves I follow the construction of [Woodford,CVPR08]

Page 19: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

How Can We Optimize That? The trick is to add an occlusion node for each node representing a

pixel

q

Oq

Occlusion node for q:Has two states

visible/occluded

Node representing pixel q:Has two statesactive/inactive

(active means that the pixel takes a specific disparity.)

Page 20: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

How Can We Optimize That? Data costs are implemented as pairwise interactions:

• If q is active and Oq is visible, we impose the pixel dissimilarity as costs.

• If q is active and Oq is occluded, we impose the occlusion penalty as costs.

• 0 costs, if q is inactive. (I am simplifying here.)

q

Oq

Occ

lusi

on P

enal

ty

Pix

el D

issi

mila

rity

Page 21: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

How Can We Optimize That? We have another pixel p. If p is active it will map to the same pixel in the right image as q. The disparity of p is smaller than that of q. => We have to prohibit that the occlusion node of p is visible if q is active

(visibility constraint). How can we do that?

q

Oq

p

Op

Page 22: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

How Can We Optimize That? We have another pixel p. If p becomes “active” it will map to the same pixel in the right image as q. The disparity of p is smaller than that of q. => We have to prohibit that the occlusion node of p is in the visible state if

q is active (visibility constraint). How can we do that?

q

Oq

p

Op

We define a pairwise term.The term gives infinite costs if q is active and Op is

visible. => This case will never occur as the result of energy minimization.

Page 23: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Result

I show the result of Surface Stereo [Bleyer,CVPR10] used in conjunction with the presented occlusion-aware data term.

I will speak about the energy function of Surface Stereo next time.

Red pixels are occlusions

Page 24: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Result

Our occlusion term works well, but it is not perfect. It detects occlusions on slanted surfaces where there should not

be occlusions.

Page 25: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Uniqueness Constraint Violated by Slanted Surfaces

A slanted surface is differently sampled in left and right image.

In the example on the right, the slanted surfaces is represented by 3 pixels in the left image and by 6 pixel in the right image.

For slanted surfaces, a pixel can have more than one correspondences in the other view. => uniqueness assumption violated

We will see how we can tackle this problem with Surface Stereo next time. Image taken from [Ogale,CVPR04]

Page 26: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Michael BleyerLVA Stereo Vision

Segmentation-Based Stereo

Page 27: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Segmentation-Based Stereo

Has become very popular over the last couple of years

Most likely because it gives high-quality results This is especially true on the Middlebury set

• Top-positions are clearly dominated by segmentation-based approaches

27

Page 28: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Key Assumptions

We assume that1. Disparity inside a segment can be modeled by a single 3D

plane

2. Disparity discontinuities coincide with segment borders

We apply a strong over-segmentation to make it more likely that our assumptions are fulfilled.

28

Tsukuba left image Result of color segmentation (Segment

borders are shown)

Disparity discontinuities in the ground truth solution

Page 29: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Key Assumptions

We assume that1. Disparity inside a segment can be modeled by a single 3D

plane

2. Disparity discontinuities coincide with segment borders

We apply a strong over-segmentation to make it more likely that our assumptions are fulfilled.

29

Tsukuba left image Result of color segmentation (Segment

borders are shown)

Disparity discontinuities in the ground truth solution

We do not longer use pixels as matching primitive, but segments.

Our goal is to assign each segment to a “good“ disparity plane.

Page 30: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

How Do Segmentation-Based Methods Work?

Two-step procedure:• Initialization:

- Assign each segment to an initial disparity plane

• Optimization:- Optimize the assignment of segments to planes to improve the

initial solution

Segmentation-based methods basically differ in the way how they implement these two steps.

I will explain the steps using the algorithm of [Bleyer,ICIP04].

30

Page 31: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Initialization Step (1)

Two preprocessing steps:• Apply color segmentation on the left image

• Compute an initial disparity match via a window-based method (block matching)

31

Tsukuba left image Color segmentation (Pixels of the same segment are

given identical colors)

Initial disparity map (obtained by block

matching)

Page 32: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Initialization Step (2)

Plane fitting:• Fit a plane to each segment using the initial disparity map

- Is accomplished via least squared error fitting

A plane is defined by 3 parameters a, b and c. Knowing the plane, one can compute the disparity of pixel <x,y>

by dx,y = ax + by + c.32

Color segmentation (Pixels of the same segment are

given identical colors)

Plane fitting result

Page 33: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Initialization Step (2)

Plane fitting:• Fit a plane to each segment using the initial disparity map

- Is accomplished via least squared error fitting

A plane is defined by 3 parameters a, b and c. Knowing the plane, one can compute the disparity of pixel <x,y>

by dx,y = ax + by + c.33

Color segmentation (Pixels of the same segment are

given identical colors)

Plane fitting result

We now try to refine the initial plane fitting result in the optimization step.

Page 34: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Optimization Step We use energy minimization:

• Step 1:- Design an energy function that measures the goodness of an

assignment of segments to planes.

• Step 2:- Minimize the energy to obtain the final solution.

34

Page 35: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Idea Behind The Energy Function We use the disparity map to warp the left image into

the geometry of the right view.

If the disparity map was correct, the warped view should be very similar to the real right image.

35

Reference image Disparity map Warped view

+ =

Warped view Real right view

min

Page 36: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

36

X-Coordinates [pixels]

Dis

pa

rity

[pix

els

]

Left view

S1 S2

S3

Visibility Reasoning and Occlusion Detection

Page 37: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Visibility Reasoning and Occlusion Detection

37

X-Coordinates [pixels]

Dis

pa

rity

[pix

els

]

Left view

S1 S2

S3

X-Coordinates [pixels]

S1 S2

S3

Warped view

Dis

pa

rity

[pix

els

]

Warping

Page 38: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

38

X-Coordinates [pixels]

Dis

pa

rity

[pix

els

]

Left view

S1 S2

S3

X-Coordinates [pixels]

S1 S2

S3

Warped view

Dis

pa

rity

[pix

els

]

Warping

If two pixels of the left view map to the same pixel in the right view, the

one of higher disparity is visible

Visibility Reasoning and Occlusion Detection

Page 39: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

39

X-Coordinates [pixels]

Dis

pa

rity

[pix

els

]

Left view

S1 S2

S3

X-Coordinates [pixels]

S1 S2

S3

Warped view

Dis

pa

rity

[pix

els

]

Warping

If there is no pixel of the left view that maps to a specific pixel of the

right view, we have detected an occlusion.

Visibility Reasoning and Occlusion Detection

Page 40: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

40

Overall Energy Function

Measures the pixel dissimilarity between warped and real right views for visible pixels.

Assigns a fixed penalty for each detected occluded pixel.

Assigns a penalty for neighboring segments that are assigned to different disparity planes (smoothness)

Page 41: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

41

Overall Energy Function

Measures the pixel dissimilarity between warped and real right views (for visible pixels).

Assigns a fixed penalty for each detected occluded pixel.

Assigns a penalty for neighboring segments that are assigned to different disparity planes (smoothness)

How can we optimize that?

Page 42: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

42

Energy Optimization

Start from the plane fitting result of the initialization step. Optimization Algorithm (Iterated Conditional Modes [ICM]):

• Repeat a few times:- For each segment s:

– For each segment t being a spatial neighbor of s:

» Test if assigning s to the plane of t reduces the energy.» If so, assign s to t’s plane.

Plane testing

Page 43: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

43

Results

Ranked second in the Middlebury benchmark at the time of submission (2004)

Computed disparity map Absolute disparity errors

Page 44: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

44

Disadvantages of Segmentation–Based Methods If segments overlap a depth discontinuity, there will definitely be

a disparity error. (Segmentation is a hard constraint.)

A planar model is oftentimes not sufficient to model the disparity inside the segment correctly (e.g. rounded objects).

Leads to a difficult optimization problem• The set of all 3D planes is of infinite size (label set of infinite size)

• Cannot apply α-expansions or BP (at least not in a direct way)

Map reference frame (color segmentation generates segments that overlap

disparity discontinuities)

Ground truth Result of [Bleyer,ICIP04]

Page 45: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Michael BleyerLVA Stereo Vision

The Matting Problem in Stereo

Page 46: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

The Matting Problem Let us do a strong zoom-in on the Tsukuba Image.

At depth-discontinuities, there occur pixels whose color is the mixture of fore- and background colors

These pixels are called mixed pixels.

Mixed pixels

Page 47: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Single Image Matting Methods Do a foreground/background segmentation Bright pixels represent foreground – dark pixels represent background This is not just a binary segmentation! The grey value expresses the percentage to which a mixed pixel belongs

to the foreground. (This is the so-called alpha-value.)

Input image Alpha Matte

Page 48: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Single Image Matting Methods Do a foreground/background segmentation Bright pixels represent foreground – dark pixels represent background This is not just a binary segmentation! The grey value expresses the percentage to which a mixed pixel belongs

to the foreground. (This is the so-called alpha-value.)

Zoomed-in View Alpha Matte

Page 49: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

C = α F + (1 - α) B

How Can We Compute the Alpha-Matte? We have to solve the compositing equation:

= +● ●

● ●

More precisely, given the color image C we have to compute:• The alpha-value α

• The foreground color F

• The background color B These are 3 unknowns in one equation => severely under-constraint

problem. Hence matting methods typically require user input (scribbles)

Page 50: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Why Do We Need it? For Photomontage! We give an image as well as scribbles as an input to the matting algorithm

• Red scribbles mark the foreground

• Blue scribbles mark the background The matting algorithm computes α and F. Using α and F we can paste the foreground object against a new

background.

Input image Novel Background

Page 51: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Why Bother About This When Doing Stereo?

I will now go through the presentation slides for the paper [Bleyer,CVPR09].

You can find them here: http://www.ims.tuwien.ac.at/publication_detail.php?ims_id=262

Page 52: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

Summary Occlusion handling in global stereo Segmentation-based methods The matting problem in stereo

Page 53: Segmentation- Based Stereo Michael Bleyer LVA Stereo Vision

References [Bleyer,ICIP04] M. Bleyer, M. Gelautz, A Layered Stereo Matching

Algorithm Using Global Visibility Constraints, ICIP 2004. [Bleyer,CVPR09] M. Bleyer, M. Gelautz, C. Rother, C. Rhemann, A Stereo

Approach that Handles the Matting Problem Via Image Warping, CVPR 2009.

[Bleyer,CVPR10] M. Bleyer, C. Rother, P. Kohli, Surface Stereo with Soft Segmentation. CVPR 2010.

[Ogale,CVPR04] A. Ogale, Y. Aloimonos, Stereo correspondence with slanted surfaces : critical implications of horizontal slant, CVPR 2004.

[Woodford,CVPR08] O. Woodford, P. Torr, I. Reid, A. Fitzgibbon, Global stereo reconstruction under second order smoothness priors, CVPR 2008.