Stereo Vision in Structured Environments by Consistent ... ?· Stereo Vision in Structured Environments…

Embed Size (px)

Text of Stereo Vision in Structured Environments by Consistent ... ?· Stereo Vision in Structured...

  • Stereo Vision in Structured Environments by Consistent Semi-Global Matching

    Heiko HirschmullerInstitute of Robotics and Mechatronics Oberpfaffenhofen

    German Aerospace Center (DLR), 82230 Wessling,


    This paper considers the use of stereo vision in struc-tured environments. Sharp discontinuities and large un-textured areas must be anticipated, but complex or natu-ral shapes of objects and fine structures should be handledas well. Additionally, radiometric differences of input im-ages often occur in practice. Finally, computation time isan issue for handling large or many images in acceptabletime. The Semi-Global Matching method is chosen as itfulfills already many of the requirements. Remaining prob-lems in structured environments are carefully analyzed andtwo novel extensions suggested. Firstly, intensity consis-tent disparity selection is proposed for handling untexturedareas. Secondly, discontinuity preserving interpolation issuggested for filling holes in the disparity images that arecaused by some filters. It is shown that the performance ofthe new method on test images with ground truth is compa-rable to the currently best stereo methods, but the complex-ity and runtime is much lower.

    1. IntroductionStructured environments can contain many difficulties

    for stereo vision. There are often sharp depth discontinu-ities and large untextured areas. However, there may also bemore natural, rounded or fine structured objects like chairsor plants. All of these situations need to be handled forcomputing accurate disparity images for applications likereconstruction. Additionally, radiometric differences of in-put images can often not be avoided in practice, especiallyif individual images are taken at different times and auto-calibrated later. Finally, processing time is often an impor-tant issue for processing either large or many images in ac-ceptable time.

    Section 2 reviews some stereo algorithms that currentlyproduce the best results. The Semi-Global Matching (SGM)method is selected as it addresses already most of the re-quirements. SGM is reviewed and its behavior analyzed onimages in structured environments in Section 3. Two novel

    refinements are suggested for tackling specific problems inSection 4. Finally, Section 5 shows the performance of thenew method on standard test images as well as images oftypical indoor scenes.

    2. Literature reviewSince the publication of Scharstein and Szeliskis tax-

    onomy of stereo algorithms [9], many authors have partic-ipated in an on-line evaluation [8]. The addition of morecomplex test images [10] has lead to the new Tsukuba,Venus, Teddy and Cones data set. Especially the last twoimage pairs are quite complex and realistic.

    Almost all of the currently top-ranked algorithms [11,13, 2, 5, 7, 14] on this data set define a global energy func-tion that is minimized for finding the disparities. This en-ergy function always includes a data term and a smooth-ness term. The former evaluates the matching of individ-ual pixels, while the latter supports piecewise smooth dis-parity selections. Some methods use more terms for pe-nalizing occlusions [2, 7] or alternatively treating visibility[11, 13]. Furthermore, some methods [11, 13, 14, 5] enforcethe consistency of the disparity of all used stereo images(e.g. left/right consistency or symmetry).

    The strategies for finding the minimum of the global en-ergy function differ. The classical approach is Graph Cuts[7], which casts the problem into finding the minimum cutin a graph. Belief Propagation [11] iteratively sends mes-sages between neighboring nodes in the four connected im-age grid for minimizing the global cost. One of the bestranked variant forces symmetrical matching and uses seg-mentation as soft constraints. Layered approaches [2, 14]perform image segmentation and use the assumption thatdisparities vary smoothly (e.g. planar) within each segment.An initialization by a simple method like correlation is typi-cally complemented by an iterative refinement of the dispar-ity selection. Furthermore, there are also methods [13] thatcombine Belief Propagation with segmentation and planefitting in an iterative loop. Finally, the Semi-Global Match-ing [5] method sums for each pixel the costs along 1D pathsfrom several directions. In contrast to all other methods,

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA, June 17-22, 2006.

  • its pixelwise matching cost is not based on comparing in-tensities directly, but on Mutual Information [12] betweenthe stereo images. This makes it very robust against radio-metric differences and some violations of the assumption oflambertian surfaces, e.g. reflections.

    The complexity of a global algorithm is usually high andcan depend on the complexity of the scene [2]. Conse-quently, most of these methods [11, 13, 2, 7] have a runtimeof at least a minute on typical images. In contrast, the Semi-Global Matching method [5] has a complexity of O(WHD)(i.e. number of pixels times disparity range) and a runtimeof just 1-2s under similar conditions.

    The Semi-Global Matching method has been selected forthe discussed problem, due to its accuracy at depth discon-tinuities, robustness of matching in the presence of radio-metric differences and execution speed.

    3. Semi-Global Matching (SGM)A stereo algorithm uses a base image Ib and a match im-

    age Im for calculating a disparity image D that correspondsto the base image. It is assumed that the epipolar geometrybetween the images is known. An epipolar line ebm(p,d) inthe match image is defined by the pixel p in the base imageand the disparity d as line parameter. For rectified imagesebm(p,d) = [pxd,py]T . It is noteworthy that certain cam-era geometries (e.g. pushbroom cameras that do not moveon a straight path) do not allow an exact rectification of theresulting images [6]. Therefore, a general definition usingarbitrarily defined epipolar lines is preferred.

    3.1. Review

    The Semi-Global Matching (SGM) method [5] aims todetermine the disparity image D, such that the global energyE(D) is a minimum.

    E(D) = p

    (C(p,Dp) + qNp

    P1 T[|DpDq|= 1]

    + qNp

    P2 T[|DpDq|> 1])(1)

    The first term of equation (1) calculates the sum of apixelwise matching cost C(p,Dp) for all pixels p at theirdisparities Dp. The cost function can either be Birchfieldand Tomasis sampling insensitive intensity difference [1]or Mutual Information [5]. The latter one has the advantagethat it takes complex relations between corresponding inten-sities into account. This has been shown to be very robustagainst radiometric differences that often occur in practice[5, 6]. The function T[] is defined to return 1 if its argumentis true and 0 otherwise. Thus, the second term of the energyfunction penalizes small disparity differences of neighbor-ing pixels Np of p with the cost P1. Similarly, the third term

    penalizes larger disparity steps (i.e. discontinuities) with ahigher penalty P2.

    The value of P2 does not depend on the size of the dis-parity step, which preserves discontinuities. However, it hasbeen found advantageous to adapt P2 to the local intensitygradient, because discontinuities are often visible as inten-sity changes. Thus, the penalty should be reduced whereintensities differ, which is expressed as P2 =

    P2|IbpIbq| .

    x, y




    (a) Minimum Cost Path Lr(p, d)



    (b) 16 Paths from all Directions r

    Figure 1. Aggregation of costs.

    Finding the global minimum of equation (1) for thewhole 2D image is known to be an NP-complete Problem.SGM calculates E(D) efficiently along 1D paths from either8 or 16 directions towards each pixel as shown in Figure 1.The cost to reach a pixel p at the disparity d from the direc-tion r is defined according to (1) recursively as,

    Lr(p,d) = C(p,d) + min(Lr(p r,d),Lr(p r,d1) + P1,Lr(p r,d + 1) + P1,min

    iLr(p r, i) + P2)min

    kLr(p r,k).


    The first term is the pixelwise matching cost for the cur-rent pixel. The second term adds the minimum of the costat the previous pixel on the path, including the appropri-ate penalty. According to equation (1), there is no penaltyadded to the cost at the same disparity. The penalty P1 isadded to costs at the next lower or higher disparity and P2is added, if a cost at another disparity is the minimum. Thelast term of function (2) does not have any influence on thesubsequent calculation, but it guarantees that LCmax +P2.Without this term, L would always increase along each pathand its value could exceed the used data type. The calcula-tion of this function can be done in O(ND) steps, where Nis the number of pixels along the path and D is the numberof disparities.

    The costs along the paths from all directions r aresummed S(p,d) = r Lr(p,d). For each pixel p the dispar-ity d is chosen that corresponds to the minimum cost, i.e.Dp = argmind S(p,d). For sub-pixel estimation, a quadraticcurve is fitted through the neighboring costs (i.e. at the nexthigher and lower disparity) and the position of the mini-mum is calculated. The result is a disparity image Db thatcorresponds to the base image Ib.

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA, June 17-22, 2006.

  • (a) Office Image (b) Teddy Image

    Figure 2. SGM results on images with untextured background areas. Black represents filtered (i.e. invalid) disparities.

    The disparity image Dm that corresponds to the matchimage Im can either be derived from the same cost array S()or calculated from scratch. A consistency check comparesthe disparities of Db with Dm and invalidates those that dif-fer, i.e.

    Dp =

    {Dbp if |DbpDmq| 1, q = ebm(p,Dbp),Dinv otherwise.




View more >