Iterated dynamic programming and quadtree subregioning for fast stereo matching

Available online at www.sciencedirect.com

www.elsevier.com/locate/imavis

Image and Vision Computing 26 (2008) 1371–1383

Iterated dynamic programming and quadtree subregioningfor fast stereo matching

Carlos Leung a,*,1,2, Ben Appleton a,1,3, Changming Sun b

a Electromagnetics and Imaging, ITEE, The University of Queensland, Brisbane, Qld 4072, Australiab CSIRO Mathematical and Information Sciences, Locked Bag 17, North Ryde, NSW 1670, Australia

Received 18 February 2005; received in revised form 12 November 2007; accepted 23 November 2007

Abstract

The application of energy minimisation methods for stereo matching has been demonstrated to produce high quality disparity maps.However, the majority of these methods are known to be computationally expensive requiring minutes of computation. In this paper, wepropose a fast minimisation scheme that produces high quality stereo reconstructions for significantly reduced running time, requiringonly a few seconds of computation. The minimisation scheme is carried out using our iterated dynamic programming algorithm, whichiterates over entire rows and columns for fast stereo matching. A quadtree subregioning process is also used for efficient computation of amatching cost volume where iterated dynamic programming operates on.� 2008 Published by Elsevier B.V.

Keywords: Stereo matching; Energy minimisation; Iterated dynamic programming; Quadtree subregioning

1. Introduction

The study of computational stereo has undergone inten-sive research since its inception in the 1970s. Stereo match-ing is the main step for the recovery of the 3D structure ofthe scene given a pair of images. By matching primitivessuch as points, curves and regions between the stereo pair,such that the matched primitives are projections of thesame 3D identity in the scene, a disparity map of the scenecan be computed.

While simple local correspondence methods have theadvantage of low computational complexity they suffer

0262-8856/$ - see front matter � 2008 Published by Elsevier B.V.

doi:10.1016/j.imavis.2007.11.013

* Corresponding author. Tel.: +61 7 3836 1606.E-mail addresses: [email protected] (C. Leung), appleton@

google.com (B. Appleton), [email protected] (C. Sun).1 Carlos Leung and Ben Appleton are supported by the Australian

Postgraduate Award and CSIRO Mathematical and Information Sciences.2 Present address: Suncorp, P.O. Box 1453, Brisbane, Qld 4001,

Australia.3 Present address: Google Inc., 201 Sussex Street, Sydney, NSW 2000,

Australia.

from high sensitivity to matching ambiguity and the choiceof matching metric. Reconstructions based on simple localcorrespondence can be improved by incorporating globalconstraints and structural information into the stereomatching process. One such class of global correspondencemethods are those based on dynamic programming (DP).Since the dynamic programming framework allows efficientoptimal solutions, it has been applied to locate the path ofminimum matching cost for each scanline of the image[1–3]. However, since DP is typically applied independentlyto each scanline, methods that employ this technique sufferfrom interscanline inconsistencies. Several studies haveaddressed this issue by applying postprocessing to itera-tively improve the reconstruction, enforcing interscanlineconstraints. These techniques include minimising the num-ber of horizontal and vertical discontinuities [4], estimatingvertical slopes [5], and using edge maps [6]. These methodsattempt to retain the computational benefits of a dynamicprogramming formulation while avoiding the problem ofhorizontal streaking. However, while these heuristicsimprove interscanline consistencies they do not entirelysolve the problem.

mailto:[email protected]

mailto:appleton@ google.com

mailto:appleton@ google.com

mailto:[email protected]

1372 C. Leung et al. / Image and Vision Computing 26 (2008) 1371–1383

Another class of global correspondence techniques arethose that formulate the stereo matching problem into atwo-dimensional energy minimisation framework. Bydesigning an energy functional whose minima will corre-spond to good stereo reconstructions, the aim is to com-pute a disparity function that minimises this energy.Geman and Geman [7] applied simulated annealing to min-imise the energy function. Sun [8] proposed a two-stagedynamic programming technique to compute disparity sur-faces of maximum total correlation. In recent years algo-rithms based on graph cuts and iterated graph cuts havebeen proposed to solve the optimisation problem [9–14].Graph cut methods produce excellent results at the costof orders of magnitude greater computation than dynamicprogramming techniques.

In many stereo matching algorithms, there is a need toevaluate the similarity or other metric values for matchingpoints. Metric values evaluated for all overlapping regionsof interest between the stereo pair defined by the windowsize and disparity range can be constructed into a matchingcost volume. For fast stereo matching, efficient computa-tion of a matching cost volume is essential. Faugeraset al. [15] developed a recursive technique which is invari-ant to the size of the correlation window to calculate corre-lation coefficients. Sun [16] applied the box-filteringtechnique to achieve fast cross correlation computations.Efficient algorithms have previously been proposed to com-pute the full matching cost volume by processing the entireimage simultaneously.

Sun [8] proposed a rectangular subregioning algorithmin order to reduce the computation cost when constructingthe matching cost volume. By subregioning the images intorectangular regions optimised for minimal computationalload, the reevaluation of the banded cost volume at eachfiner scale can be computed efficiently and quickly. How-ever, there are situations where the rectangular subregion-ing process is not optimal.

In this paper, we present two new techniques for fast ste-reo matching. We propose an iterated dynamic program-ming (IDP) algorithm which minimises an energyfunction that incorporates both intrascanline and inter-scanline regularity. Although our energy minimisationmethod is also based on DP, the use of a multi-directionalsmoothing energy prevents streaking. This frameworkovercomes the interscanline inconsistency problem inherentto DP techniques and produces comparable results to exist-ing energy minimisation algorithms. Hence, we proposeIDP as a new alternative for minimising general energyfunctions of the form to be described in Section 2. Weexplain theoretically and demonstrate with results thatthe proposed algorithm is efficient and a competitiveenergy minimisation scheme. We also propose a quadtreesubregioning (QSR) algorithm that segments stereo imagesinto subregions for the fast computation of the matchingcost volume in a multiscale framework. We improve onprevious techniques and present a quadtree partitioningscheme that efficiently evaluates a banded cost volume. In

Section 4, we present timings and results to demonstratethe quality and speed of the proposed techniques. Initialwork by the same authors was presented in [17].

2. Energy function

The goal of dense two-view 3D reconstruction is torecover the depth of each pixel from a stereo image pair.The stereo pair in this paper is assumed to be rectified suchthat corresponding horizontal scanlines in the two imageslie in the same epipolar plane. A disparity function dð~xÞ rep-resents the horizontal displacement for each point~x of thereference image and is related to the depth of that point inthe scene. We formulate the stereo problem into an energyminimisation framework, such that good reconstructionscorrespond to minima of the energy function. To each dis-parity function dð�Þ we associate an energy E½d� quantifyingthe matching quality. We minimise an energy function thatincludes terms for data fidelity and regularisation:

E½d� ¼X~x

cð~x; dð~xÞÞ þX~x1�~x2

eðj dð~x1Þ � dð~x2Þ jÞ: ð1Þ

The first term in Eq. (1) accounts for the matching costcð�Þ of pixel correspondences. Many matching metricshave been proposed in the literature, either measuringsimilarities or dissimilarities between corresponding prim-itives [18]. Stereo reconstruction based solely on match-ing criteria however is an ill-posed problem which hasmany solutions. The second term of Eq. (1) imposesthe assumption of regularity onto the disparity functionto obtain solutions which are considered likely fromprior knowledge. Here the edge function eð�Þ is selectedto penalise discontinuities in dð�Þ and � is the neighbour-hood relation between points. A variety of edge func-tions have been proposed in the literature, includingquadratic functions, discontinuity-preserving functions,and terms dependent on intensity differences [19]. Dis-continuity-preserving edge functions have the propertyof giving bounded penalties to very large disparityjumps. Minimising such functions therefore allows largejumps in the disparity estimate at object boundaries, thusavoiding the oversmoothing which is common to otherenergy functions.

Although discontinuity-preserving energy functions aredesirable in stereo reconstruction, Boykov et al. showedthat the minimisation of such an energy function isNP-hard by analogy to the Potts model [10]. They pro-posed a multiway graph cut framework which can com-pute a strong local minimum of such energy functions.Kolmogorov and Zabih [12] extended their graph cutframework and minimised an energy function with anextra visibility term in order to model occlusions. In thispaper we do not include a visibility term and just mini-mise the energy function described in Eq. (1). Whilethese optimisation schemes rely on an iterated applica-tion of minimum cuts, we propose a fast alternativeusing IDP.

C. Leung et al. / Image and Vision Computing 26 (2008) 1371–1383 1373

3. Fast stereo matching

3.1. Iterated dynamic programming

Iterated dynamic programming utilises dynamic pro-gramming to compute the optimal multi-labelling of aone-dimensional energy function – that is, to optimise afunction of one variable. Our energy minimisation takesadvantage of this, optimally relabelling entire rows and col-umns of the disparity function at once.

In the case of matching 2D images for stereo reconstruc-tion, the coordinate ~x in Eq. (1) may be described by thecomponents~x ¼ ðx; yÞ. IDP proceeds iteratively by optimis-ing a disparity function dðx; yÞ, relabelling the pixels alonga single row or column if the relabelling results in a lowerenergy. While doing so, it leaves the remainder of the dis-parity function untouched. In this way, we may reduce amulti-dimensional energy minimisation problem to asequence of one-dimensional subproblems. The global min-imum of each of these one-dimensional subproblems maybe obtained using dynamic programming. As the algorithmproceeds the energy decreases monotonically, convergingwhen no row or column remains which may be relabelledto further reduce the energy.

IDP iteratively optimises lines of the disparity function,such as dðx; �Þ and dð�; yÞ, until convergence. At each opti-misation step, the corresponding plane in the 3D matchingcost volume for each scanline or column is considered. Anexample of the operation of IDP is depicted in Fig. 1. In thecase of a column, fixing x ¼ x0, the energy function in Eq.(1) becomes:

E½dðx0; �Þ� ¼ jþX

y

eðj dðx0; yÞ � dðx0; y þ 1Þ jÞ

þX

y

ðcðx0; y; dÞ þ eðj dðx0; yÞ � dðx0 � 1; yÞ jÞÞ

ð2Þ

The first term of Eq. (2), j absorbs components of the energyfunction which are unrelated to the current line being opti-mised. The second term accounts for the energy of the active

x

y d

x x x −1 000

Fig. 1. (a) An instance of a column dðx0; �Þ of the disparity function to be opDotted lines are not considered by the current step of optimisation. (b) The citeration process. Depicted are the neighbouring planes whose disparity valueprogramming is performed. Arrows denote the active edges (in the x0 plane) w

edges, which are the interactions between neighbouringpoints along the vertical line dðx0; �Þ. The third term dependson the values of dðx0; �Þ at individual points along the currentline, absorbing the matching costs and passive edges. Ob-serve that we have converted the multi-dimensional minimi-sation problem posed by Eq. (1) into a one-dimensionalsubproblem whose global minimum may be efficiently com-puted using DP. The algorithm monotonically reduces theenergy of the disparity estimate and converges when everyhorizontal and vertical line is optimal. In the case of a row,fixing y ¼ y0, the energy function in Eq. (1) becomes:

E½dð�; y0Þ� ¼ jþX

x

eðj dðx; y0Þ � dðxþ 1; y0Þ jÞ

þX

x

ðcðx; y0; dÞ þ eðj dðx; y0Þ � dðx; y0 � 1Þ jÞÞ

ð3Þ

Recall that the minimisation of a discontinuity-preservingenergy function is NP-hard. Whilst the computation of theglobal minimum cannot be guaranteed, multiway graph cutshave been shown to compute a strong local minimum of suchenergy functions [10]. Similar to the use of multiway graphcuts, which iteratively compute the optimal binary relabel-ling or the global minimum of two-dimensional subprob-lems, IDP monotonically reduces the energy function ofEq. (2) by decomposing the NP-hard minimisation probleminto a sequence of one-dimensional subproblems which maybe optimally relabelled by dynamic programming. Whereasmultiway graph cuts has the advantage of relabelling all pix-els at once by considering only two disparities during eachoptimisation, IDP has the advantage of considering all avail-able disparities at once during each optimisation step.

In this description, we have neglected the treatment ofthe image borders for brevity. A range of boundary condi-tions may be considered in practice. Zero padding, imageextension and reflecting the images have been experimentedin order to obtain a matching cost for the border pixels. Itwas found however that the best method was to resolveborder cases by allow the optimiser to extrapolate.

It is worth noting that IDP differs from the frameworkof Roy and Cox [13], which, although takes a graph cut

y

+1 y

x

x

x

x

0

0

d

0+1

−1

timised. Solid lines denote active edges, dashed lines denote passive edges.orresponding ðx0; �; �Þ plane in the cost volume to be optimised during thes affect the current optimisation. (c) The planar trellis on which dynamichile dashed lines denote the passive edges.


approach, similarly considers all disparities at once. Theirframework models the matching cost volume as a graphand computes the minimum cut that corresponds to themaximum flow through this graph. The energy functionproposed by Roy and Cox does not use discontinuity-pre-serving regularisation. However as a result of its simplicitythey are able to obtain the global optimum solution.

Here, we summarise the steps of our IDP algorithm:Iterated dynamic programming

(1) Compute the matching cost volume.(2) Initialise dðx; yÞ with ‘greedy’ matching (minimisa-

tion of Eq. (1) considering only data fidelity, e.g.selecting disparity which gives maximum or mini-mum matching cost at position ðx; yÞ).

(3) For each column and row in sequence:

d

y

Fig. 2. Given (a) the coarse scale disparity estimate, the stereo matchingcan be restricted to a banded range at the finer scale. Depicted in (b) is thecross section of a corresponding slice of the narrow band (denoted by thered line in (a)).

(a) For each column of the image.(i) Form Eq. (2) along the current column.(ii) Minimise Eq. (2) using dynamic programming.

(b) For each row of the image.(i) Form Eq. (3) along the current row.(ii) Minimise Eq. (3) using dynamic programming.

(4) If convergence has not been reached, repeat Step 3for those rows or columns which contain neighboursof a relabelled pixel.

It may not be necessary to relabel every line in an opti-misation sweep. In Step 3, when we are considering the rel-abelling of a line which has previously been optimised, adifferent result can only be obtained if one or more of theneighbouring pixels has changed disparity. Therefore asthe algorithm proceeds we note any changes in pixel dispar-ities and flag the rows and columns which may be affectedby this change. In the next optimisation sweep (Step 3),only those rows or columns which have been flagged needto be considered. As the disparity function converges thiscan substantially reduce the computation time of Step 3.Please refer to [20] [Section 2] and [21] for details on theimplementation of the dynamic programming algorithmon a 2D slice of the 3D matching cost volume for Step 3.

Interested readers can view a video simulation of theIDP optimisation process at our website [22]. The videopresents the progressive optimisation of the disparity func-tion from initially zero disparity until its convergence,whereupon the relabelling of any line cannot furtherdecrease the energy E½dðx; yÞ�. The video also demonstrateshow IDP reconstructs the general structure of the 3D scenewithin the first few iterations and progressively refinedetails in subsequent iterations.

3.2. Coarse-to-fine scheme

The application of a coarse-to-fine scheme can also beuseful both as a tool to achieve fast stereo matching andto obtain more reliable disparity maps. By utilising amulti-resolution data structure, the search range at eachpyramid level can be limited to a small range of disparity

values. While the upper levels of the pyramid provide anoverview of the 3D structure of the scene, the lower levelsof the pyramid which operate at higher resolutions providethe finer details. However in a multiscale approach, in thetransition from a coarse estimate to the initialisation of theoptimisation at the next finer level, it is necessary to recom-pute the matching costs for the ranges of interest for allpixels at the new resolution. Since the disparity ranges tosearch within the finer scale is a banded range of the coarseestimate, only a narrow band of the new matching cost vol-ume needs to be evaluated. A cross section of the restrictedsearch range is depicted in Fig. 2.

Beginning with the original image, a pyramid of coar-ser resolution images is constructed by sequentiallydownsampling the image. Amongst the different down-sampling methods, we consider the typical implementa-tion where the coarser resolution image is obtained bytaking the average values of each pixel’s correspondingr � r neighbourhood in the finer resolution image inthe previous pyramid level, where r is the reduction ratioand is usually 2.

Let dcðx; yÞ be the disparity map computed at the coarsescale and df ðx; yÞ the disparity function to be computed atthe next finer scale. From dcðx; yÞ we may derive boundsdminðx; yÞ 6 df ðx; yÞ 6 dmaxðx; yÞ to limit the range of dis-parities to consider in the computation of df to a bandedsearch range. Here, we use power of two downsampling,such that the fine scale has twice the resolution of thecoarse scale. Then the coarse scale estimate localises spatialdiscontinuities in the disparity function at half of the reso-lution of a finer scale. Likewise, the disparity values them-selves are twice as heavily quantised in the coarse scale.Since the disparities in dc are quantised at half the precisionof df , the disparity function at a finer scale may bebounded by 2dc � 1 6 df 6 2dc þ 1 in a region of locallyconstant disparity. This is coupled with an erosion anddilation in the x and y directions to account for spatialupsampling. This is applied to search for discontinuitieswithin a small neighbourhood of the coarse disparity esti-mate. We obtain the bounds dminðx; yÞ and dmaxðx; yÞ bythe following procedure:


Disparity search range estimation

(1) Upsample dc to a finer resolution and scale by afactor of 2 to obtain �df , i.e. �df ¼ 2dc.

(2) Set dmin ¼ �df � 1, dmax ¼ �df þ 1.(3) Erode dmin and dilate dmax by 1 pixel.

These bounds greatly reduce the work required to com-pute the matching costs and the disparity range withinwhich the IDP algorithm must search at a finer scale. Anexample of the disparity search range construction is pre-sented in Fig. 3.

3.3. Quadtree subregioning

In order to perform fast stereo matching, the matchingcost volume cð�Þ needs to be efficiently computed. Sun[16] applied box-filtering and a recursive cross correlationscheme to obtain the matching costs volume of an imagepair of size ðX ; Y Þ with D disparities in OðXYDÞcomputation.

Sun [8] proposed a rectangular subregioning scheme toobtain a set of R rectangles for computing the matchingcost volume with minimised total computational cost.Rectangular subregioning proceeds by dividing the imageinto horizontal stripes, merging neighbouring stripes ofsimilar disparity ranges. This procedure is also carriedout in the vertical direction on each resulting stripe. Thedisadvantage of this method is that the merging of horizon-tal stripes must operate over the full width of the image andtherefore may not adapt well to scenes with a wide range ofdisparities in the horizontal direction. By merging rectan-gles rather than stripes, a more adaptive partitioning maybe obtained. Our aim is to obtain large rectangles withsmall disparity ranges and small rectangles with large dis-parity ranges, so that the overall cost of computing thematching costs within R rectangles,

PR�1i¼0 ðX iY iDiÞ, may

be reduced. We propose a subregioning algorithm whichtakes a divide-and-conquer approach to merging rectan-gles, eliminating the problem experienced when using rect-angular subregioning mentioned earlier.

d

dmin

dmaxdf

y

d

Fig. 3. Given the coarse scale estimate, �df may be processed in order to obtain tnarrow band after: (a) setting dmin ¼ �df � 1 and dmax ¼ �df þ 1, and (b) erosio

Given a box or a subregion of the input image in whicha narrow band of matching costs must be evaluated, wemay either compute the matching costs for the full boxor subdivide this box into quarters. If we already knewthe most efficient way to compute the quarters of a region,it would be simple to evaluate whether merging them mayreduce the computation cost. Since it is possible to estimatethe time required to evaluate these subregions, the previouslevel of subregions which were split in order to obtain thesenew set of smaller subregions can also determine whether itis more cost efficient to merge or split. Recursively applyingthis splitting will therefore produce a quadtree subregion-ing (QSR). The input image is conceptually representedby a quadtree structure. QSR operates on this quadtreein order to partition the image so that the computationaltime of the required matching costs may be minimised.Fig. 4 illustrates an example of the partitioning problemthat the QSR is solving. Depicted is a cross section of thenarrow band that needs to be updated. Whilst consideringthe full box offers the computational benefits of box-filter-ing, unnecessary matching costs are also evaluated. QSRconsiders this tradeoff and seeks to obtain a partitioningthat efficiently evaluates the narrow band with minimumcomputational load.

The QSR algorithm begins by iteratively dividing theimage into quadrants until a predefined minimum boxwidth is reached, resulting in a tree structure. Proceedingrecursively up the tree structure, each parent region com-putes the partitioning of its four children, deciding whethermerging its children into one node will decrease the compu-tational load. From the partition tree computed, similar tothe recursive dynamic programming framework a back-tracking algorithm can then proceed down the tree struc-ture storing all of the subregions to be processed. Thesesubvolumes form the set of quadtree partitions correspond-ing to the minimum computation load. Whilst applyingthis sparse computation of the cost volume can decreasethe computational complexity, we note that it is also possi-ble to decrease the space complexity, minimising memoryusage by storing only a sparse matching cost volume.Fig. 5 shows examples of the segmentations obtained by

dmin

dmaxdf

y

he narrow band dmin and dmax. Depicted is the cross section of a slice of then and dilation of dmin and dmax, respectively.

d

dmin

dmaxd

y

d

dmin

dmaxd

y

d

dmin

dmaxd

y

Fig. 4. Given the partitioning of the narrow band (top row), the corresponding cross section (along the red lines) of the boxes are shown (bottom row). (a)Without any subregioning, the full box is considered maximising the benefits of box filtering. (b and c) The resulting subregions by partitioning the narrowband into quads. QSR will seek to merge or further divide the subregions to obtain the optimal partitioning for minimal computation.

Fig. 5. The results of quadtree subregioning on three real images: (a) Parking meter. (b) Pentagon. (c) Fruit. Depicted are the boxes overlaid on thecomputed disparity function. Observe how small boxes form near discontinuities while large boxes form in regions of similar disparities.


applying QSR on different images. Beginning with theinput image, here are the steps of our QSR algorithm:

Quadtree subregioning

(1) Given a box B of dimensions ðX i; Y iÞ with disparitysearch range Di, partition in x and y to give four chil-dren Bmn where m; n 2 f1; 2g. Each Bmn has its owndisparity search range when evaluating the computa-tional cost.

(2) Compute the minimal computation cost CðBÞ:
(a) Compute the merge cost for window size
ðxw;ywÞ : CmergeðBÞ¼ðX iþxwÞðY iþywÞDiþCoverhead.(b) Compute the split cost CsplitðBÞ ¼

Pm;nCðBmnÞ.

When calculating each CðBmnÞ, the formula usedis similar to the one for CmergeðBÞ, but with differ-ent box size and disparity search range.

(c) Determine the minimal computation costCðBÞ ¼ minðCmergeðBÞ;CsplitðBÞÞ.

(d) Label the box as split or merge accordingly.

(3) Beginning with the largest box B, extract quadtreesubregions.

(a) If this box is labelled merge, add box to list.(b) If this box is labelled split, recurse to children Bmn.

Notice that a large Coverhead implies that the workrequired to administrate and setup the boxes is large. Sinceeach created box, whether it is by merging or because it isthe minimum width, incurs this additional overhead pen-alty, a large Coverhead will drive the QSR process towardsminimising the number of boxes created. An infiniteCoverhead results in a single partition. The choice ofCoverhead will hence adjust the density of the boxes.

3.4. Implementation

The construction of the energy function involves theselection of an appropriate matching metric and edge func-tion such that its minima correspond to good stereo recon-structions of the scene.


The matching cost term relates to local correspondencesand will drive the energy function towards similarities inmatching regions. In this paper, we consider the zero meannormalised cross correlation (ZNCC) and the sum of abso-lute differences (SAD) metrics. Empirically, we have foundthat ZNCC is well suited for real scenes, because it is inde-pendent of differences in brightness and contrast due to thenormalisation with respect to the mean and standard devi-ation. For synthetic images and scenes we have foundSAD, which is easy to implement and requires fewer com-putations, to be well suited.

The edge penalty term controls the spatial smoothnessof the reconstruction and monitors the likelihood of dis-continuities. Whilst a variety of edge penalty functionsare available, discontinuity-preserving energy functionsare desirable since they do not have a tendency to over-smooth object boundaries. However not all energy minimi-sation schemes are capable of minimising a discontinuity-preserving energy function. The IDP energy minimisationframework we propose is capable of minimising suchenergy functions.

An additional advantage of our energy minimisationscheme is that there are no restrictions on the smoothnessconstraint, therefore allowing arbitrary selections of edgefunctions. While the multiway cut framework of Kolmogorovand Zabih [12] requires that the smoothness term should bea metric or semi-metric, IDP has no constraints on thechoice of edge penalty. In our implementation, a varietyof edge functions have been applied such as the Pottsenergy functional which is completely discontinuitypreserving, and edge penalties which are linear functionsof disparity differences.

We have also experimented with energy functions whichinclude an extra term to model detected edge boundaries inorder to guide the minimisation with a priori knowledge ofthe scene. However, we have found that when discontinu-ity-preserving edge functions are used, the effect of includ-ing this extra term is negligible. We believe that objectboundaries are already modelled by the discontinuity-pre-serving edge term.

Empirically, we have found that the following disconti-nuity-preserving edge function as illustrated in Fig. 6 pro-duces good results:

Δd

K2

K1

e

10–1–2 2

Fig. 6. The edge function described by Eq. (4).

eðDdÞ ¼0 j Dd j¼ 0

K1 j Dd j¼ 1

K2 j Dd jP 2

8><>:

ð4Þ

Here, K1 and K2 are regularisation parameters ðK1 6 K2Þ.As we are dealing with integer disparities, Dd will also bean integer. Note that this edge function is a semi-metric.

3.5. Algorithm steps

The steps of our proposed algorithm, which uses thecombination of QSR and IDP, for fast stereo matchingare given below. For details of each step of the algorithm,please see relevant sections of [21].

(1) Build image pyramids with L levels (from 0 to L� 1),from the original left and right images.

(2) Initialize the disparity map as zero for level k ¼ L� 1and start stereo matching at this level.

(3) Perform stereo matching using the method describedin Sections 3.1–3.3 which includes:
(a) Segment images into subregions using QSR based
on the current disparity map as described in Sec-tion 3.3.

(b) Perform fast zero mean normalised correlation toobtain the correlation coefficients for each subre-gions and build a 3D matching cost volume forthe whole image.

(c) Use IDP algorithm to find the disparity map asdescribed in Section 3.1.

(4) If k 6¼ 0, propagate the disparity map to the nextlevel in the pyramid, set k ¼ k � 1 and then go backto Step 3; if k ¼ 0, go to Step 5.

(5) Save or display disparity map.

4. Experimental results

In this section, we present experimental results and tim-ings to demonstrate the high quality of the stereo recon-structions computed using our IDP and QSR algorithms.We compare our new algorithm with existing methodsand with ground truth on a variety of synthetic and realimages. All experiments have been performed on a1.8 GHz Pentium IV under the Windows operating system.The algorithms have been implemented in C++ and com-piled with standard optimisation flags.

A recent comprehensive survey compared the perfor-mance and reconstruction quality of all major stereo corre-spondence algorithms [19]. The algorithms were comparedon four stereo data sets under fixed parameters and associ-ated ground truths are available for evaluation. Here, wepresent the results of IDP on the same data set. In our fixedparameter experiment, we used the SAD metric with 3� 3comparison windows and applied the edge functiondescribed in Eq. (4) with K1 ¼ 200 and K2 ¼ 1000. Figs.


7–10(c) demonstrates our reconstructions on the Tsukuba,Sawtooth, Venus and Map data set, respectively.

In order to compare the quality of IDP to the state ofthe art, we have included the reconstructions computedusing the following methods considered in the surveypaper: the a-expansion graph cut method of Kolmogorovand Zabih [12] (GC+occl.); the a� b swap moves algo-rithm proposed by Boykov et al. [10] and implementedby [19] (GC); a scanline optimisation algorithm whichsolves the same energy functional as the one described inthis paper, except that vertical smoothness terms areignored (SO); and a dynamic programming algorithmwhich considers occlusions (DP). Their reconstructionsare presented in Figs. 7–10(d, f, g, and h) for qualitativecomparisons.

The stereo survey also defined a set of error measuresto quantitatively evaluate the overall quality of the com-

Fig. 7. Tsukuba data set (a and b). Presented are (e) the ground truth and salgorithm, (d) the a-expansion algorithm of [12] (GC + occl.), (f) the graph cutsolution of [19].

Fig. 8. Sawtooth data set (a and b). Presented are (e) the ground truth and salgorithm, (d) the a-expansion algorithm of [12] (GC + occl.), (f) the graph cutsolution of [19].

puted correspondences, one of which is the percentage ofmismatched pixels in unoccluded regions. If the differ-ence between the computed and the true disparity atan image position is larger than a threshold, say 1, wesay it is a mismatched pixel. The percentage of mis-matched pixels in unoccluded regions can be calculatedby the total number of mismatches and the total numberof pixels in the unoccluded regions. Here, we comparethe results of our method to these competing algorithmsusing the same error metric. Table 1 summarises theerror percentages using a fixed set of parameters acrossall images, as well as using parameters optimised foreach image.

Fig. 11 showcases additional stereo reconstructionsobtained using IDP as the energy minimisation scheme.The stereo data in Fig. 11(a, b, c, e, and f) are realimages of the Parking meter, Pentagon, Fruit, Shrubs

tereo reconstructions obtained under fixed parameters using (c) our IDP(GC), (g) scanline optimisation (SO), and (h) dynamic programming (DP)

tereo reconstructions obtained under fixed parameters using (c) our IDP(GC), (g) scanline optimisation (SO), and (h) dynamic programming (DP)

Fig. 9. Venus data set (a and b). Presented are (e) the ground truth and stereo reconstructions obtained under fixed parameters using (c) our IDPalgorithm, (d) the a-expansion algorithm of [12] (GC+occl.), (f) the graph cut (GC), (g) scanline optimisation (SO), and (h) dynamic programming (DP)solution of [19].

Fig. 10. Map data set (a and b). Presented are (e) the ground truth and stereo reconstructions obtained under fixed parameters using (c) our IDPalgorithm, (d) the a-expansion algorithm of [12] (GC + occl.), (f) the graph cut (GC), (g) scanline optimisation (SO), and (h) dynamic programming (DP)solution of [19].

Table 1A quantitative comparison of the reconstruction quality of IDP againstfour other algorithms presented in Figs. 7–10: the a-expansion algorithmof [12] (GC + occl.), the graph cut (GC), scanline optimisation (SO) anddynamic programming (DP) solution of [19]

Tsukuba Sawtooth Venus Map

Fixed Best Fixed Best Fixed Best Fixed Best

GC+occl. 1.27 – 0.36 – 2.79 – 1.79 –GC 1.94 1.94 1.30 0.98 1.79 1.48 0.31 0.09SO 5.08 4.66 4.06 3.47 9.44 8.31 1.84 1.04DP 4.12 3.82 4.84 3.70 10.10 9.13 3.33 1.21

IDP 3.27 2.94 1.83 1.01 1.52 1.34 0.17 0.11

Quoted are the percentage of mismatching pixels in unoccluded regions,using both fixed and optimised parameters for each image.


and Trees scene computed using the ZNCC cost metric.Fig. 11(d) is the stereo reconstruction of additionalimage data obtained from the recent studies of [19].The stereo reconstructions are obtained by applyingQSR to efficiently compute the matching costs required.

To demonstrate the efficiency of our stereo matchingscheme, we present the running times for stereo reconstruc-tions which benefited from the use of QSR in Table 2. Therunning times for each component of our algorithm areprovided in order to highlight the effectiveness and compu-tational savings of QSR.

Table 3 lists the running times of each component ofthe algorithm for the stereo reconstructions computedusing our IDP minimisation scheme. Table 4 comparesthe running time of IDP against the four algorithmswe used to compare quantitatively in Table 1. The qual-ity and speed of the reconstructions establish IDP as a

competitive energy minimisation scheme for stereomatching.

Fig. 11. Stereo reconstructions obtained using our IDP framework on the (a) Parking meter, (b) Pentagon, (c) Fruit, (d) Cone, (e) Shrubs, and (f) Treesscene. A coarse-to-fine approach combined with QSR was used with ZNCC as the matching metric.


5. Discussion

5.1. Reconstruction quality

The stereo images of Figs. 7–10 qualitatively appear tobe well reconstructed by our IDP framework. We furtheranalyse the accuracy of our stereo reconstructions by com-paring it to the ground truth of these standard data sets. Toquantitatively evaluate our reconstructions, we compareour algorithm with competing methods which similarly

take an energy minimisation approach and have beenshown to produce good results. The percentage of mis-matched pixels in unoccluded regions is used as an errormetric and the error percentages for all of the computeddisparity functions presented in Figs. 7–10 for the four datasets are summarised in Table 1.

One of the benefits of IDP is that despite utilisingdynamic programming it does not suffer from the classicalproblem of interscanline inconsistency, avoiding the hori-zontal streaking typical in algorithms which optimise one

Table 2The computation times of each component of the algorithm for reconstructions presented in Fig. 11

Timings Parking meter (512 � 480 � 31) Pentagon (512 � 512 � 21) Fruit (512 � 512 � 46)

No QSR QSR No QSR QSR No QSR QSR

Matching cost evaluations (�106) 5.7 1.7 4.1 1.0 11.8 3.0Matching cost time (s) 1.97 1.00 2.47 1.43 4.49 2.50Optimisation time (s) 0.81 0.81 1.44 1.44 1.86 1.86QSR computation time (s) – 0.05 – 0.07 – 0.05

Total time (s) 2.78 1.86 3.91 2.94 6.35 4.41

Cone (450 � 375 � 65) Shrubs (512 � 480 � 20) Trees (256 � 233 � 9)

Matching cost evaluations (�106) 8.6 2.5 4.2 1.1 0.5 0.3Matching cost time (s) 2.55 1.50 1.62 0.89 0.18 0.16Optimisation time (s) 2.38 2.38 1.36 1.36 0.22 0.22QSR computation time (s) – 0.02 – 0.04 – 0.01

Total time (s) 4.93 3.90 2.98 2.29 0.40 0.39

Observe that QSR reduces the computational load of the matching cost computation. The combination of IDP and QSR produces a very efficientalgorithm for stereo correspondence.

Table 3The computation times of each component of the algorithm for reconstructions presented in Figs. 7–10

Timings Tsukuba (384 � 288 � 10) Sawtooth (434 � 383 � 21) Venus (434 � 380 � 21) Map (284 � 216 � 31)

Matching cost time (s) 0.21 0.61 0.65 0.17Optimisation time (s) 0.49 3.28 3.59 0.90

Total time (s) 0.70 3.89 4.24 1.07

Table 4A comparison of the running time of IDP against the four algorithmspresented in Figs. 7–10: the a-expansion algorithm of [12] (GC+occl.), thegraph cut (GC), scanline optimisation (SO) and dynamic programming(DP) solution of [19]

Time (seconds) Tsukuba Sawtooth Venus Map

GC + occl. 69.8 154.4 239.9 64.0GC 662.0 735.0 829.0 480.0SO 1.1 2.2 2.3 1.3DP 1.0 1.8 1.9 0.8

IDP 0.7 3.9 4.2 1.1

The running times are courtesy of [19]. Although the algorithms are per-formed on different platforms, it demonstrates that IDP is an efficient andfast algorithm for stereo reconstruction.


scanline at a time. This streaking effect, which can beobserved in Figs. 7–10(g, h), is the main contributing factorfor the high error percentages recorded for these tech-niques. IDP avoids the interscanline inconsistency of thesemethods through the use of a multi-directional smoothingenergy function and minimisation scheme. From the com-puted reconstructions, it can be observed that IDP is free ofdimensional bias. IDP retains the efficiency of dynamicprogramming whilst being immune to the problems of indi-vidual scanline optimisation techniques.

Although the scanline optimisation technique minimisesan energy function which is similar to Eq. (1), the absenceof regularising edge costs between scanlines causes inter-scanline inconsistencies. Iterated graph cuts have been sub-

sequently introduced and studied in order to compute highquality reconstructions through its large moves approachwhich considers the binary labelling of all image pixels ateach iteration. In this paper, we propose IDP as a fastalternative to iterated graph cuts. Where iterated graphcuts have the advantage of relabelling all pixels at once,IDP has the advantage of considering all available dispar-ities at once. The two methods therefore minimise theenergy function from different perspectives and have differ-ent advantages and disadvantages. The error percentagesfor these two approaches are summarised in Table 1. Fromthese results, we observe that IDP is a good optimisationalgorithm and stereo reconstructions produced by IDPare competitive with those computed using graph cuttechniques.

The variations observed in the error percentages for theTsukuba, Sawtooth, Venus and Map data sets are becauseIDP tends to be more suited for scenes with larger disparityranges. The relatively high error percentage in the Tsukubadata set is due to the small disparity range in the scene. Ourenergy minimisation scheme computes the optimal multi-labelling of a row or column and generally performs betteron scenes with large disparity ranges, since it considers agreater large move. By contrast graph cut methods preferimages with fewer disparities, converging to the optimumon images with two disparities. This difference is evidentin the results on the Sawtooth and Venus scenes whichhave twice the disparities of the Tsukuba scene; and theMap scene which has three times the disparity range of

Fig. 12. The resulting effects of altering the regularisation parameter K1 and K2 of Eq. (4). (a) K1 ¼ K2 (b) K1 0:2K2 (c) K1 � K2.


the Tsukuba scene. In these reconstructions IDP gives thestrongest performance. Extra examples of reconstructionson scenes with larger disparity ranges are presented inFig. 11.

The IDP framework proposed does not explicitly modelocclusions by including a visibility term in the energy func-tion. Occluded pixels are resolved by the IDP optimiser.Since the matching costs in occluded regions are approxi-mately constant and uninformative, which in our frame-work equates to a higher matching metric, the smoothingterm will dominate the energy minimisation. Borderregions similarly have uninformative matching costs sincethey do not have well-defined neighbours. Border pixelsare therefore similarly resolved by the IDP optimiser, withthe smoothness term becoming more influential in directingthe energy minimisation.

Varying the regularisation parameters K1 and K2 of theedge penalty described in Eq. (4) adjusts the smoothness ofthe reconstruction. Fig. 12 demonstrates the differenteffects of altering the regularisation parameters. For exam-ple, in the case where K1 ¼ K2 the edge function willbecome the Potts energy functional [10]; whereas in thecase where K1 � K2 the neighbourhood system will be lim-ited to be within ±1 which is similar to [8]. The choice ofregularisation is a tradeoff between favouring depth discon-tinuities, which encourages sharper object edges, at theexpense of surface smoothness. Using Eq. (4) as thesmoothness term, we have found a ratio of 0.2 betweenK1 and K2 to be a reasonable tradeoff. The IDP frameworkis robust to the choice of model parameters K1 and K2, andalso parameters such as disparity ranges and window sizes.The choice of the disparity range and quantisation isdependent upon the scene; while the choice of the windowsize is related to the smoothness and level of detail desiredin the reconstruction. In our reconstructions, we have con-sistently used 3� 3 and 5� 5 windows. When SAD is cho-sen as the matching metric IDP is capable of computing thedisparity function without the need for windowing, whichis equivalent to the use of 1� 1 windows.

5.2. Running times

While methods based on dynamic programming typi-cally require only a few seconds of computation, graph

cut methods require 1–10 min [19]. The advantage of IDPis that it retains the computational benefits of a dynamicprogramming formulation while producing results compet-itive with slower graph cut approaches. Tables 2 and 3present the running times of our fast stereo framework todemonstrate the computational efficiency of IDP. Theeffectiveness of adding a QSR process is also demonstratedby presenting a breakdown of the running times for eachcomponent of the algorithm.

Quadtree subregioning was introduced in Section 3.3 tominimise the amount of computation required when evalu-ating the matching cost volume in a coarse-to-fineapproach. Note that the use of QSR does not affect thequality of the stereo reconstruction but merely speeds upthe computation of the matching costs. Examples of QSRwere presented in Fig. 5. Observe that regions of constantdisparity tend to form large partitions while regions ofhighly varying disparity such as object boundaries tendto have smaller boxes.

Table 2 lists the total number of metric evaluations com-puted in a coarse-to-fine matching with and without the useof QSR for the stereo reconstructions in Fig. 11. For theseexamples, QSR reduces the number of metric evaluationsby approximately a factor of four and approximatelyhalved the time required to compute the matching costs.The computational savings of QSR clearly outweigh thesmall overhead of the QSR algorithm. The relatively smal-ler amount of gain in the case of the Trees scene is due tothe small disparity range which results only in a smallamount of redundant computation even if no subregioningis applied.

Table 3 lists the timings for the reconstructions of Figs.7–10 which are computed using SAD. In Table 4, we com-pare the running time of IDP for the reconstructions of Figs.7–10 against the four algorithms we used to compare thereconstruction quality of IDP in Table 1. The timings forthe algorithms we are comparing to are obtained from [19].The dramatic differences in the running time of the a-expan-sion algorithm of [12] (GC+occl.) and the graph cut (GC)implementation of [19] is due the different implementationsand specialisations of the graph cut methods. These resultswere obtained on different platforms, however Table 4 indi-cates that IDP retains the computational advantage of algo-rithms that optimise one scanline at a time. While remaining


competitive in terms of reconstruction quality to graph cuttechniques, IDP is able to minimise a discontinuity-preserv-ing energy function in seconds rather than minutes. From thequality and speed of our reconstructions, we have demon-strated with results that IDP is a competitive and efficientenergy minimisation scheme for fast stereo reconstruction.

6. Conclusion

We have presented a new algorithm, iterated dynamicprogramming, for fast stereo reconstruction. Takingadvantage of dynamic programming’s ability to computethe optimal multi-labelling of a one-dimensional energyfunction, we proposed an iterated dynamic programmingscheme that can minimise a discontinuity-preserving energyfunction. We have also presented an algorithm for the fastcomputation of the cost volume by computing an optimalquadtree subregioning of the disparity image.

Results have been presented and compared to existingstereo energy minimisation algorithms. These results dem-onstrate that iterated dynamic programming is stronglycompetitive in terms of quality with graph cut techniqueswhile maintaining the computational advantages ofdynamic programming techniques. Combined with quad-tree subregioning, iterated dynamic programming com-putes high quality reconstructions in seconds rather thanminutes. The quality and speed of our reconstructionsestablish iterated dynamic programming as a competitiveenergy minimisation scheme for stereo matching.

Acknowledgements

We thank Dr. Daniel Scharstein and Dr. RichardSzeliski for the stereo data sets, reconstructions and groundtruths. We thank Dr. Michael Buckley of CSIRO Mathe-matical and Information Sciences, Australia for his com-ments on this paper. Carlos Leung and Ben Appletonwould like to thank Professor Brian C. Lovell for hissupervision.

References

[1] D. Geiger, B. Ladendorf, A.L. Yuille, Occlusions and binocular stereo,International Journal of Computer Vision 14 (3) (1995) 211–226.

[2] G.L. Gimel’farb, V.M. Krot, M.V. Grigorenko, Experiments withsymmetrized intensity-based dynamic programming algorithms forreconstructing digital terrain model, International Journal of ImagingSystems and Technology 4 (1) (1992) 7–21.

[3] S.A. Lloyd, A dynamic programming algorithm for binocular stereovision, GEC Journal of Research 3 (1) (1985) 18–24.

[4] I.J. Cox, S. Hingorani, S. Rao, B. Maggs, A maximum likelihoodstereo algorithm, Computer Vision and Image Understanding 63 (3)(1996) 542–567.

[5] P.N. Belhumeur, A Bayesian approach to binocular stereopsis,International Journal of Computer Vision 19 (3) (1996) 237–262.

[6] Y. Ohta, T. Kanade, Stereo by intra- and inter-scanline search usingdynamic programming, IEEE Transactions on Pattern Analysis andMachine Intelligence PAMI-7 (2) (1985) 139–154.

[7] S. Geman, D. Geman, Stochastic relaxation, Gibbs distribution, andthe Bayesian restoration of images, IEEE Transactions on PatternAnalysis and Machine Intelligence 6 (6) (1984) 721–741.

[8] C. Sun, Fast stereo matching using rectangular subregioning and 3Dmaximum-surface techniques, International Journal of ComputerVision 47 (1/2/3) (2002) 99–117.

[9] S. Birchfield, C. Tomasi, Multiway cut for stereo and motion withslanted surfaces, in: Proceedings of International Conference onComputer Vision, 1999, pp. 489–495.

[10] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimi-zation via graph cuts, IEEE Transactions on Pattern Analysis andMachine Intelligence 23 (11) (2001) 1222–1239.

[11] H. Ishikawa, D. Geiger, Occlusions, discontinuities, and epipolarlines in stereo, in: Proceedings of European Conference on ComputerVision, Freiburg, Germany, 1998, pp. 232–248.

[12] V. Kolmogorov, R. Zabih, Multi-camera scene reconstruction viagraph cuts, in: Proceedings of European Conference on ComputerVision, Vol. 3, London, UK, 2002, pp. 82–96.

[13] S. Roy, I.J. Cox, A maximum-flow formulation of the N-camerastereo correspondence problem, in: Proceedings of InternationalConference on Computer Vision, IEEE, Bombay, India, 1998, pp.492–499.

[14] O. Veksler, Efficient graph-based energy minimization methods incomputer vision, Ph.D. thesis, Cornell University (1999).

[15] O. Faugeras, B. Hotz, H. Mathieu, T. Vieville, Z. Zhang, P. Fua, E.Theron, L. Moll, G. Berry, J. Vuillemin, P. Bertin, C. Proy, Real timecorrelation-based stereo: algorithm, implementations and applica-tions, Tech. Rep. RR-2013, INRIA (1993).

[16] C. Sun, A fast stereo matching method, in: Digital Image Computing:Techniques and Applications, Massey University, Auckland, NewZealand, 1997, pp. 95–100.

[17] C. Leung, B. Appleton, C. Sun, Fast stereo matching by iterateddynamic programming and quadtree subregioning, in: A. Hoppe, S.Barman, T. Ellis (Eds.), British Machine Vision Conference, vol. 1,Kingston University, London, 2004, pp. 97–106.

[18] M.Z. Brown, D. Burschka, G.D. Hager, Advances in computationalstereo, IEEE Transactions on Pattern Analysis and Machine Intel-ligence 25 (8) (2003) 993–1008.

[19] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Journal ofComputer Vision 47 (1/2/3) (2002) 7–42.

[20] C. Sun, S. Pallottino, Circular shortest path in images, PatternRecognition 36 (3) (2003) 711–721.

[21] C. Leung, Efficient methods for 3D reconstruction from multipleimages, Ph.D. thesis, University of Queensland (2006). Availablefrom: <http://www.itee.uq.edu.au/~iris/ComputerVision/Leung/index.html> (October 2005).

[22] C. Leung, B. Appleton, C. Sun, Iterated dynamic programming andquadtree subregioning for fast stereo matching. Available from:<http://extra.cmis.csiro.au/IA/changs/idp> (February 2005).

http://www.itee.uq.edu.au/~iris/ComputerVision/Leung/index.html

http://www.itee.uq.edu.au/~iris/ComputerVision/Leung/index.html

http://extra.cmis.csiro.au/IA/changs/idp

Documents

Iterated dynamic programming and quadtree subregioning for fast stereo matching