CENTER FOR MACHINE PERCEPTION Representation of …cmp.felk.cvut.cz/ftp/articles/martinec/Tylecek-TR-2008-02.pdf · Representation of Geometric Objects for 3D Photography ... Representation

CENTER FOR

MACHINE PERCEPTION

CZECH TECHNICAL

UNIVERSITY IN PRAGUE

MAST

ER

TH

ESIS

Representation of GeometricObjects for 3D Photography

Radim Tylecek

[email protected]

CTU–CMP–2008–02

January 18, 2008

Available atftp://cmp.felk.cvut.cz/pub/cmp/articles/martinec/Tylecek-TR-2008-02.pdf

Thesis Advisor: Daniel Martinec

This work has been supported by the Czech Academy of Sciencesunder Project 1ET101210406.

Research Reports of CMP, Czech Technical University in Prague, No. 2, 2008

Published by

Center for Machine Perception, Department of CyberneticsFaculty of Electrical Engineering, Czech Technical University

Technicka 2, 166 27 Prague 6, Czech Republicfax +420 2 2435 7385, phone +420 2 2435 7637, www: http://cmp.felk.cvut.cz

Abstract

This work describes a method of 3D model reconstruction from images that takes provideddisparity maps as an input and fuses them into a consistent 3D model. Camera parametersare re-estimated in the procedure as well. Occlusion boundary artifacts are reduced andholes in disparity maps interpolated over.The result of stereo matching is a disparity map created from correspondences betweenimages. Every map holds information only from the overlapping regions of source imagepair. If all maps from the view of one camera are compared, it shows that for some pointsin the image the values from different pairs are not consistent. This is caused by noisein the images, inaccurately estimated parameters of the cameras and errors in the stereoalgorithm.The goal of this work is to make use of the information fusion and find the optimal depthmaps. These depth maps are chosen as the representation of surface for the output 3Dphotography. On real-world scenes is shown that this representation outperforms thecurrently used “fish-scales” in continuous and more detailed surface.

Anotace

Tato prace popisuje metodu rekonstrukce 3D modelu z obrazu, ktera obdrzı na vstupudisparitnı mapy a spojı je do konzistentnıho 3D modelu. Parametry kamer jsou behem to-hoto postupu tez zpresnovany. Artefakty na hranicıch prekryvu jsou castecne odstranenya dıry v disparitnıch mapach interpolovany.Vysledkem stereo parovanı je disparitnı mapa zıskana z korespondencı mezi obrazy. Kazdamapa obsahuje informace pouze z prekryvajıcıch se oblastı zdrojovych obrazu paru. Pokudse vsechny mapy z pohledu jedne kamery porovnajı, ukaze se, ze v nekterych obrazovychbodech nejsou hloubky prepoctene z ruznych paru konzistentnı. To je zpusobeno sumemv obrazech, nepresne odhadnutymi parametry kamer a chybami ve stereo algoritmu.Cılem teto prace je vyuzıt fuze dat k nalezenı konzistentnıch hloubkovych map. Tytohloubkove mapy predstavujı reprezentaci povrchu pro vyslednou 3D fotografii. Na realnychscenach je ukazano, ze tato reprezentace prekonava soucasne pouzıvane “supiny” spojityma vıce detailnım povrchem.

v

vi

Prohlasenı

Prohlasuji, ze jsem svou diplomovou praci vypracoval samostatne a pouzil jsempouze podklady (literaturu, projekty, SW atd.) uvedene v prilozenem seznamu.

V Praze dne 18. ledna 2008

vii

viii

Acknowledgements

I would like to thank Daniel Martinec for leadership of this diploma thesis.Additionally I would like to thank Radim Sara for valuable advice.Finally I thank my family for support during my studies.

This work has been supported by the Czech Academy of Sciences under projectNo. 1ET101210406.

ix

x

Representation of Geometric Objectsfor 3D Photography

Radim Tylecek

January 18, 2008

ii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Computer Graphics methods . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 Computer Vision methods . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Our goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Reconstruction pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Fish-scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Proposed solution 11

2.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Projection geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.2 Input data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.4 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Estimation of depths Λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Projection constraints . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.2 Surface model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.3 Energy minimisation . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.4 System of equations for the depth task . . . . . . . . . . . . . . . . 23

2.3 Estimation of visibility V . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.1 Discontinuity detection . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3.2 Handling three labels . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.3 The Max-Flow algorithm . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Implementation 31

iii

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Camera selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.1 Initial data projection . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.2 Initial visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.3 Initial depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.4 Image contrast calculation . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Discontinuity estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.1 Depth interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.2 Local depth error calculation . . . . . . . . . . . . . . . . . . . . . . 38

3.4.3 Discontinuity detection . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5 Depth estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.5.1 Projection of 3D points to images . . . . . . . . . . . . . . . . . . . 39

3.5.2 Equations from projective constraints . . . . . . . . . . . . . . . . . 39

3.5.3 Equations for the depth model . . . . . . . . . . . . . . . . . . . . . 40

3.5.4 Solving system of equations . . . . . . . . . . . . . . . . . . . . . . 41

3.5.5 Update of camera centres . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.6 Repairing of correspondences . . . . . . . . . . . . . . . . . . . . . 44

3.6 Visibility estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6.1 Image error calculation . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6.2 Local depth error calculation . . . . . . . . . . . . . . . . . . . . . . 46

3.6.3 Invisible colour histogram calculation . . . . . . . . . . . . . . . . . 48

3.6.4 Running max-flow and data hiding . . . . . . . . . . . . . . . . . . 49

3.7 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.8 Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.8.1 Redundancy removal . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.8.2 Back-projection of depth maps to 3D points . . . . . . . . . . . . . 51

3.8.3 Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.8.4 Texturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.8.5 Export to file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.9 Computational cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.9.1 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . 53

3.9.2 Memory requirements . . . . . . . . . . . . . . . . . . . . . . . . . 53

iv

4 Experiments 55

4.1 Used hardware and software . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Scene 1: Daliborka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Scene 2: Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Scene 3: St. John of Nepomuk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Scene 4: St. Martin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5 Conclusion 73

5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Appendix 81

6.1 Contents of attached data disk . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.1.1 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.1.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.1.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

v

vi

Chapter 1

Introduction

1.1 Motivation

A 3D photography can be defined as a visualised 3D model of an object. In contraryto classical photography, it requires special hardware, such as computer or glasses, to beviewed. The realisation of 3D photography we will use in this work is a data file viewedon a computer screen with use of rendering software. It is interactive as the spectatorcan change the point of view in the object space, or the view changes automatically in apre-defined fly-through virtual tour.

The 3D model can be acquired by means of 3D reconstruction, what is one of the classicalcomputer vision problems [23]. Its aim is to automatically create a 3D model from a setof uncalibrated images.For example, a tourist can take a number of photos of a castle and later at home he canreconstruct its 3D model which can be viewed from various angles. The same thing can doa salesman to present a product on his website in an attractive and visual way. Anotheruse is in the visualisation of scenes in architecture, movies, computer games and manyother applications of computer graphics.

The 3D photography acquisition works on the principles similar to human vision trans-formed into methods of computer vision. The first step is to obtain sparse correspondencesbetween the input images, with which camera auto-calibration is performed. In the nextstep 3D points are reconstructed from computational stereo vision. Finally the 3D surfaceis reconstructed by fusion of these points.

Before a more detailed description of this procedure will be given, the related work in thelast decades challenging the problem of 3D surface representation is described in the nextsection.

1

2

1.2 State of the art

The problem of representation of geometric objects in 3D is traditionally studied in thetwo fields: computer graphics (CG) and computer vision (CV). The difference in approachcomes from primary goals, the treatment and sources of the input data.

In CG, the source of the data is usually from range scanning, typically with use of laserrange finder (LRF). The data acquired using this method are well sampled and moderatelyaccurate thanks to the calibration of devices. For these reasons, the data are more treatedas correct, possibly with some noise but without outliers sticking out, and the methodsused focus on the representation itself and qualities of the surface.

On the other side, in CV, the input data are treated more as measurements of underlingtrue surfaces. This point of view is necessary as the input data also come as the result ofstereo algorithms, another discipline of CV. Such data have a lower level of certainty ascorrespondence search, calibration, light conditions and more factors have effect on them.Generally, CV more takes into account that the data are not perfect and often involvealso the process of data acquisition.

However, it is difficult to draw a sharp line between CG and CV as the fields overlap inthe studied problem.

1.2.1 Computer Graphics methods

The problem of 3D geometric model reconstruction has long been restricted to the problemof surface reconstruction from measured 3D data. Following methods were proposed inCG in the 90’s:

Edelsbrunner [16] introduced the concept of weighted alpha shapes, polytopes determinedby input points with weights and parameter α to control the level of detail. They canbe computed from existing triangulation of the points. Hoppe [25] proposed surfacereconstruction with all-round range data provided. After initial estimation of surface by adense mesh and its optimisation, he performs piecewise smoothing on surfaces subdividedby different discontinuities (creases, corners and darts).

Delingette [13] used simplex meshes as a representation of deformable models with alter-able topology. He is iteratively fitting the mesh from manually assigned initial shape onthe input data, minimising a given functional. Similarly, Poli [34] described a methodbased on model of close elastic thin surface, in analogy to thin-plate model. Also Lin [30]adapted progressive shell model as a 3D extension of progressive contour model, with useof Finite Element Method (FEM). The limitation of these methods is in the assumptionof closed surface being modelled.

A modern (2001) representation for variational reconstruction was developed by Zhao [46].Level sets can handle complicated topology and deformations as well as noisy and non-uniform data. They are based on a continuous formulation using Partial DifferentialEquations (PDE) to deform an implicit surface. The surface ΓΓΓ is represented as the zeroisocontour of a scalar (level set) function φ(x):

ΓΓΓ = {x : φ(x) = 0} . (1.1)

1.2. STATE OF THE ART 3

Figure 1.1: Result of reconstruction of statue of David by [29]

In practice they prefer signed distance function in numerical computations. After a fastinitial reconstruction, they use the convection model and the gradient flow to finish finalreconstruction.

Another approach to work with surface in CG is smoothing. Taubin [41] analysed thesignal processing approach where smoothing corresponds to low-pass filtering. He studiedFinite Impulse Response (FIR) filters with different windows and pass-band frequencies.Later Schneider [37] presented geometric fairing, an algorithm for smoothing arbitrarytriangle meshes, based on high-order PDE. An irregular triangle mesh is used for the rep-resentation of the otherwise independent solution, with the flexibility of topology changesnot affecting the geometric shape.

Some of the recent big projects of reconstruction can be shown to illustrate the achieve-ments of CG methods. Levoy et al. [29] performed reconstruction of a set of Michelangelo’sstatues. For example, the statue of David was scanned with LRF, producing 8 billion ofraw polygons. After manual alignment this was merged with a dense volumetric grid into2 billion polygons. The rendered result is shown if Figure 1.1. A similar technique wasused in The Great Buddha of Kamakura project by Ikeuchi et al. [26]. The scanningpipeline is shown in Figure 1.2.

4

Figure 1.2: Process of reconstruction of statue of The Great Buddha of Kamakura,adapted from [26]

1.2.2 Computer Vision methods

The efforts to reconstruct surface with methods of computer vision begun with formulationof variational principles for surface reconstruction in the 80’s. Terzopoulos [42] designedalgorithm for multi-level reconstruction with relaxation between fine and coarse grids.Using FEM, he transforms the continuous variational principle of a thin flexible plateinto discrete problem in the form of a large system of linear equations. Computationalmolecules are used here to address the conditions near borders of surface. The detailedmathematical model is based on the surface function in explicit form of

z = f(x, y). (1.2)

In the similar framework Grimson [20] interpolates surfaces with quadratic variation asthe optimal functional. In his case the problem is still well-defined, as a perfect calibra-tion and data on input are assumed. Grimson also opened the problem of detection ofdiscontinuities, which was extended by Marroquin [31]. He used non-convex variationalprinciples to reconstruct piece-wise smooth surfaces. These problems have been improvedby Szeliski [39] especially in the mean of speed with use of conjugate gradient descent andhierarchical set of basis functions as an alternative to multigrid relaxation. Summary ofthese and related methods can be found in [6].

However, the limitations of (1.2) and intentions to model more complex objects resulted


in the need for further research in the 90’s. An original approach to the problem wasdeveloped by Guy and Medioni [21]. They collect votes from input data in a large 3Darray of n×n×n voxels covering the model space to construct 3-D saliency maps based onspherical extension field. Their framework allows them to infer surfaces and detect discon-tinuities at the same time. Unfortunately, both computational and memory complexityis high (O(n4)).

Szeliski [40] presented a dynamic model of elastic surfaces based on interacting particlesystems, which can be used to split, join or extend the free-form surface. The particles canbe rendered as axes, discs, wire-frame or flat-shaded triangulations. Fua [19] proposed anapproach based on fitting local surfaces. The local surfaces are iteratively grouped intomore global while points are smoothed and outliers removed. Typical problems of dataacquired from stereo algorithms are taken into account and the majority of them is alsoaddressed.

In his following work, Fua [18] focused on object-centered representation with use of tri-angulated 3D mesh. The novel combination of multi-image stereo and shading broughthim to reconstruction of both the shape and reflectance properties of surfaces. The objec-tive function is then a weighted sum of stereo data, shading and smoothing components.The weights vary over the surface so each component has its greatest influence where itsaccuracy is likely to be greatest, such as stereo data from highly textured surface.

To show that range data were also in the interest of CV, Hilton [24] can be given as anexample. He presented an algorithm based on continuous implicit surface representationto integrate multiple range images.

Range scanning is possible only for objects up to a certain size and complexity, and theCG projects described in the previous section are reaching these limits. With the use ofstereo algorithms and adapted reconstruction algorithms, CV methods can be used forbigger objects, such as buildings or whole cities. Zach et al. [45] reconstructs the innercity of Graz. The global model comes from geographic information system (GIS) dataused for the blocks of houses. The facades of the houses are modelled from terrestrialphotographs with 30 cm geometric detail and 2 cm texture resolution. The size of thedata set (320 GB) requires on-line database system and level-of-detail based visualisation.Another project of Urban 3D Reconstruction From Video [33] is using GPS navigationand INS measurements to align the models reconstructed from moving car.

Finally, a review of two recent articles [28, 38] dealing closely with our specification of theproblem is given here. The difference in approach comes from the orientation on videosequence by Koch [28] or unordered set of images by Strecha [38]. The latter is closer toour project, but still it works with a small image set while in this work large sets of inputimages are dealt with.

Koch [28] fuses stereoscopic disparity maps using strong constraints derived from the factthe input camera sequence order is known and baselines are small. Errors are probabilis-tically modelled.

Our situation is different, because we have unordered images with potentially wide base-lines, so methods from [28] (uncertainty threshold, Kalman filter) cannot be used. Theresulting modelled surface is approximated by splines for interpolation of gaps and smooth-ing. However, this leads to loss of accuracy.

6

Figure 1.3: Four input images for City Hall scene [38].

The approach in [38] is Bayesian and can be described as maximising the posterior prob-ability

(Λj∗, Ij∗) = arg maxΛj ,Ij

P (Λj, Ij | I) (1.3)

where Λj is depth map in camera j, Ij is the estimate of true image in camera j andI = {I1, . . . , In} are input images. EM algorithm [14, 15] is used to compute the estimate,introducing visibility maps V as hidden variables (occlusion v 0 / visible v 1). It iteratesbetween

1. estimation of values for V , given the current estimate of Λj, Ij and

2. estimation of Λj, Ij given V .

The implementation incorporates data-driven ‘regularizer’ for adaptive smoothing andhistogram of colours for visibility estimation. Their algorithm is very universal, it con-structs both depths and true image directly from input images. Also the view constructedcan be a new camera that is not present in the input set.

An example of results of this algorithm applied on input images from Figure 1.3 is givenin Figures 1.4.


Figure 1.4: Textured reconstructed surface of City Hall in [38] (upper) and its detail(lower).

8

1.3 Our goal

One of the recent methods [38] considered the 3D model reconstruction problem in almostfull complexity. Still, these methods rely on accurate camera autocalibration and thecalibration parameters (camera pose, focal length, etc.) are considered given and fixedin the reconstruction problem. This is where our method described in this thesis differs.Although we assume fairly good initial guess of the parameters, we will relax some ofthem in the reconstruction procedure.

1.4 Reconstruction pipeline

There is a running project of 3D reconstruction [2], in which an automatic pipeline isbeing developed. Many state-of-the-art algorithms are used here, it can be given as anexample of typical modern reconstruction pipeline. A brief description of the current stateof the project follows:

1. The input is a number of unorganised photographs of the scene to be reconstructed.There are only few assumptions on the input: they were taken by perspective cam-eras with radial distortion and approximate focus lengths are known.

2. The first step in the pipeline is the region matching which is later used for esti-mation of a consistent system of cameras [32, 10, 11, 9]. This includes cameraauto-calibration as well as the radial distortion correction.

3. Image pair rectification must be carried out to facilitate the dense matching [22].

4. The dense matching [8] is performed as growth from seed correspondences using arobust algorithm.

5. The output, one disparity map per image pair admitted for dense matching, isupgraded to sub-pixel resolution [35].

6. Corresponding 3D points are reconstructed from the disparity maps, forming a densepoint cloud.

7. Fish-scales in the form of local covariance ellipsoids [36] are used to represent thisdistribution.

8. The final step is the mapping of textures from the input images on the fish-scales.

The result is however suffering from errors caused by some of the steps in the pipeline,like noise in the input images and inaccurately estimated parameters of the cameras. Theerrors show as noise or discontinuity in the reconstructed surfaces. A human can identifythese errors because he has a-priori knowledge of how should the surface look like fromhis real-life experience with the modelled object.

In this work we propose a solution which diminishes such errors.

1.5. FISH-SCALES 9

1.5 Fish-scales

The current implementation ends with fish-scales [36], which serve as a 3D sketch of thesurface. Fish-scales are local covariance ellipsoids that are fit to the points. They canbe visualised as small round discs. A collection of fish-scales approximates the spatialdensity function of the measurement in 3D space. The most important parameter of thefitting is the fish-scale size. A too small value results in a noisy and sparse model anda too large value does not model fine structures well [36]. The appropriate size is foundautomatically based on the median point density [12]. Two sizes are used, a larger onefor modelling the overall structure, and a smaller one for details. The two resulting setsof fish-scales are fused together by removing redundant large-size fish-scales from theirunion [12].

Fish scales are a very effective representation, the size of the data can be reduced 200×.However, they lack the visual experience of a continuous surface, as can be observed inFigure 1.5. There are holes between the discs even in the surfaces with enough support ofcorrespondences. Straight edges on the objects are wavy or saw-like as they are formed bythe discs. Also note that because of their planar nature, they fit better to represent planarsurface rather than non-planar. When a non-planar surface is observed from detail, thediscrete nature of fish-scales becomes apparent.

The drawback of such representation is that it is not a continuous surface in the form of atriangulated mesh. Transformation into the mesh using some of the known methods like [3]is not possible because the individual neighbouring fish-scales are not very compatibledue to effect of inaccurate calibration. It is not so easy to decide which fish-scales areneighbouring with each other, even with the method proposed for this purpose [36].

After study of available methods (see Section 1.2) we have come to the conclusion thata suitable formulation of the problem is a variant of procedure proposed by Strecha [38].The main modification lies in more effective data structures and in the need for furthercalibration of the cameras.

Our proposed solution improves the visual experience of fish-scales by

• higher apparent detail,

• covering the holes,

• preserving edges,

• smoothing out noise,

as can be seen later seen in Figure 4.3.

10

Figure 1.5: Fish-scale 3D sketch of a section of a modern replica of Langweil Model [4]

Chapter 2

Proposed solution

In this chapter we will choose an appropriate representation and describe the reconstruc-tion algorithm.

2.1 Problem statement

In this section will be introduced the concepts, which will be used and explained theirmeaning in the task of reconstruction. Also notation will be given.

Our task is to transform the reconstructed point cloud into an object representation for3D photography. A point cloud is a set of points in space specified by their coordinates(x, y, z), see Figure 2.1. Objects are entities visible in the captured scene.

Figure 2.1: Input point cloud and cameras

11

12 CHAPTER 2. PROPOSED SOLUTION

2.1.1 Projection geometry

Some principles from theory of 3D geometry [23] will be presented in this section.

Forward projection of world point X into image point x with projective camera P isexpressed by

λx = PX (2.1)

where λ 6= 0 is the depth at point x, that means distance on the ray between cameracentre C and point X. It is scaled so that distance between camera centre and imageplane is always 1, as can be seen in Figure 2.2.

This expression is possible when the points x,X are in homogeneous coordinates, wherea new scaling coordinate is added, so that

x =[x1 x2 1

]>=[λx1 λx2 λ

]>(2.2)

are equal expressions of the same 2D point (x1, x2), similarly

X =[X1 X2 X3 1

]>=[λX1 λX2 λX3 λ

]>(2.3)

are equal expressions of the same 3D point (X1, X2, X3) . The equation (2.1) can be thenwritten as

λ[x1 x2 1

]>= P

[X1 X2 X3 1

]>. (2.4)

Inversely, φ(x, λ,C) will be back-projecting function, assigning a point in space X to thedepth λ, image point x in the projective camera with centre C:

X = φ(x, λ,C) =

[C + λR>K−1x

1

], (2.5)

where R is rotation of the camera and K are camera parameters.

........................................................................................................................................................................................................................................................................................................................................

................................................................

................................................................

................................................................

................................................................

................................................................

........

..........................................................................................................................................................................................................................................................

..........................

sssssssssssssssssssss

sssssssssssssss

qqqqqqqqqqqqqqqqqqqqqqqqqqqqq

........

........

........................................................................................................................................................................................................................................

...................................................

......................................................................................................................................................................................................................................................

...............................................................................................................................................................................

.................................................................................................................................................................................................

pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp



pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp

x1

x

x2

image plane

λ

λ = 1

Xworld space

optical ray

C

Figure 2.2: Projection geometry

2.1. PROBLEM STATEMENT 13

2.1.2 Input data

Input data are represented by a set of colour images I. Images will be indexed with indexi = 1, . . . , c, where c is the number of images (cameras). Pixels in the images will beindexed with index p = 1, . . . , ni, where ni is the number of pixels in the image i. Forreasons of simplification, the pixel index p is unique only within a given image i. Thepixel index p expresses the linear index of pixel as if all image pixels are arranged intoone long vector by rows.

With this indices, vector Iip ∈ R3 is the RGB value of pixel p in image i, and the inputimage set can be defined as

I ={Iip | i = 1, . . . , c; p = 1, . . . , ni

}. (2.6)

Let xip = (x1, x2) ∈ Z2 be the 2D coordinates of pixel p in image i, obtained by inverse ofthe linear index described above.

Additionally, set of computed disparity maps M will be described as

M = {Mij | i, j = 1, . . . , c} , (2.7)

where Mij is disparity map computed from image pair ij.

Set of all corresponding points X (measurement) coming from disparity maps M will beparametrised as

X ={X ip | i = 1, . . . , c; p = 1, . . . , ni

}, (2.8)

where X ip is a set of correspondences of pixel p in image i from all disparity maps Mij

X ip =

{Xijpq | j ∈ {1, . . . , c}; q ∈ {1, . . . , nj}; (j, q) ∈ χip

}, (2.9)

where Xijpq ∈ R3 is a a point in space computed from correspondence between pixel p in

camera i and pixel q in camera j from disparity map Mij. χip defines the set of indices (j, q)

of all correspondences collected from 3D neighbourhood N3(xip,Cip) of pixel p in image

i. Such corresponding points X ∈ N3(xip,Ci) lie in the vicinity of image ray defined by

image point xip and camera centre Ci, as shown in Figure 2.3. This space has the shapeof pyramid.

X ip is an empty set when there is no correspondence. The correspondence set X will

modified during correspondence repair with two operations: (i) update of values and (ii)split of a correspondence into two.

Parameters of previously calibrated cameras P are in the form of camera matrices Pi ∈R3×4 with decomposition to

Pi = KiRi[

I | −Ci], (2.10)

where I is 3 × 3 unity matrix, the notation[· | ·

]composes a new matrix by adding a

vector as a new column on the right,

• Ki ∈ R3×3 are internal parameters of the camera i,

• Ri ∈ R3×3 is rotation of the camera i and

• Ci ∈ R3 is position of the camera centre i.


.................................................................................

..........................................................

..............................................

.......................................................................................................

..........................................................................................................................................

.....................................................................................................................................................

.....................................................................................................................................................

............................................................................................................................................................................

.......................................................................................................................................................................................

............................................................................................................................................................................

.................................................................................................................................................................

...................................

................................................................................................................................................................

.......................................................................................................................................................................................

..........................................................................................................................................................................................................................................................................

................................................................

.........................................

..........................

..........................................................................................................................................................................................................................................................

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq




qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqs

........................................................................................................................................................................................................................................................................................................................................

................................................................

................................................................

................................................................

................................................................

................................................................

........

sssssssssssssssssssssssssssssssssss s

qqqqqqqqqqqqqqqqqqqqqqqqqqqqq

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

. .. .

............................................................

....................................................

ssssssssssssssssssssssssssssssss

s...................................

s

camera center

x1

x

x2

Ci image plane

X

3D neighbourhood

pixel area

Figure 2.3: 3D neighbourhood. Black bold line: image ray. Blue: pixel area in imagegrid. Red lines: 3D neighbourhood Red dots: matching correspondences

More specifically, C = {Ci | i = 1, . . . , c} will be the set of all camera centres, which willbe also modified during the process.

From the set of all cameras T = {i; i = 1, . . . , c} we will pick a subset of cameras whichwill be marked as main, Tm ⊆ T , while the others will be supporting, Ts ⊂ T, Ts = T \Tm.As a result, disparity maps (if any) between the supporting cameras are not used for thefusion in order to save computational resources.

2.1.3 Output

The output of the algorithm will be the set of all reconstructed depth maps Λ and set ofall visibility maps V . The set of depth maps Λ will be defined as

Λ ={λip | i = 1, . . . , c; p = 1, . . . , ni

}, (2.11)

where λip ∈ R is a reconstructed depth in pixel p in image i.

The set of visibility maps V will be defined as

V ={vip | i = 1, . . . , c; p = 1, . . . , ni

}, (2.12)

where vip ∈ {0, 1, 2} is visibility of pixel p in image i. Visibility defines if a given pixel andits estimated depth will be used for reconstruction of the surface or not, and also if thereis data support from projected correspondences at this point. The respective values of vipare assigned according to:

• 0 . . . pixel is not visible, will be not reconstructed here,

• 1 . . . pixel is visible, but without data support, surface will be interpolated here,

2.1. PROBLEM STATEMENT 15

........

..............................................

...................................

............................................................................................................................................................................................................................................................................................................................

........

..............................................

...................................

....................................................................................................................................................................................................................................................................................................................................

..............................................

...................................

............................................................................................................................................................................................................................................................................................................................

rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

rrrrrrrrrrrrrr rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

rrrrrrrrrrrrrr....................................................................................................................................................................................................................................................................... ..........................

..........................

.....................................................................................................................................................................................................................................................................................................................................

..................................................................................................................................................................................................................................................................................................................................

..........................

.......................................................................................................................................................................................................................................................................................................

....................................................................................................................................................................................................

....................................................................................................................................................................................... ..........................

pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp VΛ

I

image differencecontrast

visibility

shiftcorrespondence

reprojectionXC

projection

update

depth differenceresidual

Figure 2.4: Information flow in the system

• 2 . . . pixel is visible with data support (X ip must be an non-empty set), surface will

be reconstructed here from data.

2.1.4 Solution

The proposed representation are depth maps Λ together with visibility maps V . A depthmap in camera i can be defined as X = λi(x) for selected set of visible image pointsx ∈

{xip | vip ≥ 1; p = 1, . . . , n

}. Point X is then given by optical ray and distance (depth)

λip from centre of the camera Ci. The final model is the union of surfaces constructedfrom depth maps of all used cameras.

We will use the information from disparity maps M to create a set of consistent depthmaps. The available image data I will be used to discover some errors and controlsmoothing.

The algorithm can be described as depth map fusion, as it fuses the information fromindividual depth maps together.

As long as we only fuse points from disparity maps, we do not have to deal with occlusionsas this has been already done by the stereo algorithm.

The goal is to get Bayesian estimate of depth maps Λ and visibility V in all cameras given


estimated points X and images I as the input. It leads to maximisation of the posteriorprobability, what can be formally written as

(X ∗,Λ∗, V ∗, C∗) = arg maxX ,Λ,V,C

P (X ,Λ, V, C | I). (2.13)

The intended output is (Λ∗, V ∗) while the estimation of (C∗,X ∗) is secondary and shouldbe interpreted as an effort to repair the input data. Because of the presence of aggre-gate probabilities it is necessary to decompose the problem. It will be divided in twocyclically dependent subproblems of estimation of (Λ, C,X ) and V – the output of thefirst subproblem will be used as input to the second, and vice versa. Subsequently, thesubproblem of estimation of (Λ, C,X ) will be also divided, so that the result of the op-timisation task of (Λ, C) will be used to repair the corresponding points X . This is onlyinternal representation.

This proposal is a modification of EM algorithm [14, 15], inspired by [38].

If we denote ·(k) as the value of a variable in iteration k, the overall iterative procedurecan be described as

1. Initialise visibility V (0) in all cameras i = 1, 2, . . . , c from input data X .Initialise depths Λ(0) in all cameras i = 1, 2, . . . , c from input data X .

2. Solve the visibility estimation task (2.52) and get V ∗.Update the value of V (k) with the new value V (k + 1) = V ∗.

3. Solve the depth estimation task (2.14) and get (Λ∗, C∗).Update the value of Λ(k), C(k) with the new values Λ(k+1) = Λ∗ and C(k+1) = C∗.

4. Repeat steps 2-3 until a convergence criterion is reached.

The information flow between variables of the system is displayed in Figure 2.4. The ex-change between Λ and V is emphasised as they represent the major tasks in this problem.

The theoretical proposal of the solution is given in the following two sections. The detailsrelated more closely to implementation are explained in Chapter 3.

2.2. ESTIMATION OF DEPTHS Λ 17

2.2 Estimation of depths Λ

Given measurements X and visibility V , we search for the estimate

(Λ∗, C∗) = arg maxΛ,C

P (Λ, C | X , V ), (2.14)

where

P (Λ, C | X , V ) =P (X | Λ, C, V )P (Λ, C, V )

P (X , V ). (2.15)

The solution of the problem does not depend on P (X , V ).

2.2.1 Projection constraints

Probability P (X | Λ, C, V ) from (2.15) can be expressed as

P (X | Λ, C, V ) =c∏i=1

n∏p=1

p(X ip | λip, C, V ), (2.16)

where c is the number of cameras, n is the number of pixels in image, X ip is the set of

correspondences projecting to pixel p in image i and λip is estimated depth at this point.

Let us choose

p(X ip | λip, C, V ) =

1Tλe−

(λ(χip,C)−λip)

2

2σ2λ if vip = 2,

1 otherwise,

(2.17)

where Tλ = σλ√

2π and λ(χip, C) is a depth estimating function from set of all correspon-dences χip. It is computed as the result of least squares minimisation subtask

λ(X ip, C) = arg min

λip

∑(j,q)∈χip; vjq≥1

‖φ(xip, λip,C

i)− φ(xjq, λjq,C

j)‖2, (2.18)

where j, q are all correspondences visible also in the corresponding cameras j and φ is theback-projecting function given in (2.5).

Let’s build a system of projective equations for all such correspondences in (X ip | V ). The

part ‖ · ‖2 in (2.18) minimises the distance between back-projected points. This resultsinto the constraint assuming these two points are equal (2.19). Now will be presentedhow to build an equation for one correspondence pair (j, q) ∈ χip represented by pointXijpq. The notation will be simplified according to Figure 2.5 as follows:

• P = KR[

I | −C]

. . . reference camera matrix for i

• P′ = K′R′[

I | −C′]

. . . corresponding camera matrix for j

• C,C′ ∈ R3 . . . position of cameras

• R,R′ ∈ R3×3 . . . rotation of cameras


......................................... . . . . . . . ......................

estimated surface

X

R′>x′

R>x

C′

Xijpq

......................................................................................................................................................................................................................................................................................................................................

.........................................................................................................................................................................................................................................................................................................................................................................................

....................

....................

..........................................

X′

C

λ

λ′

rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

..................................................................

.............. .............

........................................................................................

........................................... ........

....................................................

rrrrrrrrrrrrrrrrrrrrrr

Figure 2.5: Situation with one correspondence

• K,K′ ∈ R3×3 . . . internal parameters of cameras

• x ∈ R3 . . . coordinates of an image point p in camera i in homogeneous coordinates

• x′ ∈ R3 . . . coordinates of an image point q in camera j in homogeneous coordinates

• λ ∈ R . . . optimised depth λip

• λ′ ∈ R . . . depth λjq in corresponding camera

• X,X′ ∈ R4 . . . back-projected points X = φ(xip, λip,C

i),X′ = φ(xjq, λjq,C

j) in homo-geneous coordinates

• A(3) . . . third row of matrix A

Let’s build constraints on the system to force X′ to be close to X:

X w X′ (2.19)

so that λ′ should be close to forward projection of X in camera P′

λ′ = P′(3)

X, (2.20)

what is a simplification of (2.1).

Camera matrix P′ can be decomposed and simplified

K′(3)

= [ 0 0 1 ] (2.21)

P′(3)

= K′(3)

R′[

I | −C′]

= R′(3)[

I | −C′]

(2.22)


Point X can be back-projected according to (2.5):

X =

[C + λR>K−1x

1

]. (2.23)

Put together

λ′ = R′(3)

(I | −C′)

[C + λR>K−1x

1

](2.24)

λ′ = R′(3) (

C + λR>K−1x)−R′

(3)C′ (2.25)

The resulting constraint is

R′(3)

C + λR′(3)

R>K−1x− λ′ = R′(3)

C′ (2.26)

where λ, λ′,C are considered unknowns. This gives us one of equation from the set ofequations of geometric constraints to λ, forming an over-defined system.1 Note that theuse of geometric constraints allowed us to include re-estimation of camera centres C inthe task. The solver will be used to find the least-squares solution for λ, λ′,C in (2.18).

2.2.2 Surface model

Probability P (Λ, V, C) from (2.15) can be written under the assumption of the Markovproperty as

P (Λ, V, C) =c∏i=1

∏(p,p)∈N2(i)

p(λip, λip | vip, vip) p(vip, vip), (2.27)

where N2(i) is the set of all neighbouring pixel pairs (p, p) in the image i (2D neighbour-hood). The pairs are defined for all edges of the image grid, as can be seen in Figure 2.6.

1Only if there are more correspondences for the given pixel.

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq




qqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqq

Figure 2.6: 2D neighbourhood on the image grid, pixels are at the intersection of gridlines. Blue: horizontal pairs in one column. Red: vertical pairs in one row.


The solution of task (2.14) does not depend on p(vip, vip), as it is fixed. The choice of

p(λip, λip | vip, vip) depends on the used model of surface.

The probability distribution of general surface model can be described as

p(λip, λip | vip, vip) =

1Tλe−

(εip)2

2 (σ ic,p)2 if vip ≥ 1, vip ≥ 1,

1 otherwise,(2.28)

Tλ = σ ic,p

√2π, (2.29)

The value of σ ic,p is proportional to local image contrast at pixel p in image i, based on

variation of neighbouring pixels of I ip. This comes from the assumption that image regionswith higher variance contain more information and the belief in the results of the stereoalgorithm is higher, therefore we can penalise differences in λ less allowing the solutionto follow data more closely.

The difference ε2 will be defined as

(εip)2 = (Φi

p − Φip)

2 (2.30)

where Φ is a surface property, such as depth, normal or curvature, and Φip is the value of

the property at given point p in image i.

We will now discuss the properties representing different orders of surface model.

Order 0

For Φ ≡ λ, the equation (2.28) describes continuity of order 0, this means all depths areideally equal. This leads to a flat planar surface of average depth, perpendicular to thecamera axis. The difference (2.30) becomes

ε2 = (λip − λip)2. (2.31)

Order 1

In order to improve the quality of our model, the order of continuity can be increased.For order 1 this means the normals of surface in neighbouring points should be equal,Φ ≡ N. The difference (2.30) becomes

ε2 = (Nip −Ni

p)2. (2.32)

Surface normal N at point x0, y0 can be computed as the cross product of partial deriva-tives of the surface function X along x and y:

N(x0, y0) =∂X(x0, y0)

∂x× ∂X(x0, y0)

∂y(2.33)

The computation of the normal is complex, so we will reduce this problem to the constancyof the first derivative of the depth function λ along the image axes. This choice reduces


the number and complexity of equations for the surface model to reasonable level, butloses the intrinsicity of the normal constancy. The assumption becomes

ε2 =

(∂λ(x0, y0)

∂x− ∂λ(x0 + 1, y0)

∂x

)2

+

(∂λ(x0, y0)

∂y− ∂λ(x0, y0 + 1)

∂y

)2

. (2.34)

We must transform the continuous derivatives into discrete differences for use with ourdata. The estimation of the first derivative by finite difference can be expressed for x as

∂λ(x0, y0)

∂xw λ(x0 − 1, y0)− λ(x0, y0). (2.35)

Putting (2.34) and (2.35) together we get

ε2 =[(λ(x0 − 1, y0)− λ(x0, y0)

)−(λ(x0, y0)− λ(x0 + 1, y0)

)]2

+

+[(λ(x0, y0 − 1)− λ(x0, y0)

)−(λ(x0, y0)− λ(x0, y0 + 1)

)]2

.(2.36)

After modifications we obtain the simplest approximation of the first order:

ε2 =[λ(x0 − 1, y0)− 2λ(x0, y0) + λ(x0 + 1, y0)

]2+

+[λ(x0, y0 − 1)− 2λ(x0, y0) + λ(x0, y0 + 1)

]2.

(2.37)

As a result, constancy of depth is replaced by constancy of gradient.

Order 2

The second order means the constancy of curvatures along the surface, Φ ≡ K. Thedifference (2.30) becomes

ε2 = (Kip −Kip)2. (2.38)

Following the same simplification as for order 1, it leads to the constancy of the secondderivatives:

ε2 =

(∂2λ(x0, y0)

∂x2− ∂2λ(x0 + 1, y0)

∂x2

)2

+

(∂2λ(x0, y0)

∂y2− ∂2λ(x0, y0 + 1)

∂y2

)2

. (2.39)

The estimation of the second derivative by finite differences can be expressed for x as

∂2λ(x0, y0)

∂x2w(λ(x0 − 1, y0)− λ(x0, y0)

)−(λ(x0 y0)− λ(x0 + 1, y0)

). (2.40)

After modifications we obtain the simplest approximation of the second order:

ε2 =[λ(x0 − 1, y0)− 3λ(x0, y0) + 3λ(x0 + 1, y0)− λ(x0 + 2, y0)

]2+

+[λ(x0, y0 − 1)− 3λ(x0, y0) + 3λ(x0, y0 + 1)− λ(x0, y0 + 2)

]2.

(2.41)

It shows that depending on the chosen estimate by differences, the resulting equations’coefficients follow the Pascal’s triangle.


2.2.3 Energy minimisation

The problem of (2.14) after decomposition to cameras can be expressed as

(Λ∗, C∗) = arg maxΛ,C

c∏i=1

P (Λi, C | X , V ). (2.42)

After application of negative logarithm we get

(Λ∗, C∗) = arg minΛ,C

c∑i=1

E(Λi, C | X , V ). (2.43)

In this way the problem of maximisation of probability P has been transformed intoproblem of minimisation of energy E.

The full expression comes from modifications of (2.16), (2.17), (2.27), (2.28):

(Λ∗, C∗) = arg minΛ,C

c∑i=1

1

2 (σ iλ)2

n∑p=1

Eip +

∑(p,p)∈N2(i|V )

(εip)2

2 (σ ic,p)

2

(2.44)

where

Eip =

{(λ(χip, C)− λip

)2if vip = 2,

0 otherwise,(2.45)

and N2(i | V ) is the set of all neighbouring pixel pairs (p, p) in the image i, so that theyboth have visibility vip ≥ 1, vip ≥ 1.

Energy of depths in camera i, E(Λi, C | X , V ), from (2.43) in a more clear expression forvisible pixels only and with use of the first-order surface model (2.31) is

E(Λi, C | X , V ) =1

2σ2λ

n∑p=1

(λp − λp)2 +∑

(p,p)∈N2(i|V )

1

2σ2c,p

(λp − λp)2, (2.46)

where the first part expresses the data energy, the second part expresses the surface modelenergy and coefficients σ define their mutual weights.

Minimum sought in (2.43) turns up when the first derivative of (2.46) is equal to zero forgiven p:

∂Ep(Λi, C | X , V )

∂λp= 0 (2.47)

which is after modification

1

σ2λ

(λp − λp

)+

∑(p,p)∈N2(i|V )

1

σ2c,p

(λp − λp) = 0. (2.48)


2.2.4 System of equations for the depth task

In this section will be given summary of equations proposed in the previous two sections,enumerating for all points in all images.

The system of equations builds up from projective equations (2.26) and energy minimi-sation equations (2.48).

After return to the original indices, (2.26) turns into

Rj(3)Ci + Rj(3) Ri>Ki−1xip λip − λjq = Rj(3)Cj (2.49)

for all disparity maps Mij, where i is a main camera and j is a supporting or maincamera, and p, q are all corresponding visible image points in the respective cameras sothat vip = 2, vjq = 2. Depths λ represent least-square solution of projective constraints anddepths λ represent the solution of the surface model.

From (2.48) in full indices we get after modifications

1

σ2λ

(λip − λip

)+∑p∈Np

1

(σ ic,p)2

(λip − λip) = 0 (2.50)

for all main cameras i. Np = {p− 1, p+ 1, p−h, p+h | V } are visible pixels neighbouringto p so that vip ≥ 1 and h is the height of the input image. With N = |Np|, this results ina form of weighted average of the neighbouring depths

1

σ2λ

(λip − λip

)+

N

(σ ic,p)

2λip −

1

(σ ic,p1

)2λip1− . . .− 1

(σ ic,pN )2

λipN = 0. (2.51)

This equation with indices for all images i and all visible points p in them is added to thesystem of equations. The approach to solve this system is given in Section 3.5.4.


2.3 Estimation of visibility V

Given depths Λ, correspondences X and image values I, it is searched for estimate

V ∗ = arg maxV

P (V | I,Λ,X ), (2.52)

where

P (V | I,Λ,X ) =P (I | V,Λ,X )P (V,Λ,X )

P (I,Λ,X ). (2.53)

The solution does not depend on P (I,Λ,X ). The conditional probability P (I | V,Λ,X )can be expressed as

P (I | V,Λ,X ) =c∏i=1

n∏p=1

∏(q,j)∈χip; vjq≥1

p(Ijq, Iip | vjq , vip), (2.54)

where (q, j) is a list of visible corresponding pixels j from camera q. Let’s choose

p(Ijq , Iip | vjq , vip) =

1TIe−

(Iip−Ijq)2

2σ2I if vjq = vip = 2,

h(Iip) otherwise,(2.55)

where TI = σI√

2π and where h(Iip) is probability of observing an invisible colour (likecolour of the sky), based on currently invisible regions.

Probability P (V,Λ) can be rewritten as

P (V,Λ) = P (V ) · P (Λ | V ). (2.56)

It is assumed that the surface is locally flat, and big local changes in depth are eitherdiscontinuities due to occlusion or errors in stereo. The discontinuities should be repre-sented as an invisible line and erroneous data should be hidden. This assumption can beexpressed as

P (Λ | V ) =c∏i=1

∏(p,p)∈N2(i|V )

1

Tλe−

(λip−λip)2

2(σ iλ,p

)2 , (2.57)

where σ iλ,p is estimated from residual of depth at given point, derived from the depth

optimisation task (2.14). The residual is higher at the points where the surface modelcould not be fitted on the data, indicating possible outliers. Expression (p, p) ∈ N2(i | V )are visible neighbouring pixels, as defined in (2.27) where additionally vip ≥ 1, vip ≥ 1.

Compactness of visible and invisible regions is assumed:

P (V ) =c∏i=1

∏(p,p)∈N2(i)

1

Tve−

(vip−vip)2

2σ2v . (2.58)

The expression (vip − vip)2 means that pixels neighbouring to a visible pixel with data

support (v = 2) are more likely to be visible (v = 1) rather than invisible (v = 0).

2.3. ESTIMATION OF VISIBILITY V 25

After applying negative logarithm on (2.52) and some manipulations we get

V ∗ = arg minV

c∑i=1

E(V i) (2.59)

E(V i) =n∑p=1

E(vip) +1

2σ2v

∑(p,p)∈N2(i)

(vip − vip)2, (2.60)

where

E(vip) =∑

(q,j)∈χip; vjq≥1

E(vip, vjq) +

∑(p,p)∈N2(i|V )

(λip − λip)2

2(σ iλ,p)

2(2.61)

E(vip, vjq) =

{(Iip−Ijq)

2

2σ2I

if vip = vjq = 2

− log h(Iip) otherwise.(2.62)

In order to solve this task using the Maximum graph flow algorithm [43], as described inSection 2.3.3, the task has to be modified. There are two problems:

1. The algorithm basically works only with two labels, while we have three values forvisibility. This brings a need for reduction to two values (visible/invisible, v ∈{0, 1}), while the third v = 2 will be treated specially. The solution is proposed inSection 2.3.2.

2. It treats discontinuity for visible and invisible regions in the same way, while wewould need a slightly different approach to the invisible regions representing thedepth discontinuity, in the form of lines. This leads to the subtask of discontinuitydetection in Section 2.3.1.

2.3.1 Discontinuity detection

Let’s introduce a new variable, discontinuity map Di ={dip | p = 1, . . . , ni

}, where dip ∈

{0, 1} is presence of discontinuity in pixel p in camera i. Set of all discontinuity maps willbe D = {Di | i = 1, . . . , c}.

The subtask of discontinuity detection can be formally expressed as a search for estimateD∗:

D∗ = arg maxD

P (D | V,Λ) (2.63)

According to Bayes, P (D | V,Λ) can be rewritten to

P (D | V,Λ) =P (Λ | D, V )P (D, V )

P (V,Λ)(2.64)

In this task the solution does not depend on P (V,Λ). Similarly to (2.57) we choose

P (Λ | D, V ) =c∏i=1

∏(p,p)∈N2(i|V ); dip=dip=0

1

Tλe−

(λip−λip)2

2σ2λ . (2.65)


The probability P (D, V ) can be rewritten as

P (D, V ) = P (V | D)P (D). (2.66)

The prior probability P (D) should express the property of discontinuity that it formslines rather then regions or dots. Formally this can be written as

P (D) =c∏i=1

n∏p=1

l(N2(p, i), Di) (2.67)

where l(N2(p, i), Di) is a line similarity function on neighbouring pixels. This could bethought of as a function assigning probability to every possible combination of disconti-nuity flags in a 9× 9 window.

The relation between V and D is chosen as

P (V | D) =c∏i=1

n∏p=1

p(vip | dip) (2.68)

p(vip | dip) =

0 if vip = 0, dip = 1,

1 if vip = 1, dip = 1,

0.5 otherwise.

(2.69)

This expresses the fact that the discontinuity has sense only in the visible regions.

Probability P (D | V,Λ) is difficult to calculate explicitly. The proposed solution solvesthe task indirectly in every camera i:

1. Local depth error is calculated on the visible data. If the depth is unknown at themoment, it is interpolated as the median of the closest known depths. Note this isan edge-preserving operation.

2. Local depth error is normalised to the equivalent of local depth error in depth λ = 1.

3. A local depth error threshold is chosen together with σλ in (2.65).

4. Initial discontinuity map Di0 is obtained by thresholding the local depth error.

5. Di0 is processed using binary morphological operations to reduce regions to lines toobtain Di∗.

2.3.2 Handling three labels

In this section there will be described how to use 2-label max-flow optimiser to estimate3-label visibility we work with v ∈ {0, 1, 2}, because the max-flow algorithm basicallyworks only with two labels. Visibility V represents where in the image should be theoutput surface reconstructed and where it is supported by data. The chosen coding ofvisibility in a pixel is v = 0 for no surface, v = 1 for interpolated surface and v = 2 forsurface with data support. This requires an analysis of the available inputs and possibleoutput.


qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqreal surface

C

C ′

..........................................................................................................................................................................................................................................................................................................................................................................................................................................

.....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

overlap

.....................................................................................................................................................................................................................................................................

.................................................................................................................................................................................................. ............................................................................ ..........................................................................

.......................................................

...................................................

............................................................

..........................................

.............................................................................

......................................................................................

.....................................................................................................

...................................................................................................................

..................................................................................................................................

..............................................................................................................................................

......................................

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqestimated surface

background

............................................................

..................................................................

.............. .............

........................................................................................

...........................................

Figure 2.7: Image colour error helps remove overlapping borders.

The basic fact is that in this task the data visibility vip = 2 cannot be assigned withoutan existing support of at least one correspondence Xij

p . We will not create a new corre-spondence, because it is currently beyond the scope of this work. The principal problemsare: (i) to which camera pair ij should the new correspondence be added and (ii) if it iscorrect to do so, with respect to occlusions. As a result only some data can be hidden bythis task (visibility set to v = 0), and otherwise the data visibility is rigid.

The interpolation with vip = 1 in the terms of (2.58) should be done only in regions nearvisible data (vip = 2). This corresponds to filling holes in the projected data, but alsoshould not cause interpolation of depths far from data, as the probability of guessing suchdepth correctly is low.

According to (2.61), the principal inputs (image colour error, invisible histogram match,local depth error) should be minimised, that is the higher the value of the input is at agiven point, the higher is the demand to hide it.

The image difference and invisible histogram match mostly indicate errors on borders ofsurfaces, where the results of stereo tend to overlap the real surface and continue a fewpixels “into the air”. These overlaps then have the colour of the background (surface inhigher depth). This is displayed in the Figure 2.7, where the false overlapping surface ismarked blue. Green and red illustrate different colours of the background projected tothe different cameras.

Such overlapping regions should be made completely invisible. This effect can be observedin the Figure 2.8. Compare the difference with Figures 3.10 and 3.13.

The interpretation of local depth error on input should be different, with respect to thefact the discontinuities are estimated separately. Together with residual in (2.57) it pointsto places where data significantly do not match the surface model. Such data should beignored and the depth interpolated instead. Consequently, the data can be repaired withthe interpolated value and made visible again, as described in Section 3.5.6. In order tomaintain visible regions and speed up the convergence of the algorithm, the data visibilityshould be in this case changed to v = 1 and interpolation enforced.


a) initial visibility b) visibility after first iteration of theproposed algorithm

Figure 2.8: Overlapping borders. Detail from Daliborka scene.

This different interpretation of inputs to max-flow induces the dividing of inputs for twooptimisation tasks:

1. Based on image difference and invisible histogram match, hide some data and pixels(set v = 0).

2. Based on local depth error, make some pixels visible, but hide the underlying data(set v = 1).

2.3.3 The Max-Flow algorithm

This section describes the algorithm used to minimise the energy in the visibility estima-tion task. It is designed to solve the problem of Minimum Cut [43]: Given a connectedgraph G = (V,E), a capacity c : E → R+, and two nodes s and t, find a maximum s− tflow. An s− t cut C on a graph with two terminals is a partitioning of the nodes in thegraph into two disjoint subsets S and T such that source s is in S and sink t is in T . Thecost of a flow C = {S, T} is defined as the sum of the costs c of “boundary” edges (p, q)where p ∈ S and q ∈ T . The maximum flow problem then becomes to find a partitioningthat has the maximum flow among all partitionings.

In the application to images in computer vision, capacity function c is usually derivedfrom the following energy function:

E(L) =∑p∈P

Dp(Lp) +∑p,q∈N

Vp,q(Lp, Lq), (2.70)

where L = {Lp | p ∈ P} is a labelling of image P , Dp(·) is a data penalty function,Vp,q is an interaction potential, and N is a set of all pairs of neighbouring pixels. Theapplication of E(L) on the graph is done according to Figure 2.9.

Because the visibility problem in the form of (2.61) follow the form of (2.70) we can useit to solve the visibility task (2.52).

We will later use the following notation for the solution of the max-flow problem:

C = maxflow(D′, ν) (2.71)


..........................................................................................................................

.......................................................................................................................... .........

................................................................................................................. .........

.................................................................................................................

.........................................................................................

.........................................................................................

.........................................................................................

.........................................................................................

..........................................................................................................................

.......................................................................................................................... .........

................................................................................................................. .........

.................................................................................................................

.........................................................................................

.........................................................................................

.........................................................................................

.........................................................................................

..........................................................................................................................

.......................................................................................................................... .........

................................................................................................................. .........

.................................................................................................................

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

....................

....................

....................

....................

....................

....................

....................

....................

....................

..

....................

....................

....................

....................

....................

....................

....

........

..

........

..

........

..

........

..

........

..

........

..

........

..

........

..

........

..

......

........................................................................................................................

.......................................................................................................................................................

.........................................................................................................................................................

..........................................................................................................................................................................................................................

.........................................................................................................................................................................................................................

........................................................................................................................

..........

..........

..........

..........

..........

..........

..........

..........

.....

..............................................................................................................

.......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..........

...................................................................................................................................................

............................................................................................................................................

....................

....................

....................

....................

....................

....................

....................

....................

....................

....................

..........

...............................................................................................................................................................................................................

.......................................................................................................................................................................................

...................................................................................................................................................................................

..........................................................................................................................................................................................................................................................

t

s

Dp(p ∈ T )

q

p

Dp(p ∈ S)

Vp,q

Figure 2.9: Graph for maximum flow search

where D′ ∈ 〈0, 1〉h×w is a penalty function in the form of the matrix of image size h× wand ν is the interaction potential. The assignment of the D′ according to the labelling Lpis defined as

Dp(Lp) =

{D′p for p ∈ S,1−D′p for p ∈ T.

(2.72)

While S, T represents output Cp = 1, Cp = 0, respectively, the higher value of D′p repre-sents the higher probability of Cp = 0. The value of the interaction potential ν is assignedequally to all edges Vp,q = ν; p = 1, . . . , n; q = 1, . . . , n; p 6= q, because the image gridis uniform. The resulting cut C ∈ {0, 1}h×w is a binary matrix of the same size as D,indicating the classification of the image pixels either to S or T .


2.4 Correctness

It should be proven that the iteration scheme will converge to the expected state. Oneof the possible solutions to the energy-minimisation equations for visibility (2.60) anddepths (2.43) is that nothing is visible in all cameras, so that the energy will reach itsminimum, E(V ) = E(Λ) = 0.

This could happen if the input does not match the assumptions and parameters of theused model, i.e. the surface is not locally simple or the exposure of input images is verydifferent. This should be avoided by the robust estimation of the parameters, althoughup to a certain limit.

Since the visibility according to Section 2.3.2 builds on the data visibility and the datacan be only hidden by the visibility task and propagated to other cameras, the iterationcould gradually hide all data. This is avoided by the repair of correspondences, so thatthe data visibility of previously hidden points is restored. Because the correspondence isrestored at the point determined by the model, it minimises the energy E(V ) and E(Λ).Since the visibility task keeps data that match the model, repaired correspondences willnot be later hidden again with high probability.

Chapter 3

Implementation

3.1 Overview

Here will be presented a scheme of iteration steps necessary to implement the solutionproposed in the previous section. An overview is given for start here and then every stepwill be described in detail.

1. Camera selection:

(a) Break-up into main and supporting cameras.

2. Initialisation:

(a) Initialisation of data visibility (projection of 3D points to images).

(b) Initialisation of visibility (”blind” max-flow on initial data visibility).

(c) Initialisation of depths (mean of projected depths per pixel).

(d) Image contrast calculation (it never changes).

3. Discontinuity D estimation:

(a) Depth interpolation for newly visible points (edge-preserving median filter).

(b) Local depth error calculation (on visible 4-pixel neighbourhood).

(c) Discontinuity detection (thresholding of local depth error).

4. Depth Λ estimation:

(a) Projection of 3D point correspondences to images.

(b) Equations from projective constraints (2.49).

(c) Equations for the depth smoothing model (2.51).

(d) Solving system of equations (QMR).

(e) Update of camera centres C.(f) Repair of correspondences X .

31

32 CHAPTER 3. IMPLEMENTATION

5. Visibility V estimation:

(a) Image colour error calculation (across correspondences).

(b) Local depth error calculation (on visible 4-pixel neighbourhood).

(c) Invisible histogram calculation.

(d) Solving energy-minimisation (max-flow).

(e) Invisible data hiding.

6. Iteration: Repeat steps 3-5 for given number of times, terminate with step 4.

7. Visualisation:

(a) Redundancy removal (choose cameras with highest resolution of the surface).

(b) Back-projection of depth maps to 3D points.

(c) Triangulation (uniform).

(d) Texturing (use data from camera of origin).

(e) Export to file (VRML).

The implementation was done in MATLAB with use of it’s toolboxes and of the externallibrary for Max-flow, as described later.

Results in this chapter will be demonstrated on image in Figure 3.1 from Daliborka scene.

Figure 3.1: Input image

3.2. CAMERA SELECTION 33

3.2 Camera selection

The main cameras Tm are chosen so that every main camera T im is present in at leasttwo input disparity maps Mij,Mik, j 6= k. There should also exist disparity maps Mij

connecting the main cameras i, j together in order to enable the information flow betweenthem. However, this condition depends on the choice of previous mechanism in the recon-struction pipeline, which selects appropriate pairs. The estimate of (Λ∗, C∗,X ∗, V ∗) from(2.13) is only done in the main cameras Tm.

There is no fusion when camera has only data from one pair, and such cameras Ts onlysupport the main cameras with data. However, there can be no support camera when allcameras are connected.

3.3 Initialisation

In the initialisation phase of the implementation the mutual dependency between V andΛ must be resolved. A start solution that is close to the estimated results is proposed inorder to speed up the overall convergence as well as the convergence of partial tasks.

3.3.1 Initial data projection

The set of input points X is projected to the image planes using forward projection. Fora given corresponding point Xij this is computed as

xip = Pi[Xij 1

]>(3.1)

xjq = Pj[Xij 1

]>, (3.2)

where xip ∈ R3 is the projected image point in homogeneous coordinates:

xip = λip[x1 x2 1

]>. (3.3)

The image coordinates x1, x2 are extracted from the homogeneous vector by normali-sation of the third coordinate. The pixel index p is obtained by rounding the real co-ordinates (x, y) to the closest integer values and then converting them to linear index,p = round(x) + round(y) · hi, similarly for xjq.

Initial data visibility Vi(0) for values v = 2 is simply equal to the presence of projectedcorrespondences in data:

vip(0) =

{2 if ∃ Xij : xip = Pi [Xij 1]

>,

0 otherwise.(3.4)

The argument (0) denotes initial estimate, or more generally (k) denotes iteration k.


3.3.2 Initial visibility

In this section the visibility maps containing only data visibility will be extended withregions to be interpolated (v = 1). Initial visibility is obtained from data visibility (v = 2)from the previous step by connecting the visible and invisible regions using max-flowalgorithm, as described in Section 2.3.3. The penalty function Di

data is defined as

Didata = G(σ2)⊗

(1− α

2+α

2Vi(0) + β

), (3.5)

where α is a constant adjusting the weight of initial data visibility V(0), it was chosenexperimentally to the value of α = 1

3. The expression G(σ2) ⊗ (·) denotes convolution

with 2-D Gaussian filter mask with variance σ2 for support of hole covering. The positivevalue of β allows to shift the distribution towards v = 1, resulting in stronger interpolationof data. The higher value allows to fill more holes, but on the other side can result inproblems of overlaps on the borders, in the places where the mechanism described inFigure 2.7 does not work. The value used was in the range β ∈ (0.05, 0.1). The outputof (2.71) with Di

data as the input

C = maxflow(Didata, ν) (3.6)

is applied to the visibility Vi in addition to (3.4):

vi∗p (0) =

2 if vip = 2,

1 if Cp = 1 and vip < 2

0 otherwise,

(3.7)

in order to obtain the initial estimate of visibility Vi∗(0).

The ideal value of ν = 0.15 was determined experimentally to balance between fillingholes (higher penalty, more compact regions) and more exact localisation of errors (lowerpenalty, less compact regions).

3.3.3 Initial depths

Initial depths are computed as a simple mean of all depths projected to the image pixelin (3.1) and (3.3):

λi∗p (0) =1

K

K∑j=1

λip(Xij) (3.8)

where K is the number of projected correspondences and λip(Xij) are the projected depths.

This can be easily implemented with two variables per pixel only: (i) the counter of Kand (ii) the sum of depths. The result is shown in Figure 3.2.

3.3.4 Image contrast calculation

The image contrast is calculated directly from the input image, which never changes. Thismeans that it can be calculated only once during the initialisation phase. The algorithmwas implemented for colour images in the following way:

3.3. INITIALISATION 35

100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

8001

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

1.5

Figure 3.2: Initial mean of depths

100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

800

0

0.05

0.1

0.15

0.2

0.25

0.3

Figure 3.3: Image contrast


1. Separate colour channels of the image, Ii = {IiR, IiG, IiB}.

2. For every pixel p in a colour channel, collect set of 9 neighbouring values.

3. Calculate standard deviation on the collected set.

4. Repeat for all colour channels.

5. Calculate the resulting contrast in pixel p as the mean of the deviations in all colourchannels.

6. Normalise to range 〈0, 1〉.

The result is shown in Figure 3.3.

3.4. DISCONTINUITY ESTIMATION 37

3.4 Discontinuity estimation

3.4.1 Depth interpolation

The visibility estimation task usually marks some new regions to be visible without datasupport (vp = 1), this involves holes in data, artifacts due to rounding coordinates toimage grid, hidden erroneous data and areas near discontinuities. However the estimateof the actual depth values λp is not known at this time and it should be calculated inorder to (i) detect discontinuities and (ii) get the initial solution for the solver of depthestimating equations. The used method to do this is the median filtering of the visibleand known depths in the neighbourhood of the unknown depth.

The value of unknown λp(k) in iteration k is calculated as

λp(k) = median (λq(k) | q ∈ N ∗2 (p,Nmin, rmax); vq(k − 1) ≥ 1) (3.9)

where N ∗2 (p,Nmin, rmax) is wider image neighbourhood which contains at least Nmin

known visible depths from previous iteration from distance up to rmax pixels from pixelp. This allows to limit the engineering of incorrect surface far from the data and model inthe cases when the visibility was estimated too generously in some areas. In such cases,when interpolation fails to calculate the new value, the visibility is forced back to vp = 0.The result is shown in Figure 3.4.

100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

8001

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

1.5

Figure 3.4: Interpolated depths


3.4.2 Local depth error calculation

The local depth error ∆i = {δip | p = 1, . . . , ni} is calculated as the sum of squares ofdifferences between depths on visible 4-pixel neighbourhood N2(i | V ), as proposed in(2.65):

δip =1

2σ2λ

∑p∈N2(i,p|V )

(λip − λip)2. (3.10)


3.4.3 Discontinuity detection

The discontinuity is obtained by thresholding of the normalised local depth error ∆′ andmorphological processing of the result. The normalisation is calculated as

δ′ ip =δipλip

(3.11)

to obtain local depth error equivalent to depth λ = 1.

σλ is chosen together with threshold δ0. While σλ = 1 is fixed, the threshold for givencamera is initialised in the first iteration to the value δ0 = KD δm, where δm is the medianof depth error and KD is a constant determining the level of threshold over the median.

Then the raw discontinuity map Di is constructed as

100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

800

0.5

1

1.5

2

2.5

3

3.5

x 10−11

Figure 3.5: Depth error with discontinuity

3.5. DEPTH ESTIMATION 39

dip =

{1 if δ′ ip ≥ δ0,

0 if δ′ ip < δ0.(3.12)

The later morphological processing with use of Matlab function bwmorph of the binary Di

involves following operations:

1. Cleaning of solitaire pixels (’clean’)

2. Building of “bridges” between close regions (’bridge’)

3. Region thinning to obtain lines from regions (’thin’)

4. Enhancing of diagonal pixel connections to axial (’diag’)

In the brackets is appended the Matlab name of the operation.

As a result we obtain estimate of Di∗. It is then applied to the visibility so that vip = 0for di∗p = 1.

3.5 Depth estimation

3.5.1 Projection of 3D points to images

The projection is done using the equations (3.1) for every correspondence pair Xij:

λip[x y 1

]>= Pi

[Xijp 1

]>(3.13)

The pixel index p is assigned according to Section 3.3.1. As the correspondences arerepaired and the camera centres are moved during the iteration, the points may projectoutside of the boundaries of the image. This must be taken into account and such pointsignored.

3.5.2 Equations from projective constraints

The equations for (2.49) are expressed in a large sparse matrix A. There is a specialsupport for work with such matrices in MATLAB, they are expressed as a list of non-zerovalues amn together with list of its indices m,n. The form of projective equations dependson the cameras of the correspondence X ij. There are two possibilities:

1. Both cameras are main, i, j ∈ Tm. Then depths from both are unknowns as well asone camera centre and the equation is (2.26).

2. One camera is main and the other is supporting, i ∈ Tm, j ∈ Ts. Here only depthsin the main camera are unknowns and (2.26) is modified into

R′3C + λR′3 R>K−1x = R′3C′ + λ′, (3.14)

where λ′ was moved to the right side.


100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

800

Figure 3.6: Depth model order. Red: order 2; Yellow: order 1; Light blue: order 0; Darkblue: not visible.

3.5.3 Equations for the depth model

The depth model is expressed by equations (2.51) depending on the chosen order, asdescribed in Section 2.2.2. However, the higher order models require support of the widerimage pixel neighbourhood N2 | V . This brings problems near borders of the visibleregions and image borders. The solution is to use models of lower order at these points.The procedure can be described for desired order O as

1. Apply model of order O on all unresolved visible pixels with necessary support.

2. Label those pixels as resolved.

3. If there are any unresolved pixels, decrease the model order O by 1and repeat from step 1.

The result is shown in Figure 3.6. An efficient implementation of detection of pixels withnecessary support in step 1 is by morphological erosion of visibility map with appropriatestructure element.

Also the design of zero-order model should allow to resolve all remaining pixels, down toequality of two neighbouring depths.


3.5.4 Solving system of equations

Considering (2.51), let’s build a matrix equation in the form of

Ane×nv Λnv×1 = Bne×1 (3.15)

where ne is the total number of equations and nv is the number of unknowns (image pixelsin all cameras and main camera centres). The dimensions can be computed or estimatedby

nv = 3|Tm|+c∑i=1

n∑p=1

[vip ≥ 1

](3.16)

ne ≤∑ij∈Dp

(|X ij|+Mnv

)(3.17)

where M is the number of surface model equations for a visible point. While only visiblepixels form the unknowns vector, there must be established a transformation of imageindices p into induces of unknowns in Λ. Similarly some indices in Λ are reserved forcoordinates of camera centres C, three unknowns for every main camera.

(3.15) can be reduced by multiplying both sides by A> from left:

A>A Λ = A>B (3.18)

to getAnv×nv Λnv×1 = Bnv×1 (3.19)

Now it is possible to solve the system of equations expressed square matrix A using somenumerical method, such as Quasi-minimal residual method. It is suitable for solving largeand sparse systems of linear equations in the form of Ax = b, where A is a squarecoefficient matrix, b is the right-side vector and x is the unknown vector. An estimate ofthe solution can be supplied to the solver in order to reduce the computation time. Theprevious or initial estimated depths after interpolation of missing depths is used for thispurpose. As our estimate is close the optimum, the run-time is reduced significantly. Theresults are shown in Figure 3.8.

According to [17], this method is more numerically stable than other gradient methods.Since implementation of this method is available in MATLAB, it was chosen as the mostappropriate for our problem.

Example. For a scene with 6 cameras of which 3 are main, the system of equations hasfollowing properties:

• Number of unknowns: 490 000

• Number of non-zero items: 12 450 000

• Number of equations: 1 500 000

• Sparsity: 0.00005

The structure of matrix A can be observed in Figure 3.7.


Figure 3.7: Matrix for system of equations with non-zero items (blue)


100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

8001

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

1.5

100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

800

0

1000

2000

3000

4000

5000

6000

7000

Figure 3.8: Depths after QMR (upper) and their residual (lower)


3.5.5 Update of camera centres

The main camera centres C are estimated along with depths Λ by solving the systemof projective equations (2.49). As a result, the camera centres can be move to the newposition that better fits data. The camera matrices P are recalculated to be used inprojection (3.1) using

P = KR[

I | −C]. (3.20)

As the result of the update of camera centres and the repair of correspondences, the imagecoordinates p of the correspondence changes to p′. The data visibility vp = 2 should movewith the data to the new position of vp′ = 2. This is accomplished by storing the datavisibility for each correspondence (q, j) ∈ χip and its two cameras before the projection inSection 3.5.1 and afterwards setting it back to the newly projected positions.

3.5.6 Repairing of correspondences

The correspondences, X , can be repaired after depths are properly estimated, and this isafter each iteration. A correspondence, Xij

pq, is to be repaired when it meets the followingcondition: it must be visible in at least one camera without data support, that is eithervip = 1 or vjq = 1, or both. The update depends on these two situations, that is:

1. If both vip = 1 and vjq = 1, then the correspondence Xijpq is updated to the value

back-projected in camera i as in previous case, and a new correspondence Xijp′q′ is

created according to the value back-projected in camera j.

2. If either vip = 1 or vjq = 1, then the correspondence is a back-projection of thecurrently estimated depth, for vip = 1 it is

Xijp = Ci + λip (Ri)>(Ki)−1xip, (3.21)

and for vjq = 1 it is

Xijq = Cj + λjq (Rj)>(Kj)−1xjq. (3.22)

The visibility of the repaired correspondence is enforced by setting the visibility to vip =vjq = 2, where the indices p, q are obtained after the new projection of Xij.

The repair of correspondences’ positions is shown in Figure 3.9.


−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4

−1.1

−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

Figure 3.9: Repaired correspondences. Stars represent new positions, lines point to pre-vious positions. Triangles on the left represent camera centres. Correspondences visiblein one camera are red, in the other are green, or blue if visible in both cameras.


3.6 Visibility estimation

The visibility estimation task, as described in Section 2.3, can be divided into three stepsfor every camera:

1. Calculation of penalty function Di for use with maxflow in (2.71).

2. Run of the max-flow optimiser.

3. Interpretation of the results.

The implementation of max-flow search available from [7] was used after modificationsfor use with MATLAB done by [44] and the author of this work. These modifications arereferred to as “mexification” of the original C++ code.

3.6.1 Image error calculation

The calculation of image colour error as defined in (2.55) is calculated as the sum ofdeviations across all colour channels. The collection of the values itself is done togetherwith the projection of correspondences in order to optimise the process, using one collectormatrix and a counter matrix. The result is normalised to the range of 〈0, 1〉.

The penalty function DiIerr = {dip; p = 1, . . . , n} is calculated as

dip =1

2σ2I |χip|

∑(q,j)∈χip

(Iip − Ijq)2 (3.23)

Coefficient σ2I is one for all cameras and was chosen experimentally to the value of σ2

I =0.16 so that 1

2σ2I

= 30. The result is shown in Figure 3.10.

3.6.2 Local depth error calculation

The local depth error calculation as defined in (2.57) is calculated as the sum of deviationsacross visible 8-pixel neighbouring depths in Figure 3.11.

The penalty function DiΛerr = {dip; p = 1, . . . , n} is calculated as

dip =1

2(σ iλ,p)

2 |N2(i | V )|∑

(p,p)∈N2(i|V )

(λip − λip)2, (3.24)

where Np(i | V ) = {p−1, p+1, p−h−1, p−h, p−h+1, p+h−1, p+h, p+h+1 | V } arevisible pixels from 8-neighbourhood of p so that vip ≥ 1 and h is the height of the inputimage. The coefficients (σ i

λ,p)2 are estimated from the residual corresponding to depth λip.

It is calculated in the terms of (3.19):

ΛΛΛres =1

Λrel

(AΛΛΛ− B

)(3.25)

3.6. VISIBILITY ESTIMATION 47

Figure 3.10: Image colour error

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr..............................

....................................

........

..

........

..

..............................

....................

..............................................

.......... .......... ..........................................................................................

....

..........

..........

..............................

..................................................................

........................ .......................... p

Figure 3.11: The 8 neighbouring pixels (red) of pixel p (black)


100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

800

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 3.12: Depth error with residual

where Λrel = ‖AΛΛΛ−B‖‖B‖ is the total relative residual achieved, used for normalisation. The

value of ΛΛΛres is then used as a correction to constant σ2λ0:

ΣΣΣλ =σ2λ0

ΛΛΛres

. (3.26)

The combined result is shown in Figure 3.12.

3.6.3 Invisible colour histogram calculation

The colour-based similarity to invisible pixels is measured with dynamic colour his-togram based on the current visibility, according to (2.55). The 3-D RGB colour space{1, . . . , 256}3 of input images is divided into bins of selected width wbin = 4. The his-togram is then represented as hinv ∈ {1, . . . , 256

wbin}3. After it is fed with all invisible pixel

colours, the response of all pixels in the input image is calculated for penalty functionDiIhist = {dip; p = 1, . . . , n}:

dip = hinv(Iip), (3.27)

where dip is the number of pixels in the bin of Iip. It is then post-processed by medianfilter of size 3× 3 and thresholded by value of h0 to suppress small occurrences to obtainthe final penalty function Di∗

Ihist:

di∗p =

{dip if dip ≥ h0

0 otherwise.(3.28)

3.6. VISIBILITY ESTIMATION 49

100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

800

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 3.13: Invisible colour histogram match

The value h0 = 0.02 was chosen experimentally so that the result for images, where theinvisible area is not homogeneous in colour, is that the response is zero and invisiblehistogram matching is automatically suppressed. The result is shown in Figure 3.13.

3.6.4 Running max-flow and data hiding

The max-flow algorithm is run in two passes with different interpretation of the results.The penalty functions from previous sections are used as the input here

1. Run max-flow on image properties CiI = maxflow(Di∗

Ihist + DiIerr + Di

data, ν).

2. Assign vip = cip,I .

3. Run max-flow on depth properties CiΛ = maxflow(Di

Λerr + Didata, ν).

4. Assign vip = 1 where cip,Λ = 0 and vip = 2.


Short summary from previous sections:

• The motivation for step 2 is it should hide errors on borders completely and fillholes in the data.


100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

600

700

800

Figure 3.14: Visibility. Red: v = 2; Yellow: v = 1; Blue: v = 0; Dark blue: discontinuity.

• The motivation for step 4 is it should hide errors in data and force the surface modelto interpolate the ideal value. Later the erroneous correspondence can be repairedusing this depth which matches the model.

3.7 Iteration

The iteration is terminated after reaching given number of iterations. It shows that usuallythe number required to reach the convergence, after which the results do not significantlychange, is 5 to 10 iterations, depending on the number of outliers and noise in the data.

Because of the rather low number of iterations, the design of a more complicated termi-nation criterion does not seem to be necessary.

3.8 Visualisation

Visualisation is not the main objective of this work, but it is necessary in order to presentthe results.

3.8. VISUALISATION 51

3.8.1 Redundancy removal

A simple union of all depth maps to build the output is possible, but has some disadvan-tages.

The first is the output size. Not only the output file is physically bigger, but the rendererused for visualisation needs to load more data in the memory and the rendering itself ismore computationally intensive. For scenes with many cameras, this problem becomesmore important.

Although the depth maps are consistent as long as there are enough correspondences, localvariations between the alternative representation of a surface in different depth maps mayappear. Also the surface observed by cameras with different resolutions can vary.

This leads to the choice of a camera with highest resolution to represent the given part ofsurface in the final output. Higher resolution R is equivalent to lower grid size g = 1

R. Grid

size is distance between two world points, which are back-projections of two neighbouringdepths in the image. The grid size of point in depth λ in camera with focal length f is

gip =λipf i. (3.29)

A comparative map is constructed for every camera i and visible point p. The bestviewing resolution from all corresponding cameras j, ij ∈ Dp, is compared with the viewingresolution in the current camera i:

rip =gip

gjq(3.30)

for the best correspondence Xijpq. The relative value rip > 1 then indicates the camera

i should represent the surface at the point p, otherwise there is a better camera (j) torepresent it.

For a set of cameras which are close to each other, the resulting relative value is oscillatingaround 1.0 in every one of them. To resolve this case, simply the camera with highernumber is chosen.

In order to keep continuous surface, the procedure to obtain the final visibility map is:

1. Collect best rip for all data.

2. Remove visible data where rip < 0.9 by setting vip = 1.

3. Remove visible data where rip ∈ 〈0.9, 1.1〉 and camera number is lower by settingvip = 1.

4. Run max-flow with the new data visibility.

3.8.2 Back-projection of depth maps to 3D points

After the depth maps Λ are cleared against redundancy, they can be back-projected to3D points. This simply performed by equation (2.5) to get point set Xi for every maincamera i ∈ Tm representing the surface.


qqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

λ1 λ2

λ4λ3

a)

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqλ1 λ2

λ4λ3

b)

Figure 3.15: Two possible triangulation schemes

3.8.3 Triangulation

A uniform triangulation of surface points Xi projected from a depth map Λi can beobtained on the image grid, which defines the neighbourhood.

There are two possible triangulations of the primitive square of grid shown in Figure 3.15.A simple rule can be defined to choose between these two possibilities:

If |λ1 − λ4| < |λ2 − λ3|, then use triangulation scheme a).Otherwise use triangulation scheme b).

On the borders of the surface there can be only one possible choice when one of the depthsλ1,...,4 is not visible.

3.8.4 Texturing

Only simple texturing is implemented as it is not in the focus of this work. Texturing ofsurface back-projected from depths uses image data from camera of origin Ii to assign acolour to every visible point in the grid. Simply the point back-projected from λip getscolour of Iip.

3.8.5 Export to file

The format chosen for the output is VRML [1]. Every depth map i in the output file isan instance of entity Shape with IndexedFaceSet in the geometry field. It consists of

• Coordinate – coordinates of the points Xi,

• coordIndex – indices of the points for triangulation,

• Color – colour for the points.

3.9. COMPUTATIONAL COST 53

3.9 Computational cost

3.9.1 Computational complexity

The time necessary to run the algorithm depends on the complexity of the scene and runparameters. The time critical part is the solving of the system of depth equations usingQMR (see Section 3.5.4). While other tasks, such as visibility estimation, take secondsto minutes to complete, typical run time of the the solver function qmr is tens of minutesto hours.1 The function qmr has two important parameters for control of termination ofits run: the number of iterations and the requested minimal relative residual (see (3.25)).The typical values used are 200 × |Tm| iterations and residual Λrel = 10−6. The scenecomplexity can be measured by the total number of correspondences nc and the totalnumber of visible points nv. As a reference, a simpler 4-camera scene with nc = 3 · 106

and nv = 1.5 · 106 takes around 1 hour for an iteration to complete with typical values asstated above. 90% of this time is the run of the QMR solver.

3.9.2 Memory requirements

Similarly to the computational complexity, the memory critical task is again the solving ofsystem of depth equations, where the matrix A for (3.19) is constructed in the operatingmemory. Every non-zero item occupies 48 bytes. According to Section 3.5.4, for theexample given in the previous section the amount of memory occupied by the matrix isaround 1 GB.

The size of the output VRML files, where the texture information is not compressed, is120 MB for the previous example, with redundancy reduction to n′v = 1.0 · 106. The viewof this file requires a modern graphic card with 256 MB or more of dedicated graphicmemory.

1Run time corresponding to current CPUs (2 GHz).


Chapter 4

Experiments

The proposed algorithm was tested on several scenes. The input point cloud in corre-spondences, camera pairs and camera parameters were obtained from the reconstructionpipeline of [27] as described in Section 1.4.

Since the results are 3D models, they are the best to be observed in a VRML viewer. Allof the experiments’ results are available on the attached the data disk as well as somesoftware to view them. The structure is described in Appendix. It is recommended toobserve the results in 3D while reading this chapter.

4.1 Used hardware and software

Experiments were run on machines in computing grid of Center for Machine Perception,Prague, with following configuration:

• CPU: 4x K8 2400 MHz 64-bit

• RAM: 16 GB

• OS: Linux 2.6.22-gentoo-r9 #1 SMP Thu Nov 15 13:01:34 CET 2007 x86 64

• MATLAB: Version 7.3.0.298 (R2006b)

The implementation utilises only one CPU at run-time.

4.2 Scenes

Results on four different scenes and their analysis will be presented in this section. Eachscene has its specifics, which have effect on the process of reconstruction.

55

56 CHAPTER 4. EXPERIMENTS

Scene 1: Daliborka

This scene shows a part of paper model of the Prague castle with Daliborka tower [5]. Thescene is simple as the most of surfaces are planar and regular, except the round bastion.Some surfaces have no texture, therefore they were not reconstructed by the pipeline.There is a uniform background around the model, hence, it was also not reconstructed.The input images were taken in three layers all round (see Figure 2.1), but only thebottom layer cameras were used as main. Only the procedure described in Section 3.2was used to select the main cameras.

Scene parameters

• Camera used: Canon EOS-1Ds, 11.1 MPx

• Resolution of input images: 1100× 850 (resized)

• Number of cameras: 27

• Number of main cameras: 8

• Number of pairs: 18

• Number of iterations: 10

• Total running time: 10h 7min

• Total number of input correspondences: nc = 1.3 · 106

• Total number of visible pixels: nv = 1 · 106

• Surface model: order 2, σ = 2

Figure 4.2 shows the noise and outliers present in the input data. Figure 4.3 comparesthe new result with fish-scales. It is apparent that the new model representation is morecontinuous and it is almost not possible to see-through. For example, the holes betweenwindows are covered. On the other hand some lonely fish-scales are now not represented,as the data points are not dense enough. Some of them were errors, some were parts ofbackground. Generally the new method discards the tiny surfaces in the same way as itcovers holes, with the favour of the second.

The composition of the complete surface from partial surfaces is given in Figure 4.4. Thesurfaces links to each other, without visible steps. In some places like the grass under thewalls multiple surfaces overlap, the redundancy was not completely resolved, presumablybecause of lack of correspondences. This is also visible in the textured model because thesource of texturing varies. (Although the choice for texturing is not the goal of this work,it could be solved together with improved redundancy removal.)

Figure 4.5 shows the ability to represent details. The stairs are now apparent, but theyare also smoothed into “waves”. More precise texturing improves the visual experiencesignificantly, hiding such drawbacks from the observer. The chimneys now have a compact

4.2. SCENES 57

Figure 4.1: Daliborka scene. All 27 input images used.

Figure 4.2: Daliborka scene. Noisy input point cloud (10% of all points).


Figure 4.3: Daliborka scene. Previous (fish-scales, upper) and new result (depth mapfusion, lower).

4.2. SCENES 59

Figure 4.4: Daliborka scene. Composition of surface from different cameras, one colourper camera.

a) previous result b) new result

Figure 4.5: Daliborka scene, detail with stairs and chimney.



Figure 4.6: Daliborka scene, detail with bastion.

shape, but the discontinuity between them and the roof is not sufficient, this is describedbelow.

Figure 4.6 demonstrates the ability to represent non-planar surfaces. Unfortunately thehigher noise in the area of the bastion causes bigger holes.

Finally Figure 4.7 shows some problems of the two compared algorithms.

The line of white fish-scales floating above the roof (a) are errors in the stereo algorithm.The lower picture shows they were successfully removed by the new algorithm.

Surface near chimneys is difficult for both algorithms (b). Fish-scales give only a basicidea of presence of some object, the new chimney has a compact form. The problem of thenew algorithm in this camera is the erroneous surface as the result of missing discontinuitynear connection of chimney and roof. The problem is caused by continual smoothing ofthe depths near the end of discontinuity line. The median interpolation, preceding thedepth error calculation and thresholding for discontinuity, reduces the local depth error.

The surface of the back roof (c) is wavy with the new method, while fish-scales are morerobust in this noisy area. After the apparent outliers are filtered out here, the model ofthe second order used accepts the noisy surface. Also the texture here has higher contrast(see Figure 3.3), which reduces the level of smoothing, putting more weight on the noisydata. Setting the σ for smoothing to higher value could reduce this effect at the cost ofloss of accuracy for details in other places.

4.2. SCENES 61

Figure 4.7: Daliborka scene, detail of roof. Previous (fish-scales, upper) and new result(depth map fusion, lower, selected camera only). Labels: a - floating errors, b - undetecteddiscontinuity, c - noisy surface.


Scene 2: Tree

This scene shows a trunk of a tree. The surface of the trunk is irregular as it is composedof the bark. The background consists of uniform sky in the upper part and of grass androad in the bottom.

The selection of cameras was performed manually to cover the surface of trunk all round.The four selected cameras’ views almost do not overlap. The bark is rich for texture whatresults in almost complete surface being reconstructed. With pixel level of detail, thistogether produces a big output file, what is difficult to display with current hardware.For this reason only one camera surface is presented.

Scene parameters

• Camera used: Canon EOS 300D, 6 MPx







• Total number of input correspondences: nc = 5 · 106



The detail in Figure 4.10 shows that the reconstructed surface of bark looks very naturally.In this case some of the errors still present in the output can be confused with irregularsurface of the birch.

Figure 4.11 shows how the choice of 4 main cameras is sufficient to cover the trunk surfaceall-round.

4.2. SCENES 63

Figure 4.8: Tree scene. All 23 input images used.

a) previous result b) new result (selected camera)

Figure 4.9: Tree scene.


Figure 4.10: Tree scene, detail in selected camera.

Figure 4.11: Tree scene, top view. Composition of surface from different cameras, onecolour per camera.

4.2. SCENES 65

Scene 3: St. John of Nepomuk

This scene shows the statue of St. John of Nepomuk at Novy Dvur near Krivoklat castle.It was erected in 1724, the author is unknown (possibly Mathias Braun). The surfaceof the statue is curved, with intricate detail. The lightning conditions are poor as theshadows of trees are cast on the statue.

The main cameras were again selected manually in the same way as in the previousexperiment.

Scene parameters

• Camera used: Canon S50, 5 MPx







• Total number of input correspondences: nc = 2.3 · 106



Because of presence of self-occlusions in the selected cameras, the reconstructed surfaceis not complete, but can be viewed together. Some of the holes in the front view are theeffect of big image colour error caused by cast shadows. The accuracy in the back viewis higher as it is not affected by this problem. The bumps on the folds of the cloak (rearview in Figure 4.13) are true, what illustrates the fine level of detail achieved.


Figure 4.12: St. John of Nepomuk scene. All 28 input images used.

front view rear view

Figure 4.13: St. John of Nepomuk scene.

4.2. SCENES 67

Figure 4.14: St. John of Nepomuk scene, front view. Composition of surface from differentcameras, one colour per camera.


Scene 4: St. Martin

This scene shows the rotunda of St. Martin at Vysehrad, Prague, with some trees around.The surface of the building is simple from larger perspective, but in detail it is morecomplicated (stones in the walls, tiles on the roof).

In this case four cameras close to each other were selected, resulting in dense reconstructionof the surface visible in them.

Scene parameters

• Camera used: Canon S50, 5 MPx







• Total number of input correspondences: nc = 10 · 106

• Total number of visible pixels: nv = 0.6 · 106


The input correspondences are moderately noisy, as shown in Figure 4.16. Correspon-dences from three pairs form three layers of the wall surface, making the decision of wherethe true surface is difficult.

Fish-scales deal with this situation as they estimate their normal from a bigger set ofinput points. However, the stronger noise like near the windows is captured as lightfloating artifacts. This kind of errors is not present in the new results at the cost of morebumpy surface. Figure 4.19 shows even sharp peaks on the moulding under the roof.They are caused by noisy correspondences at pixels with high contrast, where smoothingis suppressed. Similar situation is on the edges of door portal in Figure 4.20 whereotherwise significant details are captured. The images were taken from the ground, sothe roof is always viewed under sharp angle. Noise on the surface with this slope thenresults in peaks heading towards the camera centre. This is the result of choice of extrinsicrepresentation of the surface in depth maps.

4.2. SCENES 69

Figure 4.15: St. Martin scene. All 34 input images used.

Figure 4.16: St. Martin scene, top view of wall cut. Input correspondences in selectedcamera, one colour per corresponding camera.



Figure 4.17: St. Martin scene.

4.2. SCENES 71

Figure 4.18: St. Martin scene. Composition of surface from different cameras, one colourper camera.



Figure 4.19: St. Martin scene, detail of roof.


Figure 4.20: St. Martin scene, detail of door.

Chapter 5

Conclusion

5.1 Summary

We have designed and implemented a surface reconstruction algorithm based on depthmap fusion. The representation of the modelled objects was chosen to be depth mapsin existing cameras. The proposed algorithm is novel while some ideas were inspired byexisting solutions. It was in theory deducted on the Bayesian framework and its stabilitywas also proven in experiments.

We have tested the proposed algorithm on many scenes to discover its characteristicsand limitations. The major improvement to existing method is the continuity of surface.It was shown that the results bring more detail, but on the other hand they are stillsensitive to noise. The control of smoothing allows to deal with the noise at the cost oflower accuracy.

Although the evaluation of the visual experience is always subjective, this can be sum-marised as a successful improvement of the existing method. Still there are possibilitiesfor further improvement, like the detection of discontinuity.

The complexity of the results is reaching the limits of current computer hardware. Toavoid this problem, a compromise between detail and output size must be chosen. Themechanism available in our implementation to tackle this is the choice of main cameras.

5.2 Open problems

Here are given unresolved problems connected with the existing implementation. Theycould not be completed because of limited time for this work.

Termination criterion. The simple fixed number of iterations should be improved toa better one, which will determine when the convergence is reached. This could be, forexample, when the number of repaired correspondences does not change over a numberof iterations.

73

74 CHAPTER 5. CONCLUSION

Vanishing discontinuity. The discontinuity is vanishing in some cases due to inter-polation near end of discontinuity line, as shown in Chapter 4. Interpolation should beimproved to preserve discontinuities here with the use of image edges.

Scenes with high complexity. Big number of main cameras result in a lot of un-known depths being optimised in the depth estimation task. Depending on the numberof correspondences, the size of matrix for this optimisation can cause memory problemson current computers when the number exceeds certain limit. A solution could be to de-compose the problem to subsets of cameras. While also time requirements arise, anothernumerical solver in place of QMR could reduce the computational time.

5.3 Future work

Here are given some proposals for further improvement and extension of the algorithm.

Dynamic parameters. The values of the internal parameters of the proposed algo-rithm are now fixed during iteration. Some of them could change during the processin order to improve the overall performance. For example, the target residual for QMRsolver could decrease as the errors are continually removed and the result could be moreaccurate.

Support covering of holes. There are often holes without data support in the middleof larger uniform surfaces. The visibility estimator could favour interpolation of such holesin the interior direction. To avoid incorrect interpolation, a condition should be definedwhen this is possible. For example, to cover a hole in a planar surface, the normals onthe borders should be equal.

Transfer of interpolated depths to other cameras. The interpolated depths couldbe transfered to other cameras by creating a new correspondences. The problems behindthis are to which camera pair(s) should be the new correspondence added and if it iscorrect to do so, with respect to occlusions.

Solve visibility all-at-once. The max-flow optimiser has capacity for solving the vis-ibility task in all cameras at once. This would allow to add edges directly betweencorresponding image points.

Local invisible histogram matching. Current histogram of invisible colours is global,and so it works only if the invisible background is uniform enough. Building a histogramfor a window around every pixel could bring more effect on hiding of overlapping edges.

Selective run. After some parts of the surface are estimated correctly, they do notfurther change in the iterations. Such regions could be removed from the set of unknown

5.3. FUTURE WORK 75

depths in order to save computational resources, as the solver can focus on the problematicparts. Depth residual could be used to identify such regions.

Output mesh optimisation. The final triangulation is a uniform mesh, what resultsin a lot of points. In order to obtain a more effective representation, certain points canbe removed without significant effect on the output. This applies particularly to planarsurfaces.

76 CHAPTER 5. CONCLUSION

Bibliography

[1] The Virtual Reality Modeling Language (VRML). Standard ISO/IEC 14772-1:1997.

[2] Automatic 3D virtual model builder from photographs, 2004–2008. CMP FEE CVUTPrague, project supported by the Czech Academy of Sciences.

[3] Nina Amenta, Marshall Bern, and Manolis Kamvysselis. A new Voronoi-based surfacereconstruction algorithm. In Proceedings of SIGGRAPH’98, Computer GraphicsProceedings, Annual Conference Series, pages 415–421, Orlando, Florida, July 1998.ACM SIGGRAPH.

[4] Katerina Beckova. Svedectvı Langweilova modelu Prahy. ISBN 80-900668-8-7. Scholaludus Pragensia, first edition, 1996.

[5] Betexa ZS, s.r.o. The Prague castle, paper model of the biggest castle complex inthe world, 2006. Scale 1:450.

[6] R. M. Bolle and B. C. Vemuri. On three-dimensional surface reconstruction methods.13(1):1–13, January 1991.

[7] Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in computer vision. In IEEE Trans-actions on Pattern Analysis and Machine Intelligence, September 2004.

[8] Jan Cech and Radim Sara. Efficient sampling of disparity space for fast and accuratematching. In BenCOS 2007: CVPR Workshop Towards Benchmarking AutomatedCalibration, Orientation and Surface Reconstruction from Images, page 8, Madison,USA, June 2007. IEEE, Omnipress. CD-ROM.

[9] Ondrej Chum. Two-view Geometry Estimation by Random Sample and Consensus.Phd thesis, Center for Machine Perception, K13133 FEE Czech Technical University,Prague, Czech Republic, September 2005.

[10] Ondrej Chum, Jirı Matas, and Josef Kittler. Locally optimized RANSAC. InJ. van Leeuwen G. Goos, J. Hartmanis, editor, DAGM 2003: Proceedings of the25th DAGM Symposium, number 2781 in LNCS, pages 236–243, Heidelberger Platz3, 14197, Berlin, Germany, September 2003. Springer-Verlag.

[11] Ondrej Chum, Tomas Werner, and Jirı Matas. Two-view geometry estimation unaf-fected by a dominant plane. In Cordelia Schmid, Stefano Soatto, and Carlo Tomasi,editors, Proc. of Conference on Computer Vision and Pattern Recognition (CVPR),volume 1, pages 772–780, Los Alamitos, USA, June 2005. IEEE Computer Society.

77

78 BIBLIOGRAPHY

[12] Hugo Cornelius, Radim Sara, Daniel Martinec, Tomas Pajdla, Ondrej Chum, andJirı Matas. Towards complete free-form reconstruction of complex 3D scenes from anunordered set of uncalibrated images. In D. Comaniciu, R. Mester, and K. Kanatani,editors, Proc ECCV Workshop Statistical Methods in Video Processing, volume LNCS3247, pages 1–12, Heidelberg, Germany, May 2004. Springer-Verlag.

[13] Herve Delingette. Simplex meshes: a general representation for 3D shape reconstruc-tion. Research Report 2214, INRIA, Sophia Antipolis, 1994.

[14] Dempster, A.P., Laird, N.M., and Rubin, D.B. Maximum likelihood from incompletedata via the em algorithm. J. R. Statist. Soc., (39):1–38, 1977.

[15] Duda, R.O., Hart, P.E., and Stork, D.G. Pattern Classification. Wiley Interscience,second edition, 2000.

[16] Herbert Edelsbrunner and Ernst P. Mucke. Three-dimensional apha shapes. ACMTransaction on Graphics, 13(1):43–72, January 1994.

[17] Freund, R. and Nachtigal, N. A quasi-minimal residual method for non-hermitianlinear systems. Numer. Math., (60):315–339, 1991.

[18] P. Fua and Y. G. Leclerc. Object-centered surface reconstruction: Combining multi-image stereo and shading. 16:35–56, 1995.

[19] P. Fua and P. Sander. Reconstructing surfaces from unstructured 3D points. In Proc.ARPA Image Understanding Workshop, pages 615–625, San Diego, CA, January1992.

[20] W. E. L. Grimson. A computational theory of visual surface interpolation. Phil.Trans. R. Lond. B, 298:395–427, 1982.

[21] Gideon Guy and Gerard Medioni. Inference of surfaces from sparse 3-D points. 1994.

[22] Richard I. Hartley. Theory and practice of projective rectification. InternationalJournal of Computer Vision, 35(2):115–127, 1999.

[23] Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision. Cam-bridge University Press, 2000.

[24] A. Hilton, A. J. Stoddart, J. Illingworth, and T. Windeatt. Reliable surface re-construction from multiple range images. volume 1 of LNCS 1065, pages 117–126,Cambridge, UK, April 1996. Springer.

[25] Hugues Hoppe. Surface Reconstruction from Unorganized Points. PhD thesis, Uni-versity of Washington, 1994.

[26] Katsushi Ikeuchi, Takeshi Oishi, Jun Takamatsu, Ryusuke Sagawa, AtsushiNakazawa, Ryo Kurazume, Ko Nishino, Mawo Kamakura, and Yasuhide Okamoto.The great buddha project: Digitally archiving, restoring, and analyzing cultural her-itage objects. Int. J. Comput. Vision, 75(1):189–208, 2007.

BIBLIOGRAPHY 79

[27] George Kamberov, Gerda Kamberova, Ondrej Chum, Stepan Obdrzalek, Daniel Mar-tinec, Jana Kostkova, Tomas Pajdla, Jirı Matas, and Radim Sara. 3d geometry fromuncalibrated images. In G. Bebis et al., editor, ISVC ’06: Proceedings 2nd Interna-tional Symposium on Visual Computing, number 4292 in Lecture Notes in ComputerScience, pages 802–813, Berlin, Germany, November 2006. Springer-Verlag.

[28] Reinhard Koch, Marc Pollefeys, and Luc J. Van Gool. Multi viewpoint stereo fromuncalibrated video sequences. In ECCV ’98: Proceedings of the 5th European Confer-ence on Computer Vision-Volume I, pages 55–71, London, UK, 1998. Springer-Verlag.

[29] Marc Levoy, Kari Pulli, Brian Curless, Szymon Rusinkiewicz, David Koller, LucasPereira, Matt Ginzton, Sean Anderson, James Davis, Jeremy Ginsberg, JonathanShade, and Duane Fulk. The digital michelangelo project: 3D scanning of largestatues. In Proceedings Conference SIGGRAPH 2000, New Orleans, Louisiana, July2000.

[30] Remin Lin and Wei-Chung Lin. Recovery of 3-D closed surfaces using progressiveshell model. In Proc. International Conference on Pattern Recognition, volume 1,pages 95–98, Vienna, Austria, August 1996. IEEE Computer Society Press.

[31] J. L. Marroquin. Surface reconstruction preserving discontinuities. A.I. Memo 792,Artificial Intelligence Lab, Massachusetts Institute of Technology, August 1984.

[32] Stepan Obdrzalek and Jirı Matas. Object recognition using local affine frames ondistinguished regions. In Paul L. Rosin and David Marshall, editors, Proceedingsof the British Machine Vision Conference, volume 1, pages 113–122, London, UK,September 2002. BMVA.

[33] A. Akbarzadeh B. Clipp C. Engels D. Gallup P. Merrell C. Salmi S. Sinha B. TaltonL. Wang Q. Yang H. Stewenius H. Towles G. Welch R. Yang M. Pollefeys P. Mor-dohai, J.-M. Frahm and D. Nister. Real-time video-based reconstruction of urbanenvironments. In 3D-ARCH 2007: 3D Virtual Reconstruction and Visualization ofComplex Architectures, ETH Zurich, Switzerland, July 2007. ISPRS Working GroupV/4 Workshop.

[34] Riccardo Poli, Giuseppe Coppini, and Guido Valli. Recovery of 3D closed surfacesfrom sparse data. 60(1):1–25, July 1994.

[35] Radim Sara. Sub-pixel disparity correction. Working Paper 98/01, Center for Ma-chine Perception, Faculty of EE, Czech Technical University, 1998.

[36] Radim Sara and Ruzena Bajcsy. Fish-scales: Representing fuzzy manifolds. In SharatChandran and Uday Desai, editors, Proc. 6th International Conference on ComputerVision, pages 811–817, New Delhi, India, January 1998. IEEE Computer Society,Narosa Publishing House.

[37] Robert Schneider and Leif Kobbelt. Geometric fairing of irregular meshes for free-form surface design. Computer-Aided Geometric Design, 18(4):359–379, May 2001.

[38] Christoph Strecha, Rik Fransens, and Luc Van Gool. Wide-baseline stereo frommultiple views: A probabilistic account. cvpr, 01:552–559, 2004.

80 BIBLIOGRAPHY

[39] Richard Szeliski. Fast surface interpolation using hierarchical basis functions.12(6):513–528, June 1990.

[40] Richard Szeliski and David Tonnesen. Surface modeling with oriented particle sys-tems. Computer Graphics (SIGGRAPH ’92), 26(2):185–194, July 1992.

[41] Gabriel Taubin, Tong Zhang, and Gene Golub. Optimal surface smoothing as filterdesign. volume 1 of LNCS 1065, pages 283–292, Cambridge, UK, April 1996. Springer.

[42] Demetri Terzopoulos. Multi-level reconstruction of visual surfaces: Variational prin-ciples and finite element representations. A.I. Memo 671, MIT, 1982.

[43] Ronald L. Rivest Thomas H. Cormen, Charles E. Leiserson and Clifford Stein. Intro-duction to Algorithms. MIT Press and McGraw-Hill, second edition, 2001. Chapter26: Maximum Flow, pages 643–700. ISBN 0-262-03293-7.

[44] Tomas Werner. A linear programming approach to max-sum problem: A review.Technical Report CTU–CMP–2005–25, Center for Machine Perception, Czech Tech-nical University, Prague, Czech Republic, December 2005.

[45] Christopher Zach, Andreas Klaus, Joachim Bauer, Konrad Karner, and MarkusGrabner. Modeling and visualizing the cultural heritage data set of Graz. In ProcInt Symp on Virtual Reality, Archaeology, and Cultural Heritage, 2001.

[46] Hong-Kai Zhao, Stanley Osher, and Ronald Fedkiw. Fast surface reconstruction usingthe level set method. In 1st IEEE Workshop on Variational and Level Set Methods,pages 194–202, Vancouver, Canada, July 2001. IEEE Computer Society Press.

Chapter 6

Appendix

6.1 Contents of attached data disk

6.1.1 Code

The attached disk contains the source code of the proposed algorithm developed by theauthor of this work in the directory ’Code’.

The main Matlab script used to run the implementation is depfusion.m. It starts withsetting of the configurable parameters of the algorithm.

The code for the external library function maxflow with modifications by author of thiswork is provided in the subdirectory ’Code/maxflow’.

6.1.2 Experiments

The data for experiments, including input, intermediate results and final models are inthe directory ’Experiments’. Every scene from the Chapter 4 has its own subdirectory.

6.1.3 Software

The free software which can be used to view the experiments’ results in VRML format isin the directory ’Software’.

81

82 CHAPTER 6. APPENDIX

List of Figures

1.1 Result of reconstruction of statue of David by [29] . . . . . . . . . . . . . 3

1.2 Process of reconstruction of statue of The Great Buddha of Kamakura,adapted from [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Four input images for City Hall scene [38]. . . . . . . . . . . . . . . . . . 6

1.4 Textured reconstructed surface of City Hall in [38] (upper) and its detail(lower). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Fish-scale 3D sketch of a section of a modern replica of Langweil Model [4] 10

2.1 Input point cloud and cameras . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Projection geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 3D neighbourhood. Black bold line: image ray. Blue: pixel area in imagegrid. Red lines: 3D neighbourhood Red dots: matching correspondences . 14

2.4 Information flow in the system . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Situation with one correspondence . . . . . . . . . . . . . . . . . . . . . . 18

2.6 2D neighbourhood on the image grid, pixels are at the intersection of gridlines. Blue: horizontal pairs in one column. Red: vertical pairs in one row. 19

2.7 Image colour error helps remove overlapping borders. . . . . . . . . . . . . 27

2.8 Overlapping borders. Detail from Daliborka scene. . . . . . . . . . . . . . 28

2.9 Graph for maximum flow search . . . . . . . . . . . . . . . . . . . . . . . 29

3.1 Input image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 Initial mean of depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 Image contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Interpolated depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5 Depth error with discontinuity . . . . . . . . . . . . . . . . . . . . . . . . 38

3.6 Depth model order. Red: order 2; Yellow: order 1; Light blue: order 0;Dark blue: not visible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.7 Matrix for system of equations with non-zero items (blue) . . . . . . . . . 42

83

84 LIST OF FIGURES

3.8 Depths after QMR (upper) and their residual (lower) . . . . . . . . . . . . 43

3.9 Repaired correspondences. Stars represent new positions, lines point toprevious positions. Triangles on the left represent camera centres. Corre-spondences visible in one camera are red, in the other are green, or blue ifvisible in both cameras. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.10 Image colour error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.11 The 8 neighbouring pixels (red) of pixel p (black) . . . . . . . . . . . . . . 47

3.12 Depth error with residual . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.13 Invisible colour histogram match . . . . . . . . . . . . . . . . . . . . . . . 49

3.14 Visibility. Red: v = 2; Yellow: v = 1; Blue: v = 0; Dark blue: discontinu-ity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.15 Two possible triangulation schemes . . . . . . . . . . . . . . . . . . . . . . 52

4.1 Daliborka scene. All 27 input images used. . . . . . . . . . . . . . . . . . 57

4.2 Daliborka scene. Noisy input point cloud (10% of all points). . . . . . . . 57

4.3 Daliborka scene. Previous (fish-scales, upper) and new result (depth mapfusion, lower). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.4 Daliborka scene. Composition of surface from different cameras, one colourper camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.5 Daliborka scene, detail with stairs and chimney. . . . . . . . . . . . . . . . 59

4.6 Daliborka scene, detail with bastion. . . . . . . . . . . . . . . . . . . . . . 60

4.7 Daliborka scene, detail of roof. Previous (fish-scales, upper) and new result(depth map fusion, lower, selected camera only). Labels: a - floating errors,b - undetected discontinuity, c - noisy surface. . . . . . . . . . . . . . . . . 61

4.8 Tree scene. All 23 input images used. . . . . . . . . . . . . . . . . . . . . 63

4.9 Tree scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.10 Tree scene, detail in selected camera. . . . . . . . . . . . . . . . . . . . . . 64

4.11 Tree scene, top view. Composition of surface from different cameras, onecolour per camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.12 St. John of Nepomuk scene. All 28 input images used. . . . . . . . . . . . 66

4.13 St. John of Nepomuk scene. . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.14 St. John of Nepomuk scene, front view. Composition of surface from dif-ferent cameras, one colour per camera. . . . . . . . . . . . . . . . . . . . . 67

4.15 St. Martin scene. All 34 input images used. . . . . . . . . . . . . . . . . . 69

4.16 St. Martin scene, top view of wall cut. Input correspondences in selectedcamera, one colour per corresponding camera. . . . . . . . . . . . . . . . . 69

4.17 St. Martin scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

LIST OF FIGURES 85

4.18 St. Martin scene. Composition of surface from different cameras, one colourper camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.19 St. Martin scene, detail of roof. . . . . . . . . . . . . . . . . . . . . . . . . 72

4.20 St. Martin scene, detail of door. . . . . . . . . . . . . . . . . . . . . . . . 72

Documents

CENTER FOR MACHINE PERCEPTION Representation of …cmp.felk.cvut.cz/ftp/articles/martinec/Tylecek-TR-2008-02.pdf · Representation of Geometric Objects for 3D Photography ... Representation