Rapid Creation of Large-scale Photorealistic Virtual EnvironmentsCharalambos Poullis
CGIT/IMSC, USCSuya You
CGIT/IMSC, USCUlrich Neumann
The rapid and efficient creation of virtual environments has be-come a crucial part of virtual reality applications. In particular, civiland defense applications often require and employ detailed modelsof operations areas for training, simulations of different scenarios,planning for natural or man-made events, monitoring, surveillance,games and films. A realistic representation of the large-scale en-vironments is therefore imperative for the success of such applica-tions since it increases the immersive experience of its users andhelps reduce the difference between physical and virtual reality.However, the task of creating such large-scale virtual environmentsstill remains a time-consuming and manual work.
In this work we propose a novel method for the rapid recon-struction of photorealistic large-scale virtual environments. First, anovel parameterized geometric primitive is presented for the auto-matic building detection, identification and reconstruction of build-ing structures. In addition, buildings with complex roofs contain-ing non-linear surfaces are reconstructed interactively using a non-linear primitive. Secondly, we present a rendering pipeline for thecomposition of photorealistic textures which unlike existing tech-niques it can recover missing or occluded texture information by in-tegrating multiple information captured from different optical sen-sors (ground, aerial and satellite).
Index Terms: [Large-scale Modeling, Texturing, Multiple Sen-sory input, Photorealistic Virtual Environments]:
Virtual reality technologies are becoming increasingly popular andare being widely used in a range of different applications. In par-ticular, civil and defense applications employ such technologies forthe simulation of real world operations areas. In such cases themodels of urban buildings are of significant value since they fa-cilitate planning, response, and real-time situational awareness inhighly-occluded urban settings [7, 10, 17]. The personnel that sim-ulate, plan, monitor, and execute responses to natural or man-madeevents can gain insight and make better decisions if they have acomprehensive view of the structures and activity occurring at anoperational scene. The models are essential components of sucha view, helping people comprehend spatial and temporal relation-ships. In addition, a photorealistic appearance is essential for theenhancement of the visual richness of the models and the immer-sive experience of the users [16, 6].
While models are important assets, the creation of photorealis-tic large-scale models remains at best a difficult, time-consumingmanual task. The science for rapidly sensing and modeling wide-area urban sites and events, in particular, pose difficult or unsolvedproblems. Over the years, a wealth of research, employing a vari-ety of sensing and modeling technologies, has been conducted to
deal with the complex modeling problem. Different types of tech-niques, ranging from computer vision, computer graphics, photo-grammetry, and remote sensing, have been proposed and developedso far, each has unique strengths and weaknesses, and each per-forms well for a particular dataset but may fail under another.
Similarly, the generation of high-resolution photorealistic tex-tures has been extensively investigated and many algorithms havebeen already proposed. However, the complexity of these algo-rithms increases considerably when dealing with large-scale virtualenvironments. In addition, these texturing techniques for large-scale environments are limited to using a single image per buildingwhich makes it impossible to recover textures for missing or oc-cluded areas and requires even more time-consuming manual work.These problems impose a serious limitation in achieving a realisticappearance for the models and in extent diminishes the immersiveexperience of the users.
In this work we address the problem of rapid creation of pho-torealistic large-scale virtual environments. Firstly, we address the3D model reconstruction problem and propose a novel parameter-ized geometric primitive for the automatic detection, identificationand reconstruction of building models. We leverage the symme-try constraints found in man-made structures to reduce the numberof unknown parameters needed during the model reconstruction,therefore considerably reducing the computational time required.In addition, complex buildings containing non-linear surfaces suchas domes, stadiums, etc. are interactively reconstructed using a non-linear primitive.
Secondly, we address the problem of texturing, and propose arendering pipeline for the composition of photorealistic textures. Asignificant advantage of this pipeline is that textures for missing oroccluded areas in one image can be recovered from another imagetherefore eliminating the need for any manual editing work. Im-ages captured from multiple sensors (ground, aerial, satellite) areintegrated together to produce a set of view-independent, seamlesstextures.
We have extensively tested the proposed approach with a widerange of sensing data including satellite, aerial, ground photographsand LiDAR(Light Detection And Ranging) and present our results.
The paper is organized as follows. In the next section, we dis-cuss previous work related to modeling and texturing. Section 3provides an overview of the system. Section 4 describes the twomain components for the reconstruction of large-scale models fromLiDAR, namely the data pre-processing in section 4.1, and the mod-eling in section 4.2. In particular, section 4.2.2 introduces the novelparameterized geometric primitive used to automatically identifyand reconstruct buildings with the most commonly occurring linearroof-types. Section 5 describes the texturing pipeline employed forthe generation of photorealistic textures. The two main componentsof the texturing pipeline are the registration of imagery from variousoptical sensors described in section 5.1, and the texture compositiondescribed in section 5.2.
2 RELATED WORK
A good survey on large-scale modeling techniques can be found in. In  the authors present a method for reconstructing large-scale 3D city models by merging ground-based and airborne-based
IEEE Virtual Reality 20088-12 March, Reno, Nevada, USA978-1-4244-1971-5/08/$25.00 2008 IEEE
LiDAR data. The elevation measurements are used to recover thegeometry of the roofs. Facade details are then incorporated by thehigh resolution capture of a ground based system which has theadvantage of also capturing texture information. The textures aidin the creation of a realistic appearance of the model. However,at the cost of having detailed facades they neglect to deal with thecomplexities and wide variations of the buildings roof types. Thesame authors later extended their method to incorporate texture in-formation from oblique aerial images. Although they combine mul-tiple aerial images to determine the models textures, their methodis restricted to traditional texture mapping rather than combiningall available texture information to generate a composite texture i.e.blending. Therefore, a significant color difference between imageswill cause visible and non-smooth transitions between neighbour-ing polygons of different texture images.
In  You et al, present an interactive primitive-based model-ing system for the reconstruction of building models from LiDARdata. Using the user input, their system automatically segments thebuilding boundary, performs model refinement, and assembles thecomplete building model. However, user input is required for thedetection as well as the identification of buildings and their roof-types.
In  they develop an interactive system for reconstructing ge-ometry using non sequential views from uncalibrated cameras. Thecalculation of all 3D points and camera positions is performed si-multaneously as a solution of a set of linear equations by exploitingthe strong constraints obtained by modeling a map as a single affineview. However, due to the considerable user interaction required bythe system, its application to large-scale areas is very limited.
In  the proposed system can deal with uncalibrated imagesequences acquired with a hand-held camera. Based on trackedor matched features the relations between multiple views are com-puted. From this both the structure of the scene and the motion ofthe camera are retrieved. Although the reconstructed 3D models arevisually impressive they consist of complex geometry -as opposedto simple polygonal models- which requires further processing andlimits their applications.
In  Nevatia et al, propose a user-assisted system for the ex-traction of 3D polygonal models of buildings from aerial images.Low level image features are initially used to build high level de-scriptions of the objects. Using a hypothesize and verify paradigmthey are able to extract impressive models from a small set of aerialimages. The authors later extended their work in  to automati-cally estimate camera pose parameters from two or three vanishingpoints and three 3D to 2D correspondences.
In  a ground-based LiDAR scanner is used to record a rathercomplex ancient structure of significant cultural heritage impor-tance. Multiple scans were aligned and merged together using asemi-automatic process and a complete 3D model was created ofthe outdoor structure. The reconstructed model is shown to con-tain high-level of details however the complexity of the geometry(90 million polygons for one building) limits this approach to thereconstruction of single buildings rather than large-scale.
In a different approach,  proposed an interactive system whichcan reconstruct buildings using ground imagery and a minimal setof geometric primitives. More recently  extended this systemto incorporate pointcloud support as part of the reconstruction how-ever the required user interaction increases considerably for large-scale areas. Moreover, the user interaction depends on the desiredlevel of detail of the reconstructed models which may vary consid-erably according to the application.
One of the most popular and successful techniques in this area isthe one introduced by  which uses a small set of images to recon-struct a 3D model of the scene. View-dependent texture mapping
(VDTM) is then performed for the computation of the texture mapsof the model. By interpolating the pixel color information from dif-ferent images new renderings of the scene can be produced. Thecontributions of each image to a pixels color is weighted based onthe angle difference between the cameras direction and the novelview-points direction. The authors then extended their work andshowed how VDTM can be efficiently implemented using projec-tive texture mapping, a feature available in most computer graphicshardware. Although this technique is sufficient to create realisticrenderings of the scene from novel view-points its computation isstill too expensive for real-time applications, like games or virtualreality.
In  they order geometry into optimized visibility layers foreach photograph. The layers are subsequently used to create stan-dard 2D image-editing layers which become the input to a layeredprojective texture rendering algorithm. However, this approachchooses the best image for each surface rather than combining thecontributions of all the images to minimize information loss.
In  reflectance properties of the scene were measured andlighting conditions were recorded for each image taken. An inverseglobal illumination technique was then used to determine the truecolors of the models surfaces which could then be used to relightthe scene. The generated textures are photorealistic, however forlarge-scale areas it requires considerable amount of manual workfor the capturing and the processing of the data.
A method for creating renders from novel view-points withoutthe use of geometric information is presented in , where denselyregularly sampled images are blended together. The image capturefor small objects is very easily achieved, however this task is closeto impossible to perform for large-scale areas.
A slightly different approach for texture generation and extrac-tion is proposed in . Given a texture sample in the form of animage, they create a similar texture over an irregular mesh hierar-chy that has been placed on a given surface, however this approachcannot capture the actual appearance of the models.
3 SYSTEM OVERVIEWThe systems overview is summarized in Figure 1. Firstly, the air-borne LiDAR data is preprocessed as described in section 4.1. Thisstep involves resampling (section 4.1.1) the data into a regular gridstructure, filtering i.e. hole filling and smoothing (section 4.1.2),and segmenting the points into vegetation or ground (section 4.1.3).The preprocessing plays an essential role in the success of mod-eling since it reduces the noise and inconsistencies from the datawhile ensuring that important features such as discontinuities arepreserved.
Secondly, the roof boundaries are extracted from the prepro-cessed data (section 4.2.1) and 3D models are generated based onthe automatic identification of the roof-types (section 4.2.2). A ma-jor advantage of this approach is that it can automatically identifythe roof-types which can be a very difficult task even for a hu-man operator. In addition, an interactive non-linear primitive isemployed for the reconstruction of complex non-linear roof-types(section 4.2.3).
Finally, the camera poses for the ground, aerial and satellite im-ages are calculated from registered correspondences to the recon-structed 3D models (section 5.1), and are used by the renderingpipeline to generate the composite photorealistic textures (section5.2). A key aspect of this pipeline is the integration of multipleinformation for the efficient and effective handling of textures formissing or occluded areas.
4 LARGE-SCALE MODEL RECONSTRUCTION4.1 Pointcloud Data Pre-processingThe preprocessing of the raw 3D pointcloud data is performed inthree steps: resampling, filtering and segmentation. The result is a