Augmented reality and photogrammetry: A synergy to visualize physical and virtual city environments

ISPRS Journal of Photogrammetry and Remote Sensing 65 (2010) 134–142

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing

journal homepage: www.elsevier.com/locate/isprsjprs

Augmented reality and photogrammetry: A synergy to visualize physical andvirtual city environmentsCristina Portalés ∗, José Luis Lerma, Santiago NavarroPhotogrammetry and Laser Scanning Research Group (GIFLE), Department of Cartographic Engineering, Geodesy and Photogrammetry, Universidad Politécnica de Valencia,Camino de Vera s/n, Building 7i, 46022 Valencia, Spain

a r t i c l e i n f o

Article history:Received 14 July 2008Received in revised form16 September 2009Accepted 5 October 2009Available online 17 October 2009

Keywords:Augmented realityModelMultisensorTrackingNavigationReal time

a b s t r a c t

Close-range photogrammetry is based on the acquisition of imagery to make accurate measurementsand, eventually, three-dimensional (3D) photo-realistic models. These models are a photogrammetricproduct per se. They are usually integrated into virtual reality scenarios where additional data such assound, text or video can be introduced, leading to multimedia virtual environments. These environmentsallow users both to navigate and interact on different platforms such as desktop PCs, laptops and smallhand-held devices (mobile phones or PDAs). In very recent years, a new technology derived from virtualreality has emerged: Augmented Reality (AR), which is based on mixing real and virtual environmentsto boost human interactions and real-life navigations. The synergy of AR and photogrammetry opens upnewpossibilities in the field of 3D data visualization, navigation and interaction far beyond the traditionalstatic navigation and interaction in front of a computer screen.In this paper we introduce a low-cost outdoormobile AR application to integrate buildings of different

urban spaces. High-accuracy 3D photo-models derived from close-range photogrammetry are integratedin real (physical) urban worlds. The augmented environment that is presented herein requires forvisualization a see-through video head mounted display (HMD), whereas user’s movement navigation isachieved in the real worldwith the help of an inertial navigation sensor. After introducing the basics of ARtechnology, the paper will deal with real-time orientation and tracking in combined physical and virtualcity environments, merging close-range photogrammetry and AR. There are, however, some software andcomplex issues, which are discussed in the paper.

© 2009 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published byElsevier B.V. All rights reserved.

1. Introduction

Close-range photogrammetry is currently an efficient tool toderive geometrical information from digital imagery in a fast andeconomic way. Nowadays, most often the derived photogrammet-ric product is a 3D photo-realistic model, and two different tech-niques for modelling objects coexist with success: ‘virtual’ realityand ‘visual’ reality (Grussenmeyer et al., 2002).While in the formerthe texture and object shapes do not need to correspond to real ob-jects, the 3D models in the latter are a replica of physical objects,and they help one to understand their corresponding real objects.Derived photogrammetric models can provide real measurements,aswell as virtual and visual realities, depending on the images usedto finally texture both objects and environments.In the last few years, with the increasing capabilities of stan-

dard personal computers, new tools have appeared that allow the

∗ Corresponding author.E-mail addresses: [email protected], [email protected] (C. Portalés),

[email protected] (J.L. Lerma), [email protected] (S. Navarro).

0924-2716/$ – see front matter© 2009 International Society for Photogrammetry anddoi:10.1016/j.isprsjprs.2009.10.001

addition of multimedia content to 3D models such as sound, text,videos, etc. Moreover, navigation in real time is possible via the In-ternet, and some kind of user interaction is usually possible. Thehigh potential value of interactive 3D photo-models in many dif-ferent areas is undeniable. It is possible to find many good exam-ples concerning photo-models in different fields, such as the studyand conservation of cultural heritage sites (Behan andMoss, 2006;Obleby, 1999), visualization and dissemination over the Internet(Dorffner and Forkert, 1998; Lerma and García, 2004) or scenereconstructions (Fraser et al., 2005).Nowadays, there are new approaches that can bring further

potential to acquired models by close-range photogrammetry, asis the case of augmented reality. This is an emerging technologythat is leading to new kinds of visualizations, navigations and userinteractions, opening new possibilities for the understanding ofvirtual photo or non-photorealistic models. Due to its novelty,the research interest has so far been focused on tracking andregistration, display technology, rendering, interaction devices andtechniques, presentation and/or authoring (Bimber and Raskar,2005). Nevertheless, nowadays research is also focussing on theapplications that can be derived from this technology to issues

Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.

http://www.elsevier.com/locate/isprsjprs

http://www.elsevier.com/locate/isprsjprs

mailto:[email protected]




http://dx.doi.org/10.1016/j.isprsjprs.2009.10.001

C. Portalés et al. / ISPRS Journal of Photogrammetry and Remote Sensing 65 (2010) 134–142 135

related to the final user, such as physical human-to-human orhuman-to-computer interactions (Cheok et al., 2006).This article sets up a synergy between conventional close-

range photogrammetry and innovative AR approaches in orderto explore enhanced visualization, spatial data management, usernavigation and/or interaction. We introduce a low-cost outdoormobile AR system to allow the visualization of virtual buildingmodels acquired by multi-convergent close-range photogramme-try which are integrated into physical environments. 3D mod-els can be achieved via terrestrial laser scanners or computed viaimage-based modelling. Although the former approach is nowa-days becoming a standard source for input data in many appli-cations, the latter remains the most widely used in general 3Dapplications; the requirements of hardware are less demand-ing and low-cost implementations are possible with existingtechnology.Image-based modelling is applied herein. The images were

taken with a low-cost conventional digital camera, following thewell-known CIPA 3×3 rules (Waldhäusl and Ogleby, 1994). After-wards, a self-calibration bundle block adjustment was performedto build up the photo-model. This paper depicts a scenario wherevirtual photo-realistic 3D models are combined into different real(physical) scenarios to figure out how different architectonic con-structions might change citizens’ identity and behaviours againstgovernmental decisions. The case study presented below is fic-tional, but it points out the power of combination of two differ-ent purpose technologies, AR and photogrammetry. The former ismore concerned with real-time processing, high speed and con-tinuous visualization over rough models, while the latter is moreconcerned with high accuracy, perfect geometry andmatching be-tween 3D models and imagery. The photogrammetric techniquecontributes the visual reality environment to the augmented 3Dworld.Section 2 presents a review of AR technology and points out the

general benefits it can bring in this field, giving some hints on thecomposition of an AR environment. Section 3 presents a case studyin which a historical building of the city of Valencia is placed onour campus, and topics such as AR system configuration, referenceframe and problems due to occlusions are tackled. Section 4reviews the programming environment, carried out within theMax/MSP Jitter (Cycling ’74, 2008) software. Finally, Section 5presents a discussion and conclusions.

2. AR technology

2.1. Description

AR is a relatively new technology that is based on mixingcomputer generated stimuli (visual, sound or haptic) and real ones,keeping a spatial relationship between synthetic and physical dataand allowing user interaction in real time, as described in Azuma(1997). In recent years, it has been introduced in many differentareas, mainly to visualize both virtual data and real environmentsall together. Such emerging areas include education (Kaufmannand Schmalstieg, 2003; Portalés Ricart et al., 2007), entertainment(Cheok et al., 2003; Wagner et al., 2004), GIS (King et al., 2005),media arts (Benford et al., 2006; Levin, 2006), psychology (Juanet al., 2005), robotics (Stilman et al., 2005), surgery (Glossop andWang, 2003; Wacker et al., 2005) and urban planning (Ben-Josephet al., 2001; Portalés Ricart et al., 2005), amongst others. The dizzilyincreasing number of AR applications in the last few years is dueto the new potential that this technology brings. According toBillinghurst and Kato (2002), augmented reality provides:• Seamless interaction between real and virtual environments. Aninterface seam was introduced by Ishii et al. (1994) and it canbe described as a spatial, temporal or functional constraint thatforces the user to change between a variety of spaces or ways ofworking. In AR the communication between users is producedin a natural way as users can still work with traditional tools

Fig. 1. Simplified schema of a virtuality continuum. From (Milgram and Kishino,1994).

and are able to see each other at the same time as virtual data,thus allowing face-to-face collaboration. For example, a studyis described in Kiyokawa et al. (2000) based on an AR systemdesigned to minimize interaction seams.

• The ability to enhance reality.WithAR systems,multimodal com-puter generated stimuli can be added to the physical environ-ment. Moreover, some parts of the reality can be modified oreven deleted (e.g. with virtual phantoms that occlude real ob-jects). This is the case presented in Fischer et al. (2005), wheredifferent filters are applied to the resulting augmented scene ofa video composition.

• The presence of spatial cues for face-to-face and remote collabora-tion. Computer generated objects can be spatially distributed inreal time according to physical environments. For instance, inKato et al. (2001) an AR videoconference system is introduced,where users can distribute the video images of other peoplearound a physical table.

• Support of a tangible interface metaphor for object manipulation.In AR systems there exists a close relationship between virtualand real objects, as physical objects can be augmented throughcomputer generated data, allowing a dynamic superimpositionof those elements. Therefore, physical objects can be used todirectly manipulate virtual data in such an intuitive mannerthat people with no computer background can still have a richinteractive experience. For example, in Cheok et al. (2006) areview of several applications in the field of entertainment ismade, highlighting the natural human-to-human and human-to-computer interactions that can arise with this technology.

• The ability to transition smoothly between reality and virtual-ity. According to Milgram’s continuum (Milgram and Kishino,1994), also known as virtuality continuum (Fig. 1), a classi-fication of interfaces can be made depending on the amountof synthetic data in proportion to the real environment,thus discerning between augmented reality (closer to a realenvironment) and augmented virtuality (closer to a virtualenvironment). Therefore, interactive 3D photo-models wouldbe close to the right side of the continuum, inside the areaof augmented virtuality, because of the photorealistic textureapplied to the virtual models of real objects.

These issues together with the increasing computational ca-pacities of standard personal computers encourage scientists, re-searchers and othermembers of the public to improve, develop anduse AR systems.

2.2. Composition of augmented reality environments

Compatibility between both real and virtual data is an impor-tant issue in AR (Wang and Dunston, 2006). To properly combinevirtual and real objects in such away that the augmented scene ap-pears to be plausible to the user, the real camera should bemappedto the virtual one in such a way that the perspectives of both envi-ronments match. Other issues that may be considered to increasethe realism of the rendered augmented scenes are occlusion be-tween real and virtual objects, lighting, shadowing and reflections.To correctly match real and virtual worlds, computer generated

objects need to be accurately registered to the real world, so that

136 C. Portalés et al. / ISPRS Journal of Photogrammetry and Remote Sensing 65 (2010) 134–142

they appear to the user as fixed in the environment. Accordingto Bimber and Raskar (2005), accurate tracking is one of themost significant challenges in AR research today, as accuracyis frequently crucial and depends essentially on the type andresolution of the sensors (e.g. GPS, INS, vision based). If absolutetracking with a global coordinate system is required, it can bedistinguished between outside-in and inside-out tracking. Thefirst case refers to those systems where sensors are fixed on theenvironment and register a set of emitters on mobile objects.The second type makes use of sensors fixed to mobile objects.Nevertheless, the acquisition of the exterior camera orientationin real time for a wide area is not always possible with the useof a single tracking technology due to limitations in sensors. Forexample, magnetic and radiofrequency sensors are influenced bymetallic interferences; GPS receivers suffer from the multipathproblem and cannot be used inside buildings; vision-basedtracking depends strongly on lighting conditions and visibility.Furthermore, it is not always possible with a single technology toeither register the camera’s six degrees of freedom (DOFs) or addadditional user interaction. Thismeans that some authors integratedifferent tracking technologies. For example, in Kiyokawa et al.(2000) a collaborative designAR system is presentedwhere cameraorientation and user interaction are achieved via a combination ofseveral magnetic sensors and push buttons. In Piekarski (2006) amobile outdoor AR system is described where camera tracking isachieved via a combination of a GPS receiver and an inertial sensor,whereas user interaction is fulfilled within a data glove and anoptical system.One has to point out the importance that mobile technology

(mobile phones and PDAs) is achieving in the field of outdoorAR-based applications. The increasing power of these deviceswith broadband network, integration of high-resolution camerasand low-cost GPS receivers, leads to small equipment andhigh performance. Many authors have used this technology incollaborative AR environments, where user communication withother users and/or remote resources is crucial (Benford et al., 2006;Díez-Díaz et al., 2007).On the other hand, occlusion is a well-known problem within

AR research. When doing a video composition of real and virtualscenes, virtual objects are always mapped on top of the imagesof the physical environment. Thus, non-desirable occlusions canoccur. Several authors have tried to solve this problem in differ-ent ways. For example, in Lepetit and Berger (2000) a method isdeveloped based on a semi-automatic occluding boundary recon-struction from different camera points of view, whereas in Fischeret al. (2003) amethod is presented based on detecting occlusions infront of static backgrounds. To further increase realism, the light-ing conditions of the virtual scene should coincide within the onesobtaining in the real environment. Furthermore, reflections andshadowing of virtual objects onto real ones can also be considered.Some of these issues have been implemented by various authors,such as Gibson et al. (2003), Jacobs and Loscos (2006), Stauder(1999), Tatham (1999) and Wang and Samaras (2003). Neverthe-less, some of these techniques are complex and require high pro-cessing speed, which makes them too cumbersome to be appliedin real time. Therefore, in our implementation, only occlusionsand lighting on the Serrano Towers are considered (see Sections 3and 4).

3. Case study

We have developed an AR application for urban visualizationbased on building models acquired by close-range photogramme-try. In this section we show a test carried out at the campus of theUniversidad Politécnica de Valencia (UPV), where a 3D model ofa landmark gate of the city centre, the Serrano Towers (‘Torres de

Fig. 2. User carrying the devices needed for our AR mobile application.

Serranos’), dating back to the XIVth century, is spatially integratedwith the physical modern buildings (late XXth century and earlyXXIst century) on the university campus. The ancient Serrano Tow-erswere part of the old citywalls and are located on the river banksdelimiting the city centre. The distance from theNortheast suburbswhere the UPV is situated to the Serrano Towers is approximately3 km. User orientation and positioning in real time is acquiredwitha combination of an inertial sensor and a vision-based tracking pro-cedure, applying basic photogrammetric rules. Other issues such aslighting of virtual models and occlusions are also considered.

3.1. AR configuration

Our application can be classified as an outdoor mobile AR sys-tem where the user wears the devices needed for the tracking,generation and rendering of the augmented environment. Thesedevices are (Fig. 2):

(a) A standard laptop inside a backpack or carry bag.(b) Adisplay system. Specifically,weused an I-glasses SVGAvideo-based HMD, with a resolution of 800× 600, 26◦ diagonal fieldof view andmodifiable brightness; it is 2D based, i.e., the sameimage of the scene is rendered for both eyes.

(c) A standard web cam, with 640 × 480 resolution and USB 2.0connection.

(d) An inertial sensor. We used the MT9 Xsens miniature inertialsensor, which operates at frequencies up to 512 Hz, providesangular resolutions of 0.05◦ and accuracies better than 1◦ RMSin the three axes.

(e) Batteries for the HMD and INS.

The user carries the backpack where the laptop and batteries areintegrated. The laptop processes all the received data in real timeand performs target oriented processing to simulate an augmentedurban environment through an application written in Max/MSPJitter. The displays used in outdoor mobile AR applications areusually seen through HMD or small hand-held devices (mobilephones or PDAs). In the second case, there exists the disadvantagethat these devices are not yet powerful enough tomanage complexaugmented environments. Some authors also use screen-baseddisplays (tablet-PC or standard monitors), although they have to


Fig. 3. Local terrestrial reference frame.

be shaded from the incoming sunlight due to the usually limitedluminosity of the screen (Wilde et al., 2003). In our application, wecan adjust the brightness and contrast of the see-through video-basedHMDaccording to the user needs. Attached to theHMD thereis aWebcamwhose projection centre is approximated to the user’spoint of view. We found that the optimum camera resolution is640× 480 pixels, thus allowing procedures in real time within 30fps. The inertial sensor is also attached to the HMD in such a waythat its axes are aligned to those of the camera.

3.2. Campus reference frame and tracking

To geometrically combine both the virtual and the physicalscenes, the exterior orientation of the virtual camera has to beequal to that of the real one, so a common reference frame has tobe established. To that purpose, we considered a Cartesian localreference frame, with the Y -axis pointing to the North and X-axis to the East. The Z-axis is perpendicular to the XY -plane andpointing to the zenith. The origin is placed at the corner of abuilding (Fig. 3).The user’s positioning and orientation in real time is achieved

via a combination of an inertial sensor and vision-based tracking.Both sensors – inertial and optical – are calibrated and aligned.Observations are integrated into a system of central projectionequations. In our application rotations are directlymeasured by theinertial sensor.Certain topographical measurements were needed in order to

compute the 3D coordinates of the control points. 3D coordinatescan be determined making use of GNSS, urban maps or architec-tonic façade plans. Herein, a reflectorless total station was used forthis purpose. These points were introduced in the physical envi-ronment as circular flat elements of different colour (Fig. 4). Theminimum number of control points needed to calculate the exte-rior orientation of the moving camera is two, due to the rotationoutputs of the inertial sensor. Nevertheless, extra pointsweremea-sured to keep them in reserve for further work. It must be pointedout that the control points should be distant in order to increasethe accuracy, although it must also be ensured that they remain in-side the field of view of the camera. If one of the two control pointsis out of the field of view, then the superimposition of the virtualphoto-modelwill not be successfully overlaid and trackingwill fail.

3.3. Photo-model generation

In this application two kinds of virtual model are needed: thevirtual model of the Serrano Towers, and the virtual model of thebuildings belonging to the area where the application takes place,the UPV campus. The latter is necessary in order to solve occlusions(see Section 3.4).

Fig. 4. Circular coloured elements acting as control points for the tracking in realtime.

To acquire the set of imagery for the purpose of modelling theSerrano Towers, the so-called CIPA 3× 3 rules have been applied.These rules have been described for simple photogrammetricdocumentation of architecture in those cases where non-metriccameras are used. They are structured in three triplets (Waldhäusland Ogleby, 1994):

• Three geometrical rules, where the preparation of control in-formation, the photographic coverage and stereo-partners areconsidered.

• Three photographic rules regarding the inner camera geometry,illumination and camera format.

• Three organizational rules consisting ofmaking proper sketches,protocols and a final check.

The control points for the whole building were measured accord-ing to a pre-defined local coordinate system. Additional tie-pointswere measured in each image. A bundle block adjustment was ap-plied in order to both calibrate the cameras and generate the pointsthat would be used to build up the 3D model. A total of 76 imageswere used to generate the 3D model (Fig. 5). All the computationswere carried out with FotograUPV, a home-made photogrammet-ric software based on multi-convergent imagery. However, a uni-form texture was applied to the whole model based on its realtexture, keeping in mind that the textured model had to be ren-dered responding to the user movements in real time, and com-plex photorealistic models are difficult to manage with standardlaptops. Afterwards, the model was exported into obj format, inorder to be readable by the software (see Section 4).Regarding themodel of the university campus, a detailed VRML

of the Higher Technical School of Geodesy, Cartography and Sur-veying can be found at Muñoz Santamaría (2006). This 3D modelwas achievedbyphoto-tacheometry due to thehighnumber of pla-nar features. Nevertheless, as this model was used as a 3D mask(i.e. it was not visualized), its geometry was simplified and its tex-tures were not considered in order to avoid extra computationalprocessing (Fig. 6).

3.4. Solving occlusion

Real and virtual objects are all integrated together inside thecampus reference frame. In Fig. 7, the Serrano Towers are virtuallyplaced and seen together on the university campus. The Towersare placed in the middle of a square garden in front of the mainentrance of the School.As mentioned in Section 2.2, occlusion is an issue to be solved

in AR scenarios because virtual objects are always placed in theforeground of the incoming video image. However, a user is able


a b

Fig. 5. The Serrano Towers: (a) images; (b) 3D model.

ba

Fig. 6. Simplified VRMLmodel of theHigher Technical School of Geodesy, Cartography and Surveying: (a)with photorealistic textures; (b) simplifiedmodelwithout textures.

Fig. 7. The Serrano Towers integrated in the University campus.

to walk around the considered study area without constraints.Therefore, it might happen that depending on his real-time spatialposition, the visual model would appear in the background of thereal buildings, and vice versa. If occlusions were not analyticallysolved, the resulting augmented image would not be plausible,Fig. 8. Indeed, the Serrano Towers are placed in the foregroundof the image, when they should be behind the physical buildingsaccording to the user’s point of view. This situation is called a ‘false’augmented environment.In the application under study the occlusion issue was solved

with the consideration of 3D masks. These masks are in factsimplified 3D models of the buildings containing texture and analpha channel, which is blended in the visual model. If the visualmodel is spatially located in front of the 3Dmasks (according to theuser’s point of view), these masks will not affect its visualization;in contrast, if the 3Dmasks are placed in front the visualmodel, theformerwill visually delete parts of the latter, thus showing throughthe video image that is backwards. This process is described inFig. 9, whereas in Fig. 10 other scenes of the same augmentedenvironment can be seen (note that in Fig. 10b no occlusion isproduced).

Fig. 8. False augmented environment due to wrong occlusion analysis.

4. Software environment

The presented AR system is self-implemented into Max/MSP/Jitter (Cycling ’74, 2008), which is a multipurpose software. Jit-ter is basically a set of video, matrix, and 3D graphics objects forthe Max graphical programming environment, especially suitedfor real-time video processing, custom effects, 2D/3D graphics,audio/visual interaction, data visualization, and analysis. The ar-chitecture of the application is shown in Fig. 11. The data acquiredby the camera in real time is managed through the jit.qt.grab ob-ject. Afterwards, it is analysed by a colour tracking algorithm thatextracts the image coordinates of control points on site. This infor-mation together with the camera interior parameters, the knownspatial position of control points and the inertial sensor data areintroduced into the central projection equations to determine thespatial camera position. Both position and attitude are assigned tothe jit.gl.render object to build the virtual scene according to thereal one. On the other hand, the 3D virtual models are managed bythe jit.gl.model object, where transparency is assigned to themasksmodels of the surrounding buildings. This processing is managedfrom a main patch and a set of sub-patches.


a b

c d

Fig. 9. Steps to solve occlusion: (a) 3D masks; (b) addition of the visual model; (c) addition of the video image; (d) resulting augmented environment.

a b

Fig. 10. Augmented environment within different points of view.

The sub-patches are p camori, p leastsquares, p trackpoint, pmodel and p light. Within the p camori and p leastsquares sub-patches the mathematical processes to achieve camera exteriororientation are carried out. P camori contains the rotations of theinertial sensor (roll, pitch, yaw) calculated from the mt9 object,which shows acquired data in the form of the rotation matrix.p leastsquares has the minimum squares procedure to solve thesystem of four equations given by the two control points. Thesolution of these equations gives the 3D coordinates of the movingcamera (XO, YO, ZO), which are sent to the pak camera object of themain patch. Furthermore, considering that the ZO coordinate of thecamera has a small variation (as it coincideswith the user’s height),an additional constraint can be added to the system.In the p trackpoint sub-patch the colour tracking of control

points is achieved with the tap.jit.colortrack object of Tap Tools 1.5(Electrotap, 2008). This object registers simultaneously up to fourdifferent colours based on a hue, saturation and brightness analysisof image pixels. The resulting values are the image coordinates of

the window enclosing the coloured area. The centre of the windowcorresponds to the image coordinates of control points that aresent to the p camori sub-patch.Sub-patch p model contains the properties of each virtual

model that should be kept in obj format. Several attributes canbe modified, such as lighting, shading, texturing, mapping orblending. The textures applied to the 3D models that act as masksare of black colour. These models are blended with the SerranoTowers according to the user’s point of view in the followingway: for those areas of the 3D models containing black colourinformation in the final scene, pixel information is replaced bythe video image of the real environment. Therefore, those areasof the Serrano Towers being at the back of the 3D masks are notrendered in the final scene. Finally, in sub-patch p light, generallight conditions of the virtual models can be controlled: ambient,diffuse and specular light, as well as spatial position of the lightsource, can be approximated to the incoming sunlight directiondepending on the time of day to correctly shadow virtual objects.These properties are sent to jit.gl.render in the main patch.


Fig. 11. Program architecture.

5. Discussion

5.1. Augmented reality and photogrammetry

The time required to generate a photo-model is crucial becauseit determines the final costs. In Dorffner and Forkert (1998) a cost-effective method based on assumptions is introduced in order togenerate accurate photo-models. Nevertheless, it can be said thatcomparedwith traditional photo-model generation, AR technologycan save more than 50% of the time needed to build up the 3Dmodel, and accordingly the same amount of production costs. The50% saving is compared to a complete virtual modelling of allobjects involved. Furthermore, if occlusions are not considered,the real part does not need to be modelled, and the time savingcan increase to 80% or 90%, depending on the work characteristics.Taking this into account, another great advantage emerges withinAR: as the ‘‘real’’ part of the augmented scene is truly seen in realtime through the image acquired by the camera, it keeps the richlevel of real-life details. Therefore, all the related costs for high-resolution photo-models can be saved for both the object and theenvironment (there is no need to model benches, street lamps,signals, trees, etc.).On the other hand, if occlusions are considered – as in our case

– 3D models of the real part may be required. Nevertheless, onlyrough 3D models are usually needed. This means that they canbe finished either automatically or semi-automatically, also savingboth time and cost. There exist many examples in the literatureshowing different approaches to reconstruct and build up 3D vir-tual cities. For instance, in Rottensteiner and Jansa (2002) an auto-matic method for building extraction based on aerial imagery andLIDAR data is developed; in Lerma García et al. (2004) 3D citymod-els are quickly and economically produced from existing GIS dataand height information, and 3D photo-models are also included,whereas in Lafarge et al. (2008) a method to extract building foot-print from DEMs is proposed, followed by a simple 3D city recon-struction process. From any of these automatic approaches, furtherdevelopment and enhancement can be achieved if AR is actuallyapplied in large urban areas.

5.2. AR software and tracking

There exists some software specially developed for the gener-ation of AR environments, for example ARToolKit (Human Inter-face Technology Laboratory, 2007),MXRToolKit (MixedReality Lab,2004) or AMIRE (AMIRE, 2004). They are well known and exten-sively used for several reasons. First of all, they are open source

distributed upon GPL license. Second, they acquire the exterior ori-entation of the camera in real time. Third, they provide all thenecessary routines to generate the augmented scene (including3D models, sounds, texts and image augmentations). These soft-ware packages are vision based, and tracking is achieved uponregistration and identification of at least one artificial marker.Nevertheless, registration success is highly dependent on lightingconditions, and thus they are not recommended for works that runoutdoors. Therefore, they would have failed in the study presentedherein.Several authors dealing with AR applications outdoors propose

registration combining different sensor devices (recall Section 2.2),which are better known as hybrid systems. In the present applica-tion, a combination of vision-based and direct inputs has been used(coming froma low-cost inertial sensor), both having pros and con-tras. It can be said that inertial sensors provide dead-reckoningorientation and are processor independent devices, despite themsometimes leading to undesired time dependent drifts. On the onehand, vision-based tracking can be very economically achieved,although this depends highly on the visibility and lightingconditions. Nevertheless, the use of hybrid tracking technologiescan overcome these problems as they compensate for the weak-nesses of each individual component (Azuma et al., 1999). For ex-ample, during fast motions the visual colour tracking may fail dueto blur and large changes in the images, so the system relies uponthe inertial sensor. Alternatively, visual tracking can be used to cor-rect the inertial sensor drifts. The accuracy of the whole systemdepends on both the INS accuracy and the camera resolution ofthe system (recall Section 3.1). The latency is highly dependent onthe hardware and number of processes running at a time. In ourapplication we keep the latency below 1 s.

5.3. Limitations

The implementation of the approach presented in this paper istechnically complex and it requires the integration of different de-vices and methodologies for real-time tracking. Furthermore, themodelled buildings range from basic shapes to complex 3D ge-ometries, and some of them have been considered as 3D masks tosolve the occlusionproblem. All these issues require powerful algo-rithms and fast analyses that overcome the processing capabilitiesof standard laptops, e.g. visualization gaps.Some problems related to the user’s mobility were also found.

For instance, the HMD and the inertial sensor need extra batteriesthat should be carried by the user, apart from the laptop; the


weight of all these devices, although not excessive, can decreasethe ergonomics and mobility. Furthermore, there are some well-known drawbacks related to HMDs, such as lack of resolution,limited field of view and discomfort (Bimber and Raskar, 2005),and thus some users may experience a degree of insecurity whenmoving around.On the other hand, the image captured by the camera should

always contain at least two control points to correctly generatethe augmented scene; otherwise the virtual and real worlds donot match. Therefore, the user is forced to walk inside a limitedarea within a specific viewing direction in this scenario. To solvethis problem, the number of control points should be incremented,although from our experience, it is not recommended to increasethe number of tracks to more than three (although the objectused by the Jitter program allows tracking of up to four differentcoloured targets, it results in extra processing that slows down thesystem).It has been found fast and easy to implement the use of a

colour tracking algorithm. The system can be slowed down whenworking in real timewith all the procedures on, e.g. colour trackingto acquire the position, occlusions and lighting of virtual objects.There is also a drawback related to the need to artificially placecontrol points in the real environment that might look unnatural.

6. Conclusions

Augmented reality is an emerging technology that is gainingprominence in several fields, such as education, entertainment,medicine, robotics and engineering. In this paper we propose asynergy between traditional close-range photogrammetry and AR,by means of an outdoor mobile AR application to visually buildup hybrid (virtual and real) urban spaces that users can walkthrough on site. User’s positioning and orientation in real timeis achieved after combining an inertial sensor and a vision-basedtracking technique. We present a case study where a virtual modelof the Serrano Towers – an emblematic building of the city centreof Valencia dating from the XIVth century – is spatially placedon our campus (located in the suburbs), building an augmentedenvironment that cannot be physically possible.Despite the troubles found in the generation of the application,

this work reflects the potential of the synergy between close-rangephotogrammetry and augmented reality, joining the high accuracyof the former to generate 3D visualmodels and the user interactionin real time and new kinds of visualization of the latter. Ourapplication can be used for several purposes, ranging from a merevisualization of mixed urban spaces to advanced urban planning,on-site reconstruction of buildings that have disappeared, andguided tours. Last but not least, it is much more cost efficientthan stand-alone virtual reality modelling in which true (physical)objects, on the one hand, and untrue (virtual) objects, on the other,have to be sequentially and accurately reconstructed.

Acknowledgements

Our acknowledgements go to Dr. Francisco Giner and Dr. Fran-cisco Sanmartín for their valuable contributions in the initial stageof the application.

References

AMIRE, 2004. AMIRE. http://www.amire.net/index.html (accessed 30.09.2009).Azuma, R., Weon Lee, J., Jiang, B., Park, J., You, S., Neumann, U., 1999. Trackingin unprepared environments for augmented reality systems. Computers &Graphics 23 (6), 787–793.

Azuma, R.T., 1997. A survey of augmented reality. Presence: Teleoperators andVirtual Environments 6 (4), 355–385.

Behan, A., Moss, R., 2006. Close-range photogrammetric measurement and 3dmodelling for irish medieval architectural studies. In: M. Ioannides, D. Arnold,F. Niccolucci and K. Mania (Eds.), Proceedings 7th International Symposiumon Virtual Reality, Archaeology and Cultural Heritage, Nicosia, Cyprus, 30October–4 November. 6 p.

Ben-Joseph, E., Ishii, H., Underkoffler, J., Piper, B., Yeung, L., 2001. Urban simulationand the luminous planning table: Bridging the gap between the digital and thetangible. Journal of Planning Education and Research 21 (2), 196–203.

Benford, S., Crabtree, A., Flintham, M., Drozd, A., Anastasi, R., Paxton, M., Tandavan-itj, N., Adams, M., Row-Farr, J., 2006. Can you see me now? ACM Transactionson Computer–Human Interaction 13 (1), 100–133.

Billinghurst, M., Kato, H., 2002. Collaborative augmented reality. Communicationsof the ACM 45 (7), 64–70.

Bimber, O., Raskar, R., 2005. Spatial Augmented Reality: Merging Real and VirtualWorlds. A K Peters, Ltd, Wellesley, Massachusetts.

Cheok, A.D., Fong, S.W., Yang, X., Liu, W., Farzbiz, F., 2003. Human Pacman: asensing-based mobile entertainment system with ubiquitous computing andtangible interaction, In: Proceedings of 2nd workshop on Network and SystemSupport for Games, California, USA, 22–23 May, pp. 106–117.

Cheok, A.D., Teh, K.S., Nguyenm, T.H.D., Qui, T.C.T., Lee, S.P., Liu, W., Li, C.C., Diaz,D., Boj, C., 2006. Social and physical interactive paradigms for mixed realityentertainment. Computers in Entertainment 4 (2). On ACM Digital Library.

Díez-Díaz, F., González-Rodríguez, M., Vidau, A., 2007. An accessible and collabora-tive tourist guide based on augmented reality andmobile devices. In: UniversalAccess in Human-Computer Interaction. Ambient Interaction. In: Lecture Notesin Computer Science, Springer-Verlag, Berlin, Heidelberg, pp. 353–362.

Dorffner, L., Forkert, G., 1998. Generation and visualization of 3D photo-modelsusing hybrid block adjustment with assumptions on the object shape. ISPRSJournal of Photogrammetry and Remote Sensing 53 (6), 369–378.

Electrotap,, 2008. Software and hardware for innovative music, media, and art.http://www.electrotap.com/taptools/ Accessed September 30, 2009.

Fischer, J., Bartz, D., Straßer, W., 2005. Artistic reality: Fast brush stroke stylizationfor augmented reality, In: Proceedings ACM Symposium on Virtual RealitySoftware and Technology, Monterey, CA, USA, 7–9 November, pp. 155–158.

Fischer, J., Regenbrecht, H., Baratoff, G., 2003. Detecting dynamic occlusion in frontof static backgrounds for AR scenes. In: Proceedings of theWorkshop on VirtualEnvironments, Zurich, Switzerland, 22–23 May, pp. 153–161.

Fraser, C., Hanley, H., Cronk, S., 2005. Close-range photogrammetry for accidentreconstruction. In: Gruen, A., Kahmen, H. (Eds.), Optical 3D Measurements VII,Technical University of Vienna, pp. 115–123.

Gibson, S., Cook, J., Howard, T., Hubbold, R., 2003. Rapid shadow generation inreal-world lighting environments. In: Proceedings Eurographics Workshop onRendering, Leuven, Belgium, 25–27 June, pp. 219–229.

Glossop, N.D., Wang, Z., 2003. Laser projection augmented reality system forcomputer-assisted surgery. International Congress Series 1256, 65–71.

Grussenmeyer, P., Hanke, K., Streilein, A., 2002. Digital Photogrammetry.In: Kasser, M., Egels, Y. (Eds.), Architectural Photogrammetry. Taylor &Francis, London, pp. 332–334.

Human Interface Technology Laboratory, 2007. ARToolKit. http://www.hitl.washington.edu/artoolkit/ (accessed 30.09.09).

Ishii, H., Kobayashi, M., Arita, K., 1994. Iterative design of seamless collaborationmedia. Communications of the ACM 37 (8), 83–97.

Jacobs, K., Loscos, C., 2006. Classification of illumination methods for mixed reality.Computer Graphics Forum 25 (1).

Juan, M.C., Alcañiz, M., Monserrat, C., Botella, C., Baños, R.M., Guerrero, B., 2005.Using augmented reality to treat phobias. IEEE Computer Graphics andApplications 25 (6), 31–37.

Kato, H., Billinghurst, M., Morinaga, K., Tachibana, K., 2001. The effect of spatial cuesin augmented reality video conferencing. In: Proceedings 9th InternationalConference on Human-Computer Interaction, New Orleans, 5–10 August, pp.478–481.

Kaufmann, H., Schmalstieg, D., 2003. Mathematics and geometry education withcollaborative augmented reality. Computers & Graphics 27 (3), 339–345.

King, G.R., Piekarski, W., Thomas, B.H., 2005. ARVino-outdoor augmented realityvisualisation of viticulture GIS data. In: Proceedings 4th IEEE and ACM Inter-national Symposium on Mixed and Augmented Reality. Vienna, Austria, 5–8October, pp. 52–55.

Kiyokawa, K., Takemura, H., Yokoya, N., 2000. SeamlessDesign for 3D objectcreation. IEEE Multimedia 7 (1), 22–33.

Lafarge, F., Descombes, X., Zerubia, J., Pierrot-Deseilligny, M., 2008. Automaticbuilding extraction from DEMs using an object approach and application to the3D-city modeling. ISPRS Journal of Photogrammetry and Remote Sensing 68(3), 365–381.

Lepetit, V., Berger, M.-O., 2000. A semi-automatic method for resolving occlusionin augmented reality. In: Proceedings IEEE Conference on Computer Vision andPattern Recognition, Hilton Head Island, South Carolina, USA, 13–15 June, pp.225–230.

Lerma García, J.L., Vidal, J., Portalés Ricart, C., 2004. Three-dimensional city modelvisualisation for real-time guided museum tours. The Photogrammetric Record19 (108), 360–374.

Lerma, J.L., García, A., 2004. 3D city modelling and visualization of historicalcenters. In: CIPA Internacional Workshop on Vision Techniques applied to theRehabilitation of City Centres, Lisbon, Portugal, 25–25 October, pp. 4.

Levin, G., 2006. Computer vision for artists and designers: Pedagogic tools andtechniques for novice programmers. Journal of Artificial Intelligence andSociety 20 (4), 462–482.

http://www.amire.net/index.html






http://www.electrotap.com/taptools/





http://www.hitl.washington.edu/artoolkit/







Milgram, P., Kishino, F., 1994. A taxonomy of mixed reality visual displays. IEICETransactions on Networked Reality E77-D (12), 1321–1329.

Mixed Reality Lab,, 2004. MXRToolKit. http://mxrtoolkit.sourceforge.net (accessed1.06.2008).

Muñoz Santamaría, A., 2006. http://personales.alumno.upv.es/anmusan/TONI/Base.htm (accessed 30.09.2009).

Obleby, C.L., 1999. From rubble to virtual reality: Photogrammetry and the virtualworld of ancient Ayutthaya, Thailand. The Photogrammetric Record 16 (94),651–670.

Piekarski, W., 2006. 3D modeling with the Tinmith mobile outdoor augmentedreality system. IEEE Computer Graphics and Applications 26 (1), 14–17.

Portalés Ricart, C., GinerMartínez, F., Sanmartín Piquer, F., 2005. Back to the 70’s. In:Proceedings International Conference on Advances in Computer EntertainmentTechnology, Valencia (España), 15–17 June, pp. 209–212.

Portalés Ricart, C., Perales Cejudo, C.D., Cheok, A., 2007. Exploring social, culturaland pedagogical issues in AR-gaming through the live LEGO house. In: Pro-ceedings International Conference on Advances in Computer EntertainmentTechnology, Salzburg, Austria, 13–15 June, pp. 238–239.

Rottensteiner, F., Jansa, J., 2002. Automatic extraction of buildings from lidar dataand aerial images. In: Talk: Symposium of ISPRS-Comm. IV, Ottawa, Canada;8–12 July. International Archives of Photogrammetry and Remote Sensing 34(4), 569–574.

Stauder, J., 1999. Augmented reality with automatic illumination control incorpo-rating ellipsoidal models. IEEE Transactions on Multimedia 1 (2), 136–143.

Stilman, M., Michel, P., Chestnutt, J., Nishiwaki, K., Kagami, S., Kuffner, J., 2005. Aug-mented reality for robot development and experimentation. CMU-RI-TR-05-55,Robotics Institute, Carnegie Mellon University.

Tatham, E.W., 1999. Optical occlusion and shadows in a ‘see-through’ augmentedreality display. In: Proceedings 3rd International Conference on InformationVisualization, London, England, 14–16 July, pp. 128.

Wacker, F.K., Vogt, S., Khamene, A., Sauer, F., Wendt, M., Duerk, J.L., Lewin, J.S., Wolf,K.J., 2005. MR image-guided needle biopsies with a combination of augmentedreality and MRI: A pilot study in phantoms and animals. International CongressSeries 1281, 424–428.

Wagner, D., Pintaric, T., Schmalstieg, D., 2004. The invisible train: A multi-playerhandheld augmented reality game. http://studierstube.icg.tu-graz.ac.at/invisible_train/ (accessed 30.09.2009).

Waldhäusl, P., Ogleby, C., 1994. 3 × 3 Rules for simple photogrammetric docu-mentation of architecture. The International Archives of Photogrammetry andRemote Sensing 30 (Part 5), 426.

Wang, X., Dunston, P.S., 2006. Compatibility issues in Augmented Reality systemsfor AEC: An experimental prototype study. Automation in Construction 15 (3),314–326.

Wang, Y., Samaras, D., 2003. Estimation of multiple directional light sources forsynthesis of augmented reality images. Graphical Models 65 (4), 185–205.

Wilde, D., Harris, E., Rogers, Y., Randell, C., 2003. The periscope: Supporting acomputer enhanced field trip for children. Pers Ubiquit Comput 7 (3–4),227–233.

http://mxrtoolkit.sourceforge.net




http://personales.alumno.upv.es/anmusan/TONI/Base.htm









http://studierstube.icg.tu-graz.ac.at/invisible_train/








Documents

Augmented reality and photogrammetry: A synergy to visualize physical and virtual city environments