24
GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft Girish Chowdhary, Eric N. Johnson, Daniel Magree, and Allen Wu Georgia Institute of Technology, Atlanta, Georgia 30332 e-mail: [email protected], [email protected], [email protected] Andy Shein Cyphy Works, Inc., Danvers, Massachusetts 01923 Received 31 August 2012; accepted 8 February 2013 GPS-denied closed-loop autonomous control of unstable Unmanned Aerial Vehicles (UAVs) such as rotorcraft using information from a monocular camera has been an open problem. Most proposed Vision aided Inertial Navigation Systems (V-INSs) have been too computationally intensive or do not have sufficient integrity for closed-loop flight. We provide an affirmative answer to the question of whether V-INSs can be used to sustain prolonged real-world GPS-denied flight by presenting a V-INS that is validated through autonomous flight-tests over prolonged closed-loop dynamic operation in both indoor and outdoor GPS-denied environments with two rotorcraft unmanned aircraft systems (UASs). The architecture efficiently combines visual feature information from a monocular camera with measurements from inertial sensors. Inertial measurements are used to predict frame-to-frame transition of online selected feature locations, and the difference between predicted and observed feature locations is used to bind in real-time the inertial measurement unit drift, estimate its bias, and account for initial misalignment errors. A novel algorithm to manage a library of features online is presented that can add or remove features based on a measure of relative confidence in each feature location. The resulting V-INS is sufficiently efficient and reliable to enable real-time implementation on resource-constrained aerial vehicles. The presented algorithms are validated on multiple platforms in real-world conditions: through a 16-min flight test, including an autonomous landing, of a 66 kg rotorcraft UAV operating in an unconctrolled outdoor environment without using GPS and through a Micro-UAV operating in a cluttered, unmapped, and gusty indoor environment. C 2013 Wiley Periodicals, Inc. 1. INTRODUCTION Unmanned Aerial Vehicles (UAVs) and Micro UAVs (M- UAVs) have enjoyed several successes in the past decade. The role of UAVs in the modern air-force and civilian dis- aster management and homeland security applications is expected to grow. To enable closed-loop feedback control for autonomous operation, UAVs need an accurate estimate of their position and attitude with respect to the surround- ing environment. Traditionally, this has been reliably en- abled through a sensor fusion architecture that integrates position information from a GPS receiver and acceleration and angular rate measurements from inertial measurement units (IMUs) [see, e.g., Kayton and Fried (1997)]. In this inertial navigation system (INS) architecture, accelerations and angular rates from the accelerometers and gyroscopes of the IMU can be integrated forward in time, and posi- tion updates from the GPS can be fused with the solution Direct correspondence to: Girish Chowdhary, e-mail: girish [email protected]. to bound the errors that result from misalignment and bias errors of the IMU [see, for example, Christophersen et al. (2006); Wendel et al. (2006) and Kayton and Fried (1997)]. The GPS position update is critical to the functioning of such an architecture. However, GPS may not be available indoors and can suffer from obstruction and multipath er- rors and is also prone to the possibility of being jammed. Therefore, the problem of GPS-denied navigation is of great interest for M-UAVs that operate indoors and for outdoor UAVs that may need to operate in an environment where GPS may be jammed or unavailable [such as under bridges, in urban jungles, or in riverine environments (Scherer et al., 2012)]. Several researchers have been tackling the problem of GPS-denied flight using monocular vision and inertial mea- surements (see Section 1.1). The risks and challenges in this field of robotics come from the need to develop a reliable set of algorithms that generate a high-integrity navigation solution that can be used to close the control loop to enable autonomous flight. In general, there have been three key open questions: Journal of Field Robotics 30(3), 415–438 (2013) C 2013 Wiley Periodicals, Inc. View this article online at wileyonlinelibrary.com DOI: 10.1002/rob.21454

GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

  • Upload
    andy

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

GPS-denied Indoor and Outdoor Monocular VisionAided Navigation and Control of Unmanned Aircraft

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

Girish Chowdhary, Eric N. Johnson, Daniel Magree, and Allen WuGeorgia Institute of Technology, Atlanta, Georgia 30332e-mail: [email protected], [email protected], [email protected] SheinCyphy Works, Inc., Danvers, Massachusetts 01923

Received 31 August 2012; accepted 8 February 2013

GPS-denied closed-loop autonomous control of unstable Unmanned Aerial Vehicles (UAVs) such as rotorcraftusing information from a monocular camera has been an open problem. Most proposed Vision aided InertialNavigation Systems (V-INSs) have been too computationally intensive or do not have sufficient integrity forclosed-loop flight. We provide an affirmative answer to the question of whether V-INSs can be used to sustainprolonged real-world GPS-denied flight by presenting a V-INS that is validated through autonomous flight-testsover prolonged closed-loop dynamic operation in both indoor and outdoor GPS-denied environments with tworotorcraft unmanned aircraft systems (UASs). The architecture efficiently combines visual feature informationfrom a monocular camera with measurements from inertial sensors. Inertial measurements are used to predictframe-to-frame transition of online selected feature locations, and the difference between predicted and observedfeature locations is used to bind in real-time the inertial measurement unit drift, estimate its bias, and accountfor initial misalignment errors. A novel algorithm to manage a library of features online is presented thatcan add or remove features based on a measure of relative confidence in each feature location. The resultingV-INS is sufficiently efficient and reliable to enable real-time implementation on resource-constrained aerialvehicles. The presented algorithms are validated on multiple platforms in real-world conditions: through a16-min flight test, including an autonomous landing, of a 66 kg rotorcraft UAV operating in an unconctrolledoutdoor environment without using GPS and through a Micro-UAV operating in a cluttered, unmapped, andgusty indoor environment. C© 2013 Wiley Periodicals, Inc.

1. INTRODUCTION

Unmanned Aerial Vehicles (UAVs) and Micro UAVs (M-UAVs) have enjoyed several successes in the past decade.The role of UAVs in the modern air-force and civilian dis-aster management and homeland security applications isexpected to grow. To enable closed-loop feedback controlfor autonomous operation, UAVs need an accurate estimateof their position and attitude with respect to the surround-ing environment. Traditionally, this has been reliably en-abled through a sensor fusion architecture that integratesposition information from a GPS receiver and accelerationand angular rate measurements from inertial measurementunits (IMUs) [see, e.g., Kayton and Fried (1997)]. In thisinertial navigation system (INS) architecture, accelerationsand angular rates from the accelerometers and gyroscopesof the IMU can be integrated forward in time, and posi-tion updates from the GPS can be fused with the solution

Direct correspondence to: Girish Chowdhary, e-mail: [email protected].

to bound the errors that result from misalignment and biaserrors of the IMU [see, for example, Christophersen et al.(2006); Wendel et al. (2006) and Kayton and Fried (1997)].The GPS position update is critical to the functioning ofsuch an architecture. However, GPS may not be availableindoors and can suffer from obstruction and multipath er-rors and is also prone to the possibility of being jammed.Therefore, the problem of GPS-denied navigation is of greatinterest for M-UAVs that operate indoors and for outdoorUAVs that may need to operate in an environment whereGPS may be jammed or unavailable [such as under bridges,in urban jungles, or in riverine environments (Scherer et al.,2012)].

Several researchers have been tackling the problem ofGPS-denied flight using monocular vision and inertial mea-surements (see Section 1.1). The risks and challenges in thisfield of robotics come from the need to develop a reliableset of algorithms that generate a high-integrity navigationsolution that can be used to close the control loop to enableautonomous flight. In general, there have been three keyopen questions:

Journal of Field Robotics 30(3), 415–438 (2013) C© 2013 Wiley Periodicals, Inc.View this article online at wileyonlinelibrary.com • DOI: 10.1002/rob.21454

Page 2: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

416 • Journal of Field Robotics—2013

Q1: How can we update an online library of reference vi-sual features to enable inclusion/removal of featuresonline to accommodate rapidly changing visual scenesin closed-loop autonomous flight without compromis-ing the integrity of the navigation solution?

Q2: Is it possible to create a sufficiently fast and high-integrity monocular camera aided navigation solutionto enable closed-loop autonomous flight, including au-tonomous landing, of resource-constrained unstablerotorcraft UAVs using off-the-shelf hardware?

Q3: Is it possible to use monocular vision-based navigationfor indoor navigation for autonomous flight of an un-stable M-UAV with fast dynamics and with a controllerin the loop?

The main contribution of this paper is to answer theabove three open questions. In particular, it contributesto answering Q1 by presenting a monocular vision-aidedinertial navigation system for providing accurate positionand attitude estimates in a GPS-denied environment thatdoes not require a premapped feature database. Rather,a monocular vision-aided INS is presented in which fea-ture locations are simultaneously estimated and tracked toprovide position, velocity, and attitude estimates for real-time closed-loop control of unstable UAV platforms per-forming real-world missions, including waypoint flight inunmapped environments and landing on an unknown ter-rain. A Harris corner detector extracts feature points fromeach captured image, and a Mahalanobis distance is usedto match features from frame-to-frame. The main contri-bution here is the real-time reference feature managementalgorithm. In particular, extracted features are stored in anonline managed database of feature locations that ranks

the features based on how often their predicted locationsmatch the measured location. Features that are not seen aredropped and replaced with new features, and the thresholdat which features are dropped is dynamically changed. Thisenables the UAV to navigate while autonomously flying be-tween waypoints in a completely GPS-denied unmappeddynamic environment without knowing feature locationsa priori.

The paper also provides answers to Q2 and Q3 throughextensive testing of the presented methods on real-worldplatforms operating in their intended domains. To stress thevision-aided navigation system in real-world GPS-deniedsituations, multiple test runs have been performed on twovery different platforms in their expected operating envi-ronments. The first one is a medium-sized UAV (66 kg, seeFigure 1), designed to operate in unmapped outdoor en-vironments, and the second one is a small-sized (Micro)M-UAV (1 kg, see Figure 1) designed to operate in clutteredand confined unmapped indoor and outdoor environments.The data presented from an autonomous landing of the66 kg rotorcraft are to our knowledge the first demonstra-tion of GPS-denied autonomous landing with an entirelyself-contained vision-based navigation system for a vehicleof this class.

The work presented here is on a path toward enabling aseamless transition of M-UAVs from indoor to outdoor en-vironments without relying on GPS (see Q3). This is a chal-lenging problem, as the lighting and visual feature struc-tures in indoor and outdoor environments vary greatly, andit may be difficult for the integrated hardware or softwareto rapidly accommodate such changes. To our knowledge,there are currently no UAVs capable of performing such atransition without the aid of GPS. Therefore, as a first step

Figure 1. The GTMax helicopter (a) weighs 66 kg and has a rotor diameter of 3 m. For vision-based navigation, a downward-facingcamera is mounted on the nose of the vehicle. The EASE (b) weighs 1 kg and has a duct diameter of 25 cm. The vehicle is designedfor indoor and outdoor use. The camera is mounted to the rotor axle bearing underneath the vehicle.

Journal of Field Robotics DOI 10.1002/rob

Page 3: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 417

to enable such transitions, the presented system is validatedin a closed-loop autonomous waypoint following flight andlanding of unstable UAV platforms in both indoor and out-door GPS-denied environments.

1.1. Related Work

Several authors have tackled the problem of navigationand control in GPS-denied environments. Researchers haveinvestigated scanning range sensors for mapping the sur-rounding environment and used this information for nav-igation [see, for example, Achtelik et al. (2009a); Achte-lik et al. (2009b); Bachrach et al. (2011); Chowdhary et al.(2011a); Chowdhary et al. (2011b); Grzonka and Burgard(2011); Grzonka et al. (2012); Pears (2000) and Shen andKumar (2011)]. However, these sensors typically rely on theemission and reception of a signal to determine range whichis sometimes undesirable if the vehicle needs to remain un-detected. Vision sensors have a tremendous potential for lo-calization and mapping applications since they provide dataabout features such as landmarks, discontinuities (corners),and patterns in the surrounding environment, which canbe used to infer information about vehicle motion. Further-more, a camera is a standard payload on any surveillanceand reconnaissance UAV. However, with the exception ofa few [notably Achtelik et al. (2011); Ahrens et al. (2009);Blosch et al. (2010); Conte and Doherty (2009); Strelow andSingh (2004); Weiss et al. (2011) and Weiss et al. (2012)],the majority of results presented in these areas have beenapplied to ground robots where size and payload consid-erations are often not a limitation, and odometry may beused. These algorithms for extracting information from thetwo-dimensional (2D) images of an image sensor are oftencomputationally intensive, and therefore they may not befeasible for M-UAV applications. On the other hand, a nav-igation system that integrates an IMU with a vision systemcan alleviate the computational burden by allowing frame-to-frame prediction of camera motion, and it can help toresolve scale ambiguity and improve overall accuracy. Anintegrated vision-aided INS (V-INS) navigation system ispassive, and hence not easily prone to jamming or detec-tion. Due to these reasons, vision-aided INS systems havereceived a great deal of attention for closed-loop control ofUAVs.

The use of vision sensors for simultaneous localizationand mapping (SLAM) is a very active research area, andseveral paradigms have emerged in recent years: Keyframe-based SLAM (Klein and Murray, 2007), EKF-SLAM (Davi-son, 2003; Davison et al., 2007), and particle filter SLAM(Eade and Drummond, 2006). The algorithms presented inthese papers each provide a method of estimating the poseof a monocular camera and the location of features in theenvironment, and they do so in real time. Additional ex-tensions to these paradigms improve robustness and scal-ability, such as inverse depth parametrization presented in

Civera et al. (2008) and relative bundle adjustment pre-sented in Sibley et al. (2010). The results in these paperspresent open-loop data from cameras carried or driven byan operator; in particular, the SLAM-provided state esti-mates are not used for closed-loop control on vehicles withfast dynamics. Additionally, the focus of Klein and Murray(2007) and Eade and Drummond (2006) was to highlightSLAM techniques for mapping, therefore they place a pri-ority on the accuracy, correctness, and size of the map. Dilleet al. (2009) and Kitt et al. (2011) showed that visual odome-tery techniques can result in navigation solutions compara-ble to high-end GPS-INS systems on ground robots, estab-lishing the feasibility of this approach for ground robotics.

1.1.1. Monocular Vision-aided Inertial Navigation forGPS-denied Operation of Aerial Platforms

The problem of UAV vision-aided navigation in aerialrobotics, however, requires that the system have differentpriorities. Aerial vehicles have fast dynamics, are resource-constrained, and rotorcraft/VTOL UAVs are unstable inposition and/or attitude. Therefore, closed-loop controlrequires accurate and fast vehicle state estimation. Further-more, since cameras are often rigidly mounted to the vehi-cle, errors in the navigation algorithm output can be exac-erbated by the movement of the vehicle in response to theerroneous output, leading to eventual instability and lossof vehicle. This results in a deep potentially destabilizingcoupling between the navigation and control problem, inwhich a poor navigation solution can rapidly destabilizethe entire system. This coupling makes it desirable to in-corporate inertial measurements from the IMU rather thanrelying on purely camera-based monocular SLAM, such asMonoSLAM (Davison et al., 2007). Additionally, UAVs typi-cally have strict payload and power constraints that limit thecomputational power available onboard, limiting the classof vision-based navigation techniques that can be practi-cally applied. In the presence of these limitations and giventhe current state of available computational power onboardUAVs, the best course of action seems to be to first reli-ably solve the problem of closed-loop control of UAVs withvision-aided inertial navigation before delving deeper intoimproving the global accuracy of the SLAM-built map us-ing advanced SLAM techniques. Therefore, the problem ofvision-aided inertial navigation for closed-loop control isseen as a special subset of the overall vision-based SLAMproblem.

Because of the difficulty of simultaneously mappingfeatures while providing accurate state feedback to the ve-hicle controller, these steps have been separated in the past.This method of first-mapping-then-localizing has useful ap-plication for vehicles needing to maintain stability duringperiodic GPS loss, however it does not easily generalize tosituations in which both these tasks need to be done simul-taneously. Madison et al. presented an Extended Kalman

Journal of Field Robotics DOI 10.1002/rob

Page 4: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

418 • Journal of Field Robotics—2013

Filter (EKF)-based design in which the vehicle states wereestimated along with the inertial locations of features in3D space (Madison and Bottkol, 2007). They presented amethod in which a vehicle with a traditional GPS-aidedINS estimates the locations of features in flight when GPSwas active, and then they used these features to aid INS inthe absence of GPS. Features were selected and tracked us-ing a Lucas-Kanade feature tracker. Simulation results for avehicle that momentarily loses GPS were provided.

More flexible navigation systems have been devel-oped using SLAM techniques. Langelaan used an unscentedKalman filter for simultaneously estimating vehicle states aswell as feature locations in inertial space (Langelaan, 2007).A Mahalonobis norm was used as a statistical correspon-dence for data association from frame-to-frame for eachestimated feature. New feature locations were initializedusing a projection onto the ground plane. Simulation re-sults for a UAV navigating through a 2D environment werepresented, however no flight test results were reported.

Fowers et al. implemented a Harris feature detectoralong with a random sample consensus (RANSAC) algo-rithm for the correspondence of features in an FPGA pro-cessor (Fowers et al., 2008). Their architecture provideddrift corrections to an INS for stabilizing a quad-rotor ve-hicle for short-term hover in an indoor flight environment.By assuming that the vehicle remains relatively level, theRANSAC algorithm was used to provide estimates of thetranslation, yaw rotation, and change in scale. RANSAC isa model-fitting algorithm in which a collection of points arefitted against a model, and the points are sorted into groupsof inliers and outliers. These estimated values were thenused to provide IMU-drift correction measurements.

Ahrens et al. (2009) presented a vision-aided navigationsystem for drift-free navigation in indoor environments.They combined emulated inertial measurements (using amotion capture system) with monocular vision sensors.Flight-test results in indoor environment were presented.

Recently, navigation in the presence of intermittent GPSsignals was explored by Scherer et al. (2012). This work usesa stereo camera sensor along with a laser range finder andintermittent GPS to map riverines, in the presence of ob-stacles such as trees, bridges, and embankments. The re-sults are presented from a surrogate data collection systemmounted on a small boat, and they reveal an impressiveability to navigate in adverse conditions. While their goalsare similar to those presented in this paper, our work isdistinguished by our use of a monocular camera sensor,which is a more common payload and can be more prac-tical for lightweight vehicles. Furthermore, the flight-testresults of the stereo system, presented by Haines (2011), donot demonstrate the extended flight distances and durationsdemonstrated in our paper.

Wu et al. developed a method for fusing feature in-formation from a monocular vision sensor in an extendedKalman filter framework (Wu and Johnson, 2010; Wu et al.,

2012). The approach in those papers relied on tracking fea-tures whose locations were estimated when GPS was ac-tive. When GPS was inactive, they demonstrated that thismethod ensures bounded hover of a rotorcraft UAV throughflight-test results. However, this method could not sustainwaypoint-based flight without a priori knowledge of fea-ture locations, and it could not estimate the position of newfeature points without GPS. The key missing element wasan algorithm for safely selecting and removing features inan online feature-reference database. This is remedied hereby creating a real-time reference feature management al-gorithm that ranks the features based on how often theirpredicted locations match the measured location. Featuresthat are not seen consistently over a threshold number ofimages are dropped and replaced with new features. Thisenables the UAV to navigate while autonomously flying be-tween waypoints in a completely GPS-denied unmappeddynamic environment without knowing feature locations apriori.

Weiss et al. (2011) described an impressive monocularslam implementation and LQR controller. The SLAM algo-rithm used is based on the parallel tracking and mapping(PTAM) algorithm (Klein and Murray, 2007), a keyframe-based method using periodic global map optimization andlocalization from the resulting map. The results presentedshow that the vehicle is capable of waypoint following flightand accurate position hold. These results are expanded inAchtelik et al. (2011) and Weiss et al. (2012). Several im-portant modifications are made to the original PTAM algo-rthm, including throwing out distant keyframes to allow thevehicle to cover greater distances and the implementationof a speed estimation system for minimizing drift duringmap initialization. The resulting waypoint-following resultsdemonstrate a highly capable autonomous system able tooperate in indoor and outdoor environments. However, itis reported in Weiss et al. (2011) that the system resultsin overshoots during waypoint-following flight. Further-more, it was reported that the algorithm could overwhelmavailable computational resources in the bundle adjustmentphase, resulting in limited computational power availablefor other onboard activities in resource-constrained envi-ronments onboard UAVs. In addition, their approach resultsin a slow drift in attitude, which can be a concern for longduration flights. Hence, while the results in this work areimpressive, this approach does not conclusively provide acomprehensive solution to the vision-aided navigation andcontrol problem.

The V-INS system proposed in this paper differs fromthat of Weiss and collaborators (Achtelik et al., 2011; Weisset al., 2011, 2012) in the fact that a fundamentally differentSLAM approach is used: we use EKF-SLAM. The benefitsof EKF-SLAM are that it makes it relatively straightforwardto integrate measurements from inertial sensors leading toour V-INS architecture, and that it fits well with commonlyused aerospace navigation and filtering frameworks which

Journal of Field Robotics DOI 10.1002/rob

Page 5: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 419

tend to be INS-based (Kayton and Fried, 1997). There is anongoing debate in the SLAM community about the supe-riority of EKF-SLAM versus PTAM and keyframe-basedalgorithms. Attempts have been made to resolve whichapproach is better (Strasdat et al., 2010), but the verdictis far from conclusive (IMU integration is not considered,and filtering is acknowledged as a possible option for low-computational-power systems, of which our system is anexample). We believe that this successful demonstration ofan EKF filtering-based approach will constructively con-tribute to this debate, especially considering how few suc-cessful implementations of vision-aided SLAM navigationthere are for rotorcraft UAVs (Kendoul, 2012). Furthermore,we do feel that a combination of keyframe-based techniquesand EKF-based multisensor fusion based implementationssuch as that used in this paper can potentially result in evenmore powerful algorithms as available onboard computa-tional power increases.

1.1.2. UAV vision-aided autonomous landing inGPS-denied environments

Several previous works have tackled the challenge of au-tonomous landing of a UAV in unknown GPS-denied envi-ronments. Leading techniques for autonomous landing siteselection and landing, such as in Chamberlain et al. (2011),rely on using GPS-INS navigation solutions and scanningrange sensors for landing site selection. Frietsch et al. pre-sented an application of vision to the hovering and landingof a quad-rotor UAV (Frietsch and Trommer, 2008). In thatpaper, it was assumed that the camera’s motion was domi-nated by rotation and that the scene was planar. Simulationresults of the method were provided. Autonomous landingof rotorcraft in a GPS-denied environment was achieved byMerz et al. (2004). The system required the use of a spe-cially marked landing pad, and a pan-tilt camera was usedto maintain a view of the pad over long distances.

A related application for vision-aided navigation is thatof spacecraft descent and landing. Mourikis et al. presentedan application of vision-aided navigation to the problemof planetary landing (Mourikis et al., 2009). Images froma downward-facing camera were processed with a Harriscorner detector to generate trackable features. These fea-tures were then matched to features with known 3D po-sitions, generated from a topographical map. In addition,features are matched frame-to-frame to generate velocitymeasurements, which extended the system to handle loweraltitude than was possible with map-matching. The positionand velocity measurements are integrated with IMU datain an EKF framework. The system was validated with im-age sequences and IMU data from a sounding rocket, whichwas post-processed to estimate the vehicle location duringdescent.

Theodore et al. (2006) presented a flight-tested vision-aided autonomous landing system for landing at unpre-

pared sites. Their work probably comes the closest to whathas been achieved here, and highlights the key difficultyin performing vision-aided landing: ensuring a consistentvision-aided navigation solution from flight altitude [e.g.50 m Above Ground Level (AGL)] to ground (0 m AGL).However, they relied on GPS until reaching an altitude of12 m AGL, after which monocular SLAM-based pseudo-GPS was used until 2 m AGL, from where the landing wasperformed using only inertial measurements until weighton wheels switch was triggered. In contrast to their work,the vision-aided navigation system presented in our workovercomes the difficulties in using a vision-aided solution inclosed-loop control to perform landings from hover altitude(50 m AGL) to landing (0 m AGL).

1.2. Contributions and Overview of Our Work

In summary, most of the works discussed above were lim-ited because to (1) vision-aided navigation was not used inGPS-denied closed-loop waypoint-following flight and/orGPS-denied landing, (2) they were validated through sim-ulation only, (3) most results were implemented on groundvehicles, which have slower and stable dynamics and anyclosed-loop flight tests were conducted in controlled envi-ronments. This paper contributes to the state-of-the-art invision-aided GPS-denied navigation by lifting these limi-tations and providing answers to the three open questionsposed in Section 1. An overview of the main contributionsis as follows:

� A monocular V-INS for GPS-denied navigation and real-time closed-loop control of unstable UAVs in both out-door and indoor environments is presented.

� A novel feature database management algorithm is pre-sented, which ranks features based on a confidence indexand removes features if they fall below a dynamic thresh-old related to the number of feature correspondences.

� Flight-test results are presented in which thenavigation solution from the presented V-INS for real-time long-term closed-loop control of unstable resource-constrained rotorcraft platforms is presented in real-world situations. The algorithms are validated on twoclasses of rotorcraft UAVs in closed-loop control: a66 kg helicopter UAV operating outdoors and a 1 kgsmall ducted-fan UAV operating in cluttered unmappedenvironments.

We begin by describing a monocular camera modelin Section 2. The reference frames utilized are discussedin Section 3. In Section 4, some preliminary details of theEKF-based implementation are provided. In Section 5, theV-INS algorithm is presented along with the feature locationdatabase management algorithm. Flight-test results on anoutdoor 66 kg rotorcraft UAV and an indoor M-UAV arepresented in Section 6. The paper is concluded in Section 7.

Journal of Field Robotics DOI 10.1002/rob

Page 6: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

420 • Journal of Field Robotics—2013

xy

z

c

c

c

Image Plane

bc

(A, B, C)

Focal Length (f)

Figure 2. Camera perspective projection model used for re-lating the 3D position to the position in 2D images. The point(A,B,C) is projected onto the camera image plane to the point(b, c).

2. PINHOLE CAMERA MODEL

A perspective projection model of a pinhole camera allowsthe position in a 2D camera image to be inferred from the3D position as shown in Figure 2 (Wu and Johnson, 2008).The model projects an arbitrary point (A, B, C) to a pixelpoint (b, c) on the image plane (the camera image) accord-ing to b = f B

Aand c = f C

A, where f is the focal length of

the camera. However, in reality, the focal lengths in the hor-izontal and vertical directions may be different. So theseexpressions will be rewritten as

b = fx

B

A, (1)

c = fy

C

A, (2)

where fx and fy are the focal lengths in the horizontal andvertical directions, respectively. These focal lengths can becomputed from knowledge of the width and height of thecamera image plane (w and h) and from the horizontal andvertical fields of view (γx and γy) according to

fx = w

2 tan(

γx

2

) , fy = h

2 tan( γy

2

) (3)

Additionally, a camera calibration is performed to removethe effects of distortion.

3. REFERENCE FRAMES

In the following, two standard aerospace right-handed co-ordinate systems are defined: the inertial frame (denotedby the superscript i) is a local frame that is aligned tothe local north, east, and down directions, and the bodyframe (denoted by the superscript b) is aligned such thatthe x axis points forward and the z axis points downwards

(Kayton and Fried, 1997). Let q = [q1, q2, q3, q4]T denote therotation quaternion, where q1 defines the rotation angle and[q2, q3, q4] represents the axis of rotation. Then the orthonor-mal rotation matrix operator Li→b transports vectors fromthe inertial to the body frame:

Li→b =⎡⎣q2

1 + q22 − q2

3 − q24 2(q2q3 + q1q4) 2(q2q4 − q1q3)

2(q2q3 − q1q4) q21 − q2

2 + q23 − q2

4 2(q3q4 + q1q2)2(q2q4 + q1q3) 2(q3q4 − q1q2) q2

1 − q22 − q2

3 + q24

⎤⎦ .

(4)

It follows that the orthonormal rotation matrix for trans-porting vectors from the body to the inertial frame is givenby Lb→i = LT

i→b.

The camera reference frame (denoted by superscript c)has its origin at the camera’s principal point with the xc

axis along the camera’s optical axis and the zc axis point-ing downward. The vector components can be transformedfrom the body frame to the camera frame by defining thepan (ψc), tilt (θc), and roll (φc) angles, and using the rotationmatrix:

Lcb =

⎡⎢⎣1 0 0

0 cos φc sin φc

0 − sin φc cos φc

⎤⎥⎦⎡⎣cos θc 0 − sin θc

0 1 0sin θc 0 cos θc

⎤⎦

×⎡⎣ cos ψc sin ψc 0

− sin ψc cos ψc 00 0 1

⎤⎦ . (5)

It is usually the case that the camera is not located at theorigin of the body reference frame. In this case, appropriatetranslation needs to be applied.

4. THE EXTENDED KALMAN FILTER

The EKF is a widely used Bayesian filtering tool for ob-taining suboptimal solutions to nonlinear filtering problems[see, for example, Gelb (1974) and Kayton and Fried (1997)].A brief review of the EKF is provided here for complete-ness. The EKF employs instantaneous linearization at eachtime step to approximate the nonlinearities; the propaga-tion of the error covariance matrix is assumed to be linear.The architecture presented here is of the mixed continuous-discrete type. Let x denote the state of the system, u denotean input to the system, y denote the output, z denote thesampled output, w denote additive process noise, v denoteadditive measurement noise, and consider the followinggeneric representation of the system dynamics:

x(t) = f (x(t), u(t)) + Fw(t), x(t0) = x0, (6)

y(t) = g(x(t), u(t)), (7)

z(k) = y(k) + Gv(k). (8)

Journal of Field Robotics DOI 10.1002/rob

Page 7: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 421

In the above equation, k denotes the discrete sampling timewhen a measurement is available. Note that different sen-sors may update at different rates. Furthermore, it is as-sumed that f and g are differentiable functions. The ma-trices F and G determine the variance of the process andmeasurement noise, which are assumed to be Gaussian. TheEKF algorithm consists of a prediction and a correction step.In the prediction step, the nonlinear process model is inte-grated forward from the last state estimate x(k − 1) to yieldthe predicted state x(k) until a measurement is available attime k. The error covariance matrix P is also propagated toyield the predicted error covariance matrix P by linearizingthe process model using the Jacobian matrix A = [

∂f

∂x

]x=x(k)

around the instantaneous state estimate from the previousstep x(k − 1). This results in the following approximationfor the error covariance, which is also integrated forwardfrom P (k − 1) to yield P (k):

˙P ≈ AP + PAT + Q, (9)

where Q = FF T . The EKF correction step occurs when asensor measurement has been received. The expected sys-tem output y(k) is predicted using the predicted state es-timate x(k) and the nonlinear measurement model g(x, u).The measurement model Jacobian matrix C(k) = [

∂g

∂x

]x=x(k)

is computed around the predicted state x(k) and used toyield the following correction equations:

K(k) = P (k)CT (k)[C(k)P (k)CT (k) + R]−1, (10)

x(k) = x(k) + K(k)[z(k) − y(k)], (11)

P (k) = [I − K(k)C(k)]P (k). (12)

In the above equation, R = GGT ; it is assumed that G issuch that R is positive-definite (Gelb et al., 1974).

5. VISION-AIDED INERTIAL NAVIGATIONALGORITHM

In this section, we describe the vision-aided inertialnavigation algorithm presented in Algorithm 1 andFigure 3. The algorithm simultaneously estimates the loca-tion of visual feature points and the state of the vehicle. Theidea is to use a monocular camera to detect feature pointsusing a Harris corner detection algorithm, predict frame-to-frame transitions of the feature points, use the estimated

Uncorresponded features

Extended Kalman Filter

Feature Correspondence

Harris Corner feature

locations

Estimated vehicle state

Predicted feature point locations

Residuals

New Point Init

Feature Point

Database

Harris Corner

Detector

Camera

Pinhole camera model

Sonar IMU

State and covariance matrices

Figure 3. A block diagram describing the navigation system. Pixel locations of features correspond to predicted locations ofdatabase points. The resulting residuals are fused with inertial data and sonar altimeter data to estimate the vehicle state, featurelocations, and associated covariance matrices. Uncorresponded features can be used to update the database as current points leavethe field of view. The resulting architecture simultaneously estimates feature point locations and vehicle states without requiringthe vehicle to maintain view of particular landmarks.

Journal of Field Robotics DOI 10.1002/rob

Page 8: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

422 • Journal of Field Robotics—2013

feature point locations to populate and update online adatabase of feature point locations, and track the movementof the feature points in the image plane to bind the IMUdrift, estimate IMU bias, and correct for initial misalignmenterrors. Therefore, the navigation algorithm fuses measure-ments from inertial sensors (accelerometer and gyroscopes),a monocular downward-facing camera, and an altitude sen-sor (such as a sonar altimeter) to form an accurate estimateof the vehicles state to enable stable hover and waypoint-following flight. For this purpose, it is necessary to estimatethe following states: the vehicle’s position ip = [x, y, z]T ,

velocity iv = [vx, vy, vz]T , attitude quaternion q = [q1,

q2, q3, q4]T , and the IMU acceleration and gyroscope biasesba, bg ∈ �3. It is beneficial to use a minimal representation ofthe attitude quaternion for describing the orientation of theaircraft (Lefferts et al., 1982). This is achieved by defining anerror quaternion as δq ≈ [1, s]T such that δq ⊗ q = q, where⊗ is the quaternion product operator. The error quaternionis propagated during EKF updates, and is used to correct theattitude estimates after each measurement update. Note thatin practice the error quaternion takes on very small values,therefore the update equation for error quaternions can be

Journal of Field Robotics DOI 10.1002/rob

Page 9: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 423

approximated by ˙s = 0. The vector s ∈ �3 should be trackedas the minimal representation of the attitude error. The ac-celerometer measurements ba = [ax, ay, az]T and the gyro-scope measurements bω = [p, q, r] from the IMU are mod-eled to account for sensor biases as follows: ba = araw − ba

and bω = ωraw − bg . The navigation filter is designed to es-timate the biases ba ∈ �3 and bg ∈ �3 of the accelerometerand gyroscope, respectively. This is important, as measure-ments from low-cost off-the-shelf IMUs are often subject toan unknown and variable bias.

Along with the vehicle states, the filter also needs to si-multaneously estimate the position of the j th feature pointsin the inertial frame, ipfj

= [xfj, yfj

, zfj]T . The tracked and

corresponded feature points are stored in a database thatcontains a maximum of N slots.

In combination, the state vector to be estimated by thenavigation filter is x(t) ∈ �15+3×N and is given as

x(t) = [s p v ba bg pf ]T, (13)

where a caret denotes estimated variables.Next the process models and measurement models are

described.

5.1. Navigation Algorithm: Process Model

The following nonlinear process model is used for predict-ing state estimates:

˙ba = 0, ˙bg = 0, ˙s = 0, (14)i ˙pfj

= 0, j = 1, . . . , N, i ˙p = v,

i ˙v = Lb→i(ba − ba). (15)

The following equation is used to predict the quater-nion estimate using bias-corrected measurements from thegyroscopes:

˙q = 12

⎡⎢⎢⎣0 −p −q −r

p 0 r −q

q −r 0 p

r q −p 0

⎤⎥⎥⎦ . (16)

The error covariance P is propagated using the follow-ing equation:

P = AP + PAT + Q, (17)

where A is the Jacobian of the process model with respect tothe states, and Q is the process covariance matrix. Appro-priately initializing the covariance matrix is important. Thistopic has been dealt with in detail in Wu et al. (2011). The fol-lowing closed-form equations are the nonzero componentsof the Jacobian matrix A. These well-known equations are

repeated here to facilitate real-time implementation:

∂i ˙p∂i v

= I3×3,∂i ˙v∂s

= −Lb→iba,

∂i ˙v∂ba

= −Lb→i ,

∂ ˙s∂s

= −bω,∂ ˙s∂bg

= −I3×3, (18)

where the tilde symbol denotes a skew symmetric matrixcomposed of the vector components such that for the vectora = [a1 a2 a3], a is defined by

a =⎡⎣ 0 −a3 a2

a3 0 −a1

−a2 a1 0

⎤⎦ . (19)

5.2. Navigation Algorithm: Measurement Model

The measurement model g(x) here is used to predict theexpected measurement using the states propagated throughthe process model. In the following, the relative position ofthe feature point j in the inertial frame will be given asi rj = ipfj

− ip. The relative vector rj will be denoted in thecamera frame as crj = [Xfjc

, Yfjc, Zfjc

]T , in the body frameas br = [Xfjb

, Yfjb, Zfjb

]T , and in the inertial frame as i r =[Xfji

, Yfji, Zfji

]T .For each feature point, the expected measurement is

computed as z = [X, Y ]T , where from the relations given inEqs. (1) and (2) we have

Xj = fx

Yfcj

Xfcj

, Yj = fy

Zfcj

Xfcj

. (20)

This nonlinear measurement model needs to be lin-earized at every time step. This is achieved by computingthe Jacobian of the measurement model using the follow-ing closed-form equations (equations given here explicitlyto facilitate real-time implementation):

∂z

∂i p=

(∂z

∂cr

) (∂cr

∂i p

), (21)

∂z

∂cr=

⎡⎣ ∂X

∂Xfcj

∂X

∂Yfcj

∂X

∂Zfcj

∂Y

∂Xfcj

∂Y

∂Yfcj

∂Y

∂Zfcj

⎤⎦ , (22)

∂X

∂Xfcj

= −fx

Yfcj

X2fcj

= − X

Xfcj

,

× ∂X

∂Yfcj

= fx

Xfcj

,∂X

∂Zfcj

= 0, (23)

∂Y

∂Xfcj

= −fy

Zfcj

X2fcj

= − Y

Xfcj

,∂Y

∂Yfcj

= 0,

× ∂Y

∂Zfcj

= fy

Xfcj

, (24)

Journal of Field Robotics DOI 10.1002/rob

Page 10: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

424 • Journal of Field Robotics—2013

Inserting Eqs. (23) and (24) into Eq. (22) gives

∂z

∂cr= 1

Xfcj

[−X fx 0−Y 0 fy

]. (25)

The derivative of the camera frames relative location ofthe feature point with respect to vehicle position is given by

∂cr

∂i p=

∂(pfcj

−c p)

∂ip

=∂pfcj

∂i p−

c∂p

∂i p

= 0 − ∂(Li→c

i p)

∂ip

= −Li→c. (26)

The partial derivatives of the measurement vector withrespect to the attitude quaternions are similarly computedas

∂z

∂R=

(∂z

∂cr

) (∂cr

∂R

),

=(

∂z

∂cr

)∂

(Lb→c

br)

∂R,

=(

∂z

∂cr

)Lb→c

(∂br

∂R

), (27)

where ∂b r

∂R= br.

The partial derivatives with respect to the feature pointlocations are calculated as

∂z

∂i pfj

=(

∂z

∂cr

)(∂cr

∂i pfj

), (28)

∂z

∂cr= 1

Xfcj

[−X fx 0−Y 0 fy

], (29)

∂cr

∂i pfj

= ∂(pfc

− cp)

∂ipfj

= ∂cpfj

∂i pfj

− ∂cp

∂ipfj

= ∂(Li→c

i pfj

)∂ipfpj

− 0

= Li→c. (30)

The nonlinear and linearized measurement models areused in the Kalman filter update equations presented inSection 4. Note that the feature point update is performedsequentially, which drastically reduces computational loadby requiring N inversions of 2 × 2 matrices, rather than

a single inversion of an 2N × 2N matrix, and assumes astatistical independence of the feature points.

5.3. Feature Point Detection and Tracking

In the literature, there are many methods for the detectionof suitable features in an image and tracking those featuresacross multiple frames. Examples of combined feature de-tectors and trackers are the Kanade-Lucas-Tomasi (KLT)tracker (Tomasi, 1991) and Lowe’s Scale-Invariant FeatureTransform (SIFT) (Lowe, 1999). Additionally, algorithms areavailable for feature detection alone, such as the Harris andStevens combined corner and edge detector (Harris andStephens, 1988). Of course, feature tracking must then beperformed by some other means.

The system presented here uses a Harris and Stevenscorner detector for feature point detection, and trackingis performed by corresponding new measurements to thedatabase of stored points using a Mahalanobis distance. Theadvantages of this method are its computational efficiencyand its natural relationship to the statistical informationavailable through the use of the Kalman filter.

The Harris and Stevens corner detector uses a localsecond-order moment matrix of the intensity gradients todetermine the directions of large intensity gradients. Largegradients in two directions indicate a corner, while smallgradients indicate a region of uniform intensity. Let I (u, v)be the intensity of the image at pixel location (u, v). Theimage gradients in the x and y directions at (u, v) are givenby

Ix(u, v) = I (u + 1, v) − I (u − 1, v)2

, (31)

Iy(u, v) = I (u, v + 1) − I (u, v − 1)2

. (32)

The moment matrix for pixel location (b, c) is calculatedas

M(X, Y ) =∑

(u,v)∈W

w(u, v, X, Y )[

I 2x IxIy

IxIy I 2x

](u, v), (33)

where w(u, v, X, Y ) is a Gaussian kernel around image point(X, Y ) given by

w(u, v, b, c) = exp(

− (u − X)2 + (v − Y )2

2σ 2

)(34)

and W defines the window around (X, Y ) over which thesummation occurs. In this system, summation occurred overa 3 × 3 window.

The corner score is calculated from this matrix,

Mc = det(M) − κ [Tr(M)]2 (35)

= αβ − κ [α + β]2, (36)

where α and β are the eigenvalues of M . Therefore, thecorner detector gives high scores for locations with both

Journal of Field Robotics DOI 10.1002/rob

Page 11: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 425

eigenvalues large without performing an explicit eigen-value decomposition.

During feature extraction, the image is divided intobins and a max number of features per bin is set to ensurethat features are spread over the image. In addition, a min-imum distance between features is required. The extractedHarris corners are sorted by corner score before being usedin the navigation algorithm.

Feature tracking is performed by corresponding mea-sured features to the predicted location of features in thedatabase. The measurement is considered to correspond toa stored point if its Mahalanobis distance to the stored pointis minimal with respect to all other pairings and is belowa threshold. The Mahalanobis distance is a Euclidian dis-tance weighted by the covariance of the predicted measure-ment. The Mahalanobis distance between measurement zi

and database point pfjis given by

Zij = eTij (CjPCT

j + R)−1eij , (37)

eij � zi − gj (x), (38)

where gj (x) is the nonlinear measurement model for pfj,

Cj is the linearized measurement matrix for point pfj, P is

the state covariance matrix, and R is the measurement noisecovariance.

If a measurement zi does not meet the maximum Ma-halanobis distance threshold for any database point, it isconsidered to be a measurement of no currently stored pointand is uncorresponded. Uncorresponded points can be usedto initialize new database points, as described in the follow-ing section.

5.4. Adding and Deleting Elements of the FeaturePoint Location Database

Typical operational scenarios for UAVs, such as aggressivemaneuvers and following position commands, make it im-possible for a fixed camera to maintain the view of a fixed setof points. Therefore, it is necessary to develop a method of

initializing new database points online and remove pointsfrom the database that are no longer in view.

New points are initialized from uncorresponded fea-tures, and features with the highest corner score are initial-ized first. Because of the lack of depth information from asingle monocular image, additional information is neces-sary to estimate the measurements’ three-dimensional co-ordinates from their two-dimensional image plane location.In many common environments for vehicles with a cameraaligned with the body z-axis, it is a reasonable assumptionthat the points lie on a plane at a distance of the altitude ofthe vehicle. This assumption would be valid, for instance,for a vehicle flying indoors over an uncluttered floor, or anoutdoor vehicle flying over a plain. By initializing pointswith a suitable uncertainty in altitude, points that are in factnot on the ground plane will converge to the correct altitudeif they remain in view (Wu and Johnson, 2010). Inverting thecamera model presented in Section 2, we get

Lc→i

⎡⎣ 1b/fx

c/fy

⎤⎦Xfji=

⎡⎣Xfji

Yfji

Zfji

⎤⎦ = i rj . (39)

Distance Xfjiis the altitude of the camera, and image

plane locations b and c and focal lengths fx and fy areknown. The covariance of the new feature is set to a fixedvalue scaled by the distance of the vehicle to that feature:

Pfj= Lc→i

⎡⎣R0||i rj ||2 0 00 R0||i rj ||2/f 2

x 00 0 R0||i rj ||2/f 2

y

⎤⎦LTc→i ,

(40)where Pfj

is the 3 × 3 block within the full covariance ma-trix P corresponding to the covariance of point fj withitself, and R0 is a scalar initialization value. All other cross-correlation terms in P are set to zero.

One of the key contributions of this work is to present anovel method for online management of the database offeature locations. In particular, the addition of new fea-tures to the database based on the number of corresponded

0 2 4 6 8 100

5

10

15

t (s)

Poi

nt C

orre

spon

denc

es

AverageMin/Max

(a)

0 2 4 6 8 100

5

10

15

t (s)

Poi

nt C

orre

spon

denc

es

AverageMin/Max

(b)

Figure 4. The number of feature correspondences over time. The left plot data were gathered using the new dynamic thresholdingpresented in this paper. The right plot uses a fixed threshold and is shown for comparison. After the change in image texture att = 1, the proposed method more quickly removes old points from the database.

Journal of Field Robotics DOI 10.1002/rob

Page 12: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

426 • Journal of Field Robotics—2013

features ensures that the database is quickly refilled withnew points when a large number of features are lost.

Feature removal from the database is determined bycomparing a “confidence index” of the current databasefeatures with the initialization confidence index of a newfeature. The confidence index for current database featuresis limited to a minimum and maximum of 0 and 100, andit is incremented for every iteration when a feature is cor-responded to a measurement, and decremented if it is notcorresponded. The index can be thought of as a measureof how often a database feature is used (corresponded) fornavigation in the recent past. The initialization confidenceindex is related to the number of database features that werecorresponded on the current iteration. A database featureis replaced if its confidence index falls below the initializa-tion confidence index. When few features are corresponded,the initialization confidence index is high, and features arereplaced quickly. When many features are corresponded,the initialization confidence index is low and features arereplaced slowly. The initialization confidence index repre-sents a dynamic point removal threshold modified by thenumber of correspondences in the current iteration.

To understand the effect of this initialization method,let us consider a few cases. First, consider a slow translationof the vehicle when there are strong features from the im-age, and most of the database features are matched. As thevehicle translates, some features will leave the frame andtheir confidence index will drop. Since most features arestill matched, the features leaving the frame are replacedslowly, because the confidence index takes time to fall be-low the initialization confidence index.

Now consider the case when a large change in pitch atti-tude has occurred, and for a short time no database featurescan be seen. Therefore, the initialization confidence indexis very high and it does not take long for some features tofall below this value and be replaced. As these new fea-tures are seen in later iterations, however, the initializationconfidence index lowers, and features are not replaces asquickly. The best of the old features are maintained longer,and can be used again if the vehicle returns to the originalorientation.

In this way, the feature database is quickly replacedif the features visible through the camera are not beingmatched with measurements, but the features are replacedin such a way that the most historically reliable features aresaved longest and can be used if they come back into view.This property of the algorithm improves reliability for plat-forms subject to abrupt attitude changes or changes in theenvironment, such as helicopters.

To demonstrate the effectiveness of the proposedmethod, it is compared to a fixed threshold feature removalmethod. The fixed threshold method is implemented by set-ting the initialization confidence index to 50 regardless ofthe number of features corresponded. This fixed thresholdmethod is equivalent to the method proposed in Davison

Figure 5. The ground station visualization of a full 16 min(40% of aircraft endurance) flight trajectory while using vision-based navigation. GTMAX flight test data presented in thispaper, except for the landing maneuver, are exerpted from thisflight. The navigation output is shown in red, and GPS datafrom the flight are show in blue, for comparison. Note that theimage of the vehicle is not drawn to scale.

(2003) and Davison et al. (2007), with the number of itera-tions equal to 50 and the percent correspondence equal to50%.

The simulated vehicle was flown over a highly texturedsurface that was instantaneously changed to a low texturedsurface. This is similar to the effect of instantaneous light-ing changes, such as a light being switched off in a room.The texture removal was performed eight times for eachmethod, and the average, minimum, and maximum num-ber of feature correspondences is shown in Figure 4. It can beseen that the dynamic threshold does a better job maintain-ing a steady number of consistent features throughout the

Journal of Field Robotics DOI 10.1002/rob

Page 13: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 427

0 20 40 60 80 100 120 140 160 180 200

−80

−60

−40

−20

0

time (s)

x (m

)

gps positionnav positioncommand

0 20 40 60 80 100 120 140 160 180 200

−80

−60

−40

−20

0

y (m

)

time (s)

Figure 6. Comparison of vision-based navigation position states (red dashed line) with GPS location data (blue solid line) of theGTMAX during translational commands. GPS data were used in navigation solution for altitude only. The commands were givenas indicated by the free dash-dot line. The vehicle tracked the commanded positions with 7 m drift of the vision-based navigationsolution from GPS position data. These results are part of a 16 min flight.

test. When the texture is removed, the fixed threshold takesa longer time to throw out the now unobservable features,as the confidence index for these points falls. In the dynamicthresholding case, the threshold is raised when few featurescorrespond, and so unseen features are removed quickly.The dynamic threshold method allows the navigation algo-rithm to rapidly adapt to the dynamic environment.

6. RESULTS

Validation of the Kalman Filter vision-based navigation sys-tem was conducted on multiple unmanned aerial vehiclesof different sizes, operating in indoor and outdoor condi-tions. The use of different vehicles illustrates the robustnessand portability of the system design. The larger of the vehi-cles is the GTMAX (Johnson et al., 2004), a 66 kg modifiedYamaha RMAX helicopter UAV with custom avionics andflight software designed by the Georgia Institute of Technol-ogy Unmanned Aerial Vehicle Research Facility (UAVRF).Also, the navigation system was tested on the EASE, a 1 kg

ducted fan vehicle, in an indoor environment. Both vehi-cles used the same baseline adaptive controller described inJohnson and Kannan (2005).

Simulation results presented below were generated us-ing the Georgia Tech UAV Simulation Tool (GUST). TheGUST software package combines a high-fidelity vehicleand environment model, onboard flight control software,and ground station software. The vehicle model is a six-rigid-body degree of freedom model with additional engineand rotor dynamics. The vehicle model simulates sensornoise, delay, location, orientation, and actuator dynamicsand saturation.

6.1. GTMAX Helicopter

The GTMAX helicopter UAV is a modified Yamaha RMAXhelicopter, weighing 66 kg and with a 3 m rotor diame-ter. Figure 1 shows a picture of the GTMAX. The heli-copter is outfitted with a variety of sensors including anInertial Science IMU sampled at 100 Hz, short-range sonar,

Journal of Field Robotics DOI 10.1002/rob

Page 14: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

428 • Journal of Field Robotics—2013

0 20 40 60 80 100 120 140 160 180 200−3

−2

−1

0

1

2

3v x (

m/s

)

gps velocitynav velocity

0 20 40 60 80 100 120 140 160 180 200−6

−4

−2

0

2

4

6

8

v y (m

/s)

time (s)

Figure 7. Comparison of vision-based navigation velocity states (red dashed line) with GPS velocity data (blue solid line) of theGTMAX during translational input commands. These results are part of a 16 min flight.

magnetometer, and differential GPS (Johnson et al., 2004).The vehicle was equipped with a Prosilica GC 1380 cam-era for vision-based navigation, which features progressivescanning and gigabit ethernet connectivity. Two onboardcomputers with 1.6 GHz Intel Pentium M CPUs processinformation from these sensors for guidance, navigation,and control of the vehicle. For the vision-based navigationdescribed in this paper, the secondary computer performsimage acquisition and feature detection, while the primarycomputer performs the navigation update and handles ve-hicle guidance and control. The computational power on-board was found to be sufficient to support real-time op-eration. In particular, the EKF described in Section 4 pro-cesses vision data and makes updates based on vision at20 Hz. It also features a sequential update (Christophersenet al., 2006), hence the updates are performed as measure-ments arrive, allowing flexibility in handling sensors withdifferent update rates. In combination, the fused navigationsolution is available to the controller at 100 Hz, which issufficient to support real-time operation. All flight results

presented below employ the GTMAX’s baseline adaptiveflight controller, which was described in detail in Johnsonand Kannan (2005).

6.1.1. GTMAX Flight Test Results

The data presented below show excerpts from a single flightduring which there were 16 min of continuous operation ofthe vision-based navigation system (16 min is about 40%of the typical endurance of the GTMAX). Figure 5 showsthe navigation solution during the flight, and the top-downview also shows the location of the vehicle as given by GPSfor comparison. Note that the vehicle is not drawn to scale.Speeds as high as 6 m/s were reached and the vehicle flew atmultiple altitudes. While this test was performed in an areaapproximately 150 m by 50 m, the results were solid, andthere is no reason why GPS-denied closed-loop autonomousflight with our V-INS flight could not be performed overgreater distances or for longer time. The flight was carriedout over a grassy area with a fence occasionally in view. The

Journal of Field Robotics DOI 10.1002/rob

Page 15: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 429

0 10 20 30 40 50 60 70 80 90

−1.5

−1

−0.5

0

0.5

1

time (s)

x (m

)

gps position nav position command

0 10 20 30 40 50 60 70 80 90

−1.5

−1

−0.5

0

0.5

1

y (m

)

time (s)

Figure 8. Comparison of vision-based navigation position states (red dashed line) with GPS location data (blue solid line) of theGTMAX during a 90 s excerpt of hovering flight approximately 100 ft above ground. GPS data were used in navigation solution foraltitude only, and this can easily be replaced with a barometer or active range sensor. The vehicle remained within approximately1.6 m of the commanded position despite a small amount of drift in the navigation solution from 22 s to 60 s. These results are partof a 16 m flight.

satellite imagery in Figure 5 gives a rough indication of theflight area.

For all tests, the camera was mounted to the nose of theaircraft pointing downward, and monochrome images weregathered at approximately 20 Hz and at 320 × 240 pixelresolution. The best features in each image, with a max-imum of 20, were output for use in the navigation algo-rithm. A maximum of 50 feature locations were stored inthe database. GPS data were gathered at 5 Hz and usedfor altitude measurement in the navigation solution. Themeasurement is analogous to using a barometric pressuresensor, in that both measurements are relative to an initialdatum, rather than the altitude above ground. Also, the GPSposition and velocity output is presented below for compar-ison to the vision-based navigation in the horizontal plane.

Figures 6 and 7 show the results of a series of transla-tional commands given to the GTMAX while using vision-based navigation. Figure 6 shows that the navigation so-lution corresponds well with the GPS output, despite the

vehicle traveling over 90 m each way, and having to replaceonline the database of feature locations. At time 100–120 s,it can be seen that a velocity error causes the vehicle to drift.This drift is possibly caused by miscorresponded measure-ment and database features, which can occur over areaswith poor texture and poor feature distribution over theimage. In poorly textured areas, features are less repeat-able from frame to frame, and periodic dropouts of featurescould cause a database point to be incorrectly matched toa neighboring feature. Reducing the occurrence of featuremiscorrespondences and navigation drift in general is thesubject of ongoing research. An absolute position error of7 m is incurred over the course of the maneuver.

Figures 8 and 9 show the results of a 90 s hover of theGTMAX. The vehicle remains within approximately a 1.6 mradius circle around the commanded position.

Figure 10 shows the performance of the V-INS so-lution during autonomous descent and landing. This isthe first time an autonomous landing has been performed

Journal of Field Robotics DOI 10.1002/rob

Page 16: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

430 • Journal of Field Robotics—2013

−1.5 −1 −0.5 0 0.5 1 1.5−1

−0.5

0

0.5

1

1.5

x (m)

y (m

)

gps positionnav position

Figure 9. x-y position of the vehicle as given by vision-basednavigation position states (red dashed line) and GPS locationdata (blue solid line) of the GTMAX during a 90 s excerpt fromextended hovering flight approximately 100 ft above ground.These results are part of a 16 min flight.

under vision-based navigation for a vehicle of this classin completely unmapped environments, to the authors’knowledge. This maneuver demonstrates that the naviga-tion solution is robust to large altitude changes and corre-sponding changes in the surface texture beneath the vehicle,and that the solution can be trusted during maneuvers withlow velocity error tolerance.

6.1.2. GTMAX Simulation Results

A 350 s simulation was performed to analyze the perfor-mance of the navigation system over an area with a non-flat ground plane, and to provide further validation oflong-distance flight. During this simulation, the vehicletraveled over 560 m at an altitude of 50 m over a groupof buildings up to three stories tall (approximately 10 m).Images were captured from a simulated camera located onthe vehicle. An image from the simulation environment,along with the true path of the vehicle and the waypointsof the trajectory, is shown in Figure 11.

Figure 12 shows the GPS and navigation output in thex-y plane and with respect to time. The final navigation error

0 10 20 30 40 50 60

−1

−0.5

0

0.5

1

time (s)

x (m

)

gps positionnav position

0 10 20 30 40 50 60

−1

−0.5

0

0.5

1

y (m

)

0 10 20 30 40 50 600

5

10

15

20

altit

ude

(m)

time (s)vehicletouchdown

Figure 10. Time history of the position states of the GTMAX navigation solution during autonomous descent and landing. Thered dashed line gives the navigation solution position, and the blue solid line gives the GPS position. These results are part of a16 min flight.

Journal of Field Robotics DOI 10.1002/rob

Page 17: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 431

Figure 11. Flight path during a simulated flight over a group of buildings. The vehicle flew at an alittude of 50 m and and buildingswere as tall as 10 m. The vehicle flew four times around the path indicated by the purple waypoints, for a total linear distance of560 m. The true path of the vehicle is shown in blue, and the final error between the true position and navigation solution was 5 m.

−35 −30 −25 −20 −15 −10 −5 0 5−45

−40

−35

−30

−25

−20

−15

−10

−5

0

5

x (m)

y (m

)

gps positionnav position

(a)

0 50 100 150 200 250 300 350 400−40

−30

−20

−10

0

10

time (s)

x (m

)

gps positionnav position

0 50 100 150 200 250 300 350 400−50

−40

−30

−20

−10

0

10

y (m

)

time (s)

(b)

Figure 12. Plots illustrating the navigation solution (red) and GPS position (blue) during a simulated flight over nonflat ground.

over the 560 m flight path was 5 m. Note that no loop-closurewas performed, indicating that similar performance can beexpected over nonrepeating paths. It can be seen that thenavigation solution provides data that ensure the stabilityof the vehicle during the entire trajectory.

6.1.3. Comparison with Existing V-INS Techniques forGTMAX Class UAVs

The approach presented here has significant improvementsover that in Wu and Johnson (2010) and Wu et al. (2012), inwhich comparable flight times were reported for a GTMAX

Journal of Field Robotics DOI 10.1002/rob

Page 18: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

432 • Journal of Field Robotics—2013

Figure 13. Typical images from the EASE camera used in the navigation algorithm. The green crosses are features output fromthe Harris corner detector, and the blue crosses are database feature locations projected onto the image plane.

−0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

x (m)

y (m

)

nav positionvicon position

(a)

0 5 10 15 20 25 30 35−0.1

0

0.1

time (s)

x (m

)

nav position command vicon position

0 5 10 15 20 25 30 35−0.1

0

0.1

y (m

)

time (s)

0 5 10 15 20 25 30 35−1.3

−1.2

−1.1

z (m

)

time (s)

(b)

Figure 14. Position data from an excerpt of a 40 s hover of the EASE. Plots show that the navigation solution (red dashed line) isin good agreement with the motion capture system output (blue solid line).

class UAS with premapped features. This is because ourapproach does not assume premapped features and usesthe computationally efficient feature-database managementalgorithm to evaluate and accommodate new features on-line. The approach is also more general than that presentedby Amidi and Fujita (1999), who assumed flat ground,and more computationally efficient than that presented byLangelaan (2007). Furthermore, these two approaches werenot flight-tested. The autonomous landing demonstrated inthis paper was performed without any artificial features, aswere used in Merz et al. (2004), and without GPS, as wasused in Chamberlain et al. (2011). The results presented inScherer (2012) demonstrate impressive navigation perfor-mance in intermittant GPS conditions with a stereo camerasensor. However, a stereo camera sensor is not as standarda payload as a monocular camera, and may not always bepractical. Additionally, the closed-loop flight results usingvision-aided navigation presented in Achtelik et al. (2011);

Weiss et al. (2011) and Weiss et al. (2012) are limited to about8 min, whereas the vision-aided INS results presented inour work extend well over double that time. Finally, our re-sults are also unique because there are no similar results forRMAX class UAVs over the time and distance demonstratedin this paper.

6.2. EASE

The EASE is a ducted-fan electric vehicle with a mass ofapproximately 1 kg designed for use in both indoor andoutdoor environments. The EASE has two independentlycontrolled coaxial counter-rotating propellers as well as fourindependent control surfaces in the duct for generating con-trol moments. The sensors on the vehicle are an AnalogDevices 16405 IMU sampled at 100 Hz, a sonar for alti-tude, and a video camera. The video camera is mountedunderneath the vehicle, with the optical axis aligned with

Journal of Field Robotics DOI 10.1002/rob

Page 19: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 433

0 5 10 15 20 25 30 35 40−2

0

2

φ (d

eg)

time (s)

0 5 10 15 20 25 30 35 40−2

0

2

θ (d

eg)

time (s)

0 5 10 15 20 25 30 35 40−2

0

2

ψ (

deg)

time (s)

nav attitude command vicon attitude

Figure 15. Attitude data from a 40 s hover of the EASE.

the body z-axis. The camera outputs analog video at 30frames per second and 640×480 resolution. The images aredown-sampled to 320×240 resolution for use in the navi-gation algorithm. The camera field of view is 41 degrees,and distortion coefficients were calculated in MATLAB us-ing the Camera Calibration Toolbox. The vehicle receivespower and communicates with the ground control stationover a microfilament, which currently can be extended upto 50 m. The microfilament also serves as an analog videolink. The main reason for the microfilament is to overcomeflight endurance limitations (fewer than 8 min) imposed bystate-of-the-art batteries. The power transferred through thecable allows the EASE to operate over an indefinite interval.

Because this cable is already in place, the vision-aided pro-cessing is performed offboard using an analog video linkbootstrapped onto this cable. Since the video link is analog,it does not induce any more delay in transmitting the imagesthan if the data were processed onboard. The V-INS and theposition control algorithms are run on a computer on theground and actuator commands are sent to the vehicle.The vehicle itself runs a high-frequency (>600 Hz) in-nerloop rate controller with a washout filter for attitudefeedback. The specifications of the GTMAX, two single-core 1.6 GHz pentium M computers, are comparable tothe latest dual-core atom processors, which could be eas-ily carried by the EASE. Hence, the ability of the platformto carry required computational resources is not a limitingfactor in implementing our algorithms onboard the EASE.Figure 1 shows a picture of the EASE. The baseline controllerfor the EASE consists of an adaptive attitude and velocitycontroller that is based on the GTMAX adaptive flight con-troller [see Johnson and Kannan (2005)] and is implementedoffboard. In addition, it features a high bandpass rate damp-ing loop implemented onboard. Similar to GTMAX, the EKFdescribed in Section 4 processes vision data and makes up-dates based on vision at 20 Hz, and also features a sequentialupdate.

Potential applications of the EASE include indoor re-connaissance in unfriendly environments, first-response op-erations, and counterterrorism operations. Therefore, the in-tended operating environment for EASE consists of gusty,unmapped, cluttered indoor environments without any ex-ternal sensor support. In the results presented here, theEASE was evaluated under these conditions. In the fol-lowing tests, the EASE performed autonomous hover andtranslations in still air, and hover in 2.4 m/s wind. The

−0.4 −0.2 0 0.2 0.4 0.6 0.8−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

x (m)

y (m

)

nav positionvicon position

(a)

0 10 20 30 40 50 60−1

0

1

time (s)

x (m

)

nav position command vicon position

0 10 20 30 40 50 60−1

0

1

y (m

)

time (s)

0 10 20 30 40 50 60−1.6

−1.4

−1.2

z (m

)

time (s)

(b)

Figure 16. Position data from an excerpt of a 57 s hover of the EASE in 2.4 m/s wind.

Journal of Field Robotics DOI 10.1002/rob

Page 20: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

434 • Journal of Field Robotics—2013

0 10 20 30 40 50 60−20

0

20

φ (d

eg)

time (s)

0 10 20 30 40 50 60−20

0

20

θ (d

eg)

time (s)

0 10 20 30 40 50 60−200

0

200

ψ (

deg)

time (s)

nav attitude command vicon attitude

Figure 17. Attitude data from an excerpt of a 57 s hover ofthe EASE in 2.4 m/s wind. The wind causes considerably morevariation in the attitude than was seen in still air. Note thecommanded yaw at 37 s, and the corresponding change in pitchangle to a roll angle.

vehicle flew over a carpeted floor, and typical camera im-ages are shown in Figure 13. Up to 16 points were storedin the database of feature locations, and the feature detec-tor output up to 20 features for each image. For compar-ison, data were collected using a motion capture system(VICON) that gives submillimeter position and subdegreeattitude accuracy at 100 Hz. Note that for indoor opera-tion, there is no direct way to check whether a feature ison the ground or on the wall when initializing a featureusing Algorithm 1. However, since our approach incorpo-rates the estimation of the altitude of the feature, it canconverge to the correct location of features if they remain inview.

Figures 14 and 15 show the results of a 40 s hover with-out wind. The vehicle remains within a 16 cm diameter cir-cle during the test, and the navigation position and motioncapture position coincide closely throughout. This test isuseful in establishing a baseline for the closed-loop controlperformance with the V-INS.

Figures 16 and 17 show the hovering performance ofthe vehicle in the presence of external disturbances (wind).The additional vehicle motion caused by the wind makesvision-based navigation challenging, hence both algorithmshave to work in unison for the system to keep hovering.The wind was generated indoors using a 3 m diameterfan, and wind speed approximately 2.4 m/s and +/− 0.4m/s. The navigation solution is again in agreement with themotion capture output, despite more vehicle and hencecamera motion. As expected, the vehicle holds position lessprecisely, having a diameter of motion of approximately 0.76m. Note in Figure 17 the approximately -10 degree pitch an-

0 5 10 15 20 25−5

0

5

φ (d

eg)

time (s)

0 5 10 15 20 25−5

0

5

θ (d

eg)

time (s)

0 5 10 15 20 25−2

0

2

ψ (

deg)

time (s)

nav attitude command vicon attitude

(a)

0 5 10 15 20 25−0.2

0

0.2

time (s)

x (m

)nav position command vicon position

0 5 10 15 20 25

−1.5

−1

−0.5

0

y (m

)

time (s)

0 5 10 15 20 25−1.3

−1.2

−1.1

z (m

)

time (s)

(b)

Figure 18. Attitude and position data illustrating the responseof the EASE to a commanded translation of 1.2 m.

gle needed to hold position in the wind, and the smoothtransition to a 10 degree roll angle as the vehicle follows theyaw command.

Figure 18 illustrates the response of the vehicle to acommanded translation. The 1.2 m command moved thevehicle such that no points in the original database werevisible, and the entire database was replaced with pointsinitialized in real time during the maneuver. As can be seenin the plots, the navigation solution incurs a small amountof offset in the y direction and in yaw due to accumulatederror.

Figure 19 shows a sequence of images from a flight ofthe EASE in a cluttered indoor environment. The vehicle

Journal of Field Robotics DOI 10.1002/rob

Page 21: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 435

Figure 19. Sequence of images from a flight of the EASE in a cluttered hallway. Note the extension cord on the floor in (b), thechange in carpet textures between (b) and (c), and the narrow corridors throughout.

is flown by an operator who gives position commands tothe vehicle using a joystick, and the vehicle autonomouslytracks these position commands. It can be seen that thenavigation solution is robust to ground clutter, such as theextension cord on the floor in (b), and to the change in carpettexture as the vehicle enters the room between (b) and (c).Furthermore, this demonstrates the ability of the EASE to becontrolled in confined spaces such as narrow hallways anddoorways. The vehicle traveled approximately 30 m overthe course of this flight.

6.2.1. Comparison with Existing V-INS Techniques for EASEClass UAVs

The results presented on the EASE demonstrate several im-portant advances over other work in the literature. Work

such as that of Ahrens et al. (2009) and Madison and Bot-tkol (2007) demonstrate comparable hover results, but in alaboratory setting (IMU data are simulated with a motioncapture system), and they do demonstrate flight over dis-tances we have shown. The impressive work done in therobotics community in SLAM (Civera et al., 2008; Davi-son et al., 2007; Eade and Drummond, 2006; Klein andMurray, 2007; Sibley et al., 2010) promises to greatly increasethe capability of vision-based navigation by generating ac-curate maps in addition to localization, but none of these pa-pers demonstrate closed-loop control on a vehicle with fastdynamics. The visual odometry system presented in Dilleet al. (2009) shows results of extended range and duration,however the system is implemented on a ground vehicle.Frietsch and Trommer (2008) presented a visual odometrysystem that enables a vehicle to maintain stability in the

Journal of Field Robotics DOI 10.1002/rob

Page 22: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

436 • Journal of Field Robotics—2013

absence of GPS, but their system does not demonstrate thelong-distance flight capability that we show in this paper.Furthermore, they did not present flight-test results.

7. CONCLUSION

An Unmanned Aerial Vehicle vision-aided Inertial Navi-gation System (V-INS) for sustained dynamic operation inGPS-denied environments was presented and validated inautonomous flight. Visual feature locations were estimatedusing a Harris corner detector; their frame-to-frame transi-tions were predicted using inertial measurements and com-pared with observed locations to bound the INS drift andcorrect for inertial sensor biases and misalignment errorsusing an Extended Kalman Filter. A novel and efficient algo-rithm to manage a library of features online was presentedand was designed to add or remove features based on ameasure of relative confidence in each feature location. Theability of the system to maintain a database of trustwor-thy navigable features allowed the system to sustain dy-namic operation during waypoint-following autonomousflight in a completely GPS-denied dynamic environmentwithout any a priori known feature locations. The presentedalgorithms were validated through flight tests on a M-UAVoperating indoors and a 66 kg rotorcraft operating out-doors in simulated GPS-denied environments. The methodpresented here was shown to overcome the shortcomingsof existing V-INS algorithms, such as computational ineffi-ciency, lack or reliability, or lack of proven flight-test results.Therefore, these results establish the feasibility of using V-INS for sustained autonomous operation in both indoor andoutdoor GPS-denied environments.

ACKNOWLEDGMENTS

This work was performed under the NIST Grant No.70NANB10H013. The authors wish to thank Eohan Georgeand the team at CyPhy Works, including Helen Greiner,Jason Walker, Alexey Zaparoavvny, and Felipe Bohorquez.

REFERENCES

Achtelik, M., Achtelik, M., Weiss, S., & Siegwart, R. (2011). On-board IMU and monocular vision based control for MAVsin unknown in- and outdoor environments. In Roboticsand Automation (ICRA), 2011 IEEE International Confer-ence, (pp. 3056–3063).

Achtelik, M., Bachrach, A., He, R., Prentice, S., & Roy, N.(2009a). Autonomous navigation and exploration of aquadrotor helicopter in GPS-denied indoor environments.In Proceedings of the 1st Symposium on Indoor Flight,International Aerial Robotics Competition.

Achtelik, M., Bachrach, A., He, R., Prentice, S., & Roy, N.(2009b). Stereo vision and laser odometry for autonomous

helicopters in GPS-denied indoor environments. Proceed-ings of SPIE, 7332.

Ahrens, S., Levine, D., Andrews, G., & How, J. (2009). Vision-based guidance and control of a hovering vehicle in un-known, GPS-denied environments. In IEEE InternationalConference on Robotics and Automation (pp. 2643–2648).

Amidi, O., Kanade, T., & Fujita, K. (1999). A visual odometer forautonomous helicopter flight. Robotics and AutonomousSystems, 29, 185–193.

Bachrach, A., Prentice, S., He, R., & Roy, N. (2011). Range -robust autonomous navigation in GPS-denied environ-ments. Journal of Field Robotics, 28(5), 644–666.

Blosch, M., Weiss, S., Scaramuzza, D., & Siegwart, R. (2010).Vision based MAV navigation in unknown and unstruc-tured environments. In IEEE International Conference onRobotics and Automation (pp. 21–28).

Chamberlain, L. J., Scherer, S., & Singh, S. (2011). Self-awarehelicopters: Full-scale automated landing and obstacleavoidance in unmapped environments. In M. Whaley(Ed.), AHS Forum 67.

Chowdhary, G., Sobers, D. M., Pravitra, C., Christmann, C., Wu,A., Hashimoto, H., Ong, C., Kalghatgi, R., & Johnson, E.N. (2011a). Integrated guidance navigation and control fora fully autonomous indoor uas. In Guidance Navigationand Control Conference, Portland, OR: AIAA.

Chowdhary, G., Sobers, D. M., Pravitra, C., Christmann, C., Wu,A., Hashimoto, H., Ong, C., Kalghatgi, R., & Johnson, E. N.(2011b). Self contained ranging sensor aided autonomousguidance, navigation, and control for indoor flight. Port-land, OR: AIAA.

Christophersen, H. B., Pickell, W. R., Neidoefer, J. C., Koller, A.A., Kannan, S. K., & Johnson, E. N. (2006). A compact guid-ance, navigation, and control system for unmanned aerialvehicles. Journal of Aerospace Computing, Information,and Communication, 3.

Civera, J., Davison, A., & Montiel, J. (2008). Inverse depthparametrization for monocular slam. IEEE Transactionson Robotics, 24(5), 932–945.

Conte, G., & Doherty, P. (2009). Vision-based unmannedaerial vehicle navigation using geo-referenced informa-tion. EURASIP Journal of Advanced Signal Process, 10.

Davison, A. (2003). Real-time simultaneous localisation andmapping with a single camera. In Computer Vision, 2003.Proceedings. Ninth IEEE International Conference (vol. 2,pp. 1403–1410).

Davison, A., Reid, I., Molton, N., & Stasse, O. (2007). Monoslam:Real-time single camera slam. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 29(6), 1052–1067.

Dille, M., Grocholsky, B. P., & Singh, S. (2009). Outdoordownward-facing optical flow odometry with commod-ity sensors. In Proceedings Field & Service Robotics (FSR’09).

Eade, E., & Drummond, T. (2006). Scalable monocular slam.In Computer Vision and Pattern Recognition, 2006 IEEEComputer Society Conference (vol. 1, pp. 469–476).

Fowers, S. G., Tipets, B. J., Lee, D. J., & Archibald, J. (2008).Vision-guided autonomous quad-rotor helicopter flight

Journal of Field Robotics DOI 10.1002/rob

Page 23: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

Chowdhary et al.: GPS-denied autonomous flight using vision aided inertial navigation • 437

stabilization and control. In Unmanned Systems NorthAmerica Conference. AUVSI.

Frietsch, N., Meister, O., Schlaile, C., Seibold, J., & Trommer,G. (2008). Vision based hovering, and landing system fora vtol-mav with geo-localization capabilities. In GuidanceNavigation and Control Conference. AIAA.

Gelb, A. (1974). Applied Optimal Estimation. Cambridge: MITPress.

Gelb, A., Kasper, J. F., Nash, R. A., Price, C. F., & Suther-land, A. A. (1974). Applied optimal estimation. The M.I.T.Press.

Grzonka, S., Grisetti, G., & Burgard, W. Towards a navigationsystem for autonomous indoor flying. In Robotics and Au-tomation, 2009. Proceedings of the IEEE International Con-ference (pp. 2878–2883).

Grzonka, S., Grisetti, G., & Burgard, W. (2012). A fullyautonomous indoor quadrotor. IEEE Transactions onRobotics, 28(1), 90–100.

Haines, J. (2011). Vision-based control of an aerial vehicle.Master’s thesis, Carnegie Mellon University.

Harris, C., & Stephens, M. (1988, Aug.–Sep.). A combined cor-ner and edge detection. Proceedings of The Fourth AlveyVision Conference, Manchester, UK (pp. 147–151).

Johnson, E., & Kannan, S. (2005). Adaptive trajectory controlfor autonomous helicopters. Journal of Guidance Controland Dynamics, 28(3), 524–538.

Johnson, E. N., Proctor, A. A., Ha, J., & Tannenbaum, A. R.(2004). Development and test of highly autonomous un-manned aerial vehicles. AIAA Journal of Aerospace Com-puting, Information and Communication, 1(12), 485–501.

Kayton, M., & Fried, W. R. (1997). Avionics navigation systems.John Wiley and Sons.

Kendoul, F. (2012). Survey of advances in guidance, navigation,and control of unmanned rotorcraft systems. Journal ofField Robotics, 29(2), 315–378.

Kitt, B. M., Rehder, J., Chambers, A. D., Schonbein, M., Late-gahn, H., & Singh, S. (2011). Monocular visual odome-try using a planar road model to solve scale ambiguity.In Proceedings of the European Conference on MobileRobots.

Klein, G., & Murray, D. (2007). Parallel tracking and mappingfor small ar workspaces. In Mixed and Augmented Real-ity, 2007. ISMAR 2007. 6th IEEE and ACM InternationalSymposium (pp. 225–234).

Langelaan, J. (2007). State estimation for autonomous flight incluttered environments. AIAA Journal of Guidance Navi-gation and Control, 30(5), 1414–1426.

Lefferts, E., Markley, F., & Shuster, M. (1982). Kalman filteringfor spacecraft attitude estimation. Journal of Guidance,Control, and Dynamics, 5(5), 417–429.

Lowe, D. (1999). Object recognition from local scale-invariantfeatures. In Computer Vision, 1999. The Proceedings of theSeventh IEEE International Conference (vol. 2, pp. 1150–1157).

Madison, R., Andrews, G., DeBitetto, Paul, Rasmussen, S., &Bottkol, M. (2007). Vision-aided navigation for small UAVs

in GPS-challenged environments. In Infotech@AerospaceConference. AIAA.

Merz, T., Duranti, S., & Conte, G. (2004). Autonomous landingof an unmanned helicopter based on vision and inertialsensing. In Proceedings of the 9th International Sympo-sium on Experimental Robotics.

Mourikis, A., Trawny, N., Roumeliotis, S., Johnson, A., Ansar,A., & Matthies, L. (2009). Vision-aided inertial navigationfor spacecraft entry, descent, and landing. IEEE Transac-tions on Robotics, 25(2), 264–280.

Pears, N. E. (2000). Feature extraction and tracking for scanningrange sensors. Robotics and Autonomous Systems, 33(1),43–58.

Scherer, S., Rehder, J., Achar, S., Cover, H., Chambers, A.,Nuske, S., & Singh, S. (2012). River mapping from a fly-ing robot: State estimation, river detection, and obstaclemapping. Autonomous Robots, 33, 189–214.

Shen, S., M. N., & Kumar, V. Autonomous multi-floor in-door navigation with a computationally constrained microaerial vehicle. In Robotics and Automation, 2011, Proceed-ings of the IEEE International Conference.

Sibley, G., Mei, C., Reid, I., & Newman, P. (2010). Vast-scaleoutdoor navigation using adaptive relative bundle ad-justment. The International Journal of Robotics Research,29(8), 958–980.

Strasdat, H., Montiel, J., & Davison, A. (2010). Real-time monoc-ular slam: Why filter? In Robotics and Automation (ICRA),2010 IEEE International Conference (pp. 2657–2664).

Strelow, D., & Singh, S. (2004). Motion estimation from imageand inertial measurements. The International Journal ofRobotics Research, 23(12), 1157–1195.

Theodore, C., Rowley, D., Ansar, A., Matthies, L., Goldberg, S.,Hubbard, D., & Whalley, M. (2006). Flight trials of a ro-torcraft unmanned aerial vehicle landing autonomouslyat unprepared sites. In American Helicopter Society62nd Annual Forum, Phoenix, AZ: American HelicopterSociety.

Tomasi, C. (1991). Detection and tracking of point featurestechnical report cmu-cs-91-132. Image Rochester NY,104(April), 1–22.

Weiss, S., Achtelik, M., Lynen, S., Chli, M., & Siegwart, R.(2012). Real-time onboard visual-inertial state estimationand self-calibration of MAVS in unknown environments.In Robotics and Automation (ICRA), 2012 IEEE Interna-tional Conference (pp. 957–964).

Weiss, S., Scaramuzza, D., & Siegwart, R. (2011). Monocular-slam-based navigation for autonomous micro helicoptersin GPS-denied environments. Journal of Field Robotics,28(6), 854–874.

Wendel, J., Meister, O., Schlaile, C., & Trommer, G. F. (2006). Anintegrated GPS/MEMS-IMU navigation system for an au-tonomous helicopter. Aerospace Science and Technology,10(6), 527–533.

Wu, A., & Johnson, E. (2008). Methods for localization and map-ping using vision and inertial sensors. In Proceedings ofAIAA GNC.

Journal of Field Robotics DOI 10.1002/rob

Page 24: GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft

438 • Journal of Field Robotics—2013

Wu, A., & Johnson, E. N. (2010). Autonomous flightin GPS-denied environments using monocular visionand inertial sensors. In Infotech@AIAA, Atlanta, GA:AIAA.

Wu, A. D., Johnson, E. N., Kaess, M., Dellaert, F., & Chowdhary,G. (2012). Autonomous flight in GPS-denied environmentsusing monocular vision and inertial sensors. AIAA Journal

of Aerospace Computing, Information, and Communica-tion (to be published).

Wu, A. D., Johnson, N. E., Kaess, M., Dellaert, F., & Chowdhary,G. (2011). Autonomous flight in GPS-denied environmentsusing monocular vision and inertial sensors. Journal ofAerospace Computing, Information, and Communication(submitted for publication).

Journal of Field Robotics DOI 10.1002/rob