PASP: Policy Based Approach for Sensor Planning

PASP: Policy Based Approach for Sensor Planning

Sankalp Arora1 and Sebastian Scherer1

Abstract— Capabilities of mobile autonomous systems is oftenlimited by the sensory constraints. Range sensors movingin a fixed pattern are commonly used as sensing modalitieson mobile robots. The performance of these sensors can beaugmented by actively controlling their configuration for mini-mizing the expected cost of the mission. The related informationgain problem in NP hard. Current methodologies are eithercomputationally too expensive to run online or make simplifyingassumptions that fail in complex environments.

We present a method to create and learn a policy that mapsfeatures calculated online to sensory actions. The policy devel-oped in this work actively controls a nodding lidar to keep thevehicle safe at high velocities and focuses the sensor bandwidthon gaining information relevant for the mission once safetyis ensured. It is validated and evaluated on an autonomousfull-scale helicopter (Boeing Unmanned Little Bird) equippedwith an actively controlled nodding laser. It is able to keep thevehicle safe at its maximum operating velocity, 56 m/s, andreduce the landing zone evaluation time by 500% as comparedto passive nodding. The structure of the policy and efficientlearning algorithm should generalize to provide a solution foractively controlling a sensor for keeping a mobile robots safewhile exploring regions of interest to the robot.

I. INTRODUCTION

Navigation through partially known environments, local-ization, mapping and manipulation of objects etc. are all tasksfor which the robot is expected to detect objects, free spaceor features in the environment. The rate and accuracy ofdetection of information of interest dictate the performanceof the robotic system. Sensors to detect such features ofthe environment are often limited by their properties likefield of view (FOV), resolution, sampling rate and signal tonoise ratios. These sensors are either fixed [1], [2], [3] withrespect to the vehicle or in some cases move in constantpre-computed patterns [4], [5]. The capabilities of suchsensors and the robotic system can be augmented by activelycontrolling the sensor configuration to minimize the cost ofcompleting the task assigned to the robot.

The problem of active perception is a well studied oneand finds its roots in sequential hypothesis testing [6]. Activeperception and adaptive sampling problems have since beenextended to account for mobile sensing within the frameworkof Bayesian reasoning [7]. Later works have resulted invarious of solutions that incorporate information theoreticmeasures for problems like object recognition, mapping, andscene reconstruction [8]. Gradient based methods for nextbest view and belief space planning have improved [9],[10]. While such algorithms have shown to be useful in

1The Robotics Institute, Carnegie Mellon University, 5000 Forbes Av-enue, Pittsburgh, PA 15213, USA asankalp, [email protected]

Fig. 1: Application Scenario: Top Left – Test Vehicle, BoeingUnmanned Littlebird. Top Right – Near Earth Autonomy M3sensor suite, equipped with a actively controllable high rangelaser. Bottom – Example mission scenario, the vehicle issuppose to autonomously navigate to a pre-defined landingzone at high speeds, evaluate it and decide to whether toland or not while ensuring safety.

their respective applications, they typically rely on restric-tive assumptions on the representations, objective functionsand do not have guarantees on global optimality. Finite-horizon model predictive control methods [11], [12] provideimprovement over myopic techniques, but they do not haveperformance guarantees beyond the horizon depth and therun times to operate online in large state-spaces. POMDPbased solvers [13],suffer from the same curse of dimension-ality. The recursive greedy algorithm [14], Branch and bound[15] use the budget to restrict the search space. But requirecomputation exponential in the size of the problem instancedue to the large blowup in the search space with increasingbudget.

We overcome the curse of dimensionality by searchingfor a policy to actively control the sensor configuration. Thepolicy function is learnt such that it maximizes the gainof information that is important for the completion of themission of the robot while ensuring its safety as it navigatesthrough unknown environments. We implement the policy ona full-scale autonomous helicopter (Boeing Unmanned LittleBird) equipped with an actively controlled nodding lidar,using occupancy grid map as a world representation. Thepolicy optimizes the use of sensor bandwidth to enable safe,high speed navigation and fast detection of landing zones.The main contributions of the paper are as follows:

• Computational complexity analysis for calculating ex-pected information gain for a range sensor and anoccupancy grid map representation .

• Policy function and feature design to allow for onlineactive sensor control that allows the vehicle to stay safeat high speeds.

• Results from online evaluation on an autonomous fullsized helicopter that show guaranteed safe autonomousnavigation at speeds of 56m/s and straight in approachesto landing zones, eliminating the need to hover.

The task of enabling safe, high speed flight enforces tighttemporal constraints on the sensor controller motivating thereactive, online approach. In the next section we formallysetup the problem, section III presents with the approachoverview. Section IV, section V cover the implementationdetails including construction and learning of policy. Resultsof evaluation of the performance of the algorithm are dis-cussed in section VII.

II. PROBLEM SETUP

In this section we formulate the problem as that ofminimization of expected cost of traversal from initial togoal state, through optimization of sensor trajectory. Thisproblem is then shown to be the same as maximizing thegain of information that minimizes the cost of traversal. Thisproblem formulation motivates and guides our approach.

Let the robot’s state space be X ⇢ Rn. Let � : [0, T ] !X be the state space trajectory and C : X ! R be thecost function, where T is the time horizon. The boundaryvalues are �(0) = �0 and �(T ) = �

f

. Let the dynamicsconstraints on the robot be given by h(�(t), �̇(t), �̈(t)) 0.The cost of the trajectory in a fully deterministic environmentis

RT

0 C(�(t)) dt.Operating in partially known, unstructured environments

the robot has to decide on its next action based on its currentbelief. Let the high dimensional state space of the worldbe W ⇢ Rm, where world includes the uncertainty of therobot about its environment and it’s pose. Let, belief be aprobability distribution over the state of the world, b : W !{0, 1}. Let’s assume the belief of the robot at the start of themission is b0. The belief of the robot changes as observationsare made using a sensor. Let the sensor’s state space be S ⇢Rs. Let �

s

: [0, T ]! S be the sensor trajectory.The cost function changes as the belief of the robot

evolves. To highlight this fact we represent the cost func-tional as C

b

: X ⇥W ! R. Due to the stochasticity of thebelief of the robot, it can only reason about expected cost ofits policy, E

p(b|�,�s

,b

o

) Cb

(.). The distribution of belief tra-jectories p(b|�,�

s

, bo

) at any given time is dependent on bothstate and sensor trajectory. Let the dynamics constraint on themotion of the sensor be given by h

s

(�s

(t), �̇s

(t), �̈s

(t)) 0.The full optimization problem can then be defined as follows.

argmin�(t),�

s

(t)E

p(b|�,�s

,b

o

)

ZT

0C

b

(�(t),�s

(t), b(t)) dt (1)

h(�(t), �̇(t), �̈(t)) 0

hs

(�s

(t), �̇s

(t), �̈s

(t)) 0

In this paper we study the problem of optimizing the sensortrajectory given a fixed vehicle trajectory �(.). The problemis then reduced to.

argmin�

s

(t)E

p(b|�,�s

,b

o

)

ZT

0C

b

(�(t),�s

(t), b(t)) dt (2)

The constraints on the sensor actuation specified in eq. 1also apply to eq. 2. Notice that, in this formulation thesensor motion can only result in gaining of informationabout the environment. Therefore, the minimization of thecost function with the �

s

(t) as the only variable, resultsin sensor gaining information that is important to mini-mize the required cost function. We call this informationcontextually important information, as it is important tosense this information to reduce the cost function. LetIG(x,�(t),�

s

(t), b(t)) be the information gain at x 2 (W )and time t. Let M(x,�(t), b(t)) 2 {0, 1} return 1 if theinformation is contextually important and 0 otherwise. Theoptimization problem then reduces to eq: 3. Proof in [16].

argmax�

s

(t)E

p(b|�,�s

,b

o

)

ZT

0

Z

8x2XM(x,�(t), b(t))

IG(x,�(t),�s

(t), b(t))dxdt

(3)

In our application, the robot is a full scale autonomoushelicopter, the sensor is a nodding lidar, with an activelycontrolled pitch axis. The trajectory �

s

(t) of the noddingsensor relative to the helicopter can be completely definedby its pitch angle profile in time ⇢(t), laser configuration⇢conf

, which consists laser’s fast axis resolution ⇢frr

and itspulse rate, ⇢

prr

. The pose of the robot is given by a highaccuracy GPS/INS system, the world representation is anoccupancy grid map [17]. The information gain evaluationfor an occupancy grid map is presented in detail in [16].M(x,�(t), b(t)), the contextual importance function is dis-cussed in detail in section IV. In the next section we presentan overview of our approach to solve eq. 3 for a mobileautonomous robot.

III. APPROACH OVERVIEW

Information gain based path planning is an NP-hardproblem [14]. Moreover, the calculation of information gainitself is computationally expensive. In this work, we usean occupancy grid map based representation. For a lasersensory action with R rays, interacting with N cells ofan occupancy grid map, the computation complexity ofcalculating information gain is O(3RN) [16]. As a result,the computational cost involved in calculating informationgain for a large number of rays over a large range, makes itnon-trivial to conduct gradient based optimizations.

To overcome the computational complexity issues andsolve the problem online, we propose learning a reactivepolicy function that maps from features to action, (see

PolicyFeaturesWorld Representation(Occupancy GridMap)

Contextual ImportanceM(v)

b(t)

b(t)

f(t) �s

(t)

Point Cloud

Mission Decsription �(t)

Fig. 2: Approach Overview: The sensor is actively controlledthrough a policy that takes in the features that are extractedfrom robot’s belief. The features take expected informationgain and contextual importance function into account.

Fig. 2). The policy learnt, is designed to maximize theinformation gain in the next time step. For the applicationconsidered in the paper, the policy learnt keeps the robot safeat high speeds and enables quick landing zone evaluation,allowing for low mission times. The data flow in the blockdiagram (Fig. 2) is explained in the rest of this section toprovide an intuitive understanding of the suggested approachand its application.

The point cloud generated by the laser is used by theperception representation to form the robot’s belief aboutthe world, in our case an occupancy grid map is used asthe world representation. The belief of the robot, alongwith the mission description and robot trajectory are usedto infer the contextual importance function. The contextualimportance function presents the locations from which itis important for the robot to gain information to completeits task safely. The construction of this function is furtherdescribed in section IV. The contextual importance functionalong with the belief of the robot are used to calculatefeatures. The policy maps these features to sensory actions.The features are the only input to the policy. Therefore,they should capture the saliencies of the interaction betweensensory actions, the representation and the environment. Theconstruction of one such set of features is described insection V-A.

The policy takes the features as input and provides thetrajectory for the sensor, �

s

(.). For the application consideredin the paper, the policy is designed for high range laser,nodding in pitch axis with respect to the vehicle. Theconstruction of the policy is described in section V. Theparameters of the constructed policy function need to belearnt to maximize the contextually important informationgain. The large partially known state space in which the robotoperates makes the learning of the correct policy parametersintractable. We reduce the scenarios for which the policyis to be learnt to overcome this problem. The procedure tolearn the policy parameters is presented in section VI. Inthe next section we describe the construction of contextualimportance function for our application.

IV. CONTEXTUAL IMPORTANCE FUNCTION

Finding the contextual importance function is as hardas solving for the original optimization problem [16]. Weapproximate this function by defining M(x,�(t), b(t)) = 1,for all x, that might be important to the vehicle given thebelief b(t) and the robot state �. In this work, we consider thestructure of the contextual importance function to have twocomponents - safety M

safe

(.) and landing zone evaluationM

lz

(.).

A. Safe NavigationWe use the emergency maneuver library [18] to enforce

the safety constraint. It suggests using an offline computedemergency maneuver library to ensure vehicle safety. If thereexists a trajectory in the emergency maneuver library thatstarts from the current state of the vehicle and stays withinthe known obstacle free region for infinite time, the vehiclecan be considered safe. Therefore, it is important to sensethe volume occupied by emergency maneuver trajectoriescorresponding to the planned trajectory.Given a vehicle state �(t) at time t, let there be a function(x,�(t)) 2 {0, 1} which is one for the points inside thevolume occupied by emergency maneuver library for state�(t) if it is unsafe and zero otherwise. The the contextualimportance function for safety is then given as.

Msafe

(x,�(t), b(t)) = (x,�(t)) (4)

B. Landing ZoneThe mission objective of the helicopter is to navigate

safely from start to goal and then land. Therefore, it isimportant to evaluate landing zones before the helicoptercommits to a landing. The landing zone is represented asa 2D grid, with each cell storing the probability of thatcell being a safe landing site. Let, the landing zone gridbe represented by LZG. Each cell inside the landing zone isallocated contextual importance of one.

Mlz

(x,�(t), b(t)) = 1 8x 2 LZG (5)

The combined contextual importance function of safety andlanding zone evaluation is given as

M(.) = max(Msafe

(.),Mlz

(.)) (6)

Fig. 3, illustrates the contextual importance function for ahelicopter navigating towards a landing zone to land.

V. POLICY PARAMETRIZATION

We have discussed that the problem of directly optimizingthe cost function itself is NP hard and the cost functionevaluation is computationally expensive, making local opti-mization intractable. Therefore, we develop a mapping fromfeatures to actions that can be used online.

We learn a policy, ⇡ : Rf ! Ra, where Rf is the featurespace and Ra is the action space. The policy is developedsuch that it can guarantee the safety of the vehicle at highspeeds if the environment allows so. We restrict the actionspace to constant velocity nods to make the search compu-tationally feasible. We learn a policy offline that maximizes

Fig. 3: Contextual Importance Function – The figure illus-trates the region with contextual importance function greaterthan one in orange, and regions which were contextuallyimportant in grey. As the robot navigates, the regions knownin the past are now irrelevant, as there is no more informationto be gained that might affect its future actions. The volumearound the trajectory, bounded by the emergency maneuverlibrary for the future as yet unsafe states and volume insidethe landing zone is contextually important.

the information gain in the worst case scenario. The worstcase scenario and the process of learning the policy functionis presented in section VI.

We want to optimize the use of sensor bandwidth andavoid wasting sensor’s time looking at regions that arealready known, inaccessible or unimportant to the mission.Therefore the features have to cover visibility, informationgain and contextual importance function. In the next subsec-tion we cover the feature design and then present the policyfunction.

A. Features

To calculate the relevant field of view for the sensor,we calculate and store the contextually important expectedinformation gain along a grid of pitch (✓) and yaw (�)directions, with rays originating from laser’s center. Thisforms a 2D map ⇣

m

(✓,�) ! R, mapping view directionto contextual information gain. We generate this map bytracing rays through 3D occupancy grid representation andcalculating contextually weighed expected information gainalong the rays. The algorithm for calculating the expectedinformation gain along the rays is presented in [19]. Averagerange at which information is gained r

e

is also used as afeature for the policy. r

e

can be calculated while calculating⇣m

. The algorithms to calculate ⇣m

and re

are presented indetail in [16]. The policy function can then be representedas eq. 7 below.

[⇢̇, ⇢conf

, ⇢max

, ⇢min

] = ⇡(⇣m

, re

) (7)

The FOV to scan ⇢max

, ⇢min

, is inferred using ⇣m

. The ⇢max

is given by maximum pitch in ⇣m

for which the informationgain is greater than 0. The ⇢

min

is given by minimum pitchin ⇣

m

for which the information gain is greater than 0. Thevelocity of the nod is stored as lookup table against r

e

atwhich the information to be sensed. We now look at how welearn the mapping (lookup table) between range and velocitythat maximizes the information gain at that range.

VI. POLICY SEARCH

We want to find a nodding speed at which to scan thevolume to maximize the gain of contextually importantinformation. The standard method to find the optimal policyis to run the system either in simulation or in field and searchthe parameter space for values that minimize the averagecost over a range of scenarios. But given we are designing apolicy to keep the vehicle safe in all conditions, we search forthe optimal policy for the worst case scenario. In the worstcase scenario, the time available to the sensor for scanningis minimal and the volume to be covered is maximum.

The following subsections develop the worst case scenariofor a guaranteed safe autonomous mobile robot navigatingthrough unknown environments. This scenario is used tolearn the optimal nodding action for a given query range. Theoptimal action maximizes contextually important informationgain. Where, information gain is a function of reduction inuncertainty about the presence of obstacle of interest.

In the case of a rotorcraft the smallest obstacles of interestare wires. This section describes how information gain isrelated to probability of detection of obstacles of interest. Wefurther describe algorithms to efficiently calculate probabilityof detection given sensory actions, enabling search of policyparameters (⇢̇) in reasonable times.

A. Policy Search ScenarioProposition 1 (Worst Case Scenario for Safety). Given theaverage range at which contextually important informationis requested is r. The sensor would have to strive to gainmaximum information to ensure vehicle safety while thevehicle follows the planned trajectory if it is moving at themaximum speed that allows the vehicle to stay safe at ranger. Assuming the vehicle is flying unknown environment andthe environment allows for safe following of the nominaltrajectory.

The proof of the proposition [16] relies on the fact that thevolume occupied by emergency maneuvers is monotonicallyincreases with vehicle’s speed. We optimize the noddingspeed for this worst case scenario. Assuming the worldrepresentation takes t

r

seconds to refresh. We search for anodding speed that maximizes the gain of information inthe relevant volume V

r

in tr

seconds, where Vr

is given byequation 8.

Vr

= max(SV

(r)) (8)

where SV

(r), returns a vector of bounding volumes ofemergency maneuver libraries that lie completely within asphere of radius r of the vehicle. In the next subsection wedevelop the definition of information gain for safety, andshow how to optimize for it efficiently.

B. Information gain as probability of detection of worst caseobstacle

Let us assume, the smallest obstacle against which theautonomous system has to guarantee safety is given by o, theprobability that an object is present at x is given by p

x

(o).

Sankalp Arora

it

For brevity we use p(o) instead of px

(o). Let the probabilityof detection of obstacle given there is exists an obstacle at xbe given by p(d|o) and probability of detecting an obstaclegiven there is no obstacle at x is given by p(d|o0). We expressthe the expected information gain given p

x

(o) as a functionof p(d|o) and p(d|o0).

E IG(o) = p(o)(p(d|o)[H(p(o))�H(p(o|d))+p(d0|o)(H(p(o))�H(p(o|d0)))]+p(o0)(p(d|o0)(H(p(o))�H(p(o|d)))+p(d0|o0)(H(p(o))�H(p(o|d0))))

(9)

where, H(.) is the entropy [20].

Proposition 2 (Monotonicity of Information Gain). Assum-ing there are no false positives, p(d|o0) = 0, p(d0|o0) = 1.For a given p(o), E IG can be completely expressed as afunction of p(d|o) and monotonically increases in p(d|o).

Proposition 2 is proved in [16], and Fig. 4 illustrates it.Hence, to find the effectiveness of an action we evaluate

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Probability of detection given an obstacle exists

Info

rmatio

n G

ain

p(o)=0.1

p(o)=0.3

p(o)=0.5

p(o)=0.7

p(o)=0.9

Fig. 4: Information Gain – The expected information gaingiven a prior (p(o)) is monotonic in probability of detectingthe event w if the event occurred, p(d|o). The assumptionbeing, there are no false positives, p(d|o0) = 0. This impliesfor maximizing information gain, one may maximize p(d|o)

p(d|o) for a given p(o) and calculate the information gainin the mission relevant region. The action that provides themaximum information gain is selected as policy for givenfeature inputs. In the next section we present an efficientalgorithm to calculate p(d|o), that allow the creation of policylookup table tractably.

C. Efficient calculation of probability of detectionWe describe a method that exploits the symmetry and the

structure of the problem to calculate information gain fora range sensor efficiently. The naive method of calculatingp(d|o) is through sampling configurations of the objectfrom p(o) and computing ray intersections with the objectgiven the sensor nodding motion. This algorithm has thecomplexity of O(RM), where R is the total number of raysgenerated by the laser and M is the number of samples

drawn from p(o). Let’s assume an objects exists in a 6Dspace, and for each dimension we sample N particles. Thecomputational complexity is then O(RN6). We introducean algorithm that reduces this computational complexity toO(R3N4) by reasoning about ray-object intersections in theobject’s cspace in spherical coordinates. We further exploitthe symmetry and structure of the problem of sensing forsafety to reduce the complexity to O(R3

l

N), where Rl

is asmall fraction of the actual number of rays.

The algorithm for fast calculation P (d|o) is presented inAlg. 1. The input to the algorithm is sensor trajectory, �

s

whose information gain is to be calculated, an object’s partialpose x

partial

, only stating the distance of the object from thesensor and its relative orientation. We assume that the sizeof the object is small as compared to the query range, whichresults in negligible change in projection of the object to[✓,�] plane of the spherical coordinate frame, centered atthe laser. The algorithm returns a function p(d|✓,�, x

partial

)that provides the probability of detection of object if its inxpartial

configuration at any [✓,�]. Lets, assume the object isdetected if n points hits are detected on the object. Each laserbeam is represented as polygon in [✓,�] space of sphericalcoordinates It is assumed that if the ray and the objectintersect, the hit will be reported with a probability of p

h

.The Intersect(.) function (Alg. 2), calculates the intersec-

tion of set of unique polygons amongst themselves and storesit as Y

int

and also returns the list of members of its input setthat intersected to form a polygon 8i 2 Y

int

as �i

int

2 ⇤int

.The output of Alg. 2 are used by Probability(.) function(Alg. 3), to calculate the probability of detection given anobject at x

partial

8[✓,�]. It is obvious that given a fixed

Algorithm 1: EfficientCalculationofP (d|o)Input : �

s

, xpartial

Output: p(d|✓,�, xpartial

)

So

ProjSpherical(spartial

)SL

GeneratePolygons(�s

)SL,o

= SL

LSo

// The minkowski sum projects polygons in// S

L

to the space of So

.[Y

int

,⇤int

] Intersect(SL,o

)p(d|✓,�, s

partial

) DetectionProbability(Yint

,⇤int

)Return: p(d|✓,�, s

partial

)

angular motion profile for a range sensor the number of raysintersecting an object decreases with increasing distance.This implies that the probability of detection of the objectdecreases with increasing distance [16]. We maximize theinformation gain at the range at which the informationis requested, maximizing the minimum information gain.This results on maximizing the information gain on a 2dspherical manifold [✓,�]. So the configuration space of theobject restricted to a 2D plane is x, y, ✓. Assuming, weuse N particles for each dimension to approximate p(o),

Algorithm 2: Intersect(.)Input : S

L,o

Output : Y s

partial

int

,⇤int

Initialize: Y s

partial

int

= SL,o

,�i

int

= i8i 2Y

s

partial

int

, Yprev

= SL,o

while |Yprev

| > 1 doYnew

=for j 2 Y

prev

dofor k 2 Y

prev

� [1 : j] doYnew

= Ynew

[ (j \ k)�end+1int

= �j

int

[ �k

int

endendY

s

partial

int

= Remove(Ys

partial

int

, Ynew

)Y

s

partial

int

= Ys

partial

int

[ Ynew

Ypref

= Ynew

endReturn : Y

int

,⇤int

Algorithm 3: Probability(.)

Input : Y s

partial

int

,⇤s

partial

int

,�, ✓

Output: p(d|�, ✓, spartial

)

i ReturnPolygonContaining(✓,�, Ys

partial

int

)

p(d|�, ✓, spartial

) =P|�i

int

|k=n

�|�i

int

|k

�pkh

(1� ph

)|�i

int

|�k

Return: p(d|�, ✓, spartial

)

the computational cost of evaluating an action is given byO(R3N).

We restricted the action space to constant velocity nods,hence the laser points lie at a constant distance form eachother in pitch and yaw space, forming a grid. This symmetrycan be exploited by finding p(d|o) for every configurationin a single grid cell as p(d|o) is the same for every cell.Also, the intersection of the object need only be checkedwith the points in l2 radius of the grid cell. Let these numberof points in l2 radius be R

l

. The complexity of evaluatingan action is O(R3

l

N). For further details about reduction incomputational complexity please refer to [16].

Alg. 1 is evaluated exhaustively for the complete noddingvelocity space at finite discrete intervals, to create the lookuptable of range vs nodding velocity. The slowest noddingspeed is restricted such that one complete nod occurs inmaximum t

r

= 1.4s. The action providing maximum in-formation gain is stored as the nodding time for that range.Fig 5 presents plots for expected probability of detection of awire, given a uniform distribution. This gives a more intuitiveevaluation of the nodding trajectories.

The policy for scanning for landing zone evaluation may

0 0.5 1 1.50.4

0.5

0.6

0.7

0.8

0.9

1

Nodding Time

Pro

ba

bili

ty o

f D

ete

ctio

n

Range =650mVelocity =45m/s

Yaw Res.0.04

Yaw Res.0.14

Yaw Res.0.24

Yaw Res.0.34

Yaw Res.0.44

Yaw Res.0.54

Fig. 5: Expected Probability of Detection: The expectedprobability of detection of a wire if it exists, E

p(o) p(d|o)for a given action is an intuitive indicator of how goodan action is. The plot shows E

p(o) p(d|o) versus varyingnodding time period for the worst case scenario for thesensor at vehicle velocity, v = 45 m/s. Each sensor velocitycorresponding to different nodding time periods is evaluatedtill t

r

= 1.4seconds. The evaluation shows scans with slowerscanning speed are better, this is also intuitively the correctbehavior as slower nodding speeds means more uniformpoint distribution in the [✓,�] manifold.

also be calculated by the method presented above, but forour application continuously scanning the region of landingzone to get a uniform point distribution on it sufficed as anadequate policy. The sampling rate of the laser is fixed forsafety scans to 29000 points per second, as changing thesampling rate means switching off the laser for 7 secondsand reducing the maximum range of the laser from 1400mto 650m, which rendered the vehicle unsafe. Once the safetyof the entire mission is ensured, the sampling rate of thesensor is increased for faster landing zone evaluation.

VII. RESULTS

We test our approach on an autonomous full-sized heli-copter which uses the Near Earth Autonomy, Mark 3 activenodding range sensor. We tested our approach in Mesa, AZ,Manassas, VA, Pittsburgh, PA and Quantico, VA for morethan 30 autonomous missions in varying conditions. Fig. 6shows a typical collection of missions. All the missions needthe vehicle to navigate from start to the landing zone at highspeeds, evaluate the landing zone and land without hoveringif the landing zone is safe. The sensor had to clear enoughregion to keep the vehicle safe, even at the speeds of 56m/s and sample a 100m diameter landing zone with enoughnumber of points to evaluate it for landing the vehicle on it.On an average the time available to the laser to sample thelanding zone (LZ) was below 20seconds. Such missions wereimpossible to conduct with a passive scanning approach.Passive scans designed to ensure safety and evaluate landingzone are only able to keep the vehicle safe up to a maximum

speed of 18 m/s and evaluate the complete landing zone in112 seconds. Fig. 7 shows a detailed analysis of the sensor

Fig. 6: Mission Definition: The helicopter navigates fromloiter point to landing zone in less than 210 seconds. It hasto navigate the environment while being provably safe andtouch down at the LZ without hovering over it to look forpotential sites.

angles during a mission in Mesa, AZ. The policy scansfor safety in the beginning when the vehicle is navigatingtowards its destination. The focus of the sensor is movedto LZ evaluation once safety is ensured till the end ofthe mission. Notice the laser configuration is changed onlyafter confirming the safety of the vehicle for the rest of themission. This mitigates the problem of reduced sensor rangeat high sampling rates affecting vehicle safety. Fig. 8 shows

Fig. 7: Sensor Angles and Configuration: The top figureshows the sensor angles. In the initial part of the mission,the sensor plans to keeps the vehicle safe (oscillation of theblue line). It switches to focussing on the landing zone whensafety for the remainder of the mission has been guaranteed.This focussing of the laser is shown by the narrow peak topeak of the nodding angles. At the very end, once enoughpoints on the landing zone has been focussed, the sensorreverts back to ensuring that the vehicle can be safe should itdesire to waveoff. The bottom figure shows the configurationswitch. Since switching configuration reduces range, thepolicy triggers this event at the appropriate distance tolanding zone, while making sure that the safety of the restof the mission is ensured.

the performance of the policy in keeping the vehicle safe.The sensor always keeps the vehicle safe and only switchesto scanning the landing zone when it has guaranteed safetyfor the entire trajectory leading to touchdown. Notice thatthe executed vehicle speed is always well below the safespeed of the vehicle. Fig. 9 shows performance of the system

0 100 200 300 4000

20

40

60

Time to Land (s)

Speed (

m/s

)

Vehicle Speed

Maximum Admissible Safe Speed

0 100 200 300 400−20

0

20

40

60

80

Time to Land (s)S

afe

ty B

uffer

(s)

Time for which the vehicle issafe if it keeps following its currentpl anned traj ect or y wi thout sensi ng

Time for which the vehicle is safe if it keeps following its current planned trajectory without sensing.

Vehicle SpeedMaximum Safe Admissible Speed

0 100 200 300 4000

5000

10000

15000

Time to Land (s)

Dis

tanc

e fro

m L

Z (m

)

Fig. 8: Safety analysis for a single mission. In each figure,two events are marked - first (blue) is when a manual pilotmakes an aggressive maneuver, second (yellow) is when thesensor switches to scanning the landing zone. The figure inthe top shows that in autonomous mode the vehicle is flyingslower than the safe speed limit. The middle figure shows thetime after which the vehicle will be unsafe given its currentbelief. As the vehicle approaches to land, this time increasestill its high enough for the sensor to switch to scanning thelanding zone. The bottom figure shows the progress of themission.

over 12 missions executed in Quantico, VA. The vehicle isalways safe and evaluates the landing zone in time for allthe missions, this demonstrates the reliability of the policyin the real world.

VIII. CONCLUSION

The main contribution of this paper is to define an al-gorithm for actively controlling the sensor mounted on avehicle to maximize the vehicle’s performance. We reducethe problem of actively controlling the sensor to minimizetrajectory cost to the problem maximizing contextually im-portant information gain. We suggest a policy based approachto solve the NP hard problem of maximizing contextually

Fig. 9: Multiple Mission Safety Analysis: The safety analy-sis shown over several missions. The worst case performanceof the system is still shown to be guaranteed safe.

important information gain. The method is evaluated throughapplication on a full scale autonomous helicopter. We haveshown clear benefits of actively controlling the sensor usingour method as opposed to passive nodding. The methodenabled the helicopter to perform autonomous high speed,guaranteed safe flights and land on landing sites without theneed to hover.

In the future, we want to explore the possibility ofinferring the contextual importance function, given a costfunction. We are currently working towards applying themethod on micro aerial vehicles and extending the policysearch space to allow for richer sensory trajectories.

IX. ACKNOWLEDGEMENT

This work would not have been possible without thededicated efforts of the entire AACUS TALOS team andwas supported by ONR under contract N00014-12-C-0671.

REFERENCES

[1] A. S. Huang, A. Bachrach, P. Henry, M. Krainin, D. Maturana, D. Fox,and N. Roy, “Visual odometry and mapping for autonomous flightusing an rgb-d camera,” in International Symposium on RoboticsResearch (ISRR), 2011, pp. 1–16.

[2] P. Furgale and T. D. Barfoot, “Visual teach and repeat for long-rangerover autonomy,” Journal of Field Robotics, vol. 27, no. 5, pp. 534–560, 2010.

[3] C. Bills, J. Chen, and A. Saxena, “Autonomous mav flight in indoorenvironments using single image perspective cues,” in Robotics andautomation (ICRA), 2011 IEEE international conference on. IEEE,2011, pp. 5776–5783.

[4] S. Scherer, J. Rehder, S. Achar, H. Cover, A. Chambers, S. Nuske,and S. Singh, “River mapping from a flying robot: state estimation,river detection, and obstacle mapping,” Autonomous Robots, vol. 33,no. 1-2, pp. 189–214, 2012.

[5] S. Scherer, L. Chamberlain, and S. Singh, “Autonomous landing atunprepared sites by a full-scale helicopter,” Robotics and AutonomousSystems, vol. 60, no. 12, pp. 1545–1562, 2012.

[6] A. Wald, “Sequential tests of statistical hypotheses,” The Annals ofMathematical Statistics, vol. 16, no. 2, pp. 117–186, 1945.

[7] A. Cameron and H. Durrant-Whyte, “A bayesian approach to optimalsensor placement,” The International Journal of Robotics Research,vol. 9, no. 5, pp. 70–88, 1990.

[8] S. Chen, Y. Li, and N. M. Kwok, “Active vision in robotic systems: Asurvey of recent developments,” The International Journal of RoboticsResearch, vol. 30, no. 11, pp. 1343–1377, 2011.

[9] F. Bourgault, A. A. Makarenko, S. B. Williams, B. Grocholsky, andH. F. Durrant-Whyte, “Information based adaptive robotic explo-ration,” in Intelligent Robots and Systems, 2002. IEEE/RSJ Interna-tional Conference on, vol. 1. IEEE, 2002, pp. 540–545.

[10] J. Van Den Berg, S. Patil, and R. Alterovitz, “Motion planning underuncertainty using iterative local optimization in belief space,” TheInternational Journal of Robotics Research, vol. 31, no. 11, pp. 1263–1278, 2012.

[11] F. Bourgault, T. Furukawa, and H. F. Durrant-Whyte, “Coordinateddecentralized search for a lost target in a bayesian world,” in IntelligentRobots and Systems, 2003.(IROS 2003). Proceedings. 2003 IEEE/RSJInternational Conference on, vol. 1. IEEE, 2003, pp. 48–53.

[12] A. Ryan and J. K. Hedrick, “Particle filter based information-theoreticactive sensing,” Robotics and Autonomous Systems, vol. 58, no. 5, pp.574–584, 2010.

[13] V. Myers and D. Williams, “A pomdp for multi-view target classifi-cation with an autonomous underwater vehicle,” in OCEANS 2010.IEEE, 2010, pp. 1–5.

[14] A. Singh, A. Krause, C. Guestrin, and W. J. Kaiser, “Efficient informa-tive sensing using multiple robots,” Journal of Artificial IntelligenceResearch, vol. 34, no. 2, p. 707, 2009.

[15] J. Binney, A. Krause, and G. S. Sukhatme, “Optimizing waypoints formonitoring spatiotemporal phenomena,” The International Journal ofRobotics Research, vol. 32, no. 8, pp. 873–888, 2013.

[16] S. Arora and S. Scherer, “Policy based approach for sensor planning,”http://www.frc.ri.cmu.edu/⇠sankalp/publications/pasp.pdf.

[17] S. Thrun, “Learning occupancy grid maps with forward sensor mod-els,” Autonomous robots, vol. 15, no. 2, pp. 111–127, 2003.

[18] S. Arora, S. Choudhury, S. Scherer, and D. Althoff, “A principledapproach to enable safe and high performance maneuvers for au-tonomous rotorcraft,” in American Helicopter Society 70th AnnualForum. AHS, 2014.

[19] B. J. Julian, S. Karaman, and D. Rus, “On mutual information-based control of range sensing robots for mapping applications,” TheInternational Journal of Robotics Research, vol. 33, no. 10, pp. 1375–1392, 2014.

[20] C. E. Shannon, “Prediction and entropy of printed english,” Bell systemtechnical journal, vol. 30, no. 1, pp. 50–64, 1951.

http://www.frc.ri.cmu.edu/~sankalp/publications/pasp.pdf

Documents

PASP: Policy Based Approach for Sensor Planning