Anytime Online Novelty Detection for Vehicle Safeguarding

Anytime Online Novelty Detection for Vehicle Safeguarding

Boris Sofman, J. Andrew Bagnell, and Anthony StentzRobotics Institute

Carnegie Mellon UniversityPittsburgh, PA 15213

{bsofman,dbagnell,axs}@ri.cmu.edu

ABSTRACT

Novelty detection is often treated as a one-class classifi-cation problem: how to segment a data set of examples fromeverything else that would be considered novel or abnormal.Almost all existing novelty detection techniques, however,suffer from diminished performance when the number of lessrelevant, redundant or noisy features increases, as often thecase with high-dimensional feature spaces. Many of thesealgorithms are also not suited for online use, a trait that ishighly desirable for many robotic applications. We presenta novelty detection algorithm that is able to address thissensitivity to high feature dimensionality by utilizing priorclass information within the training set. Additionally, ouranytime algorithm is well suited for online use when a con-stantly adjusting environmental model is beneficial. We applythis algorithm to online detection of novel perception systeminput on an outdoor mobile robot and argue such abilitiescould be key in increasing the real-world applications andimpact of mobile robotics1.

I. I NTRODUCTION

Many autonomous unmanned ground vehicles (UGVs)have advanced to a level where they are competent andreliable a high percentage of the time in many environments[1], [2], [3]. Most of these systems, however, are heavilyengineered for the domains they are intended to operatein. Any deviation from these domains often results in sub-optimal performance or even complete failure. Given thecost of such systems and the importance of safety andreliability in many of the tasks that they are intendedfor, even a relatively rare rate of failure is unacceptable.In many domains that are prime candidates for mobilerobotic applications such as space exploration, transportation,military reconnaissance, and agricultural tasks, the riskofcatastrophic failure, however small, is a primary reason whyautonomous systems are still under-utilized despite alreadydemonstrating impressive abilities.

One approach to addressing this limitation is for a UGVto be able to identify situations that it is likely untrainedtohandlebefore it experiences a major failure. This problemtherefore becomes one of novelty detection: how a robotcan identify when perception system inputs differ fromprevious prior inputs seen during training or operation. With

1Most figures in this paper are best viewed in color.

Fig. 1. Sample result from online novelty detection algorithm onboard alarge UGV. Chain-link fence was detected as novel (top and left, noveltyshown in red) with respect to the large variety of terrain andvegetationpreviously encountered. After an initial stretch being identified as novel,subsequent portions of the fence are no longer flagged (right) due tothe algorithm’s online training ability. As with all future similar images,insets within the top image show a first-person view (left inset) andthe classification of the environment by the perception system into road,vegetation, and solid obstacle in blue, green and red respectively (rightinset).

this ability, the system can either avoid novel locations tominimize risk or stop and enlist human help via supervisorycontrol or tele-operation (see Figure1).

Two common limitations of novelty detection systemsare particularly relevant to the mobile robotics domain. Au-tonomous systems often need to learn from their experiencesand continually adjust their models of what is normal andwhat is novel. For example, if human feedback were toconfirm that a certain type of environment selected as novelis actually safe to handle with the existing autonomy systemor demonstrate to the system the proper way to handle thesituation, the model no longer needs to identify such inputsas novel. Most novelty detection approaches, however, builda model of the normal set of examples a priori in batch inorder to detect novel examples in the future but are unable

to update that model online without retraining.Furthermore, existing novelty detection techniques see di-

minished performance when using high-dimensional featurespaces, particularly when some features are less relevant,redundant, or noisy. These qualities are particularly commonin features from many UGV perception systems due to thevariety of sensors and uncertainty about how these featuresrelate to novelty. For example, a large variety of camera-based features from color and texture might be computedfor use in various components of the perception system.While these features are potentially powerful, subsets ofthese features often contain redundant irrelevant information.It is therefore important for novelty detection techniquestobe resilient to such feature properties.

We present an online approach that addresses these com-mon problems with novelty detection techniques. We ap-proach the problem of novelty detection as one of onlinedensity estimation where seen examples generate an influ-ence of familiarity in feature space toward future examples.When prior class information is available, we show howusing Multiple Discriminant Analysis (MDA) for generatinga reduced dimensional subspace to operate in rather thanother common techniques such as Principal ComponentsAnalysis (PCA) can make the novelty detection system morerobust to issues associated with high-dimensional featurespaces. In effect, this creates a lower dimensional subspacethat truly captureswhat makes things novel. Additionally,our algorithm can be framed as a variant of the NORMAalgorithm, an online kernelized Support Vector Machine(SVM) optimized through stochastic gradient descent, andtherefore shares its favorable qualities [4]. Along with itsanytime properties, this allows our algorithm to better dealwith the real-time demands of online tasks.

While this work was targeted toward mobile roboticsapplications, the approaches here are more generally appli-cable to any domain which can benefit from online noveltydetection.

The next section presents background on novelty detec-tion techniques and some example applications. SectionIIIpresents our novelty detection algorithm, followed by anexplanation of how this technique can be applied to mobilerobotics in SectionIV, results from field testing on a largeUGV in SectionV and concluding remarks in SectionVI .

II. N OVELTY DETECTION

Novelty detection techniques (also referred to as anomalyor outlier detection) have been applied to a wide range ofdomains such as detecting structural faults [5], abnormal jetengine operation [6], computer system intrusion detection[7], and identifying masses in mammograms [8]. In therobotics domain some have incorporated novelty detectionsystems within inspection robots [9], [10].

Novelty detection is often treated as a one-class classi-fication problem. In training the system sees a variety of“normal” examples (and corresponding features) and later thesystem tries to identify input that does not fit into the trainedmodel in order to separate novel from non-novel examples.

Instances of abnormalities or novel situations are often rareduring the training phase so a traditional classifier approachcannot be used to identify novelty in most cases.

Most novelty detection approaches fall into one of severalcategories. Statistical or density estimation techniquesmodelthe “normal” class in order to identify whether a test samplecomes from the same distribution or not. Such approachesinclude Parzen window density estimators, nearest neighbor-based estimators, and Gaussian mixture models [11]. Thesetechniques often use a lower-dimensional representation ofthe data generated through techniques such as PCA.

Other approaches attempt to distinguish the class of in-stances in the training set from all other possible instancesin the feature space. Scholkopf et al. [12] show how an SVMcan be used for specifically this purpose. A hyper-plane isconstructed to separate the data points from the origin infeature space by the maximum margin. One application ofthis technique was document classification [13]. A noticeabledrawback of this approach is that it makes an inherentassumption that the feature space origin is a suitable priorfor the novel class. This limitation was addressed by [14]by attracting the decision boundary toward the center of thedata distribution rather than repelling it from the origin.Asimilar approach encloses the data in a sphere of minimalradius, using kernel functions to deal with non-sphericallydistributed data [15]. These techniques all require solutionsto linear or quadratic programs with slack variables to handleoutliers.

Another class of techniques attempts to detect novelty bycompressing the representation of the data and measuringreconstruction error of the input. The key idea here is thatinstances of the original data distribution are expected tobe reconstructed accurately while novel instances are not.Asimple threshold can then be used to detect novel examples.The simplest method of this type uses a subset of theeigenvectors generated by PCA to reconstruct the input. Anobvious limitation here is that PCA will perform poorly ifthe data is non-linear. This limitation was addressed by usinga kernel PCA based novelty detector [16]. Benefits of moresophisticated auto-encoders, neural networks that attempt toreconstruct their inputs through narrow hidden layers, havebeen studied as well [17].

Online novelty detection has received significantly less at-tention than its offline counterpart. Since it is often importantto be able to adjust the model of what is considered novel inreal-time, many of the above techniques are not suitable foronline use as they require significant batch training prior tooperation. While Neto et al. [9] replaced the use of PCA fornovelty detection with an implementation of iterative PCA,performance was still largely influenced by the initial dataset used for training. Marsland proposed a unique approachthat models the phenomenon of habituation where the brainlearns to ignore repeated stimuli [10]. This is accomplishedthrough a clustering network called a Grow When Required(GWR) network. This network keeps track of firing patternsof nodes and allows the insertion of new nodes to allowonline adaptation.

Markou and Singh have written a pair of extensive surveyarticles detailing many additional novelty detection applica-tions and techniques [18], [19].

The performance of the above-mentioned novelty detec-tion approaches, however, quickly deteriorates as the numberof less relevant or noisy features grows. The dispropor-tionately high variance of many of these features make itdifficult for many of these algorithms to capture an adequatemodel of the training data and their effects quickly begin todominate more relevant features in making predictions. Ouralgorithm addresses this crucial limitation in cases whereclass information is available within the training set whilestill being suitable for online use.

III. A PPROACH

A. Formalization

The goal of novelty detection can be stated as follows:given a training setD = {x}1...N ∈ X where xi ={x1

i , . . . , xki }, learn a functionf : X → {novel, not-novel}.

In the online scenario, each time stept provides an examplext and a predictionft(xt) is made.

We perform online novelty detection using the onlinedensity estimation technique shown in Algorithm1. Allpossible functionsf are elements of areproducing kernelHilbert spaceH [20]. All f ∈ H are therefore linearcombinations of kernel functions:

ft(xt) =

t−1∑

i=1

αik(xi, xt) (1)

We make the assumption that proximity in feature spaceis directly related to similarity. Observed examples deemedas novel are therefore remembered and have an influence offamiliarity on future examples through the kernel functionk(xi, xj). A novelty threshold,γ, and a learning rate,η, areinitially selected. For each examplext, the algorithm accu-mulates the influence of all previously seen novel examples(line 5). If this sum does not exceedγ then the exampleis identified as novel and is remembered for future noveltyprediction (line7)2. Non-novel examples are not stored asthey are assumed to have minimal impact on future noveltycomputations (even though a coefficient of0 is assignedin line 9 for clarity, these examples are not stored). Wesuggest simply using the Gaussian kernel with an appropriatevarianceσ2:

k(xi, xj) = e−‖xi−xj‖

2

σ2 (2)

B. Improved Dimensionality Reduction

Especially if the number of features is large, it mayfirst be necessary to project the high-dimensional inputxt into a lower-dimensional subspace more suitable fornovelty detection using distance metrics. PCA, the most

2While both the novelty threshold,γ, and the learning rate,η, are includedfor clarity, they only represent one degree of freedom sincethe equationscan be simply rescaled to be functionally equivalent withη = 1.

Algorithm 1 Online novelty detection algorithm

1: given: A sequence of featuresS = (xi)1...T ; a noveltythresholdγ; a learning rateη

2: outputs: A sequence of hypothesesf =(f1(x1), f2(x2), . . .)

3: initialize: t← 14: loop5: ft(xt)←

∑t−1

i=1αik(xi, xt)

6: if ft(xt) < γ then7: αt ← η

8: else9: αt ← 0

10: end if11: t← t+ 112: end loop

common approach for this purpose among dimensionalityreduction (and novelty detection) techniques, finds a lineartransformation that minimizes the reconstruction error inaleast-squares sense. If subsets of the features are redundant,noisy or are dominated disproportionally by a subset of thetraining set, however, applying techniques such as PCA, orany unsupervised dimensionality reduction technique for thatmatter, may yield disappointing results as precisely the mostrelevant directions for differentiation may be discarded inorder to reduce reconstruction error of a less relevant portionof the feature space.

Rather than optimizing for reconstruction error,discrim-inant analysisseeks transformations that are efficient fordiscriminating between different classes within the data.Multiple Discriminant Analysis (MDA), a generalization ofFischer’s linear discriminant for more than two classes, com-putes the linear transformation that maximizes the separationbetween the class means while keeping the class distributionsthemselves compact, making it useful for classification tasks[11].

We argue that when prior class information for the trainingset is available, using MDA to construct a lower dimensionalsubspace using labeled classes not only optimizes for knownclass separability but likely leads to separability betweenknown classes and novel classes. In cases described earlierthat result in poor performance when using PCA, MDA willlargely ignore features that do not aid in class discrimination,instead focusing on the strongly differentiating features.Novelty detection is about encountering new classes, so byusing discriminating ability as the metric for constructing asubspace, one can capture the combinations of features thatmake known classes novelwith respect to each otherandlikely generalize to previously unseen environments, in effectcapturing what makes things novel. While this algorithmcould be performed using the raw features, from this pointforward xt will refer to the projection of the raw featuresinto the lower dimensional space rather than the raw featuresthemselves.

Additionally, labeled prior class data allows one to choose

an appropriateσ by finding the value that optimizes the ratiobetween inter-class contribution to outer-class contributionfor a random subset of examples.

Experimental validation of this theory within the domainof mobile robotics is presented in SectionsIV andV.

C. Framing as Instance of NORMA

The NORMA algorithm is a stochastic gradient descentalgorithm that allows the use of kernel estimators for onlinelearning tasks [4]. As with our algorithm,f is expressed as alinear combination of kernels (1). NORMA uses a piecewisedifferentiable convex loss functionl such that at each steptwe add a new kernel centered atxt with the coefficient:

αt = −ηl′(xt, yt, ft) (3)

Our algorithm can easily be framed as an online SVMinstance of NORMA using a hinge loss function as follows:

yt = γ (4)

l(xt, yt, ft) = max(0, yt − ft(xt)) (5)

Taking the derivative of (5), we get:

l′(xt, yt, ft) =

{

−1 if ft(xt) < γ

0 otherwise(6)

As before, the gradient of our loss is non-zero only whenthe accumulated contributions from stored examples are lessthan the novelty thresholdγ, signifying that the example isnovel. From (3) and (6) we then get:

αt =

{

η if ft(xt) < γ

0 otherwise(7)

This is equivalent to the update steps in lines7 and 9 ofAlgorithm 1, showing that our algorithm can be framed as aspecific instance of the NORMA algorithm.

NORMA produces a variety of useful bounds on theexpected cumulative loss [4]. For novelty detection thisdirectly relates to the number of examples that are expectedto be flagged as novel. This means we are competitive withrespect to the bestf ∈ H in terms of representing oursample distribution with the fewest number of examples. Thisis to our advantage both from a computational perspective,since memory and prediction costs scale with the number ofremembered examples, as well as performance since we wantto minimize false positives that may be costly to handle.

D. Query Optimization

Without further measures, the potential number of basisfunctions stored by Algorithm1 could grow without bound.NORMA deals with this issue by decaying all coefficientsαi and dropping terms when their coefficients fall belowsome threshold. We found that to prune in this way to aneffective fixed capacity, the decay rate has to be relativelyhigh which leads to a forgetting effect. The qualitativeeffect is that examples that were previously encounteredare re-flagged as novel which would result in unnecessary

human interventions. Instead, we propose a modified anytimeversion of our algorithm that better utilizes a fixed buffersize while ensuring efficient and bounded computation (seeAlgorithm 2).

Algorithm 2 Online novelty detection algorithm with queryoptimization

1: given: A sequence of featuresS = (xi)1...T ; a noveltythreshold γ; a learning rateη; a maximum examplestorage capacityN

2: outputs: A sequence of hypothesesf =(f1(x1), f2(x2), . . .)

3: initialize: t← 1; ; n← 04: loop5: i← 16: ft(xt)← 07: while ft(xt) < γ and i ≤ n do8: ft(xt)← ft(xt) + αik(xi, xt)9: i← i+ 1

10: end while11: if ft(xt) < γ then12: αn+1 ← η

13: xn+1 ← xt

14: n← n+ 115: i← i− 1 // i was incremented one extra time16: end if17: optimize sequence: Move (αi, xi) to front18: if n > N then19: Delete(αi, xi)i>n

20: n← N

21: end if22: t← t+ 123: end loopAt line 17, if ft(xt) = not-novel, i indexes the example that brokethe novelty threshold. Otherwise,i indexesxt.

This algorithm takes advantage of the fact that familiaritycontribution to new queries is often dominated by only a fewexamples. First, we can easily gain some efficiency by onlyprocessing stored examples until we have reached the noveltythreshold (line7). The key performance improvement, how-ever, comes from the sequence optimization in line17. Foreach prediction, the stored example that breaks the noveltythresholdγ, or the new novel example itself, is moved tothe front of the list as it is more likely to impact futurequeries. This is a slight variation of the traditional problemof dynamically maintaining a linear list for search queriesforwhich the move-to-front approach was proven to be constant-competitive, meaning no algorithm can beat this approach bymore than a constant factor [21].

In addition to allowing us to bound the number of storedexamples (line19) and perform favorably with respect toNORMA (see Figure5), this gives our algorithm an anytimeproperty by enabling it to quickly classify as much of theenvironment as possible as not novel. When this algorithmis unable to run to completion due to time constraints, itwill fail intelligently by generating false positives but never

Fig. 2. Robot used for novelty detection testing (left) and ahigh-level illustration perception system data flow (right). Features for noveltydetection are taken from the steps highlighted in red.

Fig. 3. Example raw engineered features from the UGV’s perception systemused by the novelty detection algorithm. NDVI (normalized difference ofvegetation index) is a useful metric for detecting vegetation.

potentially dangerous false negatives.

IV. A PPLICATION TO MOBILE ROBOTICS

A natural application of our algorithm is to online noveltydetection for a mobile robot. The large UGV shown in Figure2 that was used throughout our tests is intended for operationin complex, outdoor environments, performing local sensingusing a combination of ladar and camera sensors [22]. Theperception system assigns traversal costs by analyzing thecolor, position, density, and point cloud distributions oftheenvironment [23], [24]. A large variety of engineered featuresthat could be useful for this task are computed in real-time(see Figure3) and the local environment is segmented intocolumns of20 cm3 voxels in order to capture all potentiallyrelevant information. Each voxel (tagged with its correspond-ing features) is passed through a series of classifiers andcombined with additional density-related features to create amore compact set of intermediate features more suitable fortraversal cost computation. The system then interprets thesefeatures through hand-tuned or learned methods to create afinal traversal cost for that location in the world to be usedfor path planning purposes.

To perform novelty detection we used subsets of the initialraw features as well as the intermediate classification anddensity features for each voxel. This vertical voxelizationapproach is effective for mobile robots since the presenceof specific features at certain vertical positions are highlyrelevant to their impact on traversal cost. For example, solidobjects at wheel height are likely to be small rocks whilesimilar features higher off of the ground are more likelyto be trees or man-made objects. Similarly, such spatialinformation is vital to effective novelty detection. This forcedus to deal with a relatively high-dimensional feature space

Fig. 4. All training examples projected onto the subspace defined by thefirst three basis vectors computed by PCA (top) and MDA (bottom). Onlythe first four classes were used to construct the subspaces (’other man-made’ class was withheld as a test class). The MDA-based projection clearlyshows significantly more separation between the new man-made class andthe known classes, implying a more suitable subspace for novelty detection.

(49 features) as well as with the associated issues describedearlier.

We deal with this problem by using MDA with a libraryof hand-labeled examples across many environments andconditions to compute a lower dimensional subspace moresuitable for density estimation as described in the previoussection. Of the available classes, four were used to con-struct a three-dimensional subspace: road/grass/dirt, rocks,bushes and barrels.A fifth class of examples correspondingto various non-barrel man-made objects (various man-madestructures, barriers, vehicles, poles, human-sized plastic fig-ures, etc.) was withheld for testing purposes.It is importantto note that class labels are only used initially for generatinga lower-dimensional feature space and are unnecessary forlater processing3.

Figure 4 shows the projection of all five classes ontothe first three basis vectors computed by PCA and MDAusing the first four classes4. The MDA projections clearlyshow better separation between the new set of man-madeexamples and the original four classes. As expected, themost overlap occurs with the barrel class as barrels share

3For many robust autonomy systems including ours, such data is requiredregardless for perception system development (for example, for training andvalidation of onboard classifier systems).

4All features were initially rescaled to zero-mean, unit-variance.

Fig. 5. ROC curves showing the false positive rate vs. the true positiverate for novelty detection under various configurations while varying γ,the threshold for novelty. A random1500-example subset of each trainingcategory was used as thefamiliar set with respect to which novelty wasdetected in a held-out test set coming from the same training categories (todetect false-positives) as well as the man-made category (to test for true-positives). Performance using lower-dimensionality spacescreated throughMDA and PCA (shown in Figure4) are shown in solid red and dotted red,respectively. When storage for only150 examples is available, performancesunder the query optimization approach described in Algorithm 2 as wellas the NORMA truncation approached described in [4] are shown insolid and dotted blue, respectively. For all tests, the learning rate,η, wasset to 1, and σ

2 was computed to be0.9 as described earlier. Noveltydetection performance using the MDA-based space was shown touniformlyoutperform the PCA-based space and the query optimization approachfor dealing with limited memory was shown to uniformly outperform thesuggested NORMA approach.

common properties with other man-made objects such assmooth surfaces, colors, etc. Since we would desire thesenew examples to be identified as novel relative to the restof the classes, this separation implies that this is a moresuitable subspace for use as a similarity metric within anovelty detection system. This benefit is clearly visible inthe ROC curves in Figure5.

Because our algorithm is efficient for online use, thenovelty model can start uninitialized or can be seeded witha sampling of examples used during training so that it canidentify areas that are novel and potentially unsafe to handlewith the current perception system.

V. EXPERIMENTAL RESULTS

Our novelty detection algorithm (with query optimization)was tested using our large UGV on an a natural outdoor envi-ronment to evaluate its online novelty detection performance(the algorithm ran in real-time on logged data). The testenvironment traversed by the robot consisted of combinationsof road, grass and dirt, a large variety of vegetation, a seriesof small barrels, several ditches, large heavily-sloped pilesof rocks and a long chain-link fence.

We projected all examples into the three-dimensional sub-space generated by MDA as described in the previous sectionfrom the first four hand-labeled classes (examples from theman-made objects category were not used). To best exhibitthe online novelty detection abilities of our algorithm, themodel was initialized to contain no prior examples. As theenvironment was explored, perception system features were

Fig. 6. Shortly after initialization with no prior novelty model, varioussmall vegetation was detected as novel (identified in red).

averaged into0.8 cm2 grid locations for use as online batchesof examples. Those that were identified as novel relativeto the current model (composed of everything previouslyidentified as novel) were incorporated into the model asdescribed earlier.

The vehicle’s initial environment consisted of fairly openterrain with some light scattered vegetation scattered onboth sides. As expected, instances of such vegetation weredetected as novel the first few times they were seen (seeFigure6).

The vehicle then encountered areas of much denser, largervegetation. Initially, a majority of such vegetation was iden-tified as novel with respect to previous inputs (see Figure7). As the vehicle continued navigating through similarvegetation, the model adapted and no longer identified suchstimuli as novel (see Figure8). Figure 9 demonstrates thislearning process through a series of overhead images of thisinitial environment, identifying all future locations that arenovel with respect to thecurrent model. Output is shownat three points in time: near the beginning of navigation,just before initial encounters with dense vegetation and aftersensing a small amount of dense vegetation. It is clearlyvisible how the system adapts quickly, causing future similarinstances to no longer be flagged as novel.

Proceeding through the environment, the vehicle thenencounters a series of plastic barrels (see Figure10). Asdesired, the first several appear as novel with respect tothe large variety of vegetation previously seen while laterbarrels are no longer novel due to their strong similarityto the initially seen barrels. Similarly, a long stretch of achain-link fence is identified as novel late in the course (seeFigure 1). Again, the initial portions of the fence triggeredthe novelty detection algorithm while later portions were nolonger novel due to the algorithm’s adaptation. The value ofsuch an approach is apparent when considering the UGVperception system’s interpretation of the fence comparedto more familiar vegetation (see Figure11). Additionalexamples of novel instances identified during traversal appearin Figure12.

Overall, the novelty detection algorithm was able to iden-tify all major unique objects (vegetation, barrels, fence,etc.)with a relatively small amount of false positives due toeffective adaptation to the environment. When PCA was usedto create the feature subspace, errors increased due to the lackof separability between classes. As with any algorithm, the

Fig. 7. Initial encounter with larger and denser vegetationresults in asignificant amount of detected novelty (identified in red).

Fig. 8. Similar vegetation as that shown in Figure7 encountered a shorttime later. Notice how almost all vegetation is no longer noveldue tosimilarity to previous stimuli.

Fig. 9. Novelty of all future perception input using currentnovelty modelon vegetation-heavy terrain shown in Figures6, 7 and 8 at three pointsthroughout traversal. Robot’s past and future path is shownin light anddark green respectively and novelty of terrain is indicatedby a gradientfrom yellow (moderately novelty) to red (high novelty). Robot is initializedwithout a prior novelty model.

Fig. 10. Series of barrels encountered later in the course. The initial barrelsare detected as novel (red shade) even after significant exposure to a largevariety of vegetation (top and left). Later barrels are no longer identified asnovel due to online training.

Fig. 11. The perception system’s interpretation of the chain-link fencefrom Figure1 (left) and the dense bushes from Figure7 (right). Lower andhigher traversal cost regions appear in darker and brightershades of whiterespectively, with areas that are considered impassable appearing in purple.Even though the fence is significantly more hazardous to the vehicle thanany of the vegetation, it is not in the perception system’s experience baseand its traversal cost is therefore significantly underestimated.

Fig. 12. Additional examples of novel instances identified during latertraversal (red shade): first encounter with a ditch (left) and a large, heavily-sloped pile of rocks (right).

Fig. 13. Average computation in milliseconds per novelty query on 3.2

GHz CPU for Algorithm1 (dashed red line) and Algorithm2 (solid blueline) over the previous5 seconds throughout navigation. Computationalcomplexity of Algorithm2 remains bounded due to the order optimizationstep (line 17). These timings do not include feature computation andprojection costs as they are identical under both algorithms.

success of this approach is heavily dependent on the qualityof features.

Computation time comparisons between the two algo-rithms on this course highlight the effectiveness of queryoptimization (see Figure13). While the average computationtime required per novelty query using Algorithm1 growswith the number of stored examples, Algorithm2 experiencestemporary spikes in computation time as novel areas areencountered but query optimization allows the algorithm toquickly adapt its ordering of examples in order to maintaina bounded computation throughout navigation and alloweffective anytime novelty prediction.

VI. CONCLUSION

Our algorithm addresses two significant limitations ofmost novelty detection approaches. By using MDA forsupervised dimensionality reduction rather than unsupervisedtechniques such as PCA, this algorithm operates on a sub-space that is more conducive to viewing novelty as a distancemetric and is therefore more resistant to many of the issuesassociated with high-dimensional feature spaces. Addition-ally, this algorithm’s adaptive abilities, computationalboundsand anytime properties make it a logical choice for manyonline novelty detection tasks. As robotic systems continueto improve, such approaches can help capitalize on theirabilities by acting as a safeguard against the inevitabledangers of unfamiliar situations.

ACKNOWLEDGMENT

This work was partially sponsored by DARPA undercontract Unmanned Ground Combat Vehicle - Percep-tOR Integration (contract number MDA972-01-9-0005) andby the U.S. Army Research Laboratory under contractRobotics Collaborative Technology Alliance (contract num-ber DAAD19-01-2-0012). The views and conclusions con-tained in this document are those of the authors and shouldnot be interpreted as representing the official policies, eitherexpressed or implied, of the U.S. Government. Boris Sofmanis partially supported by a Sandia National Laboratories

Excellence in Engineering Fellowship. We would also liketo thank all the members of the UPI and CTA project teams,without whom this work would not be possible.

REFERENCES

[1] C. Urmson, J. Anhalt, D. Bagnell,et al., “Autonomous driving inurban environments: Boss and the urban challenge,”Journal of FieldRobotics, vol. 25, no. 8, pp. 425–466, 2008.

[2] A. Kelly, A. Stentz, O. Amidi, et al., “Toward reliable off roadautonomous vehicles operating in challenging environments,” TheInternational Journal of Robotics Research, vol. 25, no. 5-6, pp. 449–483, 2006.

[3] S. Thrun, M. Montemerlo, H. Dahlkamp,et al., “Stanley: The robotthat won the darpa grand challenge,”Journal of Field Robotics, vol. 23,no. 9, pp. 661–692, June 2006.

[4] J. Kivinen, A. J. Smola, and R. C. Williamson, “Online learning withkernels,” in NIPS, T. G. Dietterich, S. Becker, and Z. Ghahramani,Eds. MIT Press, 2001, pp. 785–792.

[5] K. Worden, “Structural fault detection using a novelty meassure,”Journal of Sound and Vibration, vol. 201, no. 1, pp. 85–101, 1997.

[6] P. Hayton, B. Scholkopf, L. Tarassenko,et al., “Support vector noveltydetection applied to jet engine vibration spectra,” inNIPS, T. K. Leen,T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2000, pp. 946–952.

[7] J. Ryan, M.-J. Lin, and R. Miikkulainen, “Intrusion detection withneural networks,” inAdvances in Neural Information ProcessingSystems, M. I. Jordan, M. J. Kearns, and S. A. Solla, Eds., vol. 10.The MIT Press, 1998.

[8] L. Tarassenko, P. Hayton, N. Cerneaz, and M. Brady, “Novelty detec-tion for the identification of masses in mammograms,” inProceedingsof the Fourth International IEEE Conference on Artificial NeuralNetworks, vol. 409, 1995, pp. 442–447.

[9] H. V. Neto and U. Nehmzow, “Visual novelty detection with automaticscale selection,”Robotics and Autonomous Systems, vol. 55, no. 9, pp.693–701, 2007.

[10] S. Marsland, U. Nehmzow, and J. Shapiro, “On-line novelty detectionfor autonomous mobile robots,”Robotics and Autonomous Systems,vol. 51, no. 2-3, pp. 191–206, 2005.

[11] R. O. Duda and P. E. Hart,Pattern Classification. John Wiley andSons, 2000.

[12] B. Scholkopf, R. C. Williamson, A. J. Smola,et al., “Support vectormethod for novelty detection,” inNIPS, S. A. Solla, T. K. Leen, andK.-R. Muller, Eds. The MIT Press, 1999, pp. 582–588.

[13] L. M. Manevitz and M. Yousef, “One-class SVMs for documentclassification,”Journal of Machine Learning Research, vol. 2, pp. 139–154, 2001.

[14] C. Campbell and K. P. Bennett, “A linear programming approach tonovelty detection,” inNIPS, T. K. Leen, T. G. Dietterich, and V. Tresp,Eds. MIT Press, 2000, pp. 395–401.

[15] D. M. J. Tax and R. P. W. Duin, “Support vector domain description,”Pattern Recognition Letters, vol. 20, no. 11-13, pp. 1191–1199, 1999.

[16] H. Hoffmann, “Kernel PCA for novelty detection,”Pattern Recogni-tion, vol. 40, no. 3, pp. 863–874, Mar. 2007.

[17] N. Japkowicz, C. Myers, and M. A. Gluck, “A novelty detectionapproach to classification,” inIJCAI, 1995, pp. 518–523.

[18] M. Markou and S. Singh, “Novelty detection: a review - part 1:statistical approaches,”Signal Processing, vol. 83, no. 12, pp. 2481–2497, 2003.

[19] ——, “Novelty detection: a review - part 2: neural network basedapproaches,”Signal Processing, vol. 83, no. 12, pp. 2499–2521, 2003.

[20] B. Scholkopf and A. J. Smola,Learning with Kernels. Cambridge,MA: The MIT Press, 2002.

[21] D. Sleator and R. Tarjan, “Amortized efficiency of list update andpaging rules,”CACM: Communications of the ACM, vol. 28, 1985.

[22] A. Stentz, J. Bares, T. Pilarski, and D. Stager, “The crusher systemfor autonomous navigation,” inAUVSIs Unmanned Systems NorthAmerica, August 2007.

[23] D. M. Bradley, R. Unnikrishnan, and J. Bagnell, “Vegetation detectionfor driving in complex environments,” inICRA. IEEE, 2007, pp. 503–508.

[24] J.-F. Lalonde, N. Vandapel, D. Huber, and M. Hebert, “Natural terrainclassification using three-dimensional ladar data for ground robotmobility,” Journal of Field Robotics, vol. 23, no. 1, pp. 839 – 861,November 2006.

Documents

Anytime Online Novelty Detection for Vehicle Safeguarding