6
EXPECTATION-BASED, MULTI-FOCAL, SACCADIC (EMS) VISION FOR DYNAMIC SCENE UNDERSTANDING Ernst D. Dickmanns Universitaet der Bundeswehr Munich, LRT, ISF D-85577 Neubiberg, Germany, [email protected] ABSTRACT A new vision system for vehicles moving on uneven ground with strong perturbation input has been realized which exploits both inertial and visual data for dynamic scene understanding. Active pointing of viewing direc- tion for a set of cameras with various focal lengths allows mimicking vertebrate-type vision. The spatio-temporal models of the 4-D approach to dynamic vision provide the background for data fusion and situation assessment. The system has been realized on four DualPentium-PC with high communication bandwidth data exchange through a scalable coherent interface (SCI). A ‘Dynamic Object dataB ase’ (DOB) displays the collection of objects of interest and their states. Objects are defined according to generic classes. Results of mission perfor- mance with the 5-ton van VaMoRs on a network of minor roads are shown. Saccadic vision is selected when two regions of attention can not be mapped by a single viewing direction. INTRODUCTION EMS-vision has been designed to cope with many different aspects of mission performance when a wide field of view and good resolution in a smaller (central) area are required simultaneously. Due to dynamic motion of the vehicle (especially in the rotational degrees of freedom) on uneven ground, joint inertial and visual perception is required. Efficient data interpretation of this sensor arrangement with several (3 to 4) video cameras and half a dozen inertial sensors (measuring time derivatives of pose and velocity components of the own body) requires spatio-temporal models in 3-D space. The 4-D approach to dynamic vision [0, 1] lends itself for realizing this vertebrate-type of vision. THE ‘MARVEYE’ CAMERA CONFIGURATION A wide field of view (f.o.v., > ~100°) nearby, realized by two wide-angle cameras with divergent optical axes, allows to avoid obstacles at low speed and to negotiate tight curves. (Trinocular) Stereo vision in a small central f.o.v. yields depth estimations in the near range from just one single well recognizable feature. The additional tele camera covering the vertical center of the area of overlap between the two wide-angle cameras is a high-resolution 3-chip-color camera; its focal length is 3 to 4 times the one of the two wide-angle cameras (or a zoom lens). Active gaze control allows 1. shifting this central f.o.v. to where it is needed most, and 2. to inertially stabilize the viewing direction for eliminating motion blur. Figure 1 shows one of the arrangements realized. This approach provides advantages for detection and recognition of objects or landmarks far away; with a fourth camera and a strong tele-lens, a factor of ten in focal length as compared to the wide-angle lenses has been realized. This combination of cameras, mounted fix relative to each other on a single pan & tilt platform, acts like a vertebrate eye and requires quick gaze control for achieving good resolution in areas which show interesting features in the wide f.o.v. This concept trades about two orders of magnitude in video data rate for a short time delay (a few tenths of a second) until a high-resolution image of this part of the scene is available. According to the major constituents: Multi-focal camera set, active/r eactive gaze control relative to the V ehicle carrying it, it has been dubbed ‘MarVEye’. VISION SYSTEM DESIGN This concept requires distributed processors for handling the huge video data streams. single- chip color cameras (1/2”) wide field of view f = 6 mm f = 24 mm divergent trinocular stereo range ~ 100° ~11° ~ 44° L 5 ~ 200 m L 5 ~ 40m 3-chip color camera (1/3”) Figure 1: ‘MarVEye’ camera arrangement on high-bandwidth two-axis platform (pan/tilt): Fields of view with resolution better than 5 cm per pixel (normal to optical axis, top), and one of several implementations investigated (bottom).

An Expectation-Based Multi-Focal Saccadic (EMS) Vision System for Vehicle Guidance

Embed Size (px)

Citation preview

EXPECTATION-BASED, MULTI-FOCAL, SACCADIC (EMS) VISIONFOR DYNAMIC SCENE UNDERSTANDING

Ernst D. Dickmanns

Universitaet der Bundeswehr Munich, LRT, ISFD-85577 Neubiberg, Germany, [email protected]

ABSTRACTA new vision system for vehicles moving on unevenground with strong perturbation input has been realizedwhich exploits both inertial and visual data for dynamicscene understanding. Active pointing of viewing direc-tion for a set of cameras with various focal lengths allowsmimicking vertebrate-type vision. The spatio-temporalmodels of the 4-D approach to dynamic vision providethe background for data fusion and situation assessment.The system has been realized on four DualPentium-PCwith high communication bandwidth data exchangethrough a scalable coherent interface (SCI). A ‘DynamicObject dataBase’ (DOB) displays the collection ofobjects of interest and their states. Objects are definedaccording to generic classes. Results of mission perfor-mance with the 5-ton van VaMoRs on a network ofminor roads are shown. Saccadic vision is selected whentwo regions of attention can not be mapped by a singleviewing direction.

INTRODUCTION

EMS-vision has been designed to cope with manydifferent aspects of mission performance when a widefield of view and good resolution in a smaller (central)area are required simultaneously. Due to dynamic motionof the vehicle (especially in the rotational degrees offreedom) on uneven ground, joint inertial and visualperception is required. Efficient data interpretation of thissensor arrangement with several (3 to 4) video camerasand half a dozen inertial sensors (measuring timederivatives of pose and velocity components of the ownbody) requires spatio-temporal models in 3-D space. The4-D approach to dynamic vision [0, 1] lends itself forrealizing this vertebrate-type of vision.

THE ‘MARVEYE’ CAMERA CONFIGURATION

A wide field of view (f.o.v., > ~100°) nearby, realized bytwo wide-angle cameras with divergent optical axes,allows to avoid obstacles at low speed and to negotiatetight curves. (Trinocular) Stereo vision in a small centralf.o.v. yields depth estimations in the near range from justone single well recognizable feature. The additional telecamera covering the vertical center of the area of overlapbetween the two wide-angle cameras is a high-resolution3-chip-color camera; its focal length is 3 to 4 times theone of the two wide-angle cameras (or a zoom lens).

Active gaze control allows 1. shifting this central f.o.v. towhere it is needed most, and 2. to inertially stabilize theviewing direction for eliminating motion blur. Figure 1shows one of the arrangements realized. This approach provides advantages for detection andrecognition of objects or landmarks far away; with afourth camera and a strong tele-lens, a factor of ten infocal length as compared to the wide-angle lenses hasbeen realized. This combination of cameras, mounted fixrelative to each other on a single pan & tilt platform, actslike a vertebrate eye and requires quick gaze control forachieving good resolution in areas which show interestingfeatures in the wide f.o.v. This concept trades about twoorders of magnitude in video data rate for a short timedelay (a few tenths of a second) until a high-resolutionimage of this part of the scene is available. According tothe major constituents: Multi-focal camera set,active/reactive gaze control relative to the Vehiclecarrying it, it has been dubbed ‘MarVEye’.

VISION SYSTEM DESIGN

This concept requires distributed processors for handlingthe huge video data streams.

single-chipcolorcameras(1/2”)

wide fieldof view

f = 6 mm f = 24 mm

divergent trinocular stereo range

~ 100°

~11°

~ 44°

L5 ~ 200 mL5 ~ 40m

3-chip color camera(1/3”)

Figure 1: ‘MarVEye’ camera arrangement onhigh-bandwidth two-axis platform (pan/tilt):Fields of view with resolution better than 5 cmper pixel (normal to optical axis, top), and one ofseveral implementations investigated (bottom).

Hardware concept

Three commercial-off-the-shelf (COTS-) dualprocessor systems with multiple frame-grabbingcapability (see left part of Figure 2) have beenselected for image sequence processing at videorate (25 Hz or 40 ms cycle time). EPC is theembedded PC demon process developed forhandling all PC control commands from just onehuman-machine interface (HMI) situated on afourth dual processor system. It is called ‘BehaviorPC’ (right) since it also takes care of missionperformance, situation assessment, and decisionmaking for control actuation on the upper systemlevel. CD stands for central decision, BDGA forbehavior decision for gaze and attention, BDL forbehavior decision for locomotion (including itsown situation assessment), and MP for missionplanning. It directly receives localization data fromthe Global Positioning System (GPS). Other measure-ment data input and actual control output to the gazeplatform and to the vehicle is done by separate transputersystems remaining from the previous generation ofprocessors (gaze subsystem and vehicle subsystem, lowerright in figure). In order to keep communication bandwidth requiredlow, specialists for the perception of objects of certainclasses are located in the PC receiving correspondingvideo signals. For example, local road detection andtracking is done by PC1 based on the video signals fromthe two wide-angle cameras with divergent optical axesfor a larger f.o.v. (RDTL); this process yields the relativeposition and orientation of the own vehicle to the road.Other objects in this viewing range may also be detectedand tracked by a different specialist (not shown). More distant road detection and tracking is done fromcolor video data of the camera with the mild tele-lens onPC2; this process yields road curvature data and otherobjects on the road (special process not shown here).Images of even higher resolution may be analyzed byPC3. Gaze control is mostly influenced by the cameraswith small f.o.v.; they also need inertial stabilization firstin order to reduce motion blur when driving on roughground. The network for communication of results among thedistributed copies of the global Dynamic KnowledgeRepresentation database (DKR) is a Scalable CoherentInterface (SCI) with close to 100 MB/s bandwidth. Notethat due to the transition from video data to symbols andvectors of numbers for the shape parameters and statevariables of 3-D objects, data rate has been reduced byseveral orders of magnitude (2 to 3).

Scene representation

Scene representation is done in a dynamic scene treeexploiting homogeneous coordinate transformations likein computer graphics [2]; however, in computer vision,

the variables entering these transformations and thosedescribing the shape of the objects seen, are the unknownsof the problem. Up to about a dozen objects may behandled in parallel, presently. Adaptation of shapeparameters and of state variables is done by prediction-error-feedback exploiting information in the Jacobianmatrices as linear approximations of the relationshipbetween image features and 4-D models (ExtendedKalman Filter for perspective projection [1]). This coreprocess also takes care of intelligent image featureextraction by specification of parameters in the properalgorithms; orders of magnitude in computing efficiencymay be gained by doing this carefully.

Dynamic Object dataBase

The Dynamic Object dataBase (DOB as part of DKR, seebelow) - containing the scene tree representation - is thecentral (vertical) layer separating the 'systems enginee-ring' lower part of the overall cognitive system from themore 'Artificial Intelligence'-oriented upper part (seeFigure 4). Figure 3 visualizes the information stored in the DOB.Each node (circle) represents an object of relevance forthe vision process; this may even be a virtual frame ofreference like a coordinate frame parallel to the roadsurface [or the Sun as source of daylight determiningshadow directions]. The edges between nodes represent‘Homogeneous Coordinate Transformations’ (HCT) asused in computer graphics. Components for translationand rotation (around a single axis), for scaling, and forperspective projection are available. Most of theseoperations are strongly nonlinear so that direct inversionof the relations is intractable. This constitutes the majorproblem in vision: The parameters describing shape andthe state variables describing the aspect conditions are theunknowns to be determined. These latter unknowns arespread over a chain of matrices to be multiplied forobtaining image features (in 2-D) from object features (in3-D). In the lower left, Figure 3 shows the steps necessary

Figure 2: Realization of EMS-Vision on a distributed COTS-PCnetwork with transputer-subsystems for control of gaze and ofvehicle as well as for conventional sensing (lower right).

for correctly grasping the effects of three camerasmounted at different positions under different angles onthe gaze platform of a rigid vehicle rotating around itscenter of gravity and moving relative to the local road. Inthe wide-angle image, the local road is connected to themore distant road by typical collections of features fromroad boundaries and lane markings. By using differential geometry terms, the correspon-ding models are rather simple (curvature and its changeover arc length). The simultaneous change of aspectconditions and its movement from frame to frame overtime according to vehicle speed and control input isrepresented in the 4-D model utilized. The actually bestestimates for the system parameters describing the imageseen and for the relative state of the own vehicle arerepresented in the DOB together with confidencemeasures in the results, derived from prediction errorsobserved. The ‘Where’-part of the problem is solved essentiallyby summing feature positions in certain regions; forobjects it is both of interest where they are relative to thecamera and where they are relative to the local road (orother objects). The ‘What’-part of the problem is

essentially solved by taking diffe-rences of local feature positions; inconnection with the basic object classhypothesis this yields the actually bestestimates for shape parameters. An early jump to object hypothesesallows tapping background knowledgeon objects of certain classes and thus amuch richer framework for hypothesistesting; the danger of combinatorialexplosion of feature aggregations canbe reduced. Each specialist provides his resultsin the DOB and does not care aboutthose of others, except when the sameregion of the image is referenced. Inthis case, ‘Central Decision’ is notifiedand (probably in an exchange processwith BDGA) has to come up withsome attention control scheme fordisambiguating the situation.

Overall system architecture

On the higher levels, the situation isassessed by looking at collections ofobjects represented in the DOB and bymonitoring their behavior on a largertime scale (not just ‘here and now’).For this reason, the last n measurementvalues (estimates) of object states maybe stored. In addition, assumingintended maneuvers of other vehicles(like the start of a lane changemaneuver after observing a systematic

motion to one side of the lane), predictions over sometime into the future may be derived by adopting standardparameters for the maneuver hypothesized. Based on these results, behavioral decisions are takenfor control of the own vehicle in the mission context.Here, knowledge about the effects of maneuvers and ofthe application of feedback control laws has to beavailable. This is shown in a coarse manner on the highestlevel in Figure 4. Actual maneuver realization and controlcomputations are done on the lower levels with dedicatedprocessors in the distributed overall system. The arrowdownward to the right in Figure 4 symbolizes this. The realization part of this abstract decision on the‘mental’ (upper) level is done on the more process-oriented lower (second) level where knowledge aboutprocess dynamics is available similar to the one used forrecursive estimation in the left part of the figure (upward-oriented for visual perception). On this level, differentialmodels are widely used for shape and motion, while onthe higher level ‘quasi-static’ integral models arepreferred disregarding actual perturbation effects. Bylinking state variables of certain objects on this leveldirectly to control actuation (regulatory feedback control),

Road in geographic CSSun (lighting conditions)(2 dof)

(6 dof)

Long., lat.position,yaw, pitch andbank angles

Road at location ofown vehicle

(6 dof)

Differential (local) road model

(4 curvature parameters)

Otherobjects

Road at position of other object‘where’-

problem

Local‘where’-problem

(3 dof: long., lateral, heading)

Other object,(position)

(shapeparameters)

Figure 3: Scene tree for visual interpretation in EMS-visionshowing the individual transformations from the object inspace to the features in the image.

P FG ‘what’-problem

VisionFrame grabbing offset

Perspective projection

Movable partof platform (6 dof

fixed) 3 ca-meras

Objectshape

Concatenated transformations Pi’

Platform base

(2 rotations) (+1 translation fixed)

Angles measured conventionally

(6 dof)

Own vehicle cgposition & orientation

Featuresin images

on chip

fixed

0 2 4 6 8 10 12 14-25

-20

-15

-10

-5

0

5

10

15

20

25Tilt angle [deg]Tilt set angle [deg]

Figure 5: Saccadic gaze control for attention focussing with tele-cameras

perturbations are counteracted without involving highersystem levels. This is indicated by the curved arrow onthe 4-D level running through the regions of theperception specialists towards the control part at right. The lowest level represents vehicle hardware includingsensors and actuators both for perception and for locomo-tion. All signals are processed ‘here and now’ as theycome, without in-depth interpretation; there is no explicitdependence on time on this level. The full 4-Dframework with differential and integral models isconcentrated on level 2, dubbed ‘4-D level’ for thisreason. It contains the perception part geared to classes ofobjects (subjects) on the left; the actuation part with bothfeed-forward and (superimposed) feedback componentsis shown on the right hand side. Each perception specialist (some may prefer to call

them agents) consists of software packages both forfeature grouping as well as for hypothesis generation (inthe detection phase) and for tracking by feature matchingand prediction error feedback. Five object classes arementioned in the legend, one of which (IBES) is notbased on vision but on inertial measurement data for theown body. This allows almost lag-free pose determi-nation by temporal integration of accelerations androtation rates measured, once the initialization problemhas been solved. Note that these signals (the timederivatives of pose and velocity components) contain theinfluence of unpredictable disturbances directly, while invision these effects have to be derived from models forthe motion process. Space does not allow going into more details of EMS-vision here; the interested reader is referred to a series ofsix papers in [3].

EXPERIMENTAL RESULTS

Experimental results with the test vehicle 'VaMoRs' (a 5-ton van) are given for the task of detecting, recognizingand determining intersection parameters of a crossroadlike: intersection angle, road width and number of lanes ifmarked. Results of driving on unmarked dirt roads arepresented here. First, an experimental result is given forsaccadic viewing direction control with MarVEye.

Gaze control

Figure 5 shows typical saccadic viewing directionchanges of one VaMoRs system. For each new payloadon the platform a system identification test is run in orderto determine the actual inertial parameters; then, thecontrol parameters are optimized for these conditionsautomatically. It can be seen from figure 5 for a sequenceof saccades that there is only very little overshoot. A 20°saccade may be performed in less than 200 ms; for a stepof 40° about 300 ms are needed. This is not far fromperformance levels of biological systems.

Road recognition

From figure 6 the new multi-camera road recognitionmethod with separate models for the near and far rangecan be seen. In the near range (till about 15 m) covered bythe skewed wide-angle cameras, parallel straight lines arefit to edge features and region boundaries on high-speedroads. On minor roads, the full clothoid model may beused. This separation of variables leads to more efficientestimation processes; from this step, road and lane widthsas well as the position and orientation of the own vehiclerelative to the road are obtained. Search regions and edge positions discovered aremarked in the images (lower part of figure 6). Note thatthis 4-D interpretation is not a direct inversion ofperspective mapping (which could also be done). It is asmoothing recursive estimation process taking the

Knowledgeprocessing level

Knowledge repre-sentationlevel

Pan & tilt gaze control

Image captureFeature extraction

Signallevel

Featuregrouping

sensing and actuation hardwareVehicle

Longitudinal & lateral vehicle control

Inertial data

thesis

rationLDT

ODT RDT

3DS IBESHypo-

Gene-

Gaze

Vehicle

feed-forward

generic 4-Dobjects/subjects

model basefeedback

+

+

feedback

..NN..

Figure 3: Overall cognitive system architecture in EMS-vision (4 layers) RDT = Road Detection and Tracking; ODT = Obstacle Detection and Tracking LDT = Landmark Detection and Tracking; 3DS = 3D Surface Recognition IBES = Inertially-Based Ego State; NN = Future additional capabilities

dynamicobject database

DOBactual

mission element, maneuver

scene tree

State ofsubsystems of own body

perceptual and behavioral capabilities

central knowledge representation

Situationassessment

monitoring

Behavior decision: - vehicle - gazemission planning

Values Perfor-mance

monitoring

Central

decision unit

4-D level

control

Feed-forward

Figure 4: Overall cognitive system architecture

range for every combination of RoAs specified in a twolayer hierarchical logic. If the visibility range is not zero,it is pushed to the list of visibility ranges of theappropriate VAGA-object. Figures 7i and 7j (bottomright) show the first elements of the lists of visibilityranges of the distant and local road segment (pan angle).For every planning phase a dot appears in the plots. Before second 91, a (Gaze-Control) GC-object 0(Figure 7c) containing the ‘Best Single’ solution and aGC-object 1 (Figure 7d) describing a ‘Best Pair’ solutioncan be found. GC-object 0 gains a higher informationinput (~ 0.5) than GC-object 1 (~ 0.09); therefore GC-object 0 is performed. In Figure 7a the pan angle of theactive camera head during the whole turn-off maneuver isshown. After 91 seconds, an object hypothesis for the crossroadsegment (ID 2359) is instantiated, triggered by themission plan and actual GPS localization data (see Figure7e), so that three objects are relevant for gaze control;

samelandmark

Figure 6: Recognition of dirt road from multiple images(MarVEye); bottom: Images from divergent wide-anglecameras; top: from tele-camera.

temporal history of measurements, its variances, adynamical model and the locomotion of the vehiclemeasured by odometry into account [0, 4]. Based on this actual information from nearby, then, themild tele-image is analyzed on a different processor forroad curvatures (top part of figure 6) with provenmethods. In the far range, region based information isexploited through the ‘triangle’-algorithm evaluating one-dimensional cuts through intelligently selected imageregions [1, 5]. On high-speed roads, this allows todisregard lane markings in the far range and yet achievegood curvature estimates.

Autonomous turn-off maneuvers

Turn-off maneuvers onto crossroads have been conductedwith the experimental vehicle VaMoRs on bothunmarked campus roads and dirt roads on a militaryproving ground. Recognition of the intersection has beenfully integrated into the mission context, controlling bothvehicle locomotion and the perception tasks. The viewingdirection of the active pan-tilt camera head has beencontrolled dynamically by the gaze control unit asdescribed in [6, 7]. Regions of attention (RoAs) aredefined for each object tracked. The angular range in bothpan and tilt, in which the RoA is visible, is called thevisibility range of this RoA for each camera. Gazecontrol by an optimization process is performed such thatthe gain in information is maximized, taking alternatinggaze directions over time (saccades) into account [7].Figure 7 visualizes the results of the optimization processof viewing behavior. During road running (before second91 in Figure 7) two objects are relevant (see Figure 7e):The local road segment (object-ID 2355) is to be imagedin the wide f.o.v. of the wide-angle cameras and thedistant road segment (ID 2356) is to be imaged in themild tele camera. A sub-module for Visibility Analysisfor Gaze & Attention (VAGA) calculates the visibility

additional RoAs are specified. Figure 7g and 7h show thebest and second-best visibility ranges of the crossroadsegment resulting from algorithm developed [7]. In Figure 7f the distance between the vehicle center ofgravity and the crossroad (center of intersection) isplotted. Approaching the intersection, the RoAs of thecrossroad segment move along the skeleton line of thecrossroad in order to improve aspect conditions forperception. Due to the decreasing distance from thecrossroad, the pan angle of the visibility ranges of thecrossroad segment and the amplitude of the saccadesincrease (see Figure 7a+g+h, central part). According toFigure 7c, the visibility ranges of the distant road and thecrossroad segment do not overlap between second 91and 109, so that no ‘Best Single’ solution can be foundand GC-object 0 describes a gaze sequence with 1saccade. In Figure 7c the information input of thissequence can be seen also. During the smooth pursuitwith a pan angle of about 10°, the distant and the localroad segment are imaged, and during the second smoothpursuit the local road and the crossroad segment areimaged. Figure 7b shows the saccade bit denoting theexecution of a saccade to all perception experts. Whilethis bit is on, visual measurements are interrupted and theexperts continue their (virtual) perception process withpure prediction from the spatio-temporal models;confidence in predictions decreases with time duringthese periods of a few video cycles. After 109 seconds the distant road segment is nolonger visible and the crossroad as well as the localroad segment are fixated without performing a saccade.After 116 seconds the vehicle enters the intersection and areorganization of the scene tree takes place. From thistime on, the new distant road and local road segments(IDs 2360 and 2361) are relevant for gaze control. Thebest visibility ranges of these two objects intersect, a‘Best Single’ solution exists and saccades are notnecessary (see Figure 7c).

CONCLUSIONS

Driving a vehicle in natural and unprepared environmentsrequires perceiving many objects of different size,orientation and relative position to the vehicle. Thesimple-minded way for handling the various perceptiontasks is to mount many sensors with different opticalproperties and orientations onto the vehicle. An alternateapproach has been chosen here, similar to vertebratevision in biology, the most efficient sense of visionknown. Using an elaborately designed vehicle eye, calledMarVEye, and a high-performance pointing platform,the autonomous system adapts the sensor properties to theactual situation and to the perception task. The imageprocessing modules in the system announce their gazerequirements in form of regions of attention and a two-layer hierarchical logic. The gaze control unit calculatesan optimal sequence of smooth pursuits linked bysaccades to meet all requirements. The interactionbetween perception and gaze control has been tested inreal vehicles performing complex driving missions.Results of an autonomous turn-off maneuver have beenpresented. A survey on the development of the sense ofvision for ground vehicles is given in [8].

REFERENCES

[0] Dickmanns E.D.; Graefe V.: a) Dynamic monocularmachine vision. Machine Vision and Applications,Springer International, Vol. 1, 1988, pp 223-240. b)Applications of dynamic monocular machine vision.(ibid), 1988, pp 241-261

[1] E.D. Dickmanns and H.-J. Wünsche, “DynamicVision for Perception and Control of Motion”. In: B.

Jaehne, H. Haußenecker and P. Geißler(eds.) Handbook of Computer Vision andApplications, Vol. 3, Academic Press,1999, pp 569-620 [2] Dirk Dickmanns, “Rahmensystem fürvisuelle Wahrnehmung veränderlicherSzenen durch Computer”. Diss., Universitätder Bundeswehr München, Fakultät fürInformatik, 1997.[3] Proc. of Symp. on ‘IntelligentVehicles’. Dearborn, MI, USA, Oct. 2000,with the following contributions on EMS-Vision: a) R. Gregor, M. Lützeler, M. Pellkofer,Siedersberger K.H., E.D. Dickmanns.:EMS-Vision: A Perceptual System forAutonomous Vehicles. pp. 52 – 57. b) R. Gregor, E.D. Dickmanns.: EMS-Vision: Mission Performance on RoadNetworks. pp. 140 – 145. c) U. Hofmann, A. Rieder, E.D.Dickmanns: EMS-Vision: An Application

to Intelligent Cruise Control for High Speed Roads.pp. 468 – 473. d) M. Lützeler, E.D. Dickmanns: EMS-Vision:Recognition of Intersections on Unmarked RoadNetworks. 302 – 307. e) M. Maurer: Knowledge Representation forFlexible Automation of Land Vehicles. pp. 575 –580.f) M. Pellkofer, E.D. Dickmanns: EMS-Vision: GazeControl in Autonomous Vehicles. pp. 296 – 301.g) K.-H. Siedersberger, E.D.Dickmanns: EMS-Vision: Enhanced Abilities for Locomotion. pp. 146– 151.

[4] E. D. Dickmanns, A. Zapp, “A Curvature-basedScheme for Improving Road Vehicle Guidance byComputer Vision”, SPIE 86, pp. 161-168, 1986.

[5] Gregor R., Lützeler M., Pellkofer M., SiedersbergerK.-H., Dickmanns E.D.: A Vision System forAutonomous Ground Vehicles with a Wide Range ofManeuvering Capabilities. Proc. ICVS, Vancouver,July 2001

[6] Pellkofer M., Lützeler M., Dickmanns E.D.:Interaction of Perception and Gaze Control inAutonomous Vehicles. Proc. SPIE: Intelligent Robotsand Computer Vision XX; Oct. 2001, Newton, USA,pp 1-12

[7] Pellkofer M., Dickmanns E.D.: Behavior Decision inAutonomous Vehicles. Proc. of the Int. Symp. on‚Intell. Veh.‘02‘, Versailles, June 2002

[8] E.D. Dickmanns “Vision for ground vehicles: historyand prospects”. Int. J. of Vehicle AutonomousSystems, Vol.1, No.1, 2002, pp. 1 – 44.

f)

distan

ceto

cross

road

pan visibility ranges (VR) [deg]

best

VRof

cros

s roa

d

g)-50-60

time [s]

best

VRof

dista

nt ro

ad

seco

nd-b

est V

Rof

cross

road

best

VR

of lo

cal ro

ad

h)

i)

j)

-50

-50

-60

time [s]

e)

local road

local road

distant road

distant road

cross road

objec

t-ID

10.80.60.40.2

0

a)

b)

c)

d)

information input

information input

number of saccades

number of saccades

came

ra h

ead

pan a

ngle

[deg]

sacc

ade

bitga

ze o

bject

1:ga

ze ob

ject 0

:

Figure 7: Viewing direction control for MarVEye in VaMoRs duringapproach (including saccades: sub-figures a) and b)) and turnoffat an intersection (sub-figures e) to j); see text). [7]