7
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-8. NO. 4. JULY 1986 Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows ALLEN M. WAXMAN, MEMBER, IEEE, AND SARVAJIT S. SINHA, MEMBER, IEEE Abstract-A new concept in passive ranging to moving objects is de- scribed which is based on the comparison of multiple image flows. It is well known that if a static scene is viewed by an observer undergoing a known relative translation through space, then the distance to objects in the scene can be easily obtained from the measured image velocities associated with features on the objects (i.e., motion stereo). But in gen- eral, individual objects are translating and rotating at unknown rates with respect to a moving observer whose own motion may not be ac- curately monitored. The net effect is a complicated image flow field in which absolute range information is lost. However, if a second image flow field is produced by a camera whose motion through space differs from that of the first camera by a known amount, the range informa- tion can be recovered by subtracting the first image flow from the sec- ond. This "difference flow" must then be corrected for the known rel- ative rotation between the two cameras, resulting in a divergent relative flow from a known focus of expansion. This passive ranging process may be termed Dynamic Stereo, the known difference in camera mo- tions playing the role of the stereo baseline. We present the basic the- ory of this ranging process, along with some examples for simulated scenes. Potential applications are in autonomous vehicle navigation (with one fixed and one movable camera mounted on the vehicle), co- ordinated motions between two vehicles (each carrying one fixed cam- era) for passive ranging to moving targets, and in industrial robotics (with two cameras mounted on different parts of a robot arm) for in- tercepting moving workpieces. Index Terms-Binocular vision, optical flow, passive ranging, time- varying imagery. I. INTRODUCTION PASSIVE ranging by triangulation methods, employed so successfully by humans, has received much atten- tion in the machine vision literature in recent years [1]. It is obvious that the ability to recover absolute range to ob- jects in a scene would be important in a variety of robotic applications. And passive techniques to do so are cer- tainly of interest. To date, only two basic methods of pas- sive ranging have been discussed, "static stereo" (em- ploying two cameras separated by a known baseline) and "'motion stereo" (utilizing a single camera moving in a known way through a stationary scene). In this paper we Manuscript received March 14, 1985; revised November25, 1985. Rec- ommended for acceptance by W. B. Thompson. This work was supported by the Defense Advanced Research Projects Agency and the U.S. Army Night Vision Laboratory under Contract DAAK70-83-K-0018. A. M. Waxman was with the Computer Vision Laboratory, Center for Automation Research, University of Maryland, College Park, MD 20742. He is now with the Department of Electrical, Computer, and Systems En- gineering, Boston University, Boston, MA 02215. S. S. Sinha was with the Computer Vision Laboratory, Center for Au- tomation Research, University of Maryland, College Park, MD 20742. He is now with General Motors Research Laboratories, Warren, MI 48083. IEEE Log Number 8608016. introduce a new concept in passive ranging to moving ob- jects, termed "dynamic stereo," which is based on the comparison of multiple image flows. By far, most of the literature on passive ranging has been concerned with the difficult "correspondence prob- lem" associated with the assignment of stereo disparities (see the many references cited in [1]). In addition to the traditional method of intensity correlation between im- ages, much interest has been paid to the theory of Marr and Poggio [2], with its implementation by Grimson [3] and recent insights of Nishihara [4], as well as the theory of Mayhew and Frisby [5]. The use of more than two camera locations, to aid in solving the correspondence be- tween images, has been approached in different ways by Moravec [6] and Tsai [7]. Nevertheless, solution of this correspondence problem remains a computationally ex- pensive and slow process. Moreover, a maximum ranging distance is implied by the finite resolution of the cameras and the statically configured baseline between cameras. In principle, the difficulties encountered with "static stereo" can be overcome using "motion stereo." Here, a single camera is moved through space in a known way, while imaging a stationary scene. The result is that, over a period of time, the camera traverses a known physical baseline of arbitrary length. In addition, correspondence is established by tracking features over the small inter- frame distances to build up effective stereo disparities, which then yield range values [8], [9]. In practice, prob- lems can arise from inaccuracies in the camera motion parameters. However, a more fundamental limitation is the necessity of a stationary scene. If, while the camera is moving from one end of the baseline to the other, ob- jects in the scene are also moving but with unknown ve- locities, then the relative motions between camera and ob- jects would not be known and hence, the physical baseline would be undetermined. In such case, absolute range in- formation is lost, although relative range (such as object surface slopes) and scaled motion parameters can be re- covered, although not always uniquely [10], [11]. Ac- tually, once unknown object motions are admitted, we are really in the realm of the more general "structure from motion" problem which also has a long history, but has recently been addressed in the context of "image flow" theory [10]-[14]. Dynamic Stereo can be viewed as an extension of mo- tion stereo, applicable to scenes containing moving ob- jects. It employs two cameras in known relative motion, 0162-8828/86/0700-0406$01.00 © 1986 IEEE 4()6

Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows

Embed Size (px)

Citation preview

Page 1: Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-8. NO. 4. JULY 1986

Dynamic Stereo: Passive Ranging to Moving Objectsfrom Relative Image Flows

ALLEN M. WAXMAN, MEMBER, IEEE, AND SARVAJIT S. SINHA, MEMBER, IEEE

Abstract-A new concept in passive ranging to moving objects is de-scribed which is based on the comparison of multiple image flows. It iswell known that if a static scene is viewed by an observer undergoinga known relative translation through space, then the distance to objectsin the scene can be easily obtained from the measured image velocitiesassociated with features on the objects (i.e., motion stereo). But in gen-eral, individual objects are translating and rotating at unknown rateswith respect to a moving observer whose own motion may not be ac-curately monitored. The net effect is a complicated image flow field inwhich absolute range information is lost. However, if a second imageflow field is produced by a camera whose motion through space differsfrom that of the first camera by a known amount, the range informa-tion can be recovered by subtracting the first image flow from the sec-ond. This "difference flow" must then be corrected for the known rel-ative rotation between the two cameras, resulting in a divergent relativeflow from a known focus of expansion. This passive ranging processmay be termed Dynamic Stereo, the known difference in camera mo-tions playing the role of the stereo baseline. We present the basic the-ory of this ranging process, along with some examples for simulatedscenes. Potential applications are in autonomous vehicle navigation(with one fixed and one movable camera mounted on the vehicle), co-ordinated motions between two vehicles (each carrying one fixed cam-era) for passive ranging to moving targets, and in industrial robotics(with two cameras mounted on different parts of a robot arm) for in-tercepting moving workpieces.

Index Terms-Binocular vision, optical flow, passive ranging, time-varying imagery.

I. INTRODUCTIONPASSIVE ranging by triangulation methods, employed

so successfully by humans, has received much atten-tion in the machine vision literature in recent years [1]. Itis obvious that the ability to recover absolute range to ob-jects in a scene would be important in a variety of roboticapplications. And passive techniques to do so are cer-tainly of interest. To date, only two basic methods of pas-sive ranging have been discussed, "static stereo" (em-ploying two cameras separated by a known baseline) and"'motion stereo" (utilizing a single camera moving in aknown way through a stationary scene). In this paper we

Manuscript received March 14, 1985; revised November25, 1985. Rec-ommended for acceptance by W. B. Thompson. This work was supportedby the Defense Advanced Research Projects Agency and the U.S. ArmyNight Vision Laboratory under Contract DAAK70-83-K-0018.

A. M. Waxman was with the Computer Vision Laboratory, Center forAutomation Research, University of Maryland, College Park, MD 20742.He is now with the Department of Electrical, Computer, and Systems En-gineering, Boston University, Boston, MA 02215.

S. S. Sinha was with the Computer Vision Laboratory, Center for Au-tomation Research, University of Maryland, College Park, MD 20742. Heis now with General Motors Research Laboratories, Warren, MI 48083.

IEEE Log Number 8608016.

introduce a new concept in passive ranging to moving ob-jects, termed "dynamic stereo," which is based on thecomparison of multiple image flows.By far, most of the literature on passive ranging has

been concerned with the difficult "correspondence prob-lem" associated with the assignment of stereo disparities(see the many references cited in [1]). In addition to thetraditional method of intensity correlation between im-ages, much interest has been paid to the theory of Marrand Poggio [2], with its implementation by Grimson [3]and recent insights of Nishihara [4], as well as the theoryof Mayhew and Frisby [5]. The use of more than twocamera locations, to aid in solving the correspondence be-tween images, has been approached in different ways byMoravec [6] and Tsai [7]. Nevertheless, solution of thiscorrespondence problem remains a computationally ex-pensive and slow process. Moreover, a maximum rangingdistance is implied by the finite resolution of the camerasand the statically configured baseline between cameras.

In principle, the difficulties encountered with "staticstereo" can be overcome using "motion stereo." Here,a single camera is moved through space in a known way,while imaging a stationary scene. The result is that, overa period of time, the camera traverses a known physicalbaseline of arbitrary length. In addition, correspondenceis established by tracking features over the small inter-frame distances to build up effective stereo disparities,which then yield range values [8], [9]. In practice, prob-lems can arise from inaccuracies in the camera motionparameters. However, a more fundamental limitation isthe necessity of a stationary scene. If, while the camerais moving from one end of the baseline to the other, ob-jects in the scene are also moving but with unknown ve-locities, then the relative motions between camera and ob-jects would not be known and hence, the physical baselinewould be undetermined. In such case, absolute range in-formation is lost, although relative range (such as objectsurface slopes) and scaled motion parameters can be re-covered, although not always uniquely [10], [11]. Ac-tually, once unknown object motions are admitted, we arereally in the realm of the more general "structure frommotion" problem which also has a long history, but hasrecently been addressed in the context of "image flow"theory [10]-[14].Dynamic Stereo can be viewed as an extension of mo-

tion stereo, applicable to scenes containing moving ob-jects. It employs two cameras in known relative motion,

0162-8828/86/0700-0406$01.00 © 1986 IEEE

4()6

Page 2: Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows

WAXMAN AND SINHA: DYNAMIC STEREO: PASSIVE RANGING TO MOVING OBJECTS

both imaging a scene containing independently movingobjects (which are assumed rigid). The relative rigid bodymotions between objects and cameras generate imageflows (of feature points and contours) at each camera. Dif-ferences between the two flow fields are mainly due to theknown relative motion between the two cameras. This factwill be exploited below in order to recover absolute rangeto the objects in an evolving scene. Dynamic stereo canthen be used in conjunction with the image flow derivedfrom a single camera in order to recover surface shape aswell as absolute motion parameters for objects in the scene[1 1].We expect that dynamic stereo can be utilized in a va-

riety of configurations. In the context of autonomous landvehicles, one can mount on the vehicle one fixed cameraand one sliding camera in order to range to moving ve-hicles in the scene. The larger distances typically associ-ated with flight would require two cameras, each mountedon separate aircraft moving with respect to each other atknown relative speeds. This kind of coordinated flightcould enable passive ranging to moving targets. There isalso potential use in industrial robotics for handling mov-ing objects, by configuring two cameras on different partsof a robot arm such that they experience a relative motion.It should be noted, however, that any application will re-quire that at some point in their relative motion, the twocameras become sufficiently close so that their respectiveimages can be easily brought into correspondence.

This paper presents the basic theory of Dynamic Stereoalong with several simulated examples. The concepts as-sociated with "relative image flows" are described inSection II, which follows. Section III then addresses therecovery of range to moving objects using these relativeflows. Filtering techniques to reduce the effects of noiseon the required image velocities are discussed as well.Concluding remarks are presented in Section IV.

II. RELATIVE IMAGE FLOWSThe flow fields measured at each camera correspond to

the time-varying projection of object surface texture, dueto the relative rigid body motions between objects in thescene and the cameras. The equations relating image ve-locity to relative space motion and distance to points inthe scene have been derived in other studies (e.g., [10],[14]); they are given by

vx = {x - x- + [xy2 (1 + x2) Qy + YQz],(la)

Vy = YVzi Y- + [(1 + y2) Qx - - XQZ].(lb)

Fig. 1 illustrates the coordinate systems of the observer(X, Y, Z), to whom the relative translations (Vx, Vy, Vz)and rotations (Qx, Qy, Qz) are ascribed (and which maydiffer for each rigid object in the scene), and his image

fy(

vx

Fig. 1. Spatial coordinates moving with the observer, and image coordi-nate system.

plane (x, y) which has been reinverted and scaled to afocal length of unity. As the directions to points (or ob-jects) in the scene are specified by their image coordinates(x, y), their absolute range is determined by the compo-nent of distance Z, along the observer's line of sight. It isseen from (1) that the distance Z appears in ratio with thetranslational motion parameters. Clearly, if the motionparameters were all known (e.g., a camera movingthrough a stationary scene), the distance Z could be ob-tained directly from measured image velocities; this issimply "motion stereo." But if the objects are also mov-ing, then these relative motion parameters are unknownas is the distance Z to points (or "scale factor" ZO, slopesp and q, and curvatures in the case of surface patches).The theory of single image flows addresses this problemof recovering both surface structure and space motion[10]-[14]; however, solutions are obtained in a formwhich are scaled by the factor ZO. That is, absolute rangeis not recoverable from single image flows.The mathematical basis of dynamic stereo is simple;

one need only note from (1) that image velocities are lin-early proportional to the parameters ofspace motion. Wecan make this more explicit by rewriting (1) symbolicallyas

v(x, y) = zx ) T (x, y) - V + R (x, y) * Q, (2)

where the elements of the translation and rotation ma-trices, T and X, are functions only of the image coordi-nates, and may be read directly from (1).

In a dynamic stereo configuration, each camera has itsown set of relative motion parameters (for each object inthe scene), and generally, the two image planes are notcoincident. However, we can configure the two camerassuch that they come very close to each other at some timeduring their own relative motion. At that instant, the twoimage planes are nearly coincident and correspondence

407

Page 3: Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-8, NO. 4, JULY 1986

should be easily established. If we refer the measured im-age velocities to that moment in time, then we may treatthe matrices '? (x, y) and )f (x, y) for the two cameras asidentical, as if the two image planes were indeed coinci-dent momentarily. (More will be said of the residual ef-fects of a small but finite baseline in Section III-D below.)Now if the relative motion between cameras is given byAV _ (AVx, AVy, AVz) and A =_ (AOSx, Ally, AOz), thenthe second camera sees an augmented image flow of v +Av, where from (2), the "difference flow" Av is givenby

Av(x, y)= (x, y) AV + %(x,y) * An.Z(X, y)

(3)As the distance information is contained in the transla-tional term in (3), we can bring the known rotational termto the left-hand side and define a "relative flow" or"modified difference flow" as

Aw a AV - X * An== .A*'V.z (4)

This relative flow takes the form of a divergent flow fieldwith a known focus of expansion (as long as AVz * 0),as is seen from writing the individual components of (4),

AWI = z (x - Xfoe)z

AVzAwy = z (Y - Yfoe),

(5a)

(5b) (b)

whereAVx _ AVyXfoe~- Vand Yfoe V . (Sc, d)

Fig. 2(a), (b), and (c) illustrates examples of such diver-gent relative flows. In each case, a simulated scene is con-structed and feature points are marked. The two sets ofspace motion parameters are selected, corresponding tothe relative motion between objects and each of the twocameras (the difference between these sets of parametersbeing the relative motion between cameras). The individ-ual flows of the feature points are displayed along withthe divergent relative flow. No noise effects were intro-duced into this simulation. Fig. 2 was created with theImage Flow Simulator [151.

III. RECOVERING RANGE TO MOVING OBJECTSEquations (4) and (5) form the basis of the method for

recovering range to points (on moving objects in thescene) and planes (surface patches on moving objects). Ifit were not for the effects of noise on measured imagevelocities, the method would be simple and straightfor-ward, as will be presented in Section III-A below. Theinevitable effects of digitization error and noise have ledus to explore two filtering methods; "radial flow filter-ing" as motivated by the divergent nature of the relative

(C)

Fig. 2. Examples of the relative flow fields for (a) a complex scene withfeature points marked, (b) a planar surface with a grid of feature points,(c) a small planar patch with few feature points. For each case the sim-ulated scene is shown, along with the ideal flows for each set of cameramotion parameters (as indicated), and the divergent relative flow.

flow noted in (5), and "second-order flow filtering" whichstems from the Velocity Functional Method developed byWaxman and Wohn [12]. These techniques are described,along with examples, in Sections Ill-B and III-C, respec-tively. Section III-D considers the effects of the finitebaseline between cameras, at their closest approach.

408

Page 4: Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows

WAXMAN AND SINHA: DYNAMIC STEREO: PASSIVE RANGING TO MOVING OBJECTS

A. Ranging to Points and PlanesGiven the measured image velocities of corresponding

features on both image planes (determined when the twocameras are at closest approach), we form the measureddifference flow values Av. Then, according to the defini-tion in (4), we can compute the relative flows by correct-ing these difference values for the known relative rotationbetween cameras,

Awx = Av- [xy Ax- (1 + x2) AUy + y AUz], (6a)

AWy = AVU - [(I + y2) AQx_xy Ay_x Aflz]. (6b)

According to (5a)-(5d), the ideal relative flow divergesfrom (or converges to) a known focus located at (Xfoe, Yfoe)-Thus, we can define the "radial relative flow" as simply

AW,= {(AWX)2 + (AWY)2}112 (la)Then according to (5), we can solve for the range Z,

Z(x, Y) = AW {(x - Xfoe)2 + (y-Yfoe)2}1/2. (7b)

This result applies to individual feature points in the scenewhose relative flow has been derived from measured im-age velocities on both cameras. All terms on the right-hand side of (7b) are either known or measured.

If a number of feature points are believed to be the im-ages of points on a planar surface in space, then their in-dividual range values Z(x, y) can be used to fit a planarsurface. Alternatively, the parameters of the surface canbe obtained collectively from (7b). A planar surface inspace, Z = Z0 + pX + qY, can be written exactly in im-age coordinates as Z = Z0(1- px - qy) Inverting (7b)then yields

1 ~~~~~AW(X, 3') 2- (1 -px - qy)= ' {(x - Xfoe)

+ (yY Yfte)2} . (7c)This equation can serve as the basis for a linear least-squares approach to determine the parameters of the plane,Z0 ', p/Z0, and qlZo.

B. Radial Flow FilteringIn the absence of noise and digitization effects, (7) is

exact and yields perfect results, putting aside for the mo-ment the issue of finite separation between cameras at alltimes. In our simulation, we are able to explore the effectsof noise by perturbing the individual flow values beforeforming the difference flow. We considered a uniform dis-tribution of noise, up to a specified percentage, super-posed on the individual components of image velocity.The effects of noise on estimated image velocities v are

amplified by the differencing procedure in obtaining Av.Since the order of magnitude of v is VI/Z, while that ofAv is AVI/Z, the noise effects are amplified by the ratioV//I AVI when referred to Av. Clearly it is important to

have as large a relative velocity between cameras AV asis possible. This translates over time to building up aslarge a separation between cameras as is practical.

Fig. 3. Effects of 10 percent noise on feature point velocities for the planarsurface example: (a) the relative flow obtained directly from the noisyvelocities, (b) the relative flow after 'radial flow filtering," (c) the rel-ative flow obtained from "second-order flow filtering," (d) the idealrelative flow. In each case, the value found for Z is indicated.

One simple thing that can be done to reduce the effectsof noise is to perform a "radial filtering" on the relativeflow. This notion stems from the divergent nature of theideal relative flow discussed in Section II. As definition(7a) is meant to imply that the ideal relative flow shouldconsist of vectors emanating radially from a known focusof expansion, we can impose this constraint on the rela-tive flow derived from noisy velocity measurements. Thatis, we consider only that component of the relative flowwhich points radially from a known focus. The orthogo-nal component (or "azimuthal relative flow") can be usedto ascertain the magnitude of the noise; it vanishes in thelimit of ideal velocity measurements.

Fig. 3 illustrates the simulated effects of noise (10 per-cent) on the image velocities and the relative flow whichresults, in the case of a planar surface. By using only theradial component of the relative flow, one essentially re-duces the noise by a factor of 2. As one might expect, theerrors in range should scale like {( percent noise) x V /AV }. The results of our simulations bear this out. For

a ratio AVI/I VI of about 1/10, reasonable range esti-mates could be found only when the noise imposed wasless than a few percent. In the case of planar surfaces, thesame is true for recovery of the scale factor Z0. The slopesof the surface, which describe the differential changes inrange, are extremely sensitive to noise, as might be ex-pected. Their determination requires noise below 1 per-cent. In practice, one could not expect surface slope re-covery from dynamic stereo; the individual image flowsseem more appropriate for this task [10], [12]. However,qualitative recovery of range from relative flows does ap-pear feasible. For the case of ranging to surfaces, one cando better still, by using the following filtering technique.

C. Second-Order Flow FilteringThe drawback of "radial flow filtering" is that it op-

erates on the relative flow rather than on the individualflows preceding the differencing operation. As this differ-encing procedure tends to amplify noise, the radial filter-

409

Page 5: Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-8, NO. 4, JULY 1986

ing is not sufficient to recover entirely from the process.What is needed is a filtering process which reduces thenoise effects on the individual flows, before their differ-ence is taken. When the image velocities utilized arisefrom features on objects at various depths moving withdifferent space motions, no smoothing of the individualflows can be performed as the image velocities themselvesare unrelated. However, when features (points or con-tours) arising from single objects are isolated, their imagevelocities constitute a locally second-order flow field [11]-[13], and this can be used to filter out the effects of noiseon the individual image flows, before differencing.The representation of image flows in terms of second-

order flow fields has been termed the Velocity FunctionalMethod by Waxman and Wohn [121. It is a globally validrepresentation in the case of planar surfaces [12], and alocally valid one for curved surfaces [13]. The coeffi-cients of the second-order polynomials are simply relatedto the derivatives of the flows via their truncated Taylorseries about a local origin. Let V,k(X, y) be the flow inimage k (k = 1, 2), and v,"4-i be its ith partial derivativewith respect to x and jth partial with respect to y (where i+ j < 2), evaluated at a local origin in an image neigh-borhood. We have, locally

2 2

V,k = j _ Vki!j t (8)u=Oj-O i! j!(8(i +j s 2)

The coefficients are obtained from a linear least-squaresfit to measured velocities in an image neighborhood. Forpoint features, both components of image velocity areavailable, and (8) then provides two constraints. In thecase of contours evolving in geometry over time, only anormal velocity (perpendicular to the contour) is percep-tible, and (8) provides only one constraint in the form ofa linear combination of the two velocity components (cf.[121 for details). When only evolving contours are used,a minimum structure is required of a single contour if itis to yield all 12 coefficients in (8). It is generally morerobust to use several contours in a given neighborhood[12]. The flow field, as reconstructed from (8), is typi-cally much cleaner than the measured image velocitiesthemselves, and is therefore more suitable for forming thedifference flow.With expressions in the form of (8) for corresponding

neighborhoods in the two images, we can form the differ-ence flow simply by subtracting the coefficients of the re-spective representations. Then, correcting for the relativerotations as in (6), we have for the components of relativeflow

2 2

Awx = E Z [V( 2i=Oj=O(i+js2)

- x"l] yjVX.,i!j!..

- [XYAtx - (1 + x2) A(ly + yAf2z], (9a)2 2 (i])i x' y

Awy = [ Vy[ l] i! jv(i+js2)

- [(1 + y2) AQx- xy Agy- x A2z]. (9b)

Fig. 4. the use o0 "normal flow" along contours on a planar surface Z =Z0 + 0.577X + 1.0 Y: (a) and (b) the ideal normal flows for each set ofcamera motion parameters, (c) the full flow for (a) recovered by the Ve-locity Functional Method, (d) the relative flow found from the recoveredsecond-order flows.

Equations (9a) and (9b) for the relative flow are, them-selves, second-order polynomials in the image coordi-nates. They can be equated term-for-term with the ex-panded form of (5), written for a surface Z(x, y). In thecase of a planar surface, (5) yields

AWx= {-AVx + (pAVx + AVz)x + (qAVx)yzo

-(pAVz) x2 - (qAVz) xy}, (10a)

Awy = -{-AVy + (pAVy) x + (qAVy + AVz) yzo- (qAVz) y2 - (pAVZ) Xy)}. (lOb)

By comparing the coefficients of (10) to those determinedin (9), one sees that the parameters of the plane (Z0, p, q)can be determined from the zero-order and first-orderterms directly. Nonplanar surfaces will lead to modifica-tions of the second-order terms (and generate higher-orderterms as well). As one would expect the zero-order termsof (9) to be the most precise, it should be clear from (10)that ZO can be recovered most accurately if AVx and AVyare nonzero. This is borne out by our simulations in whichreasonable results for Z0 could be obtained even at 10 per-cent noise levels, with IAVI/ V: : 0.1.

In our simulations, once the coefficients in (9) have beendetermined, we have found it more convenient to returnto (7) and solve for the parameters of the plane by a least-squares procedure (having sampled (9) over a sparse grid),rather than literally identify coefficients with equations(10). In general, it is best to have all components of AVnonzero. By filtering the individual flows before differ-encing, one is able to recover surface scale factors Z0 evenin the presence of noisy image velocities. Fig. 3 alsoshows the relative flow for a plane after second-order fil-tering; it is clearly a divergent flow. Fig. 4 illustrates thecase of two elliptic contours on a planar surface, with theirindividual normal flows, one of the full flows recovered

410

Page 6: Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows

WAXMAN AND SINHA: DYNAMIC STEREO: PASSIVE RANGING TO MOVING OBJECTS

Fig. 5. Same as Fig. 4, but with 50 percent noise superposed on the nor-

mal flows.

by the Velocity Functional Method, and the relative flowfrom which Z0 is recovered. This same example, with nor-

mal velocities perturbed by 50 percent, is shown in Fig.5.

In our simulations we have tried to ascertain the effectsof noise and field of view on recovery of Z0. In general,when utilizing second-order flow filtering, we have foundrecovery of Z0 from pointfeature velocities to be possibleat 10 percent noise levels, while utilizing evolving con-

tours allows recovery of Z0 even with 50 percent pertur-bations to the normal flow around the contour. Moreover,ranging is possible down to fields of view of about 30,below which variations in image velocity become so smallthat the method becomes unstable at noise levels exceed-ing about 5 percent. Typically, surface slopes cannot bereliably determined from dynamic stereo in the presence

of noise; however, they can be found from the analysis ofsingle image flows [191, [12] (and then utilized with dy-namic stereo to recover ZO somewhat more accurately).Finally, we attempted to recover Z0 to curved surfaces,treating them as if they were planar. As one might expect,when variation in range over a surface is small comparedto absolute range to that surface, the method succeeds inrecovering the scale factor Z0. The slopes obtained can beused to find the approximate distance to individual featurepoints on the curved surface. However, when range vari-ations are large (as compared to a planar surface at com-parable Z0), then the recovery of ZO can be rather sensitiveto noise. This result would favor small fields of view,where substantial range variation is unlikely; although toosmall a field of view (below about 3°) leads to noise sen-

sitivity as well.

D. Effects of the Finite BaselineThe underlying basis of Dynamic Stereo is the compar-

ison of image flows obtained from two cameras in knownrelative motion. This notion is implicit in the "relativeflow" discussed in Section II. But in forming the required"difference flow," we treated the translation and rotationmatrices of the two cameras as identical, which is equiv-

alent to treating the two image planes as coincident at thetime of closest approach between cameras [see the com-ments preceding (3)]. Since the two cameras must bephysically separated by some small but finite baseline,even at their closest approach, it is really only a simplifi-cation to treat the two image planes as momentarily reg-istered. That is, the image coordinates of a given featureare not exactly the same on the two image planes. Theremaining disparity between coordinates is a function ofthe distance to the feature, exactly like the case of "staticstereo." However, this disparity should not be a realproblem since keeping this baseline small will lead to aninsignificant disparity, except for objects which are ex-tremely close.

In theory, we can account for the finite baseline at clos-est approach by incorporating a small correction to theimage coordinates of features imaged by the second cam-era. This correction, being the range dependent disparity,can be treated as a small perturbation to the ranging for-mulas derived above. This would lead to a "successiveapproximation" scheme for recovering Zo, in which thetheory developed here would serve as the lowest-order ap-proximation, to which corrections can be applied. How-ever, given the inaccuracies associated with noisy flowvalues, such an iteration scheme hardly seems warranted.We have investigated the effects of a finite baseline at

closest approach with our Image Flow Simulator 1151. Anartificial scene is constructed and motion parameters areassigned to an observer, yielding the first image flow field(obtained from second-order flow filtering of point featurevelocities). The second flow field is obtained using thesame scene shifted by a small amount in a direction par-allel to the image plane (in order to simulate camera sep-aration), and different motion parameters are assigned tothe observer. The two flow fields are then differenced andcorrected for relative rotation in order to recover range.In the case of a planar surface patch, in the absence ofnoise, a baseline to range ratio of 1/1000 led to a 2 percentrange error. A ratio of 5/1000 yields about a 10 percenterror, which is comparable to the accuracy obtainable with10 percent noise. It would seem that this ratio is reason-able when ranging to objects at several hundred feet ormore.

IV. CONCLUDING REMARKSWe have introduced a new concept in passive ranging,

termed Dynamic Stereo, which is based on the compari-son of image flow fields obtained from two cameras inknown relative motion. The method is designed for rang-ing to moving objects in an evolving scene. For stationaryscenes, this technique reduces to conventional "motionstereo." In this paper we have developed the basic theoryand studied the effects of noisy image velocities and fieldof view, with the aid of an Image Flow Simulator [15].These simulations suggest that ranging to points, planesand curved surfaces is feasible, even in the presence of10 percent noise. Two methods for filtering the noise werealso introduced, "Radial Flow Filtering" of the correcteddifference flow and "Second-Order Flow Filtering" prior

411

Page 7: Dynamic Stereo: Passive Ranging to Moving Objects from Relative Image Flows

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-8. NO. 4, JULY 1986

to differencing. The second method is preferred, but mayonly be applied to image velocities generated by a singlesurface patch.

It is anticipated that Dynamic Stereo may have a num-ber of interesting applications. First, it may be used inconjunction with single image flow analysis, providing thescale factor required for the complete recovery of objectstructure and space motion from time-varying imagery[11]. Second, it may prove useful in industrial roboticsfor handling moving workpieces in an evolving work-place. Two cameras would have to be configured on dif-ferent parts of the robot arm so that they will experiencea relative motion. Third, Dynamic Stereo has potentialapplication to autonomous vehicle navigation for rangingto other moving objects in the scene. Possible configura-tions are two cameras in relative motion on a land vehicle,or one camera on each of two aircraft in known relativemotion.To appreciate the distance scales involved with this ap-

proach to passive ranging, we can try to scale up our sim-ulation examples to the case of land vehicles. A vehicletraveling at 50 km/h moves about 50 feet in one second.Upon the vehicle there is mounted a fixed camera and asliding camera which moves about 5 feet during the onesecond interval. The scale factors recovered would thencorrespond to a distance of the order of 500 feet. Our sim-ulations indicate that best results are obtained when thesliding camera moves at an angle with respect to the di-rection of vehicle motion.

Recently, we have become aware of the important psy-chophysical and neurophysiological experiments of Re-gan and Beverley [ 161 and their colleagues. In particular,they have proposed that "there are neural organizationsin the human visual pathway that act as filters sensitive tothe relative velocity of the left and right retinal images."These experiments, in addition to the work presented hereon Dynamic Stereo, have motivated us to explore furtherthe structure of binocular image flows [17].

ACKNOWLEDGMENT

The help of G. Reynolds in preparing this paper isgratefully acknowledged.

REFERENCES

IlI R. A. Jarvis, "A perspective on range finding techniques for com-puter vision," IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-5, pp. 122-139, 1983.

[2] D. Marr and T. Poggio, "A computational theory of human stereovision," Proc. Roy. Soc. London, vol. B204, pp. 301-328, 1979.

13] W. E. L. Grimson, From Images to Surfaces. Cambridge, MA: MITPress, 1981.

141 H. K. Nishihara, "PRISM: A practical real-time imaging stereomatcher," Massachusetts Inst. Technol., Cambridge. Artificial In-tell. Memo 780, 1984.

151 J. E. W. Mayhew and J. P. Frisby, "Psychophysical and computa-tional studies towards a theory of human stereopsis," Artificial In-tell., vol. 17, pp. 349-385, 1981.

161 H. P. Moravec, Robot Rover Visual Navigation. Ann Arbor, MI:UMI Research Press, 1981.

[7] R. Y. Tsai, "Multiframe image point matching and 3-D surface re-construction," IEEE Trans. Pattern Anal. Machine Intell., vol.PAMI-5, pp. 159-173, 1983.

[81 R. Nevatia, "Depth measurement from motion stereo," Comput. Vi-sion, Graphics, Image Processing, vol. 9, pp. 203-214, 1976.

[9] T. D. Williams, "Depth from camera motion in a real world scene,"IEEE Trans. Pattern Anal. Machine Intell., vol PAMI-2, pp. 511-516, 1980.

110] A. M. Waxman and S. Ullman, "Surface structure and 3-D motionfrom image flow: A kinematic analysis," Center Automation Res.,Univ. Maryland, Tech. Rep. 24, Oct. 1983; also Int. J. Robotics Res.,vol. 4, no. 3, pp. 72-94, 1985.

II] A. M. Waxman, "An image flow paradigm," Center AutomationRes., Univ. Maryland, Tech. Rep. 45, Feb. 1984; also in Proc. 2ndIEEE Workshop Comput. Vision: Representation and Control, Apr.1984, pp. 49-57.

112] A. M. Waxman and K. Wohn, "Contour evolution, neighborhooddeformation and global image flow: Planar surfaces in motion," Cen-ter Automation Res., Univ. Maryland, Tech. Rep. 58, Apr. 1984;also Int. J. Robotics Res., vol. 4, no. 3, pp. 95-108, 1985.

113] K. Wohn and A. M. Waxman, "Contour evolution, neighborhooddeformation and local image flow: Curved surfaces in motion," Cen-ter Automation Res., Univ. Maryland, Tech. Rep. 134, July 1985.

114] H. C. Longuet-Higgins and K. Prazdny, "The interpretation of amoving retinal image," Proc. Roy. Soc. London, vol. B208, pp. 385-397, 1980.

[151 S. Sinha and A. M. Waxman, "An image flow simulator," CenterAutomation Res., Univ. Maryland, Tech. Rep. 71, July 1984.

1161 D. Regan and K. 1. Beverley, "Binocular and monocular stimuli formotion in depth: Changing-disparity and changing-size feed the samemotion-in-depth stage," Vision Res., vol. 19, pp. 1331-1342, 1979.

1171 A. M. Waxman and J. H. Duncan, "Binocular image flows: Stepstoward stereo-motion fusion," Center Automation Res., Univ. Mary-land, Tech. Rep. 119, May 1985; also IEEE Trans. Patterni Anal.Machine Intell., 1986, to be published.

Allen M. Waxman (M'84) received the Bache-i 'g' ~ blor's degree in physics from the City College of| | |New York in 1973 and the Doctorate degree in

astrophysics from the University of Chicago, Chi-cago, IL, in 1978 for his work on spiral wave in-stabilities in galaxies.He spent 1979-1980 as an Instructor of Ap-

0 2 lllpliedMathematics at the Massachusetts Instituteof Technology, Cambridge, and 1981 as a Visit-ing Fellow in Applied Mathematics at the Weiz-mann Institute of Science, Israel, where he began

developing his theoretical approach to image flows. He served on the re-search faculty of the University of Maryland's Computer Vision Labora-tory until 1985, where he continued his work on time-varying image anal-ysis, as well as exploring vision for autonomous vehicle navigation.Currently, he is Associate Professor of Electrical, Computer, and SystemsEngineering at Boston University, Boston, MA.

'__.' Sarvajit S. Sinha (S'82-M'82) was born in Patna,India, on September 22, 1960. He received theB.Tech. degree in electrical engineering from theIndian Institute of Technology, Kanpur, India, in1982 and the M.S. degree in computer sciencefrom the University of Maryland, College Park,in 1984.

Since then he has been a Research Scientist atthe General Motors Research Laboratories, War-ren, MI, working in the areas of geometric mod-eling and planning. His research interests are in

the areas of geometric modeling, vision, and robotics.Mr. Sinha is a member of the Association for Computing Machinery.

412