Velocity determination in scenes containing several moving objects

Velocity Determination in Scenes Containing Several Moving Objects *

A method is described whicah qllantifies thca speed and direction of several moving objects in a sequence of digital images. A relationship between the time variation of intensity, the spatial gradient, and velocity has been devrloped which allows the determination of motion using clustering techniyllos. This paper describes these relationships, the clustering techniqrlr, and providrs c~samplrs of thr, t,cac*hniqrlo 011 real images u)ntnin- ing srveral moving obj(~c%s.

Information about the velocity of regions in an imaga can be important to the scene analysis process. This information can contribute significantly t,o scene interpretation : Object trajectories may bc calculatad, struct’ural ambiguities may be resolved, important, wcnts notc>d, and future situations predicted. At lowr lwcls, information about mot’ion of small st’rwtures in an image scquenrc can bn used as an important cue for image wgmentation. X discontinuity in t,hc> wlocity of small image areas strongly impliw the c>xistoncc of an object boundary. Visually dissimilar image regions with the samcl wloc+t,y may Gthw b(> part, oi tho same objwt or wmant’ically linked in the owrall swnc’ intcrprct8ation. If an obscrwr is moving, p(wciwd motion rc~prcwnts an important d(>pth w(~.

Because of the valw of motion information, incwasing at)tention is being paid to methods analyzing t,imc-varying visual c~nvironmcnts. Howwer, many of thtl methods for motion analysis dewlopt:d to datth aw comput’ationally expt&w, require already scbgmentcxd picturw, or arc’ ineffwtiw whc~n object, boundaries aw

014M64X/7Y/040301-15302.00/0 Copyright 0 1979 by Academic. l’re~~. Inc.

\II righter uf reproduction in any form reserved.

hidden by occlusions or picture horders. In this paper we dtrscribe an efficient m&hod for computing the speed and diwction of one: or more moving objcbcts in a sccnc. Our mc%hod rcyuiwa neit’hcr previously wgmcntcd images nor accrss to complete boundaries within tho images. In following sections, we briefly survq the r&want literature, describe our now mclthod, and dcmonstratc the techniqw on a number of examples.

Swcral approaches have been used to dctermino vclocitics from image sc- qucnccs. Most methods arc based on a matching procedure. A pat&n in one image frame is sear&cd for in a succeeding image framr. If the pattern is found, the velocity is calculated from the positional shift. An alternate approach is to dirxtly calculate the velocity information from the spatial gradiwt of the images and t,hc local intwsity changes ovw time due t.o motion. Matching tcchniyws have been pwformed at a variety of picture intcrpret’ation levels. Badler [l] has dcscrib(sd a method whjch matches objwts in a series of images in which the objwts have bcon located and idwtificd. Hcrc th(b matching task is relatiwly modest and the major computational effort is in the mot’ion intwpr(%ation. Aggarwal and Duda [2] dcvclop a twhniquc for matching vcrtiws in a domain of highly occluded objwts. Their approach rctquircls accurate boundary d&crmina- tion in c>ach framct but dew not prwupposc any static interpretation. Potter [3] suggest’s matching certain low-level prop(hrt8icbs of wgions. Oth(>r rcwarchrlrs have pclrformc>d the mat’ching on th(l unprowswd image using cross correlation [4, 51. A rcwnt approach diffcrencw succcssivtt framw and th(ln tracks the arcas of large diffcrcnccs to dcterminc wloc4tics [C;, 71. .1 mow complctc surwy of motion analysis is found in [S].

VELOCITY I~)I’TEI~MISATIO? :cos

IGach of these methods has limitations. Obviously, procedures which rcquiw an accurate segmentation cannot bc used to assist in the segmentation prows it’sclf. Methods dependent, on cross correlation arc computationally expclnsivcb. Whilo some efficiencies are possible [9, lo], much effort is still required to prrform the matching at a dense enough sampling of points to be useful for tasks such as ;scgmentation. Techniques using frame-to-frame differences provide cfficicncs?., but. require sophisticated higher-lewl processing to deal rffcctivcly with occlusions and situations in which object boundaries arc not, completely within thcl imagcl frame [7].

h nonmatching approach to motion analysis of simpl(l scenes has been dwolopc~d by Limb and Murphy [ll]. Thry rclatc int’cnsity changes ovw time at a point, to t’hc: spatial intc:nsity gradient in the neighborhood of t,hc point. The framcl ratcb is a,ssurned to bc sufficiently rapid such t’hat the value of the gradient at) a point dots not change significantlJr bctwwn frames. Motions along the s and !/ axw arc twated independently. If AI is the> int’twity change at’ a paint dw to .P asis motion As, then it is claimed t’hat

whcrcl G, is the spatial gradient of the intwsity along tht> .f axis. (The minus sign is ncccssary because t’he surfaw, rathc>r than the obwrvation point), is moving.) ,2 similar relationship holds for motion only along the y axis. Figure 1 illustratw the effect. Unfortunately, AZ is affected by motion along both axes and so s-axis motion contributes to the Ay estimate w-hilt y-axis motion contributes t’o the A.1

term. Thus the values ef the Ax and Ay may be inaccurate for any single images point. However, if the direction of th(‘ gradiclnt is randomly dist,ributcd, the avcragcb values of Ac and Ay give an accurate> wtimatc of the true velocity. The> dcpchndcncy on average value limits the mc+hod to the analysis of a singlt: motion.

I’ENNI~XA AND THOMPSON

FIG. 3a. Toy problem

FIG. 3b. Blurred pair.

In the next section, WC describe a significant extension to the Limb and Murphy approach: Our m&hod allows the analysis of several moving objects, does not require randomly distributed gradients, and can identify individual pixels bc- longing to a moving object with a particular velocity. Moreover, there is no need for any image segmentation prior to application of the method.

3. THE GRADIENT-INTENSITY TRANSFORM METHOD

Our technique, which we call the Gradient Intensity Transform Method (GITM), is a nonmatching approach which develops velocity information from

th b tinw variation of intjcmsit>. at a piwl duck to t 1~1 motion and from the spaCal vaI *iat’ion of th: intmsity functioll. It is applicak)l(b for rigid I)ody translation not i nv .solving rotation. In thr follo\ving paragraphs, \VC \\-ill show that, the tinub var -iation of intmsity and the spatial gradicmt, at, clacah point place constraints on the possible speed and dircLc,tion of the, srwfau~ or ol)jwt at that, point. 7’li(w~

V!ALOCITY 1~15T~~I:~21S.4TION 305

306 FENNlMA ANI, THOiVPSON

const xaints do not’, hawcvrr, uniquely dotermimi the vtrlocity vector; but, suffic mient number of image points correspond to a surface with a common velol the T relocity vector can bc determined using clustering techniques. WC desc a mo ‘dified Hough t’ransform clustering approach which is used to determine veloc :ity.

FIG. 3e. Pixel classification for Object 2.

FIG. 3f. Filtered rlasuificntion for Object 1.

if a city, :ribe this

\ l~;I,o~‘I’r\- I~l~:‘l’l~:I~\lIN.~‘I’IoS :1Oi

30s PENNEMA AND THOMPSON


FIG. 4~. Unfiltered pixel classification for Object 1.

In our case, however, it is the underlying surface that is moving, not the sampling point. Thus, if ds is the incremental translation of the surface

di = -G.ds. (1)

It will be useful to represent both velocities and gradients in a magnitude-phase (polar coordinate) notation

G = (IGI, 0~1, v = ([VI, 8,).

We can relate translational differences to velocity and time by

ds = vdt. Combining (1) and (2)

di = -G.vcZt = - [GI /VI cos (0, - 8,)dt.

(2)

(3)

VELOCITY I~)I’TEI~MISATIO? 3 )!I

310 FENNEMA AND THOMPSON


FIG. 5c. Unfiltered pixel classification for Object 1.

For digital imagery, we must adapt the above result to a situation in which the image functions are known only at discrete sample points. Gradients are estimated using a Sobel operator [12] extended to provide gradient direction as well as magnitude. Sampling over time is also discrete. Since the above analysis is only valid if the gradient at a point remains constant over the time between samples, we only consider points where the gradients at time 11 and tz satisfy the restrictions

~Ml

H(;(tl) - N(:(lJ ; < (I.:

or

zT - IfhG(t,j - o,;it?j, < (I!.

Furthtrrmoro, XV-~ only prowss points I’or \vhic.ll thcb intcwsity charyy b~~1ww11 sampl(~d framw, AI, is sufficicwtly larg(~ :

I AI > (I:(

This xcrws t’cl eliminatcb point,s whew thcb int~cwsity c~hangc~ i!: duct primarily to scannc~r noise.

‘I% number of image points satisfying t~lww constraints is gwatly incrc~awtl if th(x image function is first lo\v-pass filttwd. ,4 smoothing operation roduccs noise an,d increases the c>xtclnt ovw \vhich gradicwts wmain approximat’cly constant while suppressing thosr areas whrrc tc>xtural variat)ions haw a short1 enough period to cau:<c confusion. In addition, smoothing allow uw of thtx tc~chniqw on edges which would othrrwise be discontinuous pointa in the image function. The dcsirahlc amount of blurring wlatw to thcl wloc~itks cxpwtcd in thcl imagcq~. The extent of the blurring funct’ion (i.cl., the c#wtivc rvidt’h of the point spwad function of the blur) should b(, at, lcast as grciat as t tic> maximum c~xpcc’tcd velocity. The value is not critical. Blurring can tw ctakly impl~~mc~nt~c~tl by dvforusi~~g, as WP have done here, and thus involvw no wtra wmputation.

To dctctrminc actual wlocity (Iv 1, 0,). WC USC’ a modification of t,hcz Hough transform [13. 141. At) c>ach pointj, A1 i< usc~d J, * S an wtimatc~ of r/i/(/t. From (A),

FENNEM8 AND THOMPSON

FIG. 5e. Unfiltered pixel classification for Object 3.

the possible values of v that could give rise to AI at that point are constrained to lie along the curve

Iv1 = -2- = - -- AI

cos (f&3 - e,) IGr’lcos (0, - 6,) ’

If the direction of the gradient varies at different points of the moving object, then values at those points will map into different curves in ( j v j,0,) space. The intersection of these curves will uniquely specify v.

Solving for this intersection analytically is a complex task, but the Hough transform approach allows a significant computational simplification. ( 1 v I, 0,) space is treated as a discrete array of accumulators. For each image point, AI and G are used in Eq. (5) to trace a curve through ( 1 v 1 , 0,) space corresponding to possible velocities. This curve is digitized onto an accumulator grid and then

TABLR 1

Toy Problem

Object Computed Computed direction speed

Measured direction

Measured speed

Charlie 0 2 0 2 Snoopy 3.141 2 3.141 2

:: 1 :i

isach accumulator corrctsponding to a point’ on t hr: wrvc is incwmc9tcd. .2n arbit,rary uppc’r bound for / v : is c~hown to d(lal with situat,ions \~hc~rc~ (‘OS (& - 0,) 7~: 0. Whc~~ t’hc (wrvos cwwsponding t)o all that points in the pic*tuw haw bcwn ontcwd, a ptlak in the awumulator array corresponds t,o a wlocil y.

In situations involving mow than orw moving objwt, diffwcntJ moving surfaws \vill usually b(b manifcstckd as dist,inct pchaks in t#ho ( iv 1, 0,) accumulat’ors. The sin;glc most, prominc>nt, pclak in the accumulat’or array is found. Tho corrwponding vt~locit y is labc~lod vl, and all points sat isf!.ing thcl wlationship

for some appropriat,c thwshold t are wsigwd wlocit’y cl. The> analysis is rqwatcxd using the origmal imagch data minus those points already assigned a veloc+it>>*. This prowdurc continues until no sufficic>ntly wc4-defined peaks are found in the accumulator array. The assignment of wlocitirs by this twhniyuc: is subjwt to (‘rror dc>pclndcnt on the order in which peaks arc found and the magnitudfl of the t hrcshold t. This error may b(> rcduwd by a final pass in which rach image 1 joint is assigned to t.hc1 clowst of the candidate wlocit~~ vwtors.

Tht> GITM approach has hwn applied t)o a variclty of sccrw scqwncw with good rwults. Figuwt 3 t’o 5 are tJyica1 of tho kind of sc~~nw \\-hich have been sucwssfull~ ana,lyzcbd. All imagw ww digitized from wal 3D swnw t)o a 128 X 128 array of &bit monochromcl pixels. From this data G I’l’M \vas abk to dctcrminc t,ht> vclocit\. magnitude t’o witIhin 1 pixc4/framcL (Tablcxs 1, 2, and 3) and was able t,o classif! pixl>ls belongiILg t,o rach wlocit,y fairI!- awuratc4y. Mowovw, thcl t8echniqw works ~~11 with SWI~PS c~ontaining wvc,ral moving objwts. The swna in Fig. 3

314 FENNEMA AND THOMPSON

contains t’wo objects moving in oppositBe directions : Snoopy to the left and Charlie to the right,. Table 1 she\\-s how t’hc computer output of GITM compares with the speed and angle measurements obtainc>d from the blurred images on a CRT using a track ball (to nearest pixel/frame). Thcsc two results correspond to the two clearly evident cosine curves in (oG, 0,) space shown in Fig. 3c. Using the velocity measurements from GITM the original picture points were then classified according to velocity in the manner explained in the last paragraph of the previous section. Figure 3d shows those points associated with velocity (2, 0) and Fig. 3e those which correspond to velocity (2, r). Figures 3f and g result from Figs. 3d and e through the use of a filter which assigns to each pixel the value which most often occurs among the neighboring pixels. This classification of pixels based on their velocity is good enough to suggest a segmentation procedure based on motion.

Figure 4 demonstrates that GITM can distinguish between two velocities which have the same direction. As shown in Table 2, Snoopy and Charlie move in the same direction, but at about 1 pixel/frame and 4 pixels/frame, respectively. The GITM computed results are in close agreement. Figures 4c and d show how well the pixels are classified.

The performance of GITM on three moving objects is shown in Fig. 5 and Table 3. Note that in the figure the lens partially occludes the hammer. The local nature of the method makes it relatively insensitive to changes along the occlusion boundary. GITM classifies the pixels fairly well, and the velocity calculations are reasonably close to those measured with the exception of the hammer velocity angle. This error is due to blurring of peaks in (v, 0,) space and suggests a possible weakness of the procedure as a measurement method for a large number of moving objects. However, for scene analysis purposes the accuracy is sufficient, and the classification capability will prove quite useful.

5. CONCLUSIONS

This paper has described a nonmatching procedure (GITM) which can detect and quantify the velocities present in a sequence of digital images containing several moving objects. Results have shown that this procedure can handle occlusion, works on blurred images, can determine the velocity angle within r/8, and can determine the velocity magnitude within about 1 pixel/frame. Furthermore, a method was described which uses the velocities obtained from GITM to classify the pixels in the image according to velocity. This classification is good enough to suggest a possible motion-based segmentation of the scene into objects.

REFERENCES

1. N. Badler, “Temporal Scene Analysis: Conceptual descriptions of object movements,” Tech. rep. 80, Dept. of Computer Science, Univ. of Toronto, February 1975.

2. J. K. Aggarwal and R. 0. Duda, Computer analysis of moving polygonal images, IEEE Trans. Computers C-24, 1975, 966-976.

3. J. L. Potter, Velocity as a cue to segmentation, IEEE Trans. Syst. Man. Cybern. SMC-5 1975, 390-394.

VELOCITY I~)I’TEI~MISATIO? 1il.i

Documents

Velocity determination in scenes containing several moving objects