12
The Journal of Neuroscience, December 1994, 14(12): 7381-7392 Transparent Motion Perception as Detection of Unbalanced Motion Signals. III. Modeling Ning Qian, a Richard A. Andersen, and Edward H. Adelson Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 In the preceding two companion articles we studied the conditions under which transparent motion perception occurs through psychophysical experiments, and investigated the underlining neural mechanisms through physiological recordings. The main finding of our perceptual experiments was that whenever a display has finely balanced motion signals in all local areas, it is perceptually nontransparent, and that transparent displays always contain motion signals in different directions that are either spatially unbalanced, or unbalanced in their disparity or spatial frequency contents. In the physiological experiments, we found two stages in the processing of transparent stimuli. The first stage is located primarily in area V1. At this stage motion measurements are made and V1 cells respond well to both the balanced, nontransparent stimuli and the unbalanced, perceptually transparent stimuli. The second stage is located primarily in area MT. MT cells show strong suppression between opposite directions of motion. The suppression for the unbalanced, transparent stimuli is significantly less than that for the balanced, nontransparent stimuli. Therefore, the activity in the second, MT stage correlates better with the perception of motion transparency than the first, V1 stage, which does not distinguish reliably between transparent and nontransparent motion. The above experiments suggest a two-stage model of motion perception with a motion measurement stage in V1 and an opponent-direction suppression stage in area MT. In this article we explicitly test this model through analysis and computer simulations, and compare the response of the model to the perceptual and physiological results using the same balanced and unbalanced stimuli we used in the experiments. In the first stage of the computational model, motion energies in different spatial frequency and disparity ranges are extracted from each local region. Similar to V1, this stage does not distinguish between the balanced and unbalanced stimuli. In the subsequent stage motion energies of opposite directions but with same spatial frequency and disparity contents suppress each Received Aug. 18, 1993; revised Apr. 18, 1994; accepted Apr. 27, 1994. We are grateful to Eero Simoncelli and John Wang for their help with OBVIUS program. We would also like to thank Bard Geesaman and two anonymous reviewers for their comments on earlier versions of the manuscript. The research is supported by Office of Naval Research Contract N00014-89-J1236 and NIH Grant EY07492, both to R.A.A. N.Q. was supported by a McDonnell-Pew postdoctoral fellowship during the early phase of this work. Correspondence should be addressed to Dr. Richard A. Andersen, Division of Biology, 217-76, California Institute of Technology, Pasadena, CA 91125. aPresent address: Center for Neurobiology and Behavior, Columbia University, New York, NY 10032. Copyright © 1994 Society for Neuroscience 0270-6474/94/147381-12$05.00/0 other using subtractive or divisive inhibition. This stage responds significantly better to the transparent stimuli than to the nontransparent ones, in agreement with MT activity. [Key words: motion transparency, motion energy models, stereopsis, motion-stereo integration, spatial frequency, com- puter modeling] Motion is a rich source of various types of useful information. For example, it allows us to determine three-dimensional structures of moving objects, and to segment a complex scene into its meaningful parts (see Nakayama, 1985, for a review). The task of motion detection is relatively easy when there is only a single point-like object moving on a blank background. Our visual system, however, has to handle (and is able to handle) much more complicated situations. For instance, when an object with differently oriented boundaries is moving in a certain direction, it generates local motion vectors that are perpendicular to the boundaries (the "aperture problem"). These vectors may not be pointing in the true direction of motion. Even more complex is the situation where there are partial occlusions and translucent surfaces in a scene with moving objects. In these cases the visual system has to represent more than one motion i n the same part of space—the problem of transparent motion perception. A laboratory demonstration of motion transparency uses two independent sets of random dots moving in opposite directions in the same part of a video monitor. Two transparent surfaces, one defined by each set of dots, are seen as continuously and independently moving across each other. In the first of this set of three studies, we performed psycho- physical experiments for determining the conditions under which transparent motion perception occurs (see preceding companion article, Qian et al., 1994). We found that displays with locally well-balanced motion signals in opposite directions are percep- tually nontransparent. The transparent displays, on the other hand, contain locally unbalanced motion signals in different directions. Furthermore, if the two components of the spatially balanced displays are at different depths, or contain very different spatial frequency contents, the displays appear transparent. These displays contain motion signals that are unbalanced in binocular disparity or spatial frequency. Based on these results, we proposed that local suppression among different directions of motion within each disparity and spatial frequency channel could be the mechanism for distinguishing transparent displays from nontransparent ones. Nontransparent displays presumably maxi- mize the suppression in all frequency and disparity channels and therefore evoke relatively weak responses. We also investigated motion transparency physiologically

Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

The Journal of Neuroscience, December 1994, 14(12): 7381-7392

Transparent Motion Perception as Detection of Unbalanced MotionSignals. III. Modeling

Ning Qian,a Richard A. Andersen, and Edward H. Adelson

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

In the preceding two companion articles we studied theconditions under which transparent motion perception occursthrough psychophysical experiments, and investigated theunderlining neural mechanisms through physiologicalrecordings. The main finding of our perceptual experiments wasthat whenever a display has finely balanced motion signals in alllocal areas, it is perceptually nontransparent, and thattransparent displays always contain motion signals in differentdirections that are either spatially unbalanced, or unbalanced intheir disparity or spatial frequency contents. In the physiologicalexperiments, we found two stages in the processing oftransparent stimuli. The first stage is located primarily in area V1.At this stage motion measurements are made and V1 cellsrespond well to both the balanced, nontransparent stimuli andthe unbalanced, perceptually transparent stimuli. The secondstage is located primarily in area MT. MT cells show strongsuppression between opposite directions of motion. Thesuppression for the unbalanced, transparent stimuli issignificantly less than that for the balanced, nontransparentstimuli. Therefore, the activity in the second, MT stage correlatesbetter with the perception of motion transparency than the first,V1 stage, which does not distinguish reliably betweentransparent and nontransparent motion.

The above experiments suggest a two-stage model of motionperception with a motion measurement stage in V1 and anopponent-direction suppression stage in area MT. In this articlewe explicitly test this model through analysis and computersimulations, and compare the response of the model to theperceptual and physiological results using the same balancedand unbalanced stimuli we used in the experiments. In the firststage of the computational model, motion energies in differentspatial frequency and disparity ranges are extracted from eachlocal region. Similar to V1, this stage does not distinguishbetween the balanced and unbalanced stimuli. In thesubsequent stage motion energies of opposite directions but withsame spatial frequency and disparity contents suppress each

Received Aug. 18, 1993; revised Apr. 18, 1994; accepted Apr. 27, 1994.We are grateful to Eero Simoncelli and John Wang for their help with OBVIUS

program. We would also like to thank Bard Geesaman and two anonymousreviewers for their comments on earlier versions of the manuscript. The research issupported by Office of Naval Research Contract N00014-89-J1236 and NIH GrantEY07492, both to R.A.A. N.Q. was supported by a McDonnell-Pew postdoctoralfellowship during the early phase of this work.

Correspondence should be addressed to Dr. Richard A. Andersen, Division ofBiology, 217-76, California Institute of Technology, Pasadena, CA 91125.

aPresent address: Center for Neurobiology and Behavior, Columbia University,New York, NY 10032.

Copyright © 1994 Society for Neuroscience 0270-6474/94/147381-12$05.00/0

other using subtractive or divisive inhibition. This stage respondssignificantly better to the transparent stimuli than to thenontransparent ones, in agreement with MT activity.

[Key words: motion transparency, motion energy models,stereopsis, motion-stereo integration, spatial frequency, com-puter modeling]

Motion is a rich source of various types of useful information.For example, it allows us to determine three-dimensionalstructures of moving objects, and to segment a complex sceneinto its meaningful parts (see Nakayama, 1985, for a review).The task of motion detection is relatively easy when there i sonly a single point-like object moving on a blank background.Our visual system, however, has to handle (and is able to handle)much more complicated situations. For instance, when an objectwith differently oriented boundaries is moving in a certaindirection, it generates local motion vectors that are perpendicularto the boundaries (the "aperture problem"). These vectors maynot be pointing in the true direction of motion. Even morecomplex is the situation where there are partial occlusions andtranslucent surfaces in a scene with moving objects. In thesecases the visual system has to represent more than one motion inthe same part of space—the problem of transparent motionperception. A laboratory demonstration of motion transparencyuses two independent sets of random dots moving in oppositedirections in the same part of a video monitor. Two transparentsurfaces, one defined by each set of dots, are seen as continuouslyand independently moving across each other.

In the first of this set of three studies, we performed psycho-physical experiments for determining the conditions under whichtransparent motion perception occurs (see preceding companionarticle, Qian et al., 1994). We found that displays with locallywell-balanced motion signals in opposite directions are percep-tually nontransparent. The transparent displays, on the otherhand, contain locally unbalanced motion signals in differentdirections. Furthermore, if the two components of the spatiallybalanced displays are at different depths, or contain very differentspatial frequency contents, the displays appear transparent.These displays contain motion signals that are unbalanced inbinocular disparity or spatial frequency. Based on these results,we proposed that local suppression among different directions ofmotion within each disparity and spatial frequency channel couldbe the mechanism for distinguishing transparent displays fromnontransparent ones. Nontransparent displays presumably maxi-mize the suppression in all frequency and disparity channels andtherefore evoke relatively weak responses.

We also investigated motion transparency physiologically

Page 2: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

7382 Qian et al. • Motion Transparency. III. Modeling

Figure 1. The motion energy model used in most of our simulations forexplaining the perceptual difference of transparent and nontransparentdisplays. Motion sensitivity is generated by the spatiotemporally orientedfilters shown schematically in the square boxes. There are two stages in themodel. The first stage computes unidirectional motion energies by squaringand then summing the outputs of a quadrature pair of oriented filters. In thesecond stage, motion energies from opposite directions suppresses eachother. Subtractive inhibition is indicated in the figure, but we have alsoconsidered divisive inhibition (see text).

through single-unit recordings from V1 and MT (see precedingcompanion article, Qian and Andersen, 1994). We found that thedirectionally selective V1 cells responded quite well to bothtransparent and the nontransparent stimuli. On the average, theycould not reliably distinguish the two types of displays. On theother hand, MT cells' responses to displays with twocomponents moving in opposite directions are stronglysuppressed in comparison to their preferred direction responses.More importantly, the suppression is significantly higher (orthe response weaker) for the nontransparent stimuli than for theperceptually transparent ones.

Our physiological experiments indicate a two-stage model ofmotion perception with a motion measurement stage in V1 andan opponent-direction suppression stage in area MT. Ourpsychophysical experiments further suggest that the suppressionin the second stage should be spatial frequency and disparityspecific. In this article, we present analysis and computersimulations using such a two-stage model in order to demonstratemore quantitatively that a disparity- and spatial frequency-specific suppressive mechanism can indeed account for thedifference in the perceptual transparency of our displays.

We use motion energy models (Adelson and Bergen, 1985;Watson and Ahumada, 1985) and their extension to disparitysensitivity (Qian, 1994) for motion measurements in the firststage. Many models for biological motion processing have beenproposed. We choose motion energy models because of theirbiological plausibility. Although certain versions of this classof models have been shown to be equivalent to the Reichardtmotion detectors (Reichardt, 1961; Adelson and Bergen, 1985;van Santen and Sperling, 1985), the unidirectional motionenergy stage of the model is not equivalent to any stage in theReichardt detector (Emerson et al., 1992). There is physiologicalevidence suggesting that directionally selective cells in theprimary visual cortex behave as if they compute unidirectionalmotion energies (Reid et al., 1987; McLean and Palmer, 1989;Snowden et al., 1991; Emerson et al., 1992). For the second,suppression stage of the model, we consider both subtractive(Adelson and Bergen, 1985) and divisive (Snowden et al., 1991;Heeger, 1992) types of inhibition among different directions ofmotion.

We first apply the model to those displays without disparitycues. The spatial frequency effect can be explained by the energymodels since energy detectors naturally contain frequencyselectivity. We then consider the effect of binocular disparity.Since standard energy models do not contain disparity tuning, wehave recently developed a model for biological stereopsis andcombined it with motion energy models into a commonframework (Qian, 1994). We show here through computersimulations that the extended model can account for thecontribution of disparity to motion transparency.

Preliminary versions of the results presented here haveappeared previously (Qian et al., 1991, 1992).

Analysis and SimulationsWe used the version of the energy models proposed by Adelsonand Bergen (1985) in our analysis and simulations. An essentialidea behind the model is that motion of an object through spaceover time can be described by an orientation in thespatiotemporal space (Fatale and Poggio, 1981; Adelson andBergen, 1985). The model uses spatiotemporally oriented linearfilters to detect motion. The linear mechanism for generatingmotion sensitivity is supported by recent intracellular studies ofdirectionally selective cells in cat visual cortex (Jagadeesh et al.,1993). The outputs of two linear filters with 90° phase differenceare squared and then summed to form the phase-independentunidirectional motion energy detector. Such a detector simplymeasures the Fourier power within a certain spatiotemporalfrequency window specified by its parameters.

We will examine whether our psychophysical and physiologi-cal results can be explained by assuming a suppressive stage inthe motion pathway at which motion signals in differentdirections from each small region inhibit each other. Differentsuppression mechanisms have been proposed in the past. Theyinclude subtractive opponency (Adelson and Bergen, 1985) anddivisive normalization (Adelson and Bergen, 1986; Snowden etal., 1991; Heeger, 1992). While there are important differencesbetween these two mechanisms (see Divisive inhibition, below),our psychophysical experiments are not designed to differentiatebetween them. In fact, both mechanisms can explain our resultswell. A schematic drawing of the motion energy model withsubtractive inhibition is shown in Figure 1.

We now give an explicit description of our model for motiontransparency. We assume that for each spatial location there is apopulation of motion energy detectors tuned to different rangesof spatiotemporal frequency (and thus to different directions andspeeds of motion) and disparity. To the first approximation, thisstage can be identified with V1 cells. At the second (suppressive)stage either the opponent energy (with subtractive inhibition) orthe normalized energy (with divisive inhibition) is computedwithin each spatial frequency and disparity channel from theinitial motion energy measurements. We propose that thissecond stage is performed by subunits in MT cells' receptivefields. We hypothesize that if a visual display generates largeenergies for both left and right directions at the suppressionstage and if these energies in different directions are spatiallymixed, the pattern is perceptually transparent. Note that forpatterns with a richer spectrum of frequencies, moreunidirectional energy detectors will be activated. Other thingsbeing equal, these patterns will activate more opponent ornormalized energy detectors and will therefore be more likely toappear transparent. This is consistent with our psychophysicalobservation that random dot patterns look transparent over a

Page 3: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

The Journal of Neuroscience, December 1994, 14(12) 7383

wider range of parameters than randomly spaced parallel linepatterns and that counterphase gratings and square wave gratingslook nontransparent over a wider range of parameters than equallyspaced line patterns (Qian et al., 1994).

In the above description, we assume that motion suppressionoccurs at the subunit level of MT cells. For completeness we furtherpropose that all subunits of an MT cell have similar tuningproperties and that the overall response of an MT cell is equal tothe summation of the thresholded output of all its subunits. Thisstep could be viewed as a part of the spatial integration processthat may be responsible for the coherent percept of transparentmotion (see Discussion), and will be explored in our futurepublications. For our present purpose of explaining the perceptualdifferences of our psychophysical stimuli, the activity at the MTsubunit level is sufficient.

For simplicity, we first consider the case of one spatialdimension and no disparity, and use subtractive inhibition forsuppression. We use Gabor filters for spatiotemporal orientationdetection. These filters correspond to V1 simple cell. A quadraturepair of such filters with even and odd phases tuned to leftward (-)and rightward (+) directions of motion are given by

gx t

x tex t x t

x t± = − −

±1

2 2 2

2

2

2

2πσ σ σ σω ωexp cos( ) (1)

gx t

x tox t x t

x t± = − −

±1

2 2 2

2

2

2

2πσ σ σ σω ωexp sin( ) (2)

where σx and σt determine the widths of the filters in the spatial and

temporal domain, respectively, and ωx and ωt are the central angularspatial and temporal frequencies. The ratio of the latter twoparameters determines the orientation of the filters in thespatiotemporal space. All these variables are assumed to bepositive. Also note that the areas under the Gaussian envelopes ofthe filters are normalized. This is important for comparing resultsfrom filters of different scales. The responses of these filters tovisual stimuli are given by the convolution operation.

The phase-insensitive leftward and rightward motion energiesfor a spatiotemporal pattern f(x,t) are defined as

E+(x,t) = [f ∗ ge+]2 + [f ∗ go

+]2, (3)

E-(x,t) = [f ∗ ge-]2 + [f ∗ go

-]2, (4)

where ∗ denotes convolution. Responses at this stage correspondto directionally selective complex cells. The opponent motionenergy is defined as the difference of the two:

E(x,t) = E+(x,t) – E-(x,t) (5)

Under these definitions, positive (negative) opponent energyindicates the rightward (leftward) motion and a value around zeromeans that no motion is detected. Of course, neurons can only firepositively. In reality, the leftward and rightward opponent motionenergies have to be carried by two different populations of MTcells with subunits having threshold nonlinearity. It is obviousthat the introduction of opponency will cause substantialcancellation of motion energies from opposite directions. What weare interested in here, however, is whether the residual responsesafter the cancellation would indeed be quite different for ourtransparent and nontransparent patterns. We now make someexplicit calculations and computer simulations for some of thesepatterns.

Counterphase gratings

We start with counterphase gratings, which are composed of twoidentical sine wave gratings moving across each other in oppositedirections. Counterphase gratings are perceptually nontransparent.They can be represented mathematically as

fcp(x,t) = sin(Ωxx – Ωt) + sin(Ωxx + Ωt) (6)

where Ωx and ±Ωt are the spatial and temporal frequencies. Assumethat both Ωx and Ωt are positive, then the two terms on therighthand side of Equation 6 represents two identical sine wavegratings moving to the right and left, respectively. As is shown inthe Appendix, the opponent energy for the counterphase grating i sexactly

Ecp(x,t) = 0, (7)

for all (x,t) (i.e., for filters located at any position at any time),independent of the parameters for the Gabor filters and the grating.This result explains the lack of transparent motion perception forcounterphase gratings. Note that while it is intuitively obviousthat opponent energy should be small for the counterphasegratings, the exact null result in Equation 7 is a consequence ofusing quadrature Gabor filters and subtractive inhibition.

Spatial frequency specificity

While two identical sine wave gratings moving across each otherlook like flicker, two sine gratings with very different spatialfrequencies are perceptually transparent (Qian et al., 1994). Thissuggests that the suppression between different directions ofmotion is limited within each spatial frequency channel. Themotion detectors in energy models already have frequencyselectivity built into it. From the Fourier transformation of theGabor filters in Equations 1 and 2, it can be shown that thefrequency responses of these filters are centered around (±ωx, ±ωt),with the bandwidth (defined at halfpeak amplitude) along eachdimension equal to

bw = +−

logln

ln2

2 2

2 2

ωσωσ

(8)

where ω and σ represent the central angular frequency and Gaussianwidth of the given dimension and ln stands for natural logarithm.To model frequency specificity of directional suppression, wetherefore apply opponency only to the two unidirectional motionenergies computed with filters of identical frequency selectivity buttuned to opposite directions of motion.

We performed computer simulations on a display composed oftwo different sine wave gratings with spatial frequencies equal to1.5 cycles/degree and 6 cycles/degree, respectively. We alsoconsidered a counterphase grating with spatial frequency equal to 3cycles/degree for comparison. We used three sets of filters withspatial frequencies centered around 1.5, 3, and 6 cycles/degree.They represent three spatial frequency channels. The Gaussianwidths (σs) of the filters were 1/3, 1/6, and 1/12 of a degree,respectively (the actual widths of the filters, defined at the half-amplitude of the Gaussian envelope, are 0.78°, 0.39°, and 0.20°,respectively). Under these parameters, the bandwidths of all filtersare 1.14 octaves according to Equation 8. The results of oursimulations are shown in Figure 2. The spatiotemporalrepresentations of the display composed of two different sine wavegratings and that of the counterphase grating are shown in Fig- ure 2, a and b. Figures 2c-h represents the opponent energies from

Page 4: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

7384 Qian et al. • Motion Transparency. III. Modeling

Figure 2. Simulation for a counterphase grating and two sine wave gratingswith different spatial frequencies. All figures are shown in spatiotemporalspace. a and b are the spatiotemporal representations of the two types ofpatterns. c-h show the opponent energies in three different frequencychannels for the two patterns, all shown with the same gray scale. Grayindicates little motion energy and white and black code for rightward andleftward opponent energies, respectively. See text for the details of theparameters used.

the three frequency channels for the two types of displays. In thisfigure gray indicates little motion energy, and white and blackcode for rightward and leftward opponent energies, respectively.It is clear from Figure 2c-h that while the counterphase gratinggives no opponent motion energy in any of the three channels,the display composed of two different sine wave gratingsgenerates rightward opponent motion energy in the high-frequency channel and leftward energy in the low-frequencychannel. These results correlate well with the observation thatthe former is not perceptually transparent while the latter is. Ifthe inhibition were between the low- and high-frequencychannels, the motion energies in opposite directions wouldstrongly cancel each other, and the pattern would appear non-transparent, just like counterphase gratings.

Line patterns

We next turn to the line patterns. Our psychophysicalexperiments indicate that over a wide range of parameters,displays composed of two sets of randomly spaced parallel linesmoving across each other in opposite directions are perceptuallytransparent. If the spacings between every two adjacent lines are

made equal, however, the resulting equally spaced line patternsare nontransparent.

Two sets of N parallel lines with initial positions xn+ and xn

– (n= 1, 2, . . ., N), respectively, moving in the opposite directionswith speed v, can be represented by

f x t c x x vt x x vtL n nn

N

( , ) [ ( ) ( )]= − − + − ++ −

=∑ δ δ

1

, (9)

where δ() is the Dirac function and c is a constant with thedimension of the inverse of length. Assuming that v is positive,the two terms in the square bracket on the right-hand siderepresent lines moving to the right and left, respectively. It canbe shown (see Appendix) that under appropriate assumptions, theopponent energy for such a line pattern is given by

E cx x vt

v

x x vt

vLn

t x

n

t xn

N

≈ −− −( )

+

− −− +( )

+

+ −

−∑' exp exp

2

2 2 2

2

2 2 21 σ σ σ σ

+ −− −( ) + − −( )

+

+ +

≠∑∑c

x x vt x x vt

vn m

t xn m

N

n m

N

' exp( ),

2 2

2 2 22 σ σ

⋅ −[ ]+ +cos (ω x m nx x

− −− +( ) + − +( )

+

− −

exp( )

x x vt x x vt

vn m

t x

2 2

2 2 22 σ σ

⋅ −[ ]− −cos (ω x m nx x , (10)

where the constant c' is equal to c2/2π(v2σt2 + σx

2).The second summation is unlikely to achieve large magnitude

because x – xn+ – vt and x – xm

+ – vt (and similarly, x – xn– + v t

and x – xm– + vt) cannot both be zero at a given (x,t) for n ≠ m .

For randomly spaced line patterns the second summation is muchsmaller in magnitude than the first for the additional reason thatthe cosine terms are equally likely to be positive or negative. Wetherefore only need to consider the contribution of the firstsummation. The first term in the summation will generate a largepositive contribution to the opponent energy around thosepositions in the x-t plane such that x – xn

+ – vt is close to 0, thatis, around the trajectory of each right-going line in thespatiotemporal space. Similarly, the second term in the firstsummation will have a large negative contribution to theopponent energy when x – xn

+ + vt is very small in magnitude.These two terms will not sufficiently cancel each other due to therandom location of the lines. There will be both large positiveand large negative opponent energies across the pattern at anyinstance of time, indicating both the rightward and leftwardmotion (the presence of transparent motion).

For equally spaced line patterns we have

x x x x m n xm n m n+ + − −− + = − = −( )∆ , (11)

where ∆x is the spacing between two adjacent lines, so the twocosine terms in the second summation of Equation 10 are equal.At periodic time intervals, each line in one set is spatially veryclose to one, and only one, line in the other set and the twocorresponding terms in both summations of Equation 10 willstrongly cancel each other. This periodic loss of motion signalhelps to explain the oscillatory perception for these patterns.

The above argument is based on the assumption that filters ofreasonable sizes (σx and σt) are used. The difference between theopponent energy of an equally spaced line pattern and that of the

Page 5: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

The Journal of Neuroscience, December 1994, 14(12) 7385

corresponding randomly spaced line pattern certainly depends onthe choice of σx and σt. We consider here two extreme cases. First,let σx → ∞ and σt → ∞, that is, use extremely wide filters. It i sclear from Equation 10 that the opponent energy EL → 0 nomatter whether the lines are equally spaced or randomly spaced,indicating that both types of patterns should appearnontransparent. This agrees with our informal observation that apattern always become less transparent or nontransparent whenit is viewed with more eccentric part of retina, where the filtersizes are presumably larger. We next consider the case when thefilters are extremely narrow, that is, σx → 0 and σt → 0. Under thiscondition, Equation 10 reduces to

E x t c x x vt x x vtL n nn

N

( , ) = − −( ) − − +( )[ ]+ −

=∑ δ δ

1

. (12)

We basically recover the original line patterns with one set oflines contributing positive opponent energy, and the othernegative energy. Thus, there are motion signals in bothdirections even for the equally spaced line patterns, except at themoment when the two opposite-going components are about tosuperimpose. At that time, xn

+ – vt = xn- + vt for all n, and the

opponent energy in Equation 12 is 0. This corresponds closelyto what we observed for the equally spaced line patterns when thenumber of lines was small and therefore the spacing betweenlines was large (increasing the spacing between lines i smathematically equivalent to reducing σx and σx). Physio-logically, the motion sensitive filters are not arbitrarily large orsmall. As a result, we see randomly spaced line patterns as muchmore transparent than the corresponding equally spaced linepatterns over a range of parameters.

We did computer simulations with randomly spaced and equallyspaced parallel line patterns. In Figure 3, a and b are thespatiotemporal representations of two such line patterns. Eachfigure represents 5° in the spatial dimension and 2.5 sec in thetemporal dimension, respectively. The speed of all the lines i s2°/sec. Since each line is wrapped around when it moves out ofthe spatial window, there are 15 1ines moving in each of the twoopposite directions at any instance of time for both linepatterns. With these parameters the randomly spaced line patternis perceptually transparent while the equally spaced line patternis not, according to our psychophysical observations. We usedthe same three sets of Gabor filters as in Figure 2. They representlow-, medium-, and high-frequency channels, or equivalently,wide, medium, and narrow spatial scales. The central spatial andtemporal frequencies of the medium channel are equal to thefundamental spatial and temporal frequencies of the equallyspaced line patterns. The opponent energies from the threechannels for the two line patterns are shown in Figure 3c-h.Again, in this figure gray indicates little motion energy andwhite and black code for rightward and leftward opponentenergies, respectively. It is clear that the opponent energy forthe randomly spaced line pattern contains both rightward andleftward motion signals while that for the equally spaced linepatterns is much weaker. This corresponds well with the presenceand absence of perceptual motion transparency in the twodisplays. Also, the way opponent motion energies change acrossthe three scales of the filters follows what we predicted based onEquation 10: as the filters get wider, the opponent energiesbecome weaker. Likewise, as the filters get narrower, the diff-erence between the two types of patterns becomes smaller, andthe equally spaced line patterns produce some opponent energy.

Figure 3. Simulation for an equally spaced and the corresponding randomlyspaced parallel line patterns. All figures are shown in spatiotemporalspace. a and b are the spatiotemporal representations of the two patterns.The total number of black pixels is exactly the same in these two patterns.c-h show the opponent energies in three different frequency channels forthe two patterns, all shown with the same gray scale. Gray indicates littlemotion energy and white and black code for rightward and leftwardopponent energies, respectively. See text for the details of the parametersused.

This corresponds well with the perception that as the number oflines in an equally spaced line patterns decreases, the patternsbecome a little more transparent.

Dot patterns

We now consider the paired and the unpaired dot patterns, whichare perceptual nontransparent and transparent, respectively.Since these dot patterns have contrast variations along the y-dimension as well as along the x- and t-dimensions, we needthree-dimensional Gabor filters for computing their opponentenergies. As is shown in the Appendix, the results of analysis arerather similar to those for the line patterns described in theprevious section. The introduction of the y-dimension simplyadds an amplitude term and a phase term to the response for eachdot. There is an additional amplitude term due to the limitedlifetime of the dots. The presence and absence of motiontransparency in paired and unpaired dot patterns could beexplained in a similar way as for the line patterns. For the paireddot patterns with two dots in each pair having different signs of contrast (one black, the other white), the results are similar

Page 6: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

7386 Qian et al. • Motion Transparency. III. Modeling

Figure 4. Simulation for a paired and the corresponding unpaired dotpatterns. The two columns show the results for the unpaired and the paireddot patterns, respectively. The three rows represent the spatial distributionof the opponent energies for the two types of patterns at three successivetime frames. All figures are displayed with the same gray scale. Grayindicates little motion energy and white and black code for rightward andleftward opponent energies, respectively. See text for the details of theparameters used.

because the unidirectional energies do not depend on the signs ofcontrast due to the squaring nonlinearity. For paired dot patternswith large vertical offsets, the two dots in each pair will generatequite different motion energies in opposite directions due to they-dependent amplitude term. Their contributions will not cancelwell locally and the patterns will thus appear transparent.

We have carried out computer simulations for the paired and theunpaired dot patterns. An example is shown in Figure 4. Thesimulation was done with a set of filters with central frequencyequal to 2.23 cycles/degree, and Gaussian width 0.1°. Note that inFigure 4 the two axes of each figure represent two spatial dimen-sions, instead of one spatial and one temporal dimension as inFigures 2 and 3. The temporal dimension in Figure 4 is repre-sented by showing three successive time frames in three rows. Itis clear from the figure that the unpaired dot pattern containsmuch stronger leftward and rightward opponent motion energiesthan the paired dot patterns, in agreement with our perception.

It is important to note that the Fourier power spectra of thepaired and the unpaired dot patterns are rather similar. Thisindicates that the perceptual difference of the two types ofpatterns is unlikely to be explained by the motion energymeasurements alone without the introduction of the suppressionstage. We have carried out the Fourier transformation on thepaired and the unpaired dot patterns. For simplicity, weconsidered dots moving in the x-dimension and ignored the y-di-

Figure 5. Fourier transforms of a paired and an unpaired dot patterns. aand b are the spatiotemporal representations of an unpaired dot pattern andits corresponding paired dot pattern. The y-dimension is not shown. c and dare the amplitudes of the Fourier transforms of the two patterns displayedwith the same gray scales. The zero spatiotemporal frequency points arelocated at the centers of the diagrams.

mension of the patterns. In Figure 5, a and b, show thespatiotemporal representations of a paired dot pattern and itscorresponding unpaired dot pattern. The amplitudes of the Fouriertransforms of the two patterns are shown in Figure 5, c and d. Thepoints with zero spatial and temporal frequencies are located atthe centers of both diagrams. It is clear from these figures thatboth patterns have their main Fourier power concentrated alongthe two diagonal lines going through the origin. These two linesare generated by the dots moving in the two opposite directionsof motion (Watson and Ahumada, 1985). If one tries to detectmotion by doing something equivalent to fitting lines (throughorigin) to these two spectra (Heeger, 1988; Shizawa and Mase,1990), then the paired and the unpaired dot patterns will both beconsidered transparent by such a procedure. Since perceptuallythe unpaired dot patterns are much more transparent than thepaired ones, we conclude that the suppression stage is essentialfor determining the perceptual transparency of a display.

Disparity specificity

We showed that the paired dot patterns can be made perceptuallytransparent if a certain amount of binocular disparity i sintroduced between the dots in each pair (Qian et al., 1994). Tomodel this interaction between motion and stereo vision, weneed to incorporate disparity sensitivity into the motion energymodel. We have recently developed a model of stereo visionbased on known properties of binocular cells in the visual cortexand have shown that this model can be naturally combined withmotion energy models (Qian, 1994). Here we show throughcomputer simulations how the combined model can be used toexplain the effect of disparity on perceptual motiontransparency. Our model assumes that the left and right receptivefields of a binocular cell are given by

Page 7: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

The Journal of Neuroscience, December 1994, 14(12) 7387

Figure 6. Simulation for a paired dot pattern with one set of dots moving inone direction with positive disparity, and the other set of dots in the oppositedirection with negative disparity. The three rows show the spatialdistribution of the opponent energies for the pattern at three successivetime frames. The three columns correspond to the opponent energies inthree disparity channels. All figures are displayed with the same grayscale. Gray indicates little motion energy and white and black code forrightward and leftward opponent energies, respectively. See text for thedetails of the parameters used.

f x y tx y t

lx y t

( , , ) exp= − − −

2

2

2

2

2

22 2 2σ σ σ⋅ + + +cos( )ω ω ω φx y t lx y t , (13)

f x y tx y t

rx y t

( , , ) exp= − − −

2

2

2

2

2

22 2 2σ σ σ⋅ + + +cos( )ω ω ω φx y t rx y t , (14)

where σ and ω are the widths of the Gaussians and angularfrequencies along the spatial and temporal dimensions, and φl andφr are the phase parameters. For a stimulus with a constantdisparity D and moving at speed vx and vy along the horizontaland vertical directions, it can be shown that the energy computedwith a quadrature pair of such filters is approximately

E v vD

t x x y yl r x≈ + +

−+

4

2 22 2 2ρ δ ω ω ω

φ φ ω( ) cos , (15)

where δ() is the delta function, and ρ is the amplitude of theFourier transformation of the stimulus (Qian, 1994). Equation l5indicates that the cell is indeed sensitive to both motion andstereo disparity, similar to some real cortical cells. While ωx, ωy,and ωt determine the motion selectivity, (φl – φr) determines thedisparity sensitivity. The width of disparity tuning (defined atthe half peak amplitude) is equal to

∆Dx

= πω

. (16)

Figure 7. Opponent energies for the same paired and the unpaired dotpatterns in Figure 4, and the corresponding flicker pattern. Only one timeframe for each spatial energy distribution is shown. All figures aredisplayed with the same gray scale. Gray indicates little motion energy andwhite and black code for rightward and leftward opponent energies,respectively. See text for the details of the parameters used.

We can thus explain the effect of both frequency and disparitycues in our psychophysical experiments by restrictingopponency to unidirectional energies computed with filters withidentical ωx, ωy, and ωt ,and (φl – φr), but tuned to oppositedirections of motion.

We carried out a simulation on a paired dot pattern with one setof dots moving in one direction with disparity 0.11°, and theother set of dots in the opposite direction with disparity -0.11°.The results are shown in Figure 6. Again, the three rows hererepresent three successive time frames. The three columnsrepresent the opponent energies computed with three sets offilters with their (φl – φr) equal to -π/2, 0, and π//2, respectively.The central spatial frequency in horizontal dimension, (ωx/2π),was equal to 2.23 cycles/degree (or the angular frequency ωx equalto 14.0 radians/degree). With this choice of ωx, the three sets offilters have their peak disparity tuning around 0.11°, 0°, and -0.11°, respectively, and the widths of tuning are 0.22°. We seefrom Figure 6 that the leftward and rightward opponent motionenergies are now segregated into different disparity channels.Unlike the paired dot pattern without disparity shown in Figure4, b, d, and f; these energies do not cancel each other out. Thisaccounts for the perceptual transparency of the pattern.

Flicker responses

Since nontransparent patterns such as counterphase gratings andpaired dot patterns look rather like flicker, we also generated aflicker pattern for comparison. The pattern was derived from theunpaired dot pattern in Figure 4 by setting the dot speed to zero.Other parameters including dot lifetime are the same. Thecomputed opponent motion energy of the flicker pattern,together with the opponent energies for the paired and theunpaired dot patterns from Figure 4, are shown in Figure 7a-c .The opponent motion energy of the paired dot pattern is onlyslightly stronger than that of the flicker pattern, and both aresignificantly weaker than the opponent energy of the unpaireddot pattern. These simulations conform to our finding that MTcells, responses to the paired dot patterns and to the flicker noiseare not significantly different from each other, and that bothresponses are significantly weaker than the response to theunpaired dot patterns (Qian et al., 1994).

Divisive inhibition

We have shown above that simple subtractive inhibitionfollowing motion energy computation can account for our psy-

Page 8: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

7388 Qian et al. • Motion Transparency. III. Modeling

chophysical and physiological results on transparent andnontransparent patterns. Single-unit recordings from behavingmonkeys, on the other hand, indicate that the directionalinhibition in area MT has a divisive component (Snowden et al.,1991). Heeger (1992) also demonstrated that divisivenormalization can account for a large amount of psychophysicaland physiological results, some of which are inconsistent withpure subtractive inhibition. For example, the opponent energycomputed with subtractive inhibition is always proportional tothe square of stimulus contrast while real visual cells exhibitcontrast saturation. It is therefore important to demonstrate thatour psychophysical and physiological experiments can also beexplained by divisive suppression.

We have repeated all our simulations described above usingdivisive suppression and similar results (not shown) have beenobtained. We considered two different normalization schemes.The first one normalizes the opponent energy by the staticenergy (Adelson and Bergen, 1986):

NE x tE x t E x t

E x t( , )

( , ) ( , )

( , )= −

+

+ −

0 ε, (17)

where E+(x,t) and E-(x,t) are the left and the right motionenergies, respectively, E0(x,t) is the static energy, and ε is asmall number for stabilizing the division. This approach i sclosely related to the least-squares method (Lucas and Kanade,1981) and gradient method (Horn and Schunck, 1981) forvelocity estimation (Adelson and Bergen, 1986; Simoncelli,1993). It is not surprising that Equation 17 can also explain theperceptual difference between the transparent and thenontransparent patterns since it is proportional to the opponentenergy.

The second approach normalizes the output of eachunidirectional energy detector by the sum of the outputs of alldetectors tuned to the same spatial frequency range. For displayscontaining only opposite directions of motion, the normal- ized left and right motion energies within a given frequency bandare

NE x tE x t

E x t E x t c±

±

+ −=+ +

( , )( , )

( , ) ( , ), (18)

where c represents contributions from filters tuned to directionsother than left and right. This approach is modified from Heeger(1992), who used the total output of detectors tuned to allfrequency ranges as the normalization factor. We restrictednormalization to be within each frequency channel because ourpsychophysical experiments suggest that the suppression i sspatial frequency specific (Qian et al., 1994). For theconvenience of the description we let c be equal to zero. For thenontransparent patterns with locally well-balanced motionsignals, E+(x,t) is approximately equal to E-(x,t) in Equation 18so that the normalized motion energies are very close to 0.5. Forthe unbalanced patterns that are perceptually transparent, on theother hand, there are many locations at which E+(x,t) is much lar-ger than E-(x,t) or vice versa. The normalized left or right motionenergies at these locations will be close to 1.0. This twofolddifference in normalized energies between the balanced and theunbalanced patterns could account for the difference in theirperceptual transparency. Notice that this difference cannot bereduced or reversed by changing the contrast of the patternsbecause the normalized energy is independent of contrast. Alsonote that the normalized energies of nontransparent patterns aresimilar to those of flicker patterns. The latter are also close to

0.5 since flicker patterns contain equal amounts of left and rightmotion energy.

DiscussionWe have shown in this article that a motion energy computationfollowed by disparity- and spatial frequency-specific suppressionamong different directions of motion can indeed explain theperceptual difference of the transparent and nontransparentdisplays used in our psychophysical experiments. Specifically,we found that the nontransparent displays generate relativelyweak opponent or normalized energies at the suppression stage.In fact, these energies are not higher than those generated byflicker patterns. On the other hand, the perceptually transparentdisplays generate much stronger opponent or normalized motionenergies along more than one direction of motion. Theseenergies in different directions are located either in different butmixed small areas (as in the randomly spaced line pattern and theunpaired dot pattern), or in different disparity or spatial frequencychannels over the same spatial regions (as in the display made oftwo different sine wave gratings, and in the paired dot patternwith binocular disparity). We hypothesize that a later stage couldintegrate these energies in different directions separately to formtwo over-lapping transparent surfaces. Note that a patternmoving in a single direction will not appear transparent becauseit will only generate strong opponent or normalized motionenergies in one direction. A display containing two widelyseparated objects moving in opposite directions will not appeartransparent either. Although such a display will generate strongopponent or normalized motion energies in two differentdirections, these energies are not spatially mixed and thereforecannot be integrated into two overlapping surfaces.

Previous physiological experiments indicate that MT cellsshow strong suppression among different directions of motion(Snowden et al., 1991). MT could therefore be the physiologicalequivalent of the suppression stage in our simulation, wheretransparent and nontransparent displays can be distinguished. Infact, our physiological recordings in a preceding companionarticle demonstrate that average MT activity to the transparentunpaired dot patterns is significantly higher than that to thenontransparent paired dot patterns (Qian and Andersen, 1994). Inaddition, our psychophysical experiments with the paired and theunpaired dot patterns indicate that under the foveation condition,an alignment of opposing motion signals on the scale of 0.4°can generate large difference in perceptual transparency. Wetherefore propose that the suppression stage for differentiatingtransparent and nontransparent occurs at the subunit level of MTreceptive fields. V1 cells, on the other hand, show relativelyweak suppression among different directions of motion(Snowden et al., 1991; Qian and Andersen, 1994). Many of thembehave rather like unidirectional motion energy detectors. Theyrespond quite well to both transparent and nontransparentpatterns, and the average V1 activity could not reliably tell thetwo types of patterns apart (Qian and Andersen, 1994). V1 andMT therefore approximately correspond to the energycomputation and suppression stages in our simulations.

We used Gabor filters along both spatial and temporaldimensions for motion detection in our simulations. While thespatial receptive field structures of simple cells are known to bedescribed by Gabor functions well (Jones and Palmer, 1987), it i snot physiologically plausible to use Gabor functions for thetemporal responses. Temporal Gabor filters are nonzero on thenegative time axis. They are thus noncausal. This is, however,

Page 9: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

The Journal of Neuroscience, December 1994, 14(12) 7389

not a major problem because these filters decay to zeroexponentially due to the Gaussian envelopes. We couldpractically make the filters causal by shifting them toward thepositive time direction by 3σt. Such a shift will only generate aphase term and will not affect energy measures. A more seriousproblem with using temporal Gabor filters is that the temporalresponse of real simple cells is skewed, with its envelope havinga longer decay time than rise time. Also, zero-crossing intervalsin the temporal dimension are not equally spaced (DeAngelis, etal., 1993). In order to make our model more biologicallyrelevant, we have repeated our simulations using more realistictemporal response functions given in Equation 1 of Adelson andBergen (1985), and obtained the similar results. We believe thatour results do not depend on the details of the receptive fieldshapes of the energy detectors as long as they approximatelymeasure the local Fourier power within a certain spatiotemporalfrequency window specified by their parameters.

The suppression among different directions of motion used inour simulation makes it impossible for a stimulus to generatestrong responses along more than one direction of motion ineach small spatial area at the opponent stage when there are noother cues in the stimulus, such as disparity or spatial frequency(the weak opponent energies for the paired dot pattern in Fig. 4h,d-f give an example). The size of the small area is determined bythe size of the front-end filters. In this regard, the suppressionstage is rather like the pooling or regularization step commonlyused in machine vision systems (Horn and Schunck, 1981;Hildreth, 1984; Heeger, 1987; Wang et al., 1989; Grzywacz andYuille, 1990). Such a step is required to solve the apertureproblem and to average out noise while at the same time i tprevents those models from having more than one velocityestimation over each area covered by the pooling operator. Inthis connection, it is interesting to note that some versions ofthe pooling procedures for combining local gradient constraints(Horn and Schunck, 1981; Lucas and Kanade, 1981) areequivalent to a mixture of subtractive and divisive types ofsuppres-sion (Simoncelli, 1993). The agreement between oursimulations and the psychophysical observations implies thatmachine vision systems can be made more consistent withtransparent motion perception if the pooling operation in thesesystems is restricted to small areas and to each frequency anddisparity channel. We suggest that the difficulty most machinevision systems have with motion transparency can be partlyattributed to the fact that these systems typically apply poolingoperations over a relatively large region and that they usually donot explore other cues such as disparity and spatial frequency torestrict the scope of pooling.

The displays we used in our psychophysical and physiologicalexperiments and computer simulations are highly artificial andare unlikely to be found in the natural environment. What, then,is the advantage of having a suppression stage, such as MT, inthe motion pathway if it is not just for making counterphasegratings or paired dot patterns appear nontransparent? In fact, ifa subpopulation of Vl cells act like unidirectional motion energydetectors, why doesn't the brain use a family of these cells tunedto different directions of motion to represent the perception ofmultiple motions? Why instead should perception be derivedfrom the suppression stage in MT, which reduces the system'sacuity to transparent motion (Snowden, 1989) and at the sametime makes the well-balanced patterns appear nontransparent?The identification of the suppression stage with the poolingoperation in machine vision systems discussed above provides

an answer. Since the function of the pooling operation is tosolve the aperture problem and to average out noise, we suggestthat directional suppression has similar functions. Indeed,unidirectional energy detectors like V1 cells suffer the apertureproblem; that is, they seem to respond only to the component ofthe motion that is perpendicular to the local spatial orientationsof the stimulus contrast. Also, V1 cells are very responsive todynamic noise patterns made of flickering dots (Qian andAndersen, 1994). The suppression stage in the motion pathwaycould help to solve these problems, just as the poolingoperations do in machine vision systems. In fact, we found thatthe noise response of MT cells is much lower than that of V1cells (Qian and Andersen, 1994). There is also evidence that thehuman visual system may solve the aperture problem byaveraging local motion measurements (Ferrera and Wilson,1990, 1991; Yo and Wilson, 1992; Rubin and Hochstein, 1993).Suppression among different directions of motion could beviewed as a kind of averaging operation and thus could be used tosolve the aperture problem. A negative effect of the suppressionis the reduced acuity for transparent motion. This problem i sminimized, however, by applying suppression locally and byrestricting it within each disparity and spatial frequency channel,since multiple motions in the real world are usually not preciselybalanced in each local area and different objects tend to havedifferent disparity and spatial frequency distributions. While theinhibition among the cells within each disparity and spatialfrequency channel could help to combine V1 outputs into a singlemotion signal at each location in order to solve the apertureproblem and to reduce noise, different disparity and spatialfrequency channels could represent multiple motions at the samespatial location.

It also seems reasonable to assume that at each spatial locationof the visual field, cells at the suppression stage that are tuned tothe same direction of motion, but different spatiotemporalfrequency bands, should facilitate each other. These cells all carryconsistent motion signals from different frequency ranges at thesame spatial location and these signals are likely generated bythe same moving object.

The model we have proposed for motion transparency i sincomplete in several ways. One problem is that while weperceive each transparent surface as a coherent whole the outputof the model at the suppression stage contains many isolatedpatches (see Figs. 4, 6). This problem can be solved with theintroduction of spatial integration. We can assume that cellstuned to similar directions of motion at nearby spatial locationshave excitatory connections between them. When enough of thespatially mixed cells tuned to a given direction are active, theactivity will spread across the whole layer of cells tuned to thesame direction of motion. Transparency will then correspond to amultipeaked distribution of the direction of motion at all spatiallocations. Spatial summation could also occur within individualMT cells through facilitation between subunits in a cell'sreceptive field. Another issue we have not addressed is whetherexplicit speed estimation has an important role in motion trans-parency. In fact, the outputs of our model are motion energies indifferent frequency and disparity channels instead of explicitvelocity estimations. In grouping together elements to formsurfaces, does the visual system consider the magnitudes of thevelocity vectors of these elements, or just their directions? Toexplore this question we used transparent random dot patterns andassigned the speeds of the dots composing each surface accordingto a uniform probability distribution. We found that when the

Page 10: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

7390 Qian et al. • Motion Transparency. III. Modeling

width of the distribution was as large as half of the average speedof the dots, motion transparency could still be observed withoutmuch difficulty. This suggests that transparent motionperception does not require precise knowledge of the relativespeed of the individual dots. This observation is consistent witha recent result by Watamaniuk and Duchon (1992), who foundthat human visual system tends to average speed information.

We can now sketch a more complete model for biologicalmotion detection based on the above discussions. At each spatiallocation there is a population of unidirectional motion energydetection cells tuned to different disparity and different three-dimensional spatiotemporal frequency bands. Each cell receivesinput from the moving pattern in its receptive field according tothe amount of Fourier power of the pattern falling on its disparityand frequency window. At this stage families of V1 cells tuned todifferent directions of motion may be activated, but this could becaused by the aperture problem of a single moving object, bynoise in the environment, or by true transparent motion ofdifferent objects. Further processing has to be carried out byconnections among the cells. At each spatial location the cellswith similar disparity and frequency tuning, but differentdirection preferences, are mutually inhibitory while those withdifferent frequency tuning but similar directional preference aremutually excitatory. This is a generalization of the suppressionbetween opposite directions of motion described in this articleand it combines raw measurements within each disparity andfrequency channel at each spatial location into a single motionrepresentation (or more accurately a unimodal distribution). Theprocess solves the noise and the aperture problem and stillallows multiple motions to be represented in different small areasor among different disparity and frequency channels. Finally, thecells at nearby spatial locations with similar directionalpreferences have excitatory connections between them tofacilitate spatial integration. This process combines consistentlocal measurements into coherent surfaces. We are currentlyimplementing this more complete model of biological motionprocessing.

AppendixWe derive the opponent energy expressions for the counterphasegratings, line patterns, and dot patterns in this section. For theconvenience of subsequent calculations and presentation, weintroduce the complex Gabor filters,

g x tx t

i x tcx t x t

x t± = − − + ±( )

( , ) exp

1

2 2 2

2

2

2

2πσ σ σ σω ω , (19)

whose real and imaginary parts are the even and odd Gaborfunctions in Equations 1 and 2. Since the convolution of any realfunction f(x,t) with the even and odd Gabor filters is equal to thereal and imaginary parts of the convolution of that function withthe complex filters, we will only give responses of the complexfilters for brevity. Using the complex Gabor function, theunidirectional and opponent motion energies can be written as

E x t f gc+ += ∗( , )

2, (20)

E x t f gc− −= ∗( , )

2, (21)

and,

E x t f g f gc c( , ) = ∗ − ∗+ −2 2, (22)

respectively.

Counterphase gratings

A single sine wave grating with spatial frequency Ωx and

temporal frequencies Ωt and drifting to the right is represented by

f x t x ts x t+ = −( , ) sin( )Ω Ω , (23)

where both Ωx and Ωt are assumed to be positive. It can be shownthat the responses of the complex Gabor filters in Equation 19 tothe sine wave grating are given by

f gi

u i x ts c x t x t+ ±∗ = − ±( ) −( )[ ]1

2ω ω, exp Ω Ω

− + ±( ) − +( )[ ]u i x tx t x tω ω, exp Ω Ω , (24)

where u(ωx,ωx) are defined as

u x tx

x xt

t t( , ) expω ωσ

ωσ

ω= − +( ) − +( )

22

22

2 2Ω Ω . (25)

Note that fs+ S gc

+ can be obtained from fs+ S gc

- by replacing ωt

with -ωt. Since Ωx, Ωt, ωx ,and ωt are all assumed to be positive, u(-ωx,-ωt) is usually much larger than u(ωx,-ωt), u(-ωx,ωt), and u(ωx,ωt). It achieves its maximum value when the central spatial andtemporal frequencies of the filters match those of the counter-phase grating. It is clear from Equation 24 that the rightwardmotion detector (gc

+) responds to the right-going sine wave muchbetter than the leftward motion detector (gc

–) does.The sine wave grating drifting to the left with the same spatial

and temporal frequencies as in Equation 23 can be obtained byreplacing Ωt with -Ωt in Equation 23. The responses of thecomplex Gabor filters to this grating can be obtained by thesame replacement in Equation 24. Using these expressions, theopponent energy for the counterphase grating (composed of theabove two sine wave gratings moving in the opposite directions)can then be shown to equal to 0 independent of the parameters forthe filters and the grating.

Line patterns

A single line with initial position x0 and moving to the rightwith speed v can be represented by

f x t c x x vtl+ = − −( , ) ( )δ 0 , (26)

where δ is the Dirac function and c is a constant with thedimension of the inverse of length. The responses of thecomplex Gabor filters to the line can be shown to be

f gc

vl c

t x

+ ±∗ =+( )2 2 2 2π σ σ

⋅ −− −( ) + ±( )

+( )

expx x vt v

vx t t x

t x

0

2 2 2 2

2 2 22

σ σ ω ωσ σ

⋅ − −±

+

exp ( )i x x vt

v

vx x t t

t x0

2 2

2 2 2

ω σ ω σσ σ

. (27)

To simplify subsequent calculations, we assume that

ωt = ωxv. (28)

That is, the spatial and temporal frequencies of the filters are suchthat they are tuned to the speed of the line. Also, fl

+ S gc+ i s

usually much smaller in magnitude then fl+ S gc

– because thepreferred direction of the filter is opposite to that of the linemotion. Then, Equation 27 can be simplified to

Page 11: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

The Journal of Neuroscience, December 1994, 14(12) 7391

f gc

vl c

t x

+ +∗ =+2 2 2 2π σ σ(

⋅ −− −( )

+( )

− −[ ]exp exp ( )x x vt

vi x x vt

t x

x0

2

2 2 2 02 σ σ

ω , (29)

f gl c+ −∗ ≈ 0 . (30)

We can now consider two sets of N lines represented byEquation 9. Using Equations 29 and 30, we obtain the opponentenergy for the line pattern as shown in Equation 10.

Dot patterns

A dot with initial position (x0,y0) and moving in the +x directionwith speed v from time t1 to t2 (t2 > t1) can be represented by

f x t c x x vt y y H t t H t td+ = − − − − −( , ) ( ) ( ) ( ) ( )δ δ0 0 2 2 , (31)

where H is the step function and c is a constant with thedimension of the inverse of length squared. With the three-dimensional extension of the Gabor filters in Equation 19, it canbe shown that

f gx x vt v

v

y yd c

x t t x

t x y

+ +∗ = −− − + ±

+−

exp( ) ( )

( )

( )02 2 2 2

2 2 20

2

22 2

σ σ ω ωσ σ σ

⋅ − − +±

++ −

exp ( ) ( )i x x vt

v

vi y yx x t t

t xy0

2 2

2 2 2 0

ω σ ω σσ σ

ω

⋅ −− +

− +

∫c

ax dxx y t

t t b a

t t b a

( )exp( )

/ /

/

2 3 22

2

2

1

2

π σ σ σ, (32)

where

av

x t

= +

1

2

12

2 2σ σ, (33)

bx x vt v

i vx

t x=− −

+ −( )

( )02σ

ω ω . (34)

Again, if we assume Equation 28 and neglect fd+ S gc-, Equation 32

can be simplified to

f gc

v

x x vt

v

y yd c

t x t x y

+ +∗ =+

−− −( )

+( ) −−( )

2 2 22 2 2

0

2

2 2 2

0

2

2π σ σ σ σ σexp

⋅ − − + −[ ]exp ( ) ( )i x x t i y yx yω ω0 0

⋅ − +( )[ ]er f a t t b a2 2' /

− − +( )[ ]er f a t t b a1 2' / , (35)

f gd c+ −∗ ≈ 0 , (36)

where

bx x vt v

x

'( )

=− −0

2σ. (37)

Comparing Equation 35 with Equation 29, we see that the twoexpressions are rather similar. The introduction of the y-dimension simply adds an amplitude term and a phase term to theresponse. The term containing two error functions is due to thelimited lifetime of the dot. We could derive the opponent energyexpression for the paired and unpaired dot patterns usingEquation 35 but we omit it here because it would be very similarto Equation 10 for the line patterns.

References

Adelson EH, Bergen JR (1985) Spatiotemporal energy models for theperception of motion. J Opt Soc Am [A] 2:284-299.

Adelson EH, Bergen JR (1986) The extraction of spatiotemporal energy inhuman and machine vision. In: Proceedings ofthe IEEE Workshop onVisual Motion, pp 151-156.

DeAngelis GC, Ohzawa I, Freeman RD (1993) Spatiotemporalorganization of simple-cell receptive fields in the cat’s striate cortex. I.General characteristics and postnatal development. J Neurophysiol69:1091-1117.

Emerson RC, Bergen JR, Adelson EH (1992) Directionally selectivecomplex cells and the computation of motion energy in cat visual cortex.Vision Res 32:203-218.

Fahle M, Poggio T (1981) Visual hyperacuity: spatiotemporal interpolationin human vision. Proc R Soc Lond [Biol] 213:451-477.

Ferrera VP, Wilson HR (1990) Perceived direction of moving two-dimensional patterns Vision Res 30.273-287.

Ferrera VP, Wilson HR (1991) Perceived speed of moving two-dimensional patterns. Vision Res 31:877-894.

Grzywacz NM, Yuille AL (1990) A model for the estimate of local imagevelocity by cells in the visual cortex. Proc R Soc Lond [A] 239: 129-161.

Heeger DJ (1987) Model for the extraction of image flow. J Opt Soc Am[A] 4:1455-1471.

Heeger DJ (1988) Optical flow using spatiotemporal filters. Int J Comp Vis279-302.

Heeger DJ (1992) Normalization of cell responses in cat striate cortex. VisNeurosci in press.

Hildreth EC (1984) Computations underlying the measurement of visualmotion. Artif Intell 23:309-355.

Horn BKP, Schunck BG (1981) Determining optical flow. Artif Intell17:185-203.

Jagadeesh B, Wheat HS, Ferster D (1993) Linearity of summation ofsynaptic potentials underlying direction selectivity in simple cells of thecat visual cortex. Science 262:1901-1904.

Jones JP, Palmer LA (1987) The two-dimensional spatial structure ofsimple receptive fields in the cat striate cortex. J Neurophysiol 58: 1187-1211.

Lucas BD, Kanade T (1981) An iterative image registration technique withan application to stereo vision. In: Proceedings of the 7th InternationalJoint Conference on Artificial Intelligence, pp 674-679.

McLean J, Palmer LA (1989) Contribution of linear spatiotemporalreceptive field structure to velocity selectivity of simple cells in area 17of cat. Vision Res 29:675-679.

Nakayama K (1985) Biological image motion processing: a review. VisionRes 25:625-660.

Qian N (1994) Computing stereo disparity and motion with knownbinocular cell properties. Neural Comput, in press.

Qian N, Andersen RA (1994) Transparent motion perception as detectionof unbalanced motion signals. II. Physiology. J Neurosci 14: 7367-7380.

Qian N, Andersen RA, Adelson EH (1991) Vl responses to two-surfacetransparent and non-transparent motion. Soc Neurosci Abstr 17:177.

Qian N, Andersen RA, Adelson EH (1992) MT responses to transparentand non-transparent motion. Soc Neurosci Abstr 18:1101.

Qian N, Andersen RA, Adelson EH (1994) Transparent motion perceptionas detection of unbalanced motion signals. I. Psychophysics. J Neurosci14:7357-7366.

Reichardt W (1961) Autocorrelation, a principle for the evaluation ofsensory information by the central nervous system. In Sensorycommunication (Rosenblith WA, ed). New York: Wiley.

Reid RC, Soodak RE, Shapley RM (1987) Linear mechanisms of directionalselectivity in simple cells of cat striate cortex. Proc Natl Acad Sci USA84:8740-8744.

Rubin N, Hochstein S (1993) Isolating the effect of one-dimensional motionsignals on the perceived direction of moving two-dimensional objects.Vision Res 33:1385-1396.

Shizawa M, Mase K (1990) Simultaneous multiple optical flow estimation.Paper presented at the IEEE Conference on Computer Vision andPattern Recognition, Atlantic City.

Simoncelli EP (1993) Distributed analysis and representation of visualmotion. PhD thesis Department of Electrical Engineering and ComputerScience Massachusetts Institute of Technology.

Page 12: Transparent Motion Perception as Detection of Unbalanced ...nq6/publications/motion-mod.pdf · transparent motion perception occurs (see preceding companion article, Qian et al.,

7392 Qian et al. • Motion Transparency. III. Modeling

Snowden RJ (1989) Motions in orthogonal directions are mutuallysuppressive. J Opt Soc Am [A] 7:1096-1101.

Snowden RJ, Treue S, Erickson RE, Andersen RA (I 991) The response ofarea MT and Vl neurons to transparent motion. J Neurosci 11: 2768-2785.

van Santen JPH, Sperling G (1985) Elaborated Reichardt detectors. J OptSoc Am [A] 2:300-321.

Wang HT, Mathur M, Koch C (1989) Computing optical flow in theprimate visual system. Neural Comput 1:92-103.

Watamaniuk SNJ, Duchon A (1992) The human visual system averagesspeed information. Vision Res 32:931-941.

Watson AB, Ahumada AJ (1985) Model of human visual-motion sensing. JOpt Soc Am [A] 2:322-342.

Yo C, Wilson HR (l992) Perceived direction of moving two-dimensionalpatterns depends on duration, contrast and eccentricity. Vision Res32:135-147.