Preattentive grouping and Attentive selection for early visual computation

7/29/2019 Preattentive grouping and Attentive selection for early visual computation

1/6


2/6

Figure 1. a) Open disk scene; b) Summed ON and OFFchannel responses; c) Phase image after 100 iterations.

2. The Edge Detection Stage

Since the early days in Computer Vision many operators

for edge detection have been proposed ranging from gradi-

ent and template matching operators to parametric edge

models. Because most of these operators are designed to

detect special kinds of edges, it soon became obvious that

a general purpose edge detector would be a compromise

between certain performance criteria. In [4] a linear oper-

ator was derived for the detection of step edges, which min-

imises the joint criteria of good localisation and reliability.

Being an odd-symmetric operator it suffers from false local-

isation of line and roof edges. The detection of zero cross-

ings (ZC) in the second derivative of the intensity function

using an even symmetric gradient filter [19] faces similar

problems at line edges. A nonlinear combination using the

summed squares of both even and odd symmetric filters has

proven to be a good detector of edges composed of steps,

peaks, and roofs [22, 24]. Using both the local energy and

phase, it is possible to reconstruct the generating edge show-

ing its applicability for image coding. However, to localise

the edge exactly a search for the maximum response is ne-cessary, contrary to the detectionof zero crossings which are

by definition dimensionless. We will show that the presen-

ted relaxation phase labeling (RPL) procedure is able to de-

tect ZCs in phase space, hereby sharpening the edge re-

sponse, at the same time performing the perceptual group-

ing of arranged dots into contours and closed objects. In the

presented system we use six pairs of oriented Gabor filters

[6] being in quadrature phase to extract the local energy fol-

lowed by a differentiation step using odd symmetric Gabor

filters to rectify the oriented responses:

(2)

(3)

Where represents the even symmetric function and

its odd symmetric Hilbert-transform. The con-

stants and specify the envelope of the oriented Gaus-

sian, sets the appropriate frequency of the modulating si-

nusoidal, and is a normalisationfactor. Figure 2 shows the

edge detection stage and the model hyper-columns which

X2

x

y

X2

+

+

+

+

+

+

+

+

+

+

+

+ +

+

+

+

+

+

+

+

+

+

+

+

Figure 2. a) Hierarchical extraction of direction specificcontour lines; b) Hyper-columnar structure with parametric

phase for the relaxation phase labeling process.

are the initial data for the relaxation labeling process ex-

posed in the next section.

3. The Relaxation Phase Labeling Process

Twenty years ago a mechanism for scene labeling was

proposed [26], which reduces the ambiguity among objects

in a scene in terms of an iterated relaxation procedure, per-

formed in parallel on the data array. Since then numerous

approaches to parallel relaxation operations have been de-

scribed. We have adopted the general strategy of relaxing

labels corresponding to observed properties in the scene, us-

ing parametric phase labels, which group into coherent ob-

jects in phase space, giving the relaxation procedure a new

degree of freedom to accomplish a consistent labeling. As

revealed by the Gestalt school in the first half of the century,visual perception is governed by certain simple rules which

group parts into wholes employing laws like grouping by

proximity, similarity, closure, symmetry and good continu-

ation [18]. Although these principles are easy to investigate

in psychophysical experiments, their underlying neuronal

computations are mainly unknown. It has been speculated,

that synchronisations of visual cortical neurons, revealed by

recent electrophysiological studies [8, 12], may serve as the

carrier for the observed perceptual grouping phenomenom.

The differences in oscillator phase between spatially neigh-

bouring spiking cells could be used in principle to label dif-

ferent objects in the scene for their intrinsical segmentation.

The proposed grouping criteria of spatial contiguity and co-

herence of particular feature domains indeed show similarit-

ies to the proposed Gestalt-laws. However, the law of good

continuation,which plays a central role in many edge group-

ing and linking schemes in computer vision, is able to over-

ride both proximity and similarity. This pronounces the role

of oriented edges both in the implementation of perceptual

grouping and synchronisation mechanisms. The emergent

forming of a perceptual group, including both edge and re-

gionbased information is depicted in figure 1: the dotsform-

2


3/6

2

2

2

f(x)

K(x)

K (x)

x

x

x

x

x

x00

0

0 0

0

OFF

ON

Intensity Phase

e)

c) f)

b)

a) d)

Figure 3. Scheme for relaxation and diffusion of phase la-bels. The intensity distribution (a) is filtered to extract in-

tensity gradients (b) corresponding to perceived edgesin the

image. The smoothed derivative of the edge map is rectifiedinto ON and OFF channels (c), allowing simple compatibil-

ity constraints between channels to modify an initially uni-

form phase map (d); (e) intermediate and (f) final phase dis-

tribution of the phase image evolving in parallel over time.

ing an incomplete circle are grouped into a synchronised

round disk with a discontinuity at the upper right indicat-

ing the missing dot in phase space. In figure 4 the results

of the proposed segmentation scheme for a scene with three

simple objectsis shown. Althoughthe objectsare defined by

different boundary types ranging from intensity discontinu-

ities over lines to dots, the phase gradient shows a common

interpretation of all contour types. In figure 3 the generalidea of the proposed phase relaxation and diffusion mech-

anism is depicted. The principal processing is as follows

[10]: we defined smoothly varying constraints on the in-

teraction strength between all direction selective responses

of the second preprocessing stage. These constraints sup-

port orientation continuity by positive interactions between

similar directions, and decouple both sides of the contour

by negative interactions between opposite directions. The

spreading of labels into regions is introduced by synchron-

ising phase oscillators at the contours with oscillators in the

interiorof objects. This filling in is similar to brightness dif-

fusion [5, 23] allowing the separation of figure and ground

[16], but instead uses the coherency of cyclic phases to la-

bel the whole scene. The proposed labeling process can be

formulated in terms of minimising an explicit functional de-

pending on the basic compatibility relations, using results

developed in [14]. The phases of each hypercolumnar

vector at position are updated according to a Gau-

Seidel procedure, using a sigmoid nonlinearity for sum-

ming up the individual activations, and a shifted cosines for

calculating the contributions of neighbouring elements de-

pending on their phase difference:

Figure 4. a) Scene with edge and line defined objects; b)Phase image after 28 iterations; c) Phase gradient of b.

(4)

(5)

Notation

Phase at position (i,j)

Random variable

Activity in m-th feature map

Contribution of n-th feature map

Compatibility constraints

Connectivity matrix

Sigmoid nonlinearity ( )

Periodic function of phase difference

Phase difference

Constants

Set of discrete directions

The compatibility function , depicted in Fig. 5a) is

modelled as a shifted Gaussian. A sparse horizontal con-

nectivity scheme was chosen to improve the synchron-

isation behaviour. In Figure 5b) the qualitative conver-

gence properties of the system are depicted, showing aver-

age phase change and normalised average energy over itera-

tion steps. The periodic function can beset to to

resemble the Kuramoto oscillator, we instead used formula-

tion5 to speed up convergence. The zero mean random vari-

able introduces noise into the decision process, thereby

resolving ambiguous situations, and forcing the process to

move from the initial equilibrium state with all phases be-

ingequal, to a global solutionin phase space. As can be seen

from the process equation 4, the change in phase at each loc-

ation is governed by a correlated activity in at least one fea-

ture map at neighbouring positions. To allow the spreading

of phase labels into regions formed by the oriented contours

a uniformactivity is added to an additional fea-

ture map , to resemble spontaneous neuronal activity.

Figure 9b)-d) shows the extracted direction selective edges

3


4/6

Y x 103

10.00

12.00

14.00

16.00

18.00

20.00

22.00

24.00

26.00

28.00

0.00 50 100 150 200

Average Phase Change / Compatibility Energy

Steps

Figure 5. a) Competitive/cooperative interaction con-straints between direction selective responses; b) Qualitat-

ive convergence behaviour of relaxation process, continu-

ous: average phase change - dashed: average energy.

of the test image Paolina, using only odd-symmetric Gabor

filters to half-wave rectify the oriented responses into ON

and OFF channels. The result of the constraint satisfaction

relaxation procedure is shown in 9e), from which the phase

gradient 9f) has been computed. To compare the perform-

ance of the segmentation, the binarised gradient of the phase

image and the edges detected by a Canny edge detector are

shown. It can be evaluated, that the contours of the binarised

phase gradient in Figure 9g) resemble the Canny edges, al-

though no postprocessing like edge linking and maximum

detection was necessary. In figure 10 the same maps are

shown for a boat image.

4. Selective Attention

Two types of theories have been suggested to explain

how attention is allocated to perform visual tasks. Accord-ing to region based theories, an attentional spotlight is dir-

ected to spatial positions in the visual field having circular

shape with varying diameter. Object based theories, on the

other hand, propose that attention is directed to perceptual

groups and not just locations. However, the main advant-

age of an attentional mechanism is the information reduc-

tion capability of spatially selecting salient portions of the

visual field, and the possible simplification of the binding

problem by linkingtogether the output of cells coding differ-

ent features of the attended object. Recent research reveals

evidence forobject-based theories of attention[29], with ob-

jects acting as wholes in a slow, competitive process work-

ing in parallel across the visual field [7], although spatial se-

lection and top-down control are part of the attentional sys-

tem. Figure 6 shows a simplifiedsketch of thebrain maps in-

volved in the segmentation of objects from a complex scene

by applying a cortical grouping mechanism and an atten-

tional focus to the early representation of the scene. Both

processes are part of early vision mechanisms [15], which

operate bottom-up, whereby the attentive control serves the

coupling of data driven and cognitive processing streams

both possessing cyclic and feedback loops. The visual in-

Attention Engagement

Pulvinar, Thalamus

Spatial Map, engage Attention

Posterior Parietal Cortex

Spatial Modulation

IOR, FEF

Target selection

Superior Colliculus

Object Recog-

nition, IT

Feature Maps, V1 - V5

Preattentive Segmentation

Synchronization

Image Plane

Retina

Figure 6. Sketch of the maps involved in the process ofsegmenting and extracting objects from a scene

formation of an image is decomposed into sets of features of

multiple feature maps (V1-V5) which interact by excitatory

and inhibitory connections between locations (horizontal)

and features (vertical). The pre-attentively grouped visual

information is further processed by an attention mechanism

(pulvinar) which chooses the most salient perceptual group

and selectively enhances the responsiveness of neurons to

thislocationat the expense of informationfrom other groupsor locations. The target selection map (SC) precomputes the

expected saccade in a retinotopic coordinate frame, which is

transformed into a spatial attentional map in viewer centred

(environmental) coordinates (PP). The spatial modulation

map (FEF) integrates information about attentionally relev-

ant locations from PP with recently visited locations (IOR)

and cognitive information like expected locations and over-

all scanning behaviour (compare with [30]).

5. The object based attention process

The phase image of the preattentive stage was used

for the sequential extraction of objects by a selective at-

tention mechanism [9]. This stage of processing applies

an object-based attention filter to the presegmented early

visual information by selectively enhancing and inhibiting

regions corresponding to preattentively synchronised per-

ceptual groups in the earlier visual maps. The attentional

filter is computed by a global winner-take-all (WTA) mech-

anism in a separate attentional map integrating the informa-

tion from all feature and scale specific earlier visual maps

and the temporal decaying memory map (IOR) represent-

4


5/6

ing recently attended objects. The dynamics of the system

has been adapted from the shunting feedback network pro-

posed by S. Grossberg [13], and has been rewritten for dis-

crete simulation on a computer:

(6)

where corresponds to the map element at position ,

equals the squared sum over all activations, and cor-

responds to the normalised result from convoluting with

kernel at . denotes the excitatory and the

inhibitoryinputfor IOR. and are arbitrarilychosen con-stants for bounding the activation of between and B.

For reasons of simplicity we have chosen and .

In the presented simulations, the constants have been set to

, and

. Critical for the overall performance of the network

is the size and form of the convolution kernel , for which

we have chosen a Gaussian with diameter five, and the para-

meter which influences the size of the variable attentional

spotlight. In the presented simulations the excitatory input

consists of two arrays for the phase and activity at each spa-

tial location. In the last processing stage the selected visual

information from the feature maps is integrated in a target

selection map (SC) which executes a saccade by applying anonlinear model of local lateral interactions for saccade av-

eraging [28], based on ensemble coding and linear vector ad-

dition of movement contributions [20].

In Figure 7 the sequence of attentional foci computed

from an objects image, overlayed on its phase image are

shown. Figure 8 shows phase and activity maps of the excit-

atory input and the sequence of inhibitory maps to prevent

the system to visit recently attended locations. As can be

evaluated, the selected regions are a compromise between

spatial and phasic coherence, allowing perceptual groups

and objects to be extracted from the input.

6. Conclusion

A four stage processing model for object segmentation

and selection has been proposed which combines neuro-

physiological and psychological data to account for its bio-

logical plausibility. We have described a relaxation phase

labeling procedure for the preattentive grouping and percep-

tual segmentation of objects in phase space and an attention

mechanism which sequentially extracts perceptual groups in

a cluttered scene consistent with an object based theory of

Figure 7. Sequence of attentional foci (white) using bothedge enrgy and phase, overlayed on the phase image of 8a).

Figure 8. a) Phase image of objects scene; b) Summedactivity of edge maps; c) Sequence of inhibitory memory.

visual attention. The original contribution of the presented

biological framework for perceptual segmentation and se-

lection of objects in a real world scene is the transformation

of the grouping process into phase space, using a simple re-

laxation labeling procedure. By introducing directional re-

sponses and local constraints thereupon, serving the group-

ing of similar directions and the decoupling of both sidesof a contour line, the proposed mechanism is able to detect

zero-crossings in phase space without an explicit and bio-

logical implausible search. The gradient in phase space is

sharpened compared to the edge response or the intensity

discontinuity, and the whole scene is labelled into objects

and background. Furthermore, the relaxation phase labeling

(RPL) process is able to extract the most salient contour

lines of perceptual groups in phase space, suppressing false

responses generated from the preprocessing stage. There-

fore the RPL-process can be used to link edges into object

boundaries by closing small gaps in the contour lines of the

intensity image, or the groupingof perceptual primitives like

dots, points or dashes intoperceptual wholes using grouping

principles originally proposed by Gestalt-Psychology. For

a more complete segmentation scheme involving both dif-

ferent spatial frequencies and multiple feature domains, the

system could be expanded by a scale space approach [3, 23]

and the integration of parallel texture-, motion-, and colour

specific processing channels [25, 1]. An extension on the

feature level will be the integration of distinctive maps for

two dimensional features like direction of motion, texture,

curvature, endstoppings and junctions.

5


6/6

Figure 9. a) Paolina image (Size 200x200); b) Summedresponses of six ON channels; c) Summed responses of six

OFF channels; d) Phase image after 21 iteration steps; e)

Binarised phase gradient of d; f) Canny detectorwith ,

and threshold (0.3,0.9).

Figure 10. a) Boat scene (Size 200x200); b) Summed re-sponses of six ON channels; c) Summed responses of six

OFF channels; d) Phase image after 51 iteration steps; e)-f)

same as in Fig. 9.

References

[1] J. Aloimonos and D. Shulman. Integration of Visual Modules: An

Extension to the Marr Paradigm. Academic Press, 1989.

[2] A. Blake and A. Zisserman. Invariant surface reconstruction using

weak continuity constraints. In ProceedingsIEEE Conf. on ComputerVision and Pattern Recognition, pages 6267.IEEE, 1986.

[3] P. J. Burt and E. H. Adelson. The laplacian pyramid as a compact

image code. IEEE Trans. on Communications, 31(4):532540,1983.

[4] J. F. Canny.A computationalapproachto edgedetection. IEEE Trans.

on Pattern Analysis and Machine Intelligence, 8(6):679698,1986.

[5] M. A: Cohen and S. Grossberg. Neural dynamics of brightness per-

ception: Features, boundaries, diffusion, and resonance. Perception

and Psychophysics, 36:428456, 1984.

[6] J. G. Daugman. Uncertainty relation for resolution in space, spatial

frequency, and orientation optimized by two-dimensionalv isual cor-

tical filters. J. Opt. Soc. Am. A, 2:1160 1168, July 1985.

[7] R. Desimone and J. Duncan. Neural mechanisms of selective visual

attention. Annual Review of Neuroscience, 18:193222, 1995.

[8] R. Eckhorn, R. Bauer, W. Jordan, M. Brosch, M. Kruse, W. Munk,

and H. J. Reitboeck. Coherent oscillations: A mechanism of feature

linking in the visual cortex? Biol. Cybern., 60:121130,1988.

[9] W. A. Fellenz. A sequential model for attentive object selection. In

Proc. 39th IWK, Sept. 27-30, vol. II, pages 109116, TU Ilmenau,

1994.

[10] W. A. Fellenz and G. Hartmann. Image segmentation by phase label

diffusion. In Proc. of the Int. Conference on Artificial Neural Net-

works, ICANN-95, Paris, vol. II, pages 309314,1995.

[11] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions

and the Bayesian restoration of images. IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, 6:721741, 1984.

[12] C. M. Gray, P. Konig, A. K. Engel, and W. Singer. Oscillatory

responses in cat visual cortex exhibit inter-columnar synchroniza-

tion which reflects global stimulus properties. Nature, 338:334336,

1989.

[13] S. Grossberg. Nonlinear neural networks: Principles, mechanisms,

and architectures. Neural Networks, 1:1761,1988.

[14] R. A. Hummel and S. W. Zucker. On the foundations of relaxation

labeling processes. IEEE Trans. on Pattern Analysis and Machine

Intelligence, 5:267287,1983.

[15] B. Julesz. Foundations of Cyclopean Perception. University of

Chicago Press, 1971.

[16] P. K. Kienker, G. E. Sejnowski, T. J. Hinton, and L. E. Schumacher.

Separating figure from ground with a parallel network. Perception,

15:197216,1986.

[17] C. Koch, J. Marroquin, and A. Yuille. Analog neuronal networks

in early vision. Proceedings of the National Academy of Science,

83:42634267,1986.

[18] K. Koffka. Principles of Gestalt Psychology. Harcourt, Brace &

World, New York, 1935.

[19] D. Marr and E. Hildreth. Theory of edge detection. Proceedings of

the Royal Society of London B, 207:187216, 1980.

[20] James T. McIlwain. Distributed spatial coding in the superior col-

liculus: A review. Visual Neuroscience, 6:313, 1991.

[21] J.-M. Morel and S. Solimini. Variational Methods in Image Segment-

ation. Birkhauser, Boston, 1995.

[22] M. C. Morrone and D. C. Burr. Feature detection in human vision: a

phase-dependent energy model. Proceedings of the Royal Society of

London, B 235:221245, 1988.

[23] P. Perona and J. Malik. Detecting and localizing edges composed of

steps, peaks and roofs. In Proc. of the 3rd Int. Conf. on Computer

Vision, pages 5257. IEEE Comp. Soc., Osaka, 1990.

[24] P. Perona and J. Malik. Scale-space and edge detection using aniso-

tropicdiffusion. IEEE Transactionson Pattern Analysisand Machine

Intelligence, 12(7):629639,1990.

[25] T. Poggio,E. B. Gamble,and J. J. Little. Parallel integration of visual

modules. Science, 242:436242,1988.

[26] A. Rosenfeld, R. A. Hummel, and S. W. Zucker. Scene labeling byrelaxation operations. IEEE Transactions on Systems, Man and Cy-

bernetics, 6:420433,1976.

[27] D. Terzopoulos. Regularization of inverse visual problems involving

discontinuities. IEEE Trans. on Pattern Analysis and Machine Intel-

ligence, 8(4):413424, 1986.

[28] A. J. Van Opstal and J. A. M. Van Ginsbergen. A nonlinear model

for collicular spatial interactions underlying the metrical properties

of electrically elicited saccades. Biol. Cybern., 60:171183, 1989.

[29] S. Yantis. Multielement visual tracking: Attention and perceptual or-

ganization. Cognitive Psychology, 24:295340, 1992.

[30] A. L. Yarbus. Eye movements and vision. Plenum, New York, 1967.

6

Documents

Preattentive grouping and Attentive selection for early visual computation