Upload
ketan8
View
216
Download
0
Embed Size (px)
Citation preview
7/29/2019 Preattentive grouping and Attentive selection for early visual computation
1/6
7/29/2019 Preattentive grouping and Attentive selection for early visual computation
2/6
Figure 1. a) Open disk scene; b) Summed ON and OFFchannel responses; c) Phase image after 100 iterations.
2. The Edge Detection Stage
Since the early days in Computer Vision many operators
for edge detection have been proposed ranging from gradi-
ent and template matching operators to parametric edge
models. Because most of these operators are designed to
detect special kinds of edges, it soon became obvious that
a general purpose edge detector would be a compromise
between certain performance criteria. In [4] a linear oper-
ator was derived for the detection of step edges, which min-
imises the joint criteria of good localisation and reliability.
Being an odd-symmetric operator it suffers from false local-
isation of line and roof edges. The detection of zero cross-
ings (ZC) in the second derivative of the intensity function
using an even symmetric gradient filter [19] faces similar
problems at line edges. A nonlinear combination using the
summed squares of both even and odd symmetric filters has
proven to be a good detector of edges composed of steps,
peaks, and roofs [22, 24]. Using both the local energy and
phase, it is possible to reconstruct the generating edge show-
ing its applicability for image coding. However, to localise
the edge exactly a search for the maximum response is ne-cessary, contrary to the detectionof zero crossings which are
by definition dimensionless. We will show that the presen-
ted relaxation phase labeling (RPL) procedure is able to de-
tect ZCs in phase space, hereby sharpening the edge re-
sponse, at the same time performing the perceptual group-
ing of arranged dots into contours and closed objects. In the
presented system we use six pairs of oriented Gabor filters
[6] being in quadrature phase to extract the local energy fol-
lowed by a differentiation step using odd symmetric Gabor
filters to rectify the oriented responses:
(2)
(3)
Where represents the even symmetric function and
its odd symmetric Hilbert-transform. The con-
stants and specify the envelope of the oriented Gaus-
sian, sets the appropriate frequency of the modulating si-
nusoidal, and is a normalisationfactor. Figure 2 shows the
edge detection stage and the model hyper-columns which
X2
x
y
X2
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
Figure 2. a) Hierarchical extraction of direction specificcontour lines; b) Hyper-columnar structure with parametric
phase for the relaxation phase labeling process.
are the initial data for the relaxation labeling process ex-
posed in the next section.
3. The Relaxation Phase Labeling Process
Twenty years ago a mechanism for scene labeling was
proposed [26], which reduces the ambiguity among objects
in a scene in terms of an iterated relaxation procedure, per-
formed in parallel on the data array. Since then numerous
approaches to parallel relaxation operations have been de-
scribed. We have adopted the general strategy of relaxing
labels corresponding to observed properties in the scene, us-
ing parametric phase labels, which group into coherent ob-
jects in phase space, giving the relaxation procedure a new
degree of freedom to accomplish a consistent labeling. As
revealed by the Gestalt school in the first half of the century,visual perception is governed by certain simple rules which
group parts into wholes employing laws like grouping by
proximity, similarity, closure, symmetry and good continu-
ation [18]. Although these principles are easy to investigate
in psychophysical experiments, their underlying neuronal
computations are mainly unknown. It has been speculated,
that synchronisations of visual cortical neurons, revealed by
recent electrophysiological studies [8, 12], may serve as the
carrier for the observed perceptual grouping phenomenom.
The differences in oscillator phase between spatially neigh-
bouring spiking cells could be used in principle to label dif-
ferent objects in the scene for their intrinsical segmentation.
The proposed grouping criteria of spatial contiguity and co-
herence of particular feature domains indeed show similarit-
ies to the proposed Gestalt-laws. However, the law of good
continuation,which plays a central role in many edge group-
ing and linking schemes in computer vision, is able to over-
ride both proximity and similarity. This pronounces the role
of oriented edges both in the implementation of perceptual
grouping and synchronisation mechanisms. The emergent
forming of a perceptual group, including both edge and re-
gionbased information is depicted in figure 1: the dotsform-
2
7/29/2019 Preattentive grouping and Attentive selection for early visual computation
3/6
2
2
2
f(x)
K(x)
K (x)
x
x
x
x
x
x00
0
0 0
0
OFF
ON
Intensity Phase
e)
c) f)
b)
a) d)
Figure 3. Scheme for relaxation and diffusion of phase la-bels. The intensity distribution (a) is filtered to extract in-
tensity gradients (b) corresponding to perceived edgesin the
image. The smoothed derivative of the edge map is rectifiedinto ON and OFF channels (c), allowing simple compatibil-
ity constraints between channels to modify an initially uni-
form phase map (d); (e) intermediate and (f) final phase dis-
tribution of the phase image evolving in parallel over time.
ing an incomplete circle are grouped into a synchronised
round disk with a discontinuity at the upper right indicat-
ing the missing dot in phase space. In figure 4 the results
of the proposed segmentation scheme for a scene with three
simple objectsis shown. Althoughthe objectsare defined by
different boundary types ranging from intensity discontinu-
ities over lines to dots, the phase gradient shows a common
interpretation of all contour types. In figure 3 the generalidea of the proposed phase relaxation and diffusion mech-
anism is depicted. The principal processing is as follows
[10]: we defined smoothly varying constraints on the in-
teraction strength between all direction selective responses
of the second preprocessing stage. These constraints sup-
port orientation continuity by positive interactions between
similar directions, and decouple both sides of the contour
by negative interactions between opposite directions. The
spreading of labels into regions is introduced by synchron-
ising phase oscillators at the contours with oscillators in the
interiorof objects. This filling in is similar to brightness dif-
fusion [5, 23] allowing the separation of figure and ground
[16], but instead uses the coherency of cyclic phases to la-
bel the whole scene. The proposed labeling process can be
formulated in terms of minimising an explicit functional de-
pending on the basic compatibility relations, using results
developed in [14]. The phases of each hypercolumnar
vector at position are updated according to a Gau-
Seidel procedure, using a sigmoid nonlinearity for sum-
ming up the individual activations, and a shifted cosines for
calculating the contributions of neighbouring elements de-
pending on their phase difference:
Figure 4. a) Scene with edge and line defined objects; b)Phase image after 28 iterations; c) Phase gradient of b.
(4)
(5)
Notation
Phase at position (i,j)
Random variable
Activity in m-th feature map
Contribution of n-th feature map
Compatibility constraints
Connectivity matrix
Sigmoid nonlinearity ( )
Periodic function of phase difference
Phase difference
Constants
Set of discrete directions
The compatibility function , depicted in Fig. 5a) is
modelled as a shifted Gaussian. A sparse horizontal con-
nectivity scheme was chosen to improve the synchron-
isation behaviour. In Figure 5b) the qualitative conver-
gence properties of the system are depicted, showing aver-
age phase change and normalised average energy over itera-
tion steps. The periodic function can beset to to
resemble the Kuramoto oscillator, we instead used formula-
tion5 to speed up convergence. The zero mean random vari-
able introduces noise into the decision process, thereby
resolving ambiguous situations, and forcing the process to
move from the initial equilibrium state with all phases be-
ingequal, to a global solutionin phase space. As can be seen
from the process equation 4, the change in phase at each loc-
ation is governed by a correlated activity in at least one fea-
ture map at neighbouring positions. To allow the spreading
of phase labels into regions formed by the oriented contours
a uniformactivity is added to an additional fea-
ture map , to resemble spontaneous neuronal activity.
Figure 9b)-d) shows the extracted direction selective edges
3
7/29/2019 Preattentive grouping and Attentive selection for early visual computation
4/6
Y x 103
10.00
12.00
14.00
16.00
18.00
20.00
22.00
24.00
26.00
28.00
0.00 50 100 150 200
Average Phase Change / Compatibility Energy
Steps
Figure 5. a) Competitive/cooperative interaction con-straints between direction selective responses; b) Qualitat-
ive convergence behaviour of relaxation process, continu-
ous: average phase change - dashed: average energy.
of the test image Paolina, using only odd-symmetric Gabor
filters to half-wave rectify the oriented responses into ON
and OFF channels. The result of the constraint satisfaction
relaxation procedure is shown in 9e), from which the phase
gradient 9f) has been computed. To compare the perform-
ance of the segmentation, the binarised gradient of the phase
image and the edges detected by a Canny edge detector are
shown. It can be evaluated, that the contours of the binarised
phase gradient in Figure 9g) resemble the Canny edges, al-
though no postprocessing like edge linking and maximum
detection was necessary. In figure 10 the same maps are
shown for a boat image.
4. Selective Attention
Two types of theories have been suggested to explain
how attention is allocated to perform visual tasks. Accord-ing to region based theories, an attentional spotlight is dir-
ected to spatial positions in the visual field having circular
shape with varying diameter. Object based theories, on the
other hand, propose that attention is directed to perceptual
groups and not just locations. However, the main advant-
age of an attentional mechanism is the information reduc-
tion capability of spatially selecting salient portions of the
visual field, and the possible simplification of the binding
problem by linkingtogether the output of cells coding differ-
ent features of the attended object. Recent research reveals
evidence forobject-based theories of attention[29], with ob-
jects acting as wholes in a slow, competitive process work-
ing in parallel across the visual field [7], although spatial se-
lection and top-down control are part of the attentional sys-
tem. Figure 6 shows a simplifiedsketch of thebrain maps in-
volved in the segmentation of objects from a complex scene
by applying a cortical grouping mechanism and an atten-
tional focus to the early representation of the scene. Both
processes are part of early vision mechanisms [15], which
operate bottom-up, whereby the attentive control serves the
coupling of data driven and cognitive processing streams
both possessing cyclic and feedback loops. The visual in-
Attention Engagement
Pulvinar, Thalamus
Spatial Map, engage Attention
Posterior Parietal Cortex
Spatial Modulation
IOR, FEF
Target selection
Superior Colliculus
Object Recog-
nition, IT
Feature Maps, V1 - V5
Preattentive Segmentation
Synchronization
Image Plane
Retina
Figure 6. Sketch of the maps involved in the process ofsegmenting and extracting objects from a scene
formation of an image is decomposed into sets of features of
multiple feature maps (V1-V5) which interact by excitatory
and inhibitory connections between locations (horizontal)
and features (vertical). The pre-attentively grouped visual
information is further processed by an attention mechanism
(pulvinar) which chooses the most salient perceptual group
and selectively enhances the responsiveness of neurons to
thislocationat the expense of informationfrom other groupsor locations. The target selection map (SC) precomputes the
expected saccade in a retinotopic coordinate frame, which is
transformed into a spatial attentional map in viewer centred
(environmental) coordinates (PP). The spatial modulation
map (FEF) integrates information about attentionally relev-
ant locations from PP with recently visited locations (IOR)
and cognitive information like expected locations and over-
all scanning behaviour (compare with [30]).
5. The object based attention process
The phase image of the preattentive stage was used
for the sequential extraction of objects by a selective at-
tention mechanism [9]. This stage of processing applies
an object-based attention filter to the presegmented early
visual information by selectively enhancing and inhibiting
regions corresponding to preattentively synchronised per-
ceptual groups in the earlier visual maps. The attentional
filter is computed by a global winner-take-all (WTA) mech-
anism in a separate attentional map integrating the informa-
tion from all feature and scale specific earlier visual maps
and the temporal decaying memory map (IOR) represent-
4
7/29/2019 Preattentive grouping and Attentive selection for early visual computation
5/6
ing recently attended objects. The dynamics of the system
has been adapted from the shunting feedback network pro-
posed by S. Grossberg [13], and has been rewritten for dis-
crete simulation on a computer:
(6)
where corresponds to the map element at position ,
equals the squared sum over all activations, and cor-
responds to the normalised result from convoluting with
kernel at . denotes the excitatory and the
inhibitoryinputfor IOR. and are arbitrarilychosen con-stants for bounding the activation of between and B.
For reasons of simplicity we have chosen and .
In the presented simulations, the constants have been set to
, and
. Critical for the overall performance of the network
is the size and form of the convolution kernel , for which
we have chosen a Gaussian with diameter five, and the para-
meter which influences the size of the variable attentional
spotlight. In the presented simulations the excitatory input
consists of two arrays for the phase and activity at each spa-
tial location. In the last processing stage the selected visual
information from the feature maps is integrated in a target
selection map (SC) which executes a saccade by applying anonlinear model of local lateral interactions for saccade av-
eraging [28], based on ensemble coding and linear vector ad-
dition of movement contributions [20].
In Figure 7 the sequence of attentional foci computed
from an objects image, overlayed on its phase image are
shown. Figure 8 shows phase and activity maps of the excit-
atory input and the sequence of inhibitory maps to prevent
the system to visit recently attended locations. As can be
evaluated, the selected regions are a compromise between
spatial and phasic coherence, allowing perceptual groups
and objects to be extracted from the input.
6. Conclusion
A four stage processing model for object segmentation
and selection has been proposed which combines neuro-
physiological and psychological data to account for its bio-
logical plausibility. We have described a relaxation phase
labeling procedure for the preattentive grouping and percep-
tual segmentation of objects in phase space and an attention
mechanism which sequentially extracts perceptual groups in
a cluttered scene consistent with an object based theory of
Figure 7. Sequence of attentional foci (white) using bothedge enrgy and phase, overlayed on the phase image of 8a).
Figure 8. a) Phase image of objects scene; b) Summedactivity of edge maps; c) Sequence of inhibitory memory.
visual attention. The original contribution of the presented
biological framework for perceptual segmentation and se-
lection of objects in a real world scene is the transformation
of the grouping process into phase space, using a simple re-
laxation labeling procedure. By introducing directional re-
sponses and local constraints thereupon, serving the group-
ing of similar directions and the decoupling of both sidesof a contour line, the proposed mechanism is able to detect
zero-crossings in phase space without an explicit and bio-
logical implausible search. The gradient in phase space is
sharpened compared to the edge response or the intensity
discontinuity, and the whole scene is labelled into objects
and background. Furthermore, the relaxation phase labeling
(RPL) process is able to extract the most salient contour
lines of perceptual groups in phase space, suppressing false
responses generated from the preprocessing stage. There-
fore the RPL-process can be used to link edges into object
boundaries by closing small gaps in the contour lines of the
intensity image, or the groupingof perceptual primitives like
dots, points or dashes intoperceptual wholes using grouping
principles originally proposed by Gestalt-Psychology. For
a more complete segmentation scheme involving both dif-
ferent spatial frequencies and multiple feature domains, the
system could be expanded by a scale space approach [3, 23]
and the integration of parallel texture-, motion-, and colour
specific processing channels [25, 1]. An extension on the
feature level will be the integration of distinctive maps for
two dimensional features like direction of motion, texture,
curvature, endstoppings and junctions.
5
7/29/2019 Preattentive grouping and Attentive selection for early visual computation
6/6
Figure 9. a) Paolina image (Size 200x200); b) Summedresponses of six ON channels; c) Summed responses of six
OFF channels; d) Phase image after 21 iteration steps; e)
Binarised phase gradient of d; f) Canny detectorwith ,
and threshold (0.3,0.9).
Figure 10. a) Boat scene (Size 200x200); b) Summed re-sponses of six ON channels; c) Summed responses of six
OFF channels; d) Phase image after 51 iteration steps; e)-f)
same as in Fig. 9.
References
[1] J. Aloimonos and D. Shulman. Integration of Visual Modules: An
Extension to the Marr Paradigm. Academic Press, 1989.
[2] A. Blake and A. Zisserman. Invariant surface reconstruction using
weak continuity constraints. In ProceedingsIEEE Conf. on ComputerVision and Pattern Recognition, pages 6267.IEEE, 1986.
[3] P. J. Burt and E. H. Adelson. The laplacian pyramid as a compact
image code. IEEE Trans. on Communications, 31(4):532540,1983.
[4] J. F. Canny.A computationalapproachto edgedetection. IEEE Trans.
on Pattern Analysis and Machine Intelligence, 8(6):679698,1986.
[5] M. A: Cohen and S. Grossberg. Neural dynamics of brightness per-
ception: Features, boundaries, diffusion, and resonance. Perception
and Psychophysics, 36:428456, 1984.
[6] J. G. Daugman. Uncertainty relation for resolution in space, spatial
frequency, and orientation optimized by two-dimensionalv isual cor-
tical filters. J. Opt. Soc. Am. A, 2:1160 1168, July 1985.
[7] R. Desimone and J. Duncan. Neural mechanisms of selective visual
attention. Annual Review of Neuroscience, 18:193222, 1995.
[8] R. Eckhorn, R. Bauer, W. Jordan, M. Brosch, M. Kruse, W. Munk,
and H. J. Reitboeck. Coherent oscillations: A mechanism of feature
linking in the visual cortex? Biol. Cybern., 60:121130,1988.
[9] W. A. Fellenz. A sequential model for attentive object selection. In
Proc. 39th IWK, Sept. 27-30, vol. II, pages 109116, TU Ilmenau,
1994.
[10] W. A. Fellenz and G. Hartmann. Image segmentation by phase label
diffusion. In Proc. of the Int. Conference on Artificial Neural Net-
works, ICANN-95, Paris, vol. II, pages 309314,1995.
[11] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions
and the Bayesian restoration of images. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 6:721741, 1984.
[12] C. M. Gray, P. Konig, A. K. Engel, and W. Singer. Oscillatory
responses in cat visual cortex exhibit inter-columnar synchroniza-
tion which reflects global stimulus properties. Nature, 338:334336,
1989.
[13] S. Grossberg. Nonlinear neural networks: Principles, mechanisms,
and architectures. Neural Networks, 1:1761,1988.
[14] R. A. Hummel and S. W. Zucker. On the foundations of relaxation
labeling processes. IEEE Trans. on Pattern Analysis and Machine
Intelligence, 5:267287,1983.
[15] B. Julesz. Foundations of Cyclopean Perception. University of
Chicago Press, 1971.
[16] P. K. Kienker, G. E. Sejnowski, T. J. Hinton, and L. E. Schumacher.
Separating figure from ground with a parallel network. Perception,
15:197216,1986.
[17] C. Koch, J. Marroquin, and A. Yuille. Analog neuronal networks
in early vision. Proceedings of the National Academy of Science,
83:42634267,1986.
[18] K. Koffka. Principles of Gestalt Psychology. Harcourt, Brace &
World, New York, 1935.
[19] D. Marr and E. Hildreth. Theory of edge detection. Proceedings of
the Royal Society of London B, 207:187216, 1980.
[20] James T. McIlwain. Distributed spatial coding in the superior col-
liculus: A review. Visual Neuroscience, 6:313, 1991.
[21] J.-M. Morel and S. Solimini. Variational Methods in Image Segment-
ation. Birkhauser, Boston, 1995.
[22] M. C. Morrone and D. C. Burr. Feature detection in human vision: a
phase-dependent energy model. Proceedings of the Royal Society of
London, B 235:221245, 1988.
[23] P. Perona and J. Malik. Detecting and localizing edges composed of
steps, peaks and roofs. In Proc. of the 3rd Int. Conf. on Computer
Vision, pages 5257. IEEE Comp. Soc., Osaka, 1990.
[24] P. Perona and J. Malik. Scale-space and edge detection using aniso-
tropicdiffusion. IEEE Transactionson Pattern Analysisand Machine
Intelligence, 12(7):629639,1990.
[25] T. Poggio,E. B. Gamble,and J. J. Little. Parallel integration of visual
modules. Science, 242:436242,1988.
[26] A. Rosenfeld, R. A. Hummel, and S. W. Zucker. Scene labeling byrelaxation operations. IEEE Transactions on Systems, Man and Cy-
bernetics, 6:420433,1976.
[27] D. Terzopoulos. Regularization of inverse visual problems involving
discontinuities. IEEE Trans. on Pattern Analysis and Machine Intel-
ligence, 8(4):413424, 1986.
[28] A. J. Van Opstal and J. A. M. Van Ginsbergen. A nonlinear model
for collicular spatial interactions underlying the metrical properties
of electrically elicited saccades. Biol. Cybern., 60:171183, 1989.
[29] S. Yantis. Multielement visual tracking: Attention and perceptual or-
ganization. Cognitive Psychology, 24:295340, 1992.
[30] A. L. Yarbus. Eye movements and vision. Plenum, New York, 1967.
6