Preattentive grouping and Attentive selection for early visual computation

  • Upload
    ketan8

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 7/29/2019 Preattentive grouping and Attentive selection for early visual computation

    1/6

  • 7/29/2019 Preattentive grouping and Attentive selection for early visual computation

    2/6

    Figure 1. a) Open disk scene; b) Summed ON and OFFchannel responses; c) Phase image after 100 iterations.

    2. The Edge Detection Stage

    Since the early days in Computer Vision many operators

    for edge detection have been proposed ranging from gradi-

    ent and template matching operators to parametric edge

    models. Because most of these operators are designed to

    detect special kinds of edges, it soon became obvious that

    a general purpose edge detector would be a compromise

    between certain performance criteria. In [4] a linear oper-

    ator was derived for the detection of step edges, which min-

    imises the joint criteria of good localisation and reliability.

    Being an odd-symmetric operator it suffers from false local-

    isation of line and roof edges. The detection of zero cross-

    ings (ZC) in the second derivative of the intensity function

    using an even symmetric gradient filter [19] faces similar

    problems at line edges. A nonlinear combination using the

    summed squares of both even and odd symmetric filters has

    proven to be a good detector of edges composed of steps,

    peaks, and roofs [22, 24]. Using both the local energy and

    phase, it is possible to reconstruct the generating edge show-

    ing its applicability for image coding. However, to localise

    the edge exactly a search for the maximum response is ne-cessary, contrary to the detectionof zero crossings which are

    by definition dimensionless. We will show that the presen-

    ted relaxation phase labeling (RPL) procedure is able to de-

    tect ZCs in phase space, hereby sharpening the edge re-

    sponse, at the same time performing the perceptual group-

    ing of arranged dots into contours and closed objects. In the

    presented system we use six pairs of oriented Gabor filters

    [6] being in quadrature phase to extract the local energy fol-

    lowed by a differentiation step using odd symmetric Gabor

    filters to rectify the oriented responses:

    (2)

    (3)

    Where represents the even symmetric function and

    its odd symmetric Hilbert-transform. The con-

    stants and specify the envelope of the oriented Gaus-

    sian, sets the appropriate frequency of the modulating si-

    nusoidal, and is a normalisationfactor. Figure 2 shows the

    edge detection stage and the model hyper-columns which

    X2

    x

    y

    X2

    +

    +

    +

    +

    +

    +

    +

    +

    +

    +

    +

    + +

    +

    +

    +

    +

    +

    +

    +

    +

    +

    +

    +

    Figure 2. a) Hierarchical extraction of direction specificcontour lines; b) Hyper-columnar structure with parametric

    phase for the relaxation phase labeling process.

    are the initial data for the relaxation labeling process ex-

    posed in the next section.

    3. The Relaxation Phase Labeling Process

    Twenty years ago a mechanism for scene labeling was

    proposed [26], which reduces the ambiguity among objects

    in a scene in terms of an iterated relaxation procedure, per-

    formed in parallel on the data array. Since then numerous

    approaches to parallel relaxation operations have been de-

    scribed. We have adopted the general strategy of relaxing

    labels corresponding to observed properties in the scene, us-

    ing parametric phase labels, which group into coherent ob-

    jects in phase space, giving the relaxation procedure a new

    degree of freedom to accomplish a consistent labeling. As

    revealed by the Gestalt school in the first half of the century,visual perception is governed by certain simple rules which

    group parts into wholes employing laws like grouping by

    proximity, similarity, closure, symmetry and good continu-

    ation [18]. Although these principles are easy to investigate

    in psychophysical experiments, their underlying neuronal

    computations are mainly unknown. It has been speculated,

    that synchronisations of visual cortical neurons, revealed by

    recent electrophysiological studies [8, 12], may serve as the

    carrier for the observed perceptual grouping phenomenom.

    The differences in oscillator phase between spatially neigh-

    bouring spiking cells could be used in principle to label dif-

    ferent objects in the scene for their intrinsical segmentation.

    The proposed grouping criteria of spatial contiguity and co-

    herence of particular feature domains indeed show similarit-

    ies to the proposed Gestalt-laws. However, the law of good

    continuation,which plays a central role in many edge group-

    ing and linking schemes in computer vision, is able to over-

    ride both proximity and similarity. This pronounces the role

    of oriented edges both in the implementation of perceptual

    grouping and synchronisation mechanisms. The emergent

    forming of a perceptual group, including both edge and re-

    gionbased information is depicted in figure 1: the dotsform-

    2

  • 7/29/2019 Preattentive grouping and Attentive selection for early visual computation

    3/6

    2

    2

    2

    f(x)

    K(x)

    K (x)

    x

    x

    x

    x

    x

    x00

    0

    0 0

    0

    OFF

    ON

    Intensity Phase

    e)

    c) f)

    b)

    a) d)

    Figure 3. Scheme for relaxation and diffusion of phase la-bels. The intensity distribution (a) is filtered to extract in-

    tensity gradients (b) corresponding to perceived edgesin the

    image. The smoothed derivative of the edge map is rectifiedinto ON and OFF channels (c), allowing simple compatibil-

    ity constraints between channels to modify an initially uni-

    form phase map (d); (e) intermediate and (f) final phase dis-

    tribution of the phase image evolving in parallel over time.

    ing an incomplete circle are grouped into a synchronised

    round disk with a discontinuity at the upper right indicat-

    ing the missing dot in phase space. In figure 4 the results

    of the proposed segmentation scheme for a scene with three

    simple objectsis shown. Althoughthe objectsare defined by

    different boundary types ranging from intensity discontinu-

    ities over lines to dots, the phase gradient shows a common

    interpretation of all contour types. In figure 3 the generalidea of the proposed phase relaxation and diffusion mech-

    anism is depicted. The principal processing is as follows

    [10]: we defined smoothly varying constraints on the in-

    teraction strength between all direction selective responses

    of the second preprocessing stage. These constraints sup-

    port orientation continuity by positive interactions between

    similar directions, and decouple both sides of the contour

    by negative interactions between opposite directions. The

    spreading of labels into regions is introduced by synchron-

    ising phase oscillators at the contours with oscillators in the

    interiorof objects. This filling in is similar to brightness dif-

    fusion [5, 23] allowing the separation of figure and ground

    [16], but instead uses the coherency of cyclic phases to la-

    bel the whole scene. The proposed labeling process can be

    formulated in terms of minimising an explicit functional de-

    pending on the basic compatibility relations, using results

    developed in [14]. The phases of each hypercolumnar

    vector at position are updated according to a Gau-

    Seidel procedure, using a sigmoid nonlinearity for sum-

    ming up the individual activations, and a shifted cosines for

    calculating the contributions of neighbouring elements de-

    pending on their phase difference:

    Figure 4. a) Scene with edge and line defined objects; b)Phase image after 28 iterations; c) Phase gradient of b.

    (4)

    (5)

    Notation

    Phase at position (i,j)

    Random variable

    Activity in m-th feature map

    Contribution of n-th feature map

    Compatibility constraints

    Connectivity matrix

    Sigmoid nonlinearity ( )

    Periodic function of phase difference

    Phase difference

    Constants

    Set of discrete directions

    The compatibility function , depicted in Fig. 5a) is

    modelled as a shifted Gaussian. A sparse horizontal con-

    nectivity scheme was chosen to improve the synchron-

    isation behaviour. In Figure 5b) the qualitative conver-

    gence properties of the system are depicted, showing aver-

    age phase change and normalised average energy over itera-

    tion steps. The periodic function can beset to to

    resemble the Kuramoto oscillator, we instead used formula-

    tion5 to speed up convergence. The zero mean random vari-

    able introduces noise into the decision process, thereby

    resolving ambiguous situations, and forcing the process to

    move from the initial equilibrium state with all phases be-

    ingequal, to a global solutionin phase space. As can be seen

    from the process equation 4, the change in phase at each loc-

    ation is governed by a correlated activity in at least one fea-

    ture map at neighbouring positions. To allow the spreading

    of phase labels into regions formed by the oriented contours

    a uniformactivity is added to an additional fea-

    ture map , to resemble spontaneous neuronal activity.

    Figure 9b)-d) shows the extracted direction selective edges

    3

  • 7/29/2019 Preattentive grouping and Attentive selection for early visual computation

    4/6

    Y x 103

    10.00

    12.00

    14.00

    16.00

    18.00

    20.00

    22.00

    24.00

    26.00

    28.00

    0.00 50 100 150 200

    Average Phase Change / Compatibility Energy

    Steps

    Figure 5. a) Competitive/cooperative interaction con-straints between direction selective responses; b) Qualitat-

    ive convergence behaviour of relaxation process, continu-

    ous: average phase change - dashed: average energy.

    of the test image Paolina, using only odd-symmetric Gabor

    filters to half-wave rectify the oriented responses into ON

    and OFF channels. The result of the constraint satisfaction

    relaxation procedure is shown in 9e), from which the phase

    gradient 9f) has been computed. To compare the perform-

    ance of the segmentation, the binarised gradient of the phase

    image and the edges detected by a Canny edge detector are

    shown. It can be evaluated, that the contours of the binarised

    phase gradient in Figure 9g) resemble the Canny edges, al-

    though no postprocessing like edge linking and maximum

    detection was necessary. In figure 10 the same maps are

    shown for a boat image.

    4. Selective Attention

    Two types of theories have been suggested to explain

    how attention is allocated to perform visual tasks. Accord-ing to region based theories, an attentional spotlight is dir-

    ected to spatial positions in the visual field having circular

    shape with varying diameter. Object based theories, on the

    other hand, propose that attention is directed to perceptual

    groups and not just locations. However, the main advant-

    age of an attentional mechanism is the information reduc-

    tion capability of spatially selecting salient portions of the

    visual field, and the possible simplification of the binding

    problem by linkingtogether the output of cells coding differ-

    ent features of the attended object. Recent research reveals

    evidence forobject-based theories of attention[29], with ob-

    jects acting as wholes in a slow, competitive process work-

    ing in parallel across the visual field [7], although spatial se-

    lection and top-down control are part of the attentional sys-

    tem. Figure 6 shows a simplifiedsketch of thebrain maps in-

    volved in the segmentation of objects from a complex scene

    by applying a cortical grouping mechanism and an atten-

    tional focus to the early representation of the scene. Both

    processes are part of early vision mechanisms [15], which

    operate bottom-up, whereby the attentive control serves the

    coupling of data driven and cognitive processing streams

    both possessing cyclic and feedback loops. The visual in-

    Attention Engagement

    Pulvinar, Thalamus

    Spatial Map, engage Attention

    Posterior Parietal Cortex

    Spatial Modulation

    IOR, FEF

    Target selection

    Superior Colliculus

    Object Recog-

    nition, IT

    Feature Maps, V1 - V5

    Preattentive Segmentation

    Synchronization

    Image Plane

    Retina

    Figure 6. Sketch of the maps involved in the process ofsegmenting and extracting objects from a scene

    formation of an image is decomposed into sets of features of

    multiple feature maps (V1-V5) which interact by excitatory

    and inhibitory connections between locations (horizontal)

    and features (vertical). The pre-attentively grouped visual

    information is further processed by an attention mechanism

    (pulvinar) which chooses the most salient perceptual group

    and selectively enhances the responsiveness of neurons to

    thislocationat the expense of informationfrom other groupsor locations. The target selection map (SC) precomputes the

    expected saccade in a retinotopic coordinate frame, which is

    transformed into a spatial attentional map in viewer centred

    (environmental) coordinates (PP). The spatial modulation

    map (FEF) integrates information about attentionally relev-

    ant locations from PP with recently visited locations (IOR)

    and cognitive information like expected locations and over-

    all scanning behaviour (compare with [30]).

    5. The object based attention process

    The phase image of the preattentive stage was used

    for the sequential extraction of objects by a selective at-

    tention mechanism [9]. This stage of processing applies

    an object-based attention filter to the presegmented early

    visual information by selectively enhancing and inhibiting

    regions corresponding to preattentively synchronised per-

    ceptual groups in the earlier visual maps. The attentional

    filter is computed by a global winner-take-all (WTA) mech-

    anism in a separate attentional map integrating the informa-

    tion from all feature and scale specific earlier visual maps

    and the temporal decaying memory map (IOR) represent-

    4

  • 7/29/2019 Preattentive grouping and Attentive selection for early visual computation

    5/6

    ing recently attended objects. The dynamics of the system

    has been adapted from the shunting feedback network pro-

    posed by S. Grossberg [13], and has been rewritten for dis-

    crete simulation on a computer:

    (6)

    where corresponds to the map element at position ,

    equals the squared sum over all activations, and cor-

    responds to the normalised result from convoluting with

    kernel at . denotes the excitatory and the

    inhibitoryinputfor IOR. and are arbitrarilychosen con-stants for bounding the activation of between and B.

    For reasons of simplicity we have chosen and .

    In the presented simulations, the constants have been set to

    , and

    . Critical for the overall performance of the network

    is the size and form of the convolution kernel , for which

    we have chosen a Gaussian with diameter five, and the para-

    meter which influences the size of the variable attentional

    spotlight. In the presented simulations the excitatory input

    consists of two arrays for the phase and activity at each spa-

    tial location. In the last processing stage the selected visual

    information from the feature maps is integrated in a target

    selection map (SC) which executes a saccade by applying anonlinear model of local lateral interactions for saccade av-

    eraging [28], based on ensemble coding and linear vector ad-

    dition of movement contributions [20].

    In Figure 7 the sequence of attentional foci computed

    from an objects image, overlayed on its phase image are

    shown. Figure 8 shows phase and activity maps of the excit-

    atory input and the sequence of inhibitory maps to prevent

    the system to visit recently attended locations. As can be

    evaluated, the selected regions are a compromise between

    spatial and phasic coherence, allowing perceptual groups

    and objects to be extracted from the input.

    6. Conclusion

    A four stage processing model for object segmentation

    and selection has been proposed which combines neuro-

    physiological and psychological data to account for its bio-

    logical plausibility. We have described a relaxation phase

    labeling procedure for the preattentive grouping and percep-

    tual segmentation of objects in phase space and an attention

    mechanism which sequentially extracts perceptual groups in

    a cluttered scene consistent with an object based theory of

    Figure 7. Sequence of attentional foci (white) using bothedge enrgy and phase, overlayed on the phase image of 8a).

    Figure 8. a) Phase image of objects scene; b) Summedactivity of edge maps; c) Sequence of inhibitory memory.

    visual attention. The original contribution of the presented

    biological framework for perceptual segmentation and se-

    lection of objects in a real world scene is the transformation

    of the grouping process into phase space, using a simple re-

    laxation labeling procedure. By introducing directional re-

    sponses and local constraints thereupon, serving the group-

    ing of similar directions and the decoupling of both sidesof a contour line, the proposed mechanism is able to detect

    zero-crossings in phase space without an explicit and bio-

    logical implausible search. The gradient in phase space is

    sharpened compared to the edge response or the intensity

    discontinuity, and the whole scene is labelled into objects

    and background. Furthermore, the relaxation phase labeling

    (RPL) process is able to extract the most salient contour

    lines of perceptual groups in phase space, suppressing false

    responses generated from the preprocessing stage. There-

    fore the RPL-process can be used to link edges into object

    boundaries by closing small gaps in the contour lines of the

    intensity image, or the groupingof perceptual primitives like

    dots, points or dashes intoperceptual wholes using grouping

    principles originally proposed by Gestalt-Psychology. For

    a more complete segmentation scheme involving both dif-

    ferent spatial frequencies and multiple feature domains, the

    system could be expanded by a scale space approach [3, 23]

    and the integration of parallel texture-, motion-, and colour

    specific processing channels [25, 1]. An extension on the

    feature level will be the integration of distinctive maps for

    two dimensional features like direction of motion, texture,

    curvature, endstoppings and junctions.

    5

  • 7/29/2019 Preattentive grouping and Attentive selection for early visual computation

    6/6

    Figure 9. a) Paolina image (Size 200x200); b) Summedresponses of six ON channels; c) Summed responses of six

    OFF channels; d) Phase image after 21 iteration steps; e)

    Binarised phase gradient of d; f) Canny detectorwith ,

    and threshold (0.3,0.9).

    Figure 10. a) Boat scene (Size 200x200); b) Summed re-sponses of six ON channels; c) Summed responses of six

    OFF channels; d) Phase image after 51 iteration steps; e)-f)

    same as in Fig. 9.

    References

    [1] J. Aloimonos and D. Shulman. Integration of Visual Modules: An

    Extension to the Marr Paradigm. Academic Press, 1989.

    [2] A. Blake and A. Zisserman. Invariant surface reconstruction using

    weak continuity constraints. In ProceedingsIEEE Conf. on ComputerVision and Pattern Recognition, pages 6267.IEEE, 1986.

    [3] P. J. Burt and E. H. Adelson. The laplacian pyramid as a compact

    image code. IEEE Trans. on Communications, 31(4):532540,1983.

    [4] J. F. Canny.A computationalapproachto edgedetection. IEEE Trans.

    on Pattern Analysis and Machine Intelligence, 8(6):679698,1986.

    [5] M. A: Cohen and S. Grossberg. Neural dynamics of brightness per-

    ception: Features, boundaries, diffusion, and resonance. Perception

    and Psychophysics, 36:428456, 1984.

    [6] J. G. Daugman. Uncertainty relation for resolution in space, spatial

    frequency, and orientation optimized by two-dimensionalv isual cor-

    tical filters. J. Opt. Soc. Am. A, 2:1160 1168, July 1985.

    [7] R. Desimone and J. Duncan. Neural mechanisms of selective visual

    attention. Annual Review of Neuroscience, 18:193222, 1995.

    [8] R. Eckhorn, R. Bauer, W. Jordan, M. Brosch, M. Kruse, W. Munk,

    and H. J. Reitboeck. Coherent oscillations: A mechanism of feature

    linking in the visual cortex? Biol. Cybern., 60:121130,1988.

    [9] W. A. Fellenz. A sequential model for attentive object selection. In

    Proc. 39th IWK, Sept. 27-30, vol. II, pages 109116, TU Ilmenau,

    1994.

    [10] W. A. Fellenz and G. Hartmann. Image segmentation by phase label

    diffusion. In Proc. of the Int. Conference on Artificial Neural Net-

    works, ICANN-95, Paris, vol. II, pages 309314,1995.

    [11] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions

    and the Bayesian restoration of images. IEEE Transactions on Pat-

    tern Analysis and Machine Intelligence, 6:721741, 1984.

    [12] C. M. Gray, P. Konig, A. K. Engel, and W. Singer. Oscillatory

    responses in cat visual cortex exhibit inter-columnar synchroniza-

    tion which reflects global stimulus properties. Nature, 338:334336,

    1989.

    [13] S. Grossberg. Nonlinear neural networks: Principles, mechanisms,

    and architectures. Neural Networks, 1:1761,1988.

    [14] R. A. Hummel and S. W. Zucker. On the foundations of relaxation

    labeling processes. IEEE Trans. on Pattern Analysis and Machine

    Intelligence, 5:267287,1983.

    [15] B. Julesz. Foundations of Cyclopean Perception. University of

    Chicago Press, 1971.

    [16] P. K. Kienker, G. E. Sejnowski, T. J. Hinton, and L. E. Schumacher.

    Separating figure from ground with a parallel network. Perception,

    15:197216,1986.

    [17] C. Koch, J. Marroquin, and A. Yuille. Analog neuronal networks

    in early vision. Proceedings of the National Academy of Science,

    83:42634267,1986.

    [18] K. Koffka. Principles of Gestalt Psychology. Harcourt, Brace &

    World, New York, 1935.

    [19] D. Marr and E. Hildreth. Theory of edge detection. Proceedings of

    the Royal Society of London B, 207:187216, 1980.

    [20] James T. McIlwain. Distributed spatial coding in the superior col-

    liculus: A review. Visual Neuroscience, 6:313, 1991.

    [21] J.-M. Morel and S. Solimini. Variational Methods in Image Segment-

    ation. Birkhauser, Boston, 1995.

    [22] M. C. Morrone and D. C. Burr. Feature detection in human vision: a

    phase-dependent energy model. Proceedings of the Royal Society of

    London, B 235:221245, 1988.

    [23] P. Perona and J. Malik. Detecting and localizing edges composed of

    steps, peaks and roofs. In Proc. of the 3rd Int. Conf. on Computer

    Vision, pages 5257. IEEE Comp. Soc., Osaka, 1990.

    [24] P. Perona and J. Malik. Scale-space and edge detection using aniso-

    tropicdiffusion. IEEE Transactionson Pattern Analysisand Machine

    Intelligence, 12(7):629639,1990.

    [25] T. Poggio,E. B. Gamble,and J. J. Little. Parallel integration of visual

    modules. Science, 242:436242,1988.

    [26] A. Rosenfeld, R. A. Hummel, and S. W. Zucker. Scene labeling byrelaxation operations. IEEE Transactions on Systems, Man and Cy-

    bernetics, 6:420433,1976.

    [27] D. Terzopoulos. Regularization of inverse visual problems involving

    discontinuities. IEEE Trans. on Pattern Analysis and Machine Intel-

    ligence, 8(4):413424, 1986.

    [28] A. J. Van Opstal and J. A. M. Van Ginsbergen. A nonlinear model

    for collicular spatial interactions underlying the metrical properties

    of electrically elicited saccades. Biol. Cybern., 60:171183, 1989.

    [29] S. Yantis. Multielement visual tracking: Attention and perceptual or-

    ganization. Cognitive Psychology, 24:295340, 1992.

    [30] A. L. Yarbus. Eye movements and vision. Plenum, New York, 1967.

    6