134
Cognitive and Perceptual Processes in Visual Recognition Thesis submitted for the degree of “Doctor of Philosophy” By Michal Jacob Submitted to the Senate of the Hebrew University of Jerusalem June 2009

Cognitive and Perceptual Processes in Visual Recognition

Embed Size (px)

Citation preview

Page 1: Cognitive and Perceptual Processes in Visual Recognition

Cognitive and Perceptual Processes in Visual Recognition

Thesis submitted for the degree of

“Doctor of Philosophy”

By

Michal Jacob

Submitted to the Senate of the Hebrew University of Jerusalem

June 2009

Page 2: Cognitive and Perceptual Processes in Visual Recognition

This work was carried out under the supervision of

Prof. Shaul Hochstein

Page 3: Cognitive and Perceptual Processes in Visual Recognition

תהליכים קוגניטיביים ותפיסתיים בזיהוי חזותי

ואר דוקטור לפילוסופיהחיבור לשם קבלת ת

מאת

מיכל יעקב

הוגש לסינט האוניברסיטה העברית בירושלים

ט"תשס תמוז

Page 4: Cognitive and Perceptual Processes in Visual Recognition

של עבודה זו נעשתה בהדרכתו

שאול הוכשטיין' פרופ

Page 5: Cognitive and Perceptual Processes in Visual Recognition

שלמי תודות

. רעיונות יצירתיים ודיונים פוריים, תובנות שהציע במהלך המחקרשאול הוכשטיין על ' תודתי נתונה לפרופ

.למדתי ממנו רבות גם אודות אורחות אנוש. אני מודה לו על נועם הליכותיו

על העזרה שהגישו לי במהלך גיבוש , חברי הועדה המלווה, ליאון דעואל'וינשל ופרופדפנה ' אני מודה לפרופ

.ותועל שיחות מועיל, המחקר והתווייתו

ועל שאפשר לי להפיק את המירב מלימודיי , ידעהעל היותו מרכז להעשרת , תודה למרכז לחישוביות עצבית

.במסלול

.על שיחות מאירות עיניים, מורי באומנות הקראטה, אני מודה ליואל יערי

.שתמכו בי לאורך הדרך, אני רוצה להודות לחבריי

.על ההפריה ההדדית, תודה לחברי המעבדה

.על שהאמינו בי, רחל ויצחק יעקב, ני להודות להורייברצו

Page 6: Cognitive and Perceptual Processes in Visual Recognition

- 1 -

ABSTRACT

What aspects of visual perception lead us to recognition? I investigated elements

influencing the process leading to recognition. These include elements in the visual scene

(Chapter I, Jacob & Hochstein, 2008), and the influence of fixations – their number and

their sequence – on recognition (Chapter II, Jacob & Hochstein, 2009; Chapter III).

In the first experiment, I used the Set game (®Set Enterprises Inc.). One of the main

findings was preference for similarity within the set (rather than span), suggesting

presence of a basic similarity-perceiving mechanism, even though the brain specializes in

identifying difference. The preference for similarity may evolutionarily result from the

diversity of the environment, over which the unique and different are the few points of

similarity. An additional main finding was a decrease in RT was observed with number

of sets in the display, according to a horse-race model (Corballis, 1998; Egeth &

Mordkoff, 1991; Garner & Lee, 1962; Graham, 1989; Miller, 1982, 1986; Monnier, 2006;

Raab, 1962; Townsend & Ashby, 1983), implying independence of simultaneous

searches. I further found influences of perceptual as well as conceptual elements of the

task on detection. These findings strengthen the assumption that perceptual and cognitive

functions are linked together (Goldstone & Barsalou, 1998; Landy & Goldstone, 2007).

My next step was to compare between detected and undetected stimuli, in order to

learn about the sequence of events leading to detection. This was done in the second

experiment (Chapter II, Jacob & Hochstein, 2009). For this purpose, I formulated the

Identity Search Task. The Identity Search Task display contains computer screen "cards",

each with a square array of scrambled black and white square units. The subjects’ task is

to find two exactly identical cards – an identical pair. Configured for the designated goal,

each display contained two identical pairs, allowing one pair to be detected, and leaving

one pair undetected in each trial.

The main approach of this research was recording subject’s eye movements while

performing the task. Eye movements were tracked using the SR Research Ltd. (Ontario,

Canada) EyeLink I eye-tracker.

Upon finding that there are more fixations on the eventually detected target than on

the undetected one, together with a different pattern in the sequence of fixations, the role

of fixations, as understood, was elevated.

Page 7: Cognitive and Perceptual Processes in Visual Recognition

- 2 -

Analysis of the backward dynamics, i.e. alignment of the accumulated fixations on

each of the pairs relative to the detection point, averaged over trials, revealed a

bifurcation point between the detected and the undetected target, where the differential

viewing properties begin. The patterns of fixations on the detected and undetected pairs

were nearly identical up to a point where the number of fixations on the detected pair

showed a sharp upturn, as a result of an earlier event – the slope of the detected pair rose

above the slope of the undetected pair. I therefore conclude that not only the absolute

number of fixations has an influence, but also their proximity in time, and suggest that

the important factor may be the change in the accumulated number of fixations, i.e. the

slope, rather than the number itself. The point where the slope exceeds a pre-determined

threshold may be regarded as the bifurcation point where there is a change of state in the

search process. This bifurcation point may reflect a transition between a first stage of

“search-in-the-dark” to a second stage of "early implicit recognition". This suggests that

there is an early recognition stage, followed by more fixations, leading to full recognition.

I further theorize that several fixations specifically on the target are required for its

recognition. To check this hypothesis, I configured a novel paradigm for researching this

hypothesis – exposing the subjects to varying numbers of fixations on the target, after

which the display was terminated and a response was obtained (Chapter III). This was

done by retrieving the fixations in real-time and controlling the display accordingly. I

again used the Identity Search Task (Jacob & Hochstein, 2009), only that this time, half

of the displays included just one target, and half included no target at all.

I found that an increase in number of target fixations leads to better recognition,

reflected by a decrease in response time and increase in performance and accuracy,

measured by hit-rate and detectability, d’.

Do fixations lead to detection or does unconscious pre-recognition guide the eyes to

more fixations? Analysis of the hit-rates as a function of number of fixations on the

target, together with confidence level responses of the subjects, suggests that a few

fixations lead to an early implicit recognition state, which in turn leads to more fixations

on the target, and thus ultimately to full explicit recognition.

Evidence points to gathering and integration of information over fixations conducted

specifically to the target, that is – local fixations, demonstrated by the several fixations

Page 8: Cognitive and Perceptual Processes in Visual Recognition

- 3 -

needed for detection. Information is gathered gradually over a number of fixations to a

scene region. On the other hand, knowing that the sequential distance between fixations

also plays a role points to memory decay. The bottom line is that a single fixation is

definitely not enough for gathering all the information about a certain region.

Finally, following these insights, I suggest a mathematical model for spatially

available visual information (Chapter IV). The model design has two prevailing

components: An increment of information, due to the accumulated number of fixations

on the region (the incremental component), and the decay of information, influenced by

the sequential distance from the last fixation on the region (the memory decay

component).

I conclude is that recognition grades as a consequence of the graded information that

is gathered, fixation after fixation on the same scene region. Several target fixations are

needed for processing visual information to achieve full recognition.

Page 9: Cognitive and Perceptual Processes in Visual Recognition

- 4 -

CONTENTS

ABSTRACT 1

INTRODUCTION 5

METHODOLOGY, RESULTS and FOCUSED DISCUSSION 12

I. Set Recognition as a Window to Perceptual and Cognitive Processes

II. Comparing Eye Movements to Detected vs. Undetected Target

Stimuli in an Identity Search Task

III. Graded Recognition as a Function of Target Fixations

IV. A Model for Visual Information

13

34

51

78

DISCUSSION 95

I. Summary 95

II. General Discussion 96

III. Future Directions 101

i. Looking into the Eyes 101

ii. New response methodology (relating to Chapter III) 103

iii. Experimental test of the visual information model (Chapter IV) 104

CONCLUSIONS 106

REFERENCES 108

APPENDICES 123

I. (Appendix to Chapter II) Predictive value of number of target

fixations and their sequential distance

123

II. (Appendix to Chapter III) After-Search 126

Page 10: Cognitive and Perceptual Processes in Visual Recognition

- 5 -

INTRODUCTION

How does consciousness arise? What in the brain is responsible to our subjective

experience of the world? What physical processes in the brain allow this amazing

phenomenon? Chalmers (1995a,b) differentiates between the “easy problems” and the

“hard problem”. The easy problems, though not at all easy, concern the objective

mechanisms of the cognitive system, whereas the hard problem is the question of how

physical processes in the brain give rise to subjective experience, or the qualia.

An example of the difference between the easy and hard problems relies on a thought

experiment devised by Jackson (1986). Suppose that Mary is a neuroscientist who is an

expert on brain processes responsible for color vision. But she has never seen colors,

therefore has never experienced a color such as red. This subjective experience, also

occurring in recognition (Farah & Aguirre, 1999; Schyns, Bonnar & Gosselin, 2002),

accompanied with awareness, which we are all so familiar with, constitutes the hard and

most intriguing problem – how does this experience arise?

I believe that what is needed now is a new theory, maybe a new physics theory, to

explain this amazing phenomenon called consciousness. It always amazes me how we

have awareness and perceive things – so simple to perform, such that we sometimes take

it for granted – and yet, we don’t know how we do it. We cannot describe the process that

leads to conscious perception, to recognition. Maybe some kind of phase transition is

involved. As I see things, humanity has not yet reached the breakthrough needed for

understanding this phenomenon.

I tried to get a grip of a tiny portion of what happens in the brain during this process

which leads eventually to our conscious subjective sense of recognition – guided by and

guiding eye movements (Chapters II and III). I chose this tool, because eye movements

are known to reflect the state of mind (Henderson & Hollingworth, 1999; Liversedge &

Findlay, 2000; Rayner, 1998; Ringach, Hawken & Shapley, 1996; Stone, Miles & Banks,

2003).

What aspects of visual perception lead us to recognition? Because I cannot put my

finger on what leads to the sense of recognition – neither locate it in the brain nor point to

the exact moment in the process leading to it, I chose to investigate elements influencing

Page 11: Cognitive and Perceptual Processes in Visual Recognition

- 6 -

this process. These include elements in the visual scene (Chapter I, Jacob & Hochstein,

2008), and the influence of fixations – their number and their sequence – on recognition

(Chapter II, Jacob & Hochstein, 2009; Chapter III).

At first I aimed at simulating an everyday “automatic” or perceptual process

(Schneider & Shiffrin, 1977; Shiffrin & Schneider, 1977). The “primary criteria” for

automaticity, defined by Neumann (1984), concern interference, intentionality, and

awareness. Principles of automaticity (Carr, 1992; Cohen, Servan-Schreiber &

McClelland, 1992; Treisman, Vieira & Hayes, 1992) are: An increase in speed of

performance with practice, involuntariness, diminishing requirements of attention, and

immunity from interference. Automaticity is characterized by “direct” processing,

referring to a direct pathway from stimulus to response, as opposed to “indirect” or

cognitive processing which demands explicit consideration or analytic reasoning (Cohen,

Dunbar & McClelland, 1990). Another test that has been used for automatic processing is

its independence of set size (Treisman et al., 1992).

I tried to mimic an automatic process under lab conditions, i.e. use a process that is

not already “built-in” to the brain system. This way I hoped to “slow-down” the process,

so I could follow it. But apparently, when the process is slowed down, it is not the same

type of process.

Chapter I reports results from investigating the Set game (®Set Enterprises Inc.;

homepage: www.setgame.com). The task is to detect, among the 12 displayed cards, a set

consisting of three cards, being either all alike or all different along each of the 4

dimensions (color, shape, filling and number) with 3 possible values for each. The Set

game, which was invented by Marsha Jean Falco in 1974, is a complex visual perception

game, or so it is called. Actually, it is not only a perceptual game, but rather turned out to

be more conceptual, at least without very extensive training. In the Set game, subjects did

not reach automaticity.

Another requirement from the experimental setup was an ‘aha!’ experience (see

Ahissar & Hochstein, 1997; Bowden & Jung-Beeman, 2003), or Insight (Jung-Beeman,

Bowden, Haberman, Frymiare, Arambel-Liu, Greenblatt, Reber, & Kounios, 2004; Luo

& Niki, 2003; Rubin, Nakayama & Shapley, 1997, 2002; Bowden & Jung-Beeman, 2007;

Maier, 1931; Smith & Kounios, 1996; Sternberg & Davidson, 1995), also termed Eureka.

Page 12: Cognitive and Perceptual Processes in Visual Recognition

- 7 -

(See also Holyoak, 1990, on problem solving). The ‘aha!’ experience is the subjective

feeling of experiencing the solution as sudden and surprising. Solvers experience insight

when they suddenly overcome an impasse as a result of unconscious processing (Bowden

& Jung-Beeman, 2003). Recent studies indicate distinct patterns of performance and

suggest differential hemispheric involvement for insight versus non-insight solutions.

Using fMRI studies, Bowden & Jung-Beeman (2003) found activation below the

threshold of awareness prior to producing the solution. When solving a problem with an

insight, fMRI revealed increased activity in the right hemisphere anterior superior

temporal gyrus, associated with making connections across distantly related information

during comprehension (Jung-Beeman et al., 2004). Insight is a momentary conscious

recognition. But only after a while does conscious perception suddenly emerge, rather

than being incremental (e.g. solving certain math problems, in which the solution is

resolved step-by-step). The long process leading to insight allows investigation of the

process itself through its duration.

The benefit of the set game is that the moment of insight of set detection does not

come immediately, but takes a few seconds. Another advantage is that the insight-like

discovery of a set is not limited to a one-time occurrence – after revealing a set, there is

still a possibility to generate that effect in another display of the cards.

The research involved characterization of the influence of different parameters on the

detection of a set: Similarity vs. span; competition between processes and independence

of simultaneous searches when more than one set is present in the display, according to a

horse-race model (Corballis, 1998; Egeth & Mordkoff, 1991; Garner & Lee, 1962;

Graham, 1989; Miller, 1982, 1986; Monnier, 2006; Raab, 1962; Townsend & Ashby,

1983); dimensional salience; most abundant value in the display; eccentricity of cards and

proximity between them; learning and generalization.

The Set game turned out to be too complex a task for my purposes – it involves many

dimensions and parameters, which distract from reaching the core of the process that I

was looking for. I was looking for a simpler task, which will neutralize all the irrelevant

factors, and will leave only the important and essential properties.

Page 13: Cognitive and Perceptual Processes in Visual Recognition

- 8 -

The main requirement from the experimental task was a comparison between

detected and undetected targets (Chapter II). What makes us detect and recognize some

objects, while others remain undetected?

A long list of requirements had to be fulfilled by the task:

i. Each display includes one eventually detected and one undetected target pair

for comparison.

ii. Displays are divided into distinct search and eye-fixation regions.

iii. A process that requires several seconds of search (allowing one to follow the

dynamics of eye movement patterns during the course of the search).

iv. The search process culminates in a momentary (though not immediate)

recognition (an insight or ‘aha’ experience; abruptly ending the search

process).

v. An enormous number of novel displays may be created, allowing one to

repeat the task with a new search and a new "aha" experience each time.

vi. The task is easy to master, so that learning is rapid, and performance stabilizes

to a steady level.

vii. Task complexity may be controlled.

More elaborations as for the requirements can be found in Chapter II (Jacob & Hochstein,

2009).

Then the Identity Search Task was formulated (Chapter II, Jacob & Hochstein,

2009). The Identity Search Task display contains computer screen "cards", each with a

square array of scrambled black and white square units. The subjects’ task is to find two

exactly identical cards – an identical pair. This task also makes use of the former finding

from the Set game of preference to similarity.

In this experiment, I used eye movement tracking. There are several types of eye-

movements and modes. There are fixations, which are used for gathering information on

the environment. Fixation locates a particular part of the visual scene on the fovea, and

this input, including approximately a radius of 2° of visual field (Anstis, 1974; Riggs,

1965), is the only one which is processed with sharpness, clarity and accuracy. Fixation

duration is ~250ms (Liversedge & Findlay, 2000), and depends on several factors. The

saccades are fast ballistic movements (i.e. the target is decided before initiation), which

Page 14: Cognitive and Perceptual Processes in Visual Recognition

- 9 -

relocate the point of fixation. Saccades typically have amplitude of up to 20°, and last

~40ms; their top angular velocity is proportional to their amplitude, but cannot exceed

1000°sec-1. In addition there is a mode of smooth pursuit of a moving object. The

fixations themselves are not completely stable, to avoid fading of visual perception as a

result of neural adaptation, and include three modes of movements: tremor, drift and

micro-saccades (Martinez-Conde, Macknik & Hubel, 2004; Martinez-Conde, 2006). Out

of all these types of eye-movements, I concentrated on fixations, and analyzed

specifically their location, duration, and sequence.

Analysis of fixations during performance of the task revealed that there are

consistently more fixations on the detected target (as opposed to more fixations on the

undetected target, or an equal number on both).

This led to investigating the role of fixations. There is a debate as to the function of

fixations. Fixations are occasionally regarded as an “inter-saccadic phenomenon”

(Findlay, Brown & Gilchrist, 2001), serving as the latency for the computation of a

saccade (Gersch, Kowler & Dosher, 2004; Irwin, 2001; Kowler, Anderson, Dosher &

Blaser, 1995), whereas all their purpose is using their time for the saccadic programming.

Such claims raise the importance of saccade planning (Vergilino & Beauvillain, 2001;

Vergilino-Perez & Findlay, 2006). This is indeed a very interesting and important issue,

but I will not deal with it here. In contrast, not lessening the importance of saccadic

programming, fixations can be regarded as having their own existence.

A most significant example of the essentiality of fixations comes from an

inattentional blindness demonstration by Simons & Chabris (1999). They instructed the

observer to follow the white team basketball players, and to carefully count their passes.

This forces concentration on their moves, so that attention and fixations are shifted to

them. Eventually, it turns out that a gorilla passed in the frame, and was completely

unnoticed. My interpretation of the reason the gorilla was not perceived is that there were

no fixations on it.

This happens daily, though less dramatically, because we sample the outside world

with our fixations, and we are usually not concentrated exclusively on certain locations or

objects. Yet, sampling of the visual scene is not uniform, and is indeed affected by the

informativeness of the scene, and by our current goal of observation.

Page 15: Cognitive and Perceptual Processes in Visual Recognition

- 10 -

Classical studies concluded that more informative scene regions receive more

fixations (Buswell, 1935; Yarbus, 1967; Mackworth & Morandi, 1967; Antes, 1974;

Loftus & Mackworth, 1978) and that regions that receive more fixations are eventually

identified (Nodine, Carmody, & Kundel, 1978, “Searching for Nina”). The question of

“What aspect of looking distinguishes a successful from an unsuccessful search when

looking for a target?” was raised before by Nodine, Carmody, & Kundel (1978, p.245).

Their answer relied on the properties of the scene – noisy surrounds adversely influenced

search performance. Moreover, they found a positive relationship between duration of the

first fixation on the target and detection, and that hits were preceded by examination-type

sampling (long duration fixations), while misses were preceded by survey-type sampling

(short duration fixations).

The finding of more fixations on the detected target also raises the question of what

comes first – does an arbitrary extensive observation of the pair cards lead to perception,

or, the opposite, does detection of the pair lead to more fixations on it? In other words,

does the larger number of fixations give rise to detection or is it a result of detection?

This question is addressed in Chapter III.

There is another debate – whether memory is obtained and preserved after a fixation,

that is with the shift of attention (attention withdrawal from one location) when fixating

another location (McCarley, Wang, Kramer, Irwin, & Peterson, 2003).

There are several types of defined memories (Atkinson & Shiffrin, 1968, multi-store

model): Short-Term Memory (STM; also referred to as Working Memory, see Baddeley

& Hitch, 1974), Long Term Memory (LTM), and Iconic Memory, defined as a sensory

memory. Visual Short-Term Memory (vSTM) is characterized by limited capacity, long

lasting store, which is non-maskable and not tied to spatial position (Irwin, 1991). Iconic

Memory, on the other hand, is a sensory memory related to visual persistence and

temporary storage of sensory input dependant on retinal or spatial position of the stimulus

(Becker, Pashler, Anstis, 2000; Phillips, 1974; Sperling, 1960), and is characterized by

high capacity and quickly decaying memory, vanishing within ~0.2sec.

The recent trend is to argue for very limited capacity of Working Memory (Horowitz

& Wolfe, 1998), or even no memory at all (O’Regan, 1992). Ballard, Hayhoe, Pook &

Page 16: Cognitive and Perceptual Processes in Visual Recognition

- 11 -

Rao (1997) suggest a tradeoff between working memory load and the number of required

fixations, which according to them have a deictic role.

Attention also plays a role. Visual attention and eye position are linked during normal

viewing, with attention automatically preceding the eyes to the next saccade target

(Deubel & Schneider, 1996; Henderson, Pollatsek & Rayner, 1989; Hoffman &

Subramanian, 1995; Kowler, Anderson, Dosher & Blaser, 1995; Rayner, McConkie &

Ehrlich, 1978; Shepherd, Findlay & Hockey, 1986).

The debate also extends to whether information is retained, accumulated and

integrated over fixations (Hollingworth et al., 2001; or in another view – over saccades,

Irwin, 1991; O’Regan & Levy-Schoen, 1983). Two opposing theories are the coherence

theory (Rensink, 2000b), which suggests that visual object representations disintegrate

immediately upon the withdrawal of attention, and the visual memory theory

(Hollingworth et al., 2001), which suggests that visual representations accumulate to

form a relatively detailed representation of a scene.

Even the supporters of information integration refer to gathering of information about

the whole scene, constructed from the contribution of all local objects. They regard the

multiple fixations across all over the visual scene, thus integrating visual information

from several local objects, and forming a large scale representation of that scene.

Here, on the other hand, I theorize about the specific number of fixations on a

certain region, that is, on the target (Chapter III), i.e. integration of information over a

specific region of the scene. For testing this hypothesis, I use a novel paradigm –

exposing the observers to varying number of fixations on the target.

Finally, a mathematical model for spatial visual information is suggested in Chapter

IV. The objective was to construct a model which defines the available information for

each unit in the visual scene in any given moment. The model is designed to be affected

by two components: An increment of information, due to the growing number of

fixations on the region (the incremental component), and the decay of information,

influenced by the sequential distance from the last fixation on the region (the memory

decay component). Hopefully, this available-information measure will be able to predict

the probability of noticing a change to a visual unit, that is, to predict the change

detection probability.

Page 17: Cognitive and Perceptual Processes in Visual Recognition

- 12 -

METHODOLOGY, RESULTS and FOCUSED DISCUSSION

The methods and results are detailed in the following chapters, accompanied by a

particular discussion:

I. Set Recognition as a Window to Perceptual and Cognitive Processes.

Jacob, M. & Hochstein, S. (2008). Perception & Psychophysics, 70, 1165-1184.

II. Comparing Eye Movements to Detected vs. Undetected Target Stimuli in an

Identity Search Task.

Jacob, M. & Hochstein, S. (2009). Journal of Vision, 9(5):20, 1-16.

III. Graded Recognition as a Function of Target Fixations.

IV. A Model for Visual Information.

Page 18: Cognitive and Perceptual Processes in Visual Recognition

- 13 -

Chapter I

Set Recognition as a Window to Perceptual and Cognitive Processes

Jacob, M. & Hochstein, S. (2008). Perception & Psychophysics, 70, 1165-1184

Page 19: Cognitive and Perceptual Processes in Visual Recognition

Often, a complex task is challenging, cognitive, analyti-cal and effort demanding at first but later, with extensive training, becomes more immediate and perceptual and does not require focused attention and conscious processing. One example of such a process that has received an enor-mous degree of study is categorization (Ashby & Maddox, 2005). Another such process is reading, where the original, almost painful working out of a word or sentence becomes automatic (Carr, 1992)—even when contraindicated, such as in the Stroop effect (Stroop, 1935). The transition be-tween these modes of processing requires further study. A related issue that has not received much attention is the visual- perceptual influences on these processes—when they are still cognitive, conscious, and analytic. We chose as the substrate for our study the game called Set. For the nov-ice, it requires slow cognitive and conscious analysis of the display. Although these processes gradually become easier, quicker, and more perceptual, here we study conscious and unconscious perceptual influences on performance while the game is in its cognitive stage. Our ultimate goal is to understand better the perceptual mechanisms underlying set recognition as an instance of perceptual processes in-fluencing cognitive tasks in general.

The Set visual perception game, demonstrated in Fig-ure 1A, was invented in the U.S. by Marsha Jean Falco in 1974 (Set Enterprises Inc.; homepage: www.setgame .com). In the original game, each card has four dimensions, or attributes, and one out of three possible values for each,

as follows: shape (diamond, ellipse, or wave), number (one, two, or three), filling (empty, striped, or filled), and color (originally, red, purple, and green; in our version, red, blue, and yellow). On each round, 12 cards are displayed on the table or, in our case, on the computer screen.

The goal is to identify a set, defined as 3 cards, being all different or all alike within each dimension, indepen-dently of the other dimensions. That is, along each and every dimension, the set either spans all values or has only one value—similarity within that dimension. In general, a valid set will span some dimensions and have similarity for others. Note that for any 2 cards, there is exactly 1 card that completes the set. As an example, in Figure 1A, the rectangles around 3 cards identify a set. There are two other sets present in these 12 cards. Can you find them?

Is playing the Set game a perceptual or a cognitive task? At first thought, since the game depends on viewing col-orful geometric elements, it might be thought to be a per-ceptual task. In fact, the commercial version is called “The Family Game of Visual Perception.” But this is a naive view, since the determination of whether cards form a set is a conceptual matter. Thus, the real question is whether the task is purely cognitive or whether there is also a per-ceptual element (other than the trivial aspect that we need to perceive the cards before we can perform the cognitive processing required to determine which constitute a set). That is, do perceptual characteristics influence the con-ceptual detection of a set (Goldstone & Barsalou, 1998)?

1165 Copyright 2008 Psychonomic Society, Inc.

Set recognition as a window to perceptual and cognitive processes

Michal Jacob and Shaul hochSteinHebrew University, Jerusalem, Israel

The Set visual perception game is a fertile research platform that allows investigation of perception, with gradual processing culminating in a momentary recognition stage, in a context that can be endlessly repeated with novel displays. Performance of the Set game task is a play-off between perceptual and conceptual processes. The task is to detect (among the 12 displayed cards) a 3-card set, defined as containing cards that are either all similar or all different along each of four dimensions with three possible values. We found preference and reduced response times (RTs) for perceiving set similarity (rather than span) and for including cards sharing the most abundant value in the display, suggesting that these are searched preferentially (perhaps by mutual enhancement). RT decreases with number of sets in the display according to a horse race model, implying independence of simultaneous searches. Central cards are included slightly more often, but set card proximity seems irrelevant. A supplementary experiment determining dimensional salience showed consistent but indi-vidual preferences, yet these seemed not to affect set identification. Training induced gradual improvement, which generalized to a new version of the game, suggesting high-level learning. We conclude that elements of perception such as similarity detection are basic for finding sets in this task, as in other real-world perceptual and cognitive tasks, suggesting the presence of basic similarity-perceiving mechanisms. The findings confirm the conclusion that conceptual processes are affected by perception.

Perception & Psychophysics2008, 70 (7), 1165-1184doi: 10.3758/PP.70.7.1165

M. Jacob, [email protected]

Page 20: Cognitive and Perceptual Processes in Visual Recognition

1166 Jacob and HocHstein

As has already been pointed out, the Set game depends on detection of a mixture or combination of similarity and span. In the dichotomy or continuum between conceptual and perceptual processes, it would seem that similarity is more perceptual and spanning is more conceptual. That is, as the Gestalt psychologists pointed out, similar elements are naturally and quite automatically grouped together perceptually. One of the questions that we address here is whether there is an equivalent mechanism that automati-cally groups elements that span a dimension.

Another aspect of the Set game is its relationship to the general task of categorization. Categorization is the relating of objects or elements that differ in irrelevant ways, be-cause they are similar in those features that are deemed rel-evant. Rosch and Mervis (1975) based their very definition of basic-level categories on maximizing similarity among category members, together with maximal dissimilarity between members of different categories, but some differ-ences must remain among members of the same category or they would be identical. Bower and Trabasso (1963) ana-lyzed performance of a concept identification task—where, again, one or more dimensions are relevant and others are present but irrelevant—and showed that in this rule-based task, subjects show all-or-none learning of potential rules (i.e., they “try” one rule at a time, sequentially testing the relevance of each dimension), rather than using all the in-formation presented to them (by experimenter feedback).

Another well-studied example is the Wisconsin card sorting task (WCST; Berg, 1948; Grant & Berg, 1948; Heaton, Chelune, Talley, Kay, & Curtiss, 1993), a very simple categorization task in that only one dimension (of three possible) is considered relevant at any time (with the subject’s task being to find that dimension as it changes occasionally). In the Set game, too, cards must be associ-ated on the basis of their similarity in some dimensions, despite their dissimilarity in others. The tasks are also similar in that they have common dimensions, with sev-eral possible values for each: The WCST has three dimen-sions (color, shape, and number, with four values along each dimension); the Set game has these same dimensions plus the additional dimension of filling (bringing the total to four dimensions, but here with three values along each). The Set game includes an added complication in that the number of dimensions for which there is similarity is not announced in advance and may change from display to display. But variability is inherent in the WCST, too, since the choice of which dimension is relevant changes from time to time without prior notification. For both tasks, there is a binary rule for each dimension and for each trial: Subjects decide which dimension is currently rel-evant versus irrelevant to the WCST (only one is relevant at each time) or which reflects similarity versus span for the Set game. Thus, in both cases, the subject must be ready for change and must not stick to old habits; both tasks require flexibly adjusting which cues are important in the environment. However, in the Set game, feedback is not essential for playing correctly, whereas in the WCST, the task relies on feedback. Another important difference between the WCST and the Set game is that the goal in the first is to match a card (a sorting task) and the goal in

Figure 1. (A) The four-dimensional three-value Set game. The goal is to identify a set, defined as three cards, all of which are dif-ferent (span) or alike within each dimension, independently of the other dimensions. Class is defined as the number of dimensions spanned within a set. The marked cards form a set of class 3; that is, they are all blue and span the dimensions of shape, number, and filling. There are two more sets in the display. Can you find them? The numbers above the cards are for identifying their location in the article; they did not appear in the actual task. The most abun-dant values (MAVs) in this display are red, wave, two, three, and empty, all with a group size (MAV-GS) of five. (B) Display demon-strating the four classes. Class 1 Cards 3, 5, and 11 span the colors, but all contain two (number) filled (filling) ellipses (shape). Class 2 Cards 1, 5, and 12 span the colors and shapes, but all contain two filled items. Class 3 Cards 4, 6, and 12 span the colors, shapes, and filling, but all contain two items. Class 4 Cards 4, 8, and 10 span all four dimensions (i.e., the cards contain exactly one each of all the possible values of each dimension). In this display, the MAV is the number two (cards with two items), with a MAV-GS of eight. (Note that in all, nine cards are included in the four sets, due to overlapping sets—i.e., cards belonging to more than one set.) (C) Generalization version of the game with new values. Sets in the display: Cards 1, 4, and 9 (class 2), 4, 5, and 11 (class 2), 1, 2, and 3 (class 3), and 1, 5, and 10 (class 4). If one of the first two sets seems to you to contain more similarity than the other, this may hint at your individual dimensional salience (see below, Experi-ment 2). The most abundant value here is three, with a group size of 7. If turquoise (6) or circle (6) seems to you more abundant than the number three, this may hint that the color or shape dimension is more salient than the number dimension. The additional sets in panel A are Cards 6, 7, and 8 (class 3) and 3, 6, and 9 (class 4).

421 3

8765

9 10 11 12

B

C

A Set Game Examples

Page 21: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1167

Ahissar, 2002; Oliva & Torralba, 2006; Schyns & Oliva, 1994) and preattentively let the sets pop out. Thus, the Set game provides a window to a “solved” problem, one with a creative solution, that is, for the time being, beyond our comprehension. Our approach includes psychophysical experiments and combinatorial analysis. In Box 1, we pre-sent general combinatorics for the Set game.

In Experiment 1, the subjects played the game, and we recorded their choice of cards and timing of performance. We found that the subjects preferred sets of lower class, and we analyzed set perceptual parameters that affected set cognitive search strategy, which set was detected (when more than one was present), and the speed of its being detected, in terms of response time (RT). These in-cluded number of sets present in the display, abundance of different values, place effects, and influence of previous card locations.

In Experiment 2, we used another experimental para-digm to determine the relative salience of different dimen-sions present in the Set game—in a subject-by-subject manner. The results were compared with the prevalence of these dimensions or values in sets found by the same subjects when they played the Set game.

In Experiment 3, we tested the dynamics of learning effects following experience with the game. Training im-proved performance dramatically in terms of speed of detecting sets. We also tested the effect of training with one version of the game on performance with a different version (new values along the original dimensions; see Figure 1C)—that is, the degree of learning generalization and transfer of learning effects. That is, after training has improved performance, will changing stimulus values start learning all over again, or is the ability to identify a set now established, involving higher cortical level mech-anisms regardless of specific lower level stimuli (Ahissar & Hochstein, 1997, 2004; Hochstein & Ahissar, 2002), so that search with new values is performed immediately at the lower, posttraining RT?

EXPERIMENT 1 Playing the Set Game

MethodWe implemented the Set game with an interactive computer pro-

gram, allowing us to record subject moves and RTs. On every round of the game, 12 cards were displayed, always including at least one set. The subjects were instructed to mark (via mouse clicks) three cards that formed a set as quickly as possible but to try to avoid mistakes. If the three chosen cards indeed formed a set, a Continue button appeared. Upon pressing of the Continue button, three new cards were dealt in place of the three marked set cards, and a new round began. The replacement cards were taken from the remaining “deck,” with the used set cards being excluded. During the round, if a player changed his or her mind after marking fewer than three cards, the player could “unmark” the cards by a second mouse click. In case the player chose three cards that did not form a set, a “try again” message appeared, along with an explanation of why the cho-sen cards did not form a set (e.g., “Two are blue and one is red”). If 10 such mistakes occurred, the set was revealed, and the round was counted as unsuccessful. RT was measured from the pressing of the Continue button until the third card of a set was chosen; the subjects were informed of this timing procedure. This process continued, going from round to round, until the entire deck had been used, com-

the second is to detect three cards according to the given rule. The most important aspect of similarity in the two tasks, however, may be the one described in the preceding paragraph—namely, that both are inherently conceptual tasks with a large degree of perceptual influence.

Thus, the Set game may be seen as a categorization task in that subjects have to find three cards that belong together, with similarity and dissimilarity along different dimensions. But Set qualification requires adherence to another rule that is added to the usual similarity requirement: In every di-mension for which there is not full similarity, there must be full spanning; that is, the set cards must all be the same—or all different—for each and every dimension. In no case may there be two cards that share one value (say, red) and a third card that differs from them (say, blue). Set recognition is a difficult task because the number of relevant dimensions for which similarity must be found is unknown (and may even be zero) and the other dimensions are not simply ir-relevant but need to span the possible values. Adding this simple requirement changes the nature of the task, and the ways in which it does so is the theme of this article.

In regard to detection of similarity versus spanning along the different dimensions, we introduce the term class as the number of dimensions spanned within a set. To elaborate, there are different types of sets, having different numbers of dimensions imposing similarity. We regard the comple-mentary number, the number of dimensions fulfilling the criterion of difference (i.e., spanning all the values along that dimension), as the class of the set. Therefore, the class of a set represents the degree of dissimilarity within the set. For example, in Figure 1A, the marked set belongs to class 3, as does also the set of Cards 6, 7, and 8; the remain-ing set (Cards 3, 6, and 9) is of class 4. In Figure 1B we pre-sent a display that is designed especially to include one set of each class (see the figure caption). One of our goals is to determine whether players have a preference for sets of lower or higher class—that is, whether the ease of detecting a set depends on its degree of similarity (Tversky, 1977).

For a general game of n dimensions, there are n dif-ferent classes, 1 . . . n. Thus, in the original game, there are four classes, numbered 1–4. There are no sets of class zero, because, by definition, this would mean that they are similar on all dimensions, or identical, which is not pos-sible, since there are no repeat cards in a pack.

Various strategies may be used to find sets (see Holy-oak, 1990, on problem solving). There are the obvious, exhaustive search strategies, of choosing each possible pair of cards and checking whether a complementary third card is present in the display or choosing each possible group of three cards and checking whether they form a set. These strategies require 220 (5 12

3 ) operations, which a computer program does most easily but is not realistic for humans (see an alternative strategy in Box 1).

Obviously, there must be other strategies as well. Thus, although the task is easy from an algorithmic point of view and is readily solvable by a computer program, the exhaustive search strategy is not the way that the human brain computes and reaches a solution (see Ullman, 1984). Ultimately, the best strategy may be not to use a strategy but to catch the gist of the scene (Hochstein &

Page 22: Cognitive and Perceptual Processes in Visual Recognition

1168 Jacob and HocHstein

(9 in one session with 1 or 2 games, 1 in a session with 3 games, and 1 in two sessions). In total, 22 subjects played 125 games, including 2,844 rounds—that is, displays and sets chosen. We eliminated the 1st round in each game because of its exceptional conditions (the sub-jects had not previously seen any of the cards, whereas on subsequent rounds only 3 cards were new). We also disregarded outlier rounds with exceptionally long RTs (setting a bound at the mean RT plus three times the global standard deviation); 2,664 rounds remained. In 9% of the 2,844 rounds played, the subjects marked 3 wrong cards and were informed they did not form a set; in 11%, they marked 2 cards and then “unmarked” them (including overlapping rounds).

MATLAB was used for experimental control, data collection, and analysis. We measured RT from display presentation to choice of the third card completing a set, tracking cards, the detected set, and other available sets and their positions, deriving the dependence of preference on the following parameters: similarity (set class), num-ber of sets present in the display, the most abundant value present, the distance of set cards from each other and their location in the display, and the relationship of cards chosen to cards just revealed (i.e., locations used in the preceding set).

pleting a game. Note that in our version of the game (but not in the original), we never presented a display without any set.

Before beginning to play the game, the subjects read an explana-tion of the task and the definition of a set. They were shown a sample display with one set marked (as in Figure 1) and were asked to find another set in the display (as you may do in Figure 1). Incorrect responses were explained, so that the subjects would understand the nature of a set and that a set might not contain two cards that were similar along any one of the four dimensions (e.g., two cards with striped elements) and only one card that was different on this dimen-sion (e.g., one card that had filled elements). The experimenter veri-fied that they understood the rule before starting the first session.

The subjects participated in several sessions, each lasting about an hour, in which they played several games, as time permitted (according to their individual performance level). Each game included up to 24 rounds (the maximal number in an 81-card pack, displaying 12 cards and replacing 3 each round). Sometimes there were fewer rounds, to avoid occurrences of displays without a set. Eleven subjects partici-pated in three sessions, each playing 3–6 games per session (following the first, slower session); another 11 participated in fewer sessions

BoX 1 Combinatorics: General

Combinatorial analysis of the Set game is presented in the relevant parts of the article, revealing a num-ber of game characteristics and explaining individual performance heuristics by comparing behavior with task parameter distributions. Here, we will give only a preface to the combinatorics.

We denote the number of dimensions as d, and the number of values along each dimension as v. Note that v is the size of a set, and d may be considered the complexity of the game.

The total number of (different) cards is vd 5 34 5 81 in the regular case of a four-dimensional three-valued game. Each combination of two cards uniquely defines a third card for a set, and there is a third card that constitutes the set for any two: There is one and only one card completing a set. This can be elaborated by a vector representation, as follows: Each card is represented by a four-dimensional vector, and in each dimension it receives one of the three possible values. Two cards, for instance, can be [3, 1, 2, 1] and [2, 1, 2, 3]. The vector of the third card of the set is constructed by the following rule: If the two cards have the same value for a certain dimension, use this value also for the third card; if the two cards have different values, use the complementary value. This will always lead to a unique and existent card. In the case of the two cards mentioned above, the third is [1, 1, 2, 2]. The number of all possible sets is (81 80)/3! 5 1,080, since there are 81 possibilities for choosing the first card, one less (80) for choosing the second, and only one way of completing the set. Division by (3!) excludes repetitions.

Number of Triplets in the DisplayAs was mentioned above, the number of possible choices of 3 cards out of 12 is 12

3 5 220. An alterna-tive calculation, choosing 2 cards at a time and verifying that the complementary 3rd is present sums to the same number of operations as follows. Choose the first 2 cards and verify whether any of the other 10 cards will complete a set (10 operations). Then, in the inner loop, increment the 2nd card to the 3rd in the display (thus now choosing the 1st and 3rd), leaving 9 cards for verification. Continuing the inner loop results in a decreasing arithmetic series (10 . . . 1) of operations. Now for the outer loop, increment the 1st card to the 2nd in the display, repeating again the inner loop, this time from the 3rd card, giving a decreasing arithmetic series (9 . . . 1), and so on. Thus, the entire process yields S10

m51S mn51

n 5 220 operations.

An Alternative Strategy for Finding a SetIn the introduction, we referred to possible exhaustive search strategies for finding sets. An alterna-

tive strategy is called dimension reduction. The idea is to choose a dimension (preferably the one that is perceived as most dominant for the player, or one with the most abundant value). On this dimension, denote the three values as x, y, and z. Look at all the cards along this dimension that have the value x, and among them search for a set. This dimension is thus set to be fixed, reducing the number of dimensions by one; that is, the set will have similarity on this dimension with the value x assigned to it. If no such set exists, move on to y and then to z. If, in all cases, no set is found, the conclusion is that the set must span this dimension. So choose the next salient dimension and repeat the process, now looking always for cards spanning the previously chosen dimension(s). If the game is four-dimensional, it is usually enough to reduce one dimension (no need to set two dimensions at a time to be fixed), because it is perceptually quite easy to find a set in three dimensions, especially within the smaller group of cards. This strategy is especially relevant to the impact of the most abundant value, described below.

Page 23: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1169

pirical distribution of rounds (gray bars) and occurrences (white bars) of sets of different classes, and in Figure 2C, in which we show the number of detected sets as a fraction of these distributions.

Another way of testing preference for sets of differ-ent classes is to analyze the relative place of chosen sets among available sets on that round. We plot the proba-bilities that a set would be chosen when there were other sets present on that round of either lower (Figure 2D) or higher (Figure 2E) class (the abscissa reflects the num-ber of sets present on each round from classes that were lower or higher than the one chosen, and the ordinate is the number of occurrences of such choices, normalized to the number of opportunities for such a choice—i.e., the number of rounds on which it was possible to choose a set and leave that number of sets of lower or higher class). Note that Figure 2E is flat, reflecting an equal probability of leaving any number of higher class sets: The players were indifferent to the presence of higher class sets. In contrast, Figure 2D declines with the number of lower class sets. Taken together, these graphs demonstrate a preference for lower class sets—that is, for sets with more similarity. We conclude that sets are found on the basis of a similarity-detecting process—presumably, a basic per-ceptual mechanism.

Recent results in the realm of categorization also point to the use of underlying mechanisms that perceive simi-

Results and DiscussionSimilarity

As was mentioned in the introduction, we were inter-ested in the impact of the class of a set on the speed with which it was detected, as well as the choice of it over sets of other classes when more than one class was present. Figure 2A shows mean RT (6SE) by class. On average, the lower the class, the shorter were the RTs. Figure 2B (black bars) shows the number of sets detected from each class. Clearly, more sets were detected for higher class sets (up to class 3). Does this trend reflect a true preference on the basis of set class, or does it simply reflect their abun-dance? To answer this question, we compute how many sets of each class are available in Box 2.

Consider the results of Figure 2B, looking at the sets detected from each class, but now taking into consider-ation the relationships among the number of sets available of each class. Only twice as many sets were detected from class 2 as from class 1, even though combinatorially there are 3 times as many sets from class 2. Class 3 sets were detected just slightly more often than those of class 2, al-though there are 1.33 times as many such sets of class 2 (4 times the number of class 1 sets). Class 4 sets were detected slightly less often than those of class 1, although there are twice as many such sets. Thus, there seems to have been a preference for lower class sets. This is also demonstrated in Figure 2B, in which we show the em-

BoX 2 Combinatorics: Division Into Classes

The general equation for the total number of sets of class i, with d dimensions and three values, is

3 2

3

d id

i⋅

!;

there are 3d possible choices of the first card, di possibilities to choose which i dimensions to change for the second card, and 2i variations of changing the i dimensions. The third card is determined uniquely by the first two, and 3! again excludes repetitions.

Summing over i, this expression gives the total number of sets, [3d (3d 2 1)]/3!. In particular, for d 5 4,

Number of class 1 sets:

814

12

3108

1⋅

⋅= ≡

!;x

Number of class 2 sets:

814

22

3324 3

2⋅

⋅= =

!;x

Number of class 3 sets:

814

32

3432 4

3⋅

⋅= =

!;x

Number of class 4 sets:

814

42

3216 2

4⋅

⋅= =

!.x

Note that these numbers sum to 1,080 5 (81 80)/3!.

Page 24: Cognitive and Perceptual Processes in Visual Recognition

1170 Jacob and HocHstein

0 1 2 3 4 5 60

.1

.2

.3

.4

.5

.6

Occ

urr

ence

s (F

ract

ion

)

0 1 2 3 4 5 60

.1

.2

.3

.4

.5

.6

Order of Chosen Set’s Class Among Available Sets

D E

Number of Lower Class Sets Number of Higher Class Sets

1 2 3 40

Fraction of Detections

Class

Frac

tio

n

Detections from roundsDetections from appearanceChance level

C

.2

.4

.6

.8

1 2 3 40

10

20

30

40

50

60Mean RT vs. Classes

Class

Mea

n R

T (s

ec)

1 2 3 40

1,000

2,000

3,000

Class

Occ

urr

ence

s

Set Detections

Set Class Impact on RT, Detection, and Preference

A B

DetectedRoundsAppearances

Figure 2. Influence of similarity on set detection, with data for 22 subjects, 125 games, and 2,664 rounds: The effect of the class of a set (number of di-mensions with span, rather than similarity) on detection speed, as well as on its preference over sets of other classes when more than one set of different classes was present. (A) Mean response times (RTs) for detecting a set by class number; error bars here and in the other figures are the standard errors of the mean (SEMs). (B) Distribution by class of detected sets (black bars), number of rounds in which such sets appeared (gray), and total number of appearances (white), including possibility of more than one set in a display. (C) Set detections by class as a fraction of rounds (solid curve; black/gray bars in panel B) and as a fraction of total appearances (dashed curve; black/white bars in panel B), both showing a decrease as the class increases. For comparison, we show the chance probability (dotted curve) of choosing each class (which follows the pattern calculated by the combinatorics in Box 1). (D and E) order of chosen sets among all available sets, demonstrating preference for sets of lower classes. only (1,866) cases with at least two sets from different classes were included. (D) Number of choices of a set as a function of the number of lower class sets present on that round, normalized to the number of opportunities for such a choice. (E) Normalized number of choices as a function of higher class sets pres ent. The decreasing distribution for lower class sets and the independence of higher class sets support a preference for finding lower class sets.

Page 25: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1171

to synergistic neural summation (rather than their being independent), the result will be an enhanced redundancy gain, which is greater than simply probability summation. On the other hand, there can be interference between the processes. These properties will be true both for perfor-mance accuracy (Graham, 1989) and for RT (Raab, 1962). We extend the horse race model to N processes and apply this model to set detection; thus, we view the different sets present in a display as competing with each other to reach detection.

We wish to derive a theoretical graph describing mean RT versus the number of sets present in the display, as ex-pected by probability summation—that is, the horse race model—and to compare it with the empirically measured RT dependence. As a first step, we measure the empirical (binned) distribution of RTs, p(t), for detecting a set when only one set is present in the display. This is shown in the upper left curve of Figure 3B. The horse race model pro-cedure is then to assume that the time for detecting either of the sets when there are two available is the minimum of the times for detecting each. Thus, the new RT distribu-tion for two sets is formed by taking a pair of time bins, t1 and t2, having individual probabilities, p(t1) and p(t2), with a joint probability of p(t1) · p(t2), and assigning it to the time bin of the minimum of t1 and t2. The mean RT ac-cording to the horse race model, when there are two sets, will then be

p t p t t t

tt1 2

01 2

0 21

( ) ⋅ ( ) ⋅ ( )=

=

∑∑ min , .

By induction, the same procedure is used to derive the RT distribution when there are three or more sets present and, from these, the expected mean RT for each number of sets. If the processes are indeed independent, the observed RT distribution will fit this theoretically created distribution; if there is neural summation and, therefore, enhanced re-dundancy gain, there would be a shift to the left in the ob-served distribution (shorter RTs); if there is interference, there would be a shift to the right (slower responses).

Experimental results are compared with predictions of the horse race model in Figures 3B and 3C. We find a good fit between the empirical results and the model pre-dictions, suggesting that the model can account for our results. The implication of the success of the horse race model is independence (rather than synergy or interfer-ence) of the processes of detecting each set when there is more than one present.

Most Abundant ValueThe relative abundance of the values along each di-

mension within the array of 12 cards presented on each round may influence which set is chosen and the RT for finding it. We are particularly interested in the cards be-longing to the largest of these groups of values, which we call the most abundant value (MAV), and the num-ber of cards in this group, called the MAV group size (MAV-GS). Since similarity was shown to be easier to perceive, it may be that subjects search among the cards that belong to the largest group with the same value along one dimension— that is, the cards that share the MAV.

larity. It was found that the learning of categories is easier and more natural when one learns from exemplar pairs that belong to the same category than when one learns from pairs that belong to different categories (Hammer, Hertz, Hochstein, & Weinshall, 2005, 2007, in press), even when the pairs are preselected to contain the same amount of information. In addition, children are even more biased toward learning from same-class pairs (Hammer, Diesen-druck, Weinshall, & Hochstein, 2008). Similarity is also the basis of Gestalt principles of grouping (Koffka, 1935; Köhler, 1929). Although the Set game is not a usual cat-egorization task or a standard grouping phenomenon, it is interesting to speculate that the same basic mechanisms may underlie all these processes. In the introduction, we compared Set detection with the WCST and the concept identification task. Despite the differences mentioned there, the many similarities between these tasks in terms of the perceptual dimensions used and, more significantly, the need to choose which dimensions are relevant for any particular trial and to repress this choice for the following trial suggest that the same similarity-detecting mechanism may play an important role in all these tasks. This congru-ence of findings suggests that despite the cognitive nature of the tasks, perceptual mechanisms play an important and essential role.

Finally, it is worthwhile noting that the preference for similarity was not learned during the game, because, in the game itself, there was actually a greater abundance of higher class sets (see Figure 2B). Thus, the bias to-ward detecting lower class sets must reflect an innate or prior preference, perhaps stemming from the tendency for real-world categories to be organized around similarities, rather than around differences.

In summary, we found that sets from lower classes were detected more quickly and recognized more often (rela-tive to their availability)—that is, with priority when more than one set was present—suggesting that greater similar-ity is a factor in set detection.

RT by Number of Sets PresentAn obvious parameter that might have influenced the

speed of detecting a set is the number of sets simultane-ously present in the display, although the subjects were not informed of this number. Calculation of the expected distribution of number of sets is nontrivial, so we deter-mined the empirical distribution in our experiment, as displayed in Figure 3A. Recall that in our experiment, we rejected displays without sets, although the original game does not.

Horse race model (theoretical analysis). The horse race model (Miller, 1982, 1986; Raab, 1962; Townsend & Ashby, 1983) deals with processes that compete with each other. When there is more than one stimulus competing for our attention, the result can be performance facilita-tion according to probability summation (Graham, 1989; Monnier, 2006), resulting from the processes being inde-pendent (Corballis, 1998; Monnier, 2006). Another term for that same phenomenon is redundancy gain (Corballis, 1998; Egeth & Mordkoff, 1991; Garner & Lee, 1962). Alternatively, on the one hand, if the processes give rise

Page 26: Cognitive and Perceptual Processes in Visual Recognition

1172 Jacob and HocHstein

1 2 3 4 5 6 7 8

Number of Sets Present

Frac

tio

n

ART Dependence on Number of Sets Present

Distribution of Number of Sets

0 50 100

1 set

0 50 100

2 sets

0 50 100

3 sets

0 50 1000

.2

.4

.6

0 50 1000

.2

.4

.6

0 50 1000

.2

.4

.6

Pro

bab

ility

RT Bins (sec)

Distribution of RTs, Observed and ExpectedB

1 2 3 4 5 6 7

0

.1

.2

.3

0

.2

.4

.6

0

.2

.4

.6

0

.2

.4

.6

10

20

30

40

50

60

70Mean RT vs. Number of Sets in the Display

Number of Sets

Mea

n R

T (s

ec)

Horse race model predictionsActual mean RTs

C

Model predictionsActual distribution

4 sets

6 sets

5 sets

Figure 3. Influence of number of sets simultaneously present in the display, comparing experimental results with predictions of the horse race model. (A) Empirical distribution of the number of sets in the dis-play (for the 2,664 rounds; recall that in the experiment, displays con-tained one or more sets). (B) observed (solid blue) and expected (dashed red) response time (RT) distributions according to the horse race model for one to six sets present. As a first step, we measure the empirical dis-tribution of RTs for detecting a set when only one set is present in the display (upper left curve). From this, we derive the expected distribution of RTs for two sets (see the text). By induction, the same procedure is used to derive the expected RT distribution when there are three or more sets present. (C) Mean RT by number of sets in the display, observed and expected. Compare the experimental results with the horse race model predictions—that is, performance facilitation according to probability summation, indicating independent processes. The alternatives, neural summation (enhanced redundancy gain) or interference, would have resulted in a shift of the graphs to the left or the right, respectively. The results show neither of these effects but, rather, a good fit to the model predictions, suggesting independence of search for different sets.

Page 27: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1173

in accord with the computed distribution in Figure 4A). The most common MAV groups are of six and seven cards. Within this MAV-GS distribution, those groups that included sets (2,251 of 2,664 rounds; 84.5%) are distributed as plotted in Figure 4B (dashed curve). The distribution of MAV-GS within which a set was actually detected by the player (1,619 of 2,251 rounds; 72%) is shown in Figure 4B (solid curve).

Now we average the fraction of rounds for each game in which the subjects detected a set among the MAV cards, out of the number of rounds in which such a set was pres-ent (Figure 4C, solid curve; error bars indicate standard errors). An almost monotonic increase is observed. That is, there was a trend for the number of discovered sets to increase within the MAV group as its size increased. This finding seems to suggest that the presence of more cards in the MAV group leads to a better probability of detecting a set within it. However, this may be misleading, since we should consider only cases in which there was a choice. Perhaps when the MAV group is large, that value is so dominant that there is no set without similarity in it. In this spirit, we add the corresponding chance level of finding a

For example, in Figure 1B, the MAV is the number two (i.e., cards with two items) with a MAV-GS of eight (i.e., there are eight cards with two items; Cards 3, 5, and 11, Cards 1, 5, and 12, and Cards 4, 6, and 12 form sets from within the MAV group; Cards 4, 8, and 10 form a set outside the MAV group); in Figure 1C the MAV is the number three, with a MAV-GS of seven (Cards 4, 5, and 11 form a set from within the MAV group; Cards 1, 2, and 3, Cards 1, 4, and 9, and Cards 1, 5, and 10 form sets outside the MAV). In Figure 1A, the MAV-GS is five, and there are several groups with this size sharing some value: red (Cards 6, 7, and 8 form a set within this MAV group), wave, two, three, and empty.

Reviewing these terms, MAV refers to the most abun-dant value itself, a MAV group is the group of cards with that (most abundant) value, and MAV-GS is the actual size of the group.

In Figure 4, we demonstrate the stages in determin-ing whether sets among the most abundant cards are preferred. In Box 3 we derive the theoretical MAV-GS distribution shown in Figure 4A (solid curve). Figure 4B shows the empirical MAV-GS distribution (dotted curve;

BoX 3 Combinatorics: Distribution of Most Abundant Value Group Size (MAV-GS)

In any dimension, the cards on display may include one, two, or all three values, so that the group of cards with a particular value of a particular dimension may include anywhere from 0 to 12 cards. Calcula-tion of the combinatorial statistics of occurrences of each MAV-GS is done according to the following equation, stating the probability of having three groups of sizes k, l, and (n 2 k 2 l ) out of n cards:

nk l n k l

n!

! !( )!.

− − ( )13

This is a probability, so that the sum of all possible distributions is 1. To satisfy this constraint, the first part of the following function (the factorials) must sum to 3n. For example, for n 5 12, the sum must be 312 5 531,441, as it is.

nk l n k l

n

l

n k

k

n!

! !⋅ ⋅ − −⋅ ( )

=

=∑∑

( )!.1

300 We construct all possible series of three groups in the display, with the total number in the three groups

summing to n, the number of cards in the display; for example, for 12 cards, the following series would exist: (12, 0, 0), (11, 1, 0), (10, 2, 0), (10, 1, 1), and so on. Each series can appear, in a permutation, one, three, or six times (when there are three, two, or no repeated values, out of three, respectively); for ex-ample, the series above would appear three, six, six, and three times, respectively. The probability of each series is calculated by the equation above (with n being 12, and k and l representing two out of the three group sizes), multiplied by the number of permutations. Taking the largest value in each series and sum-ming the probability of the related series yields the probability of each MAV-GS (the number of cards in the most abundant group), but only in one dimension (Figure 4A, dashed curve).

To calculate the probability for each MAV-GS when there are four dimensions, we construct a series of largest values for each dimension. Then we calculate the probability of each such series, according to the previously calculated probability of each value, and accumulate this to the probability of the largest value in the series (similar to the calculation for the shortest RT in the horse race model). Now we have the probability vector of each MAV-GS for four dimensions (Figure 4A, solid curve). The graph shows a shift to the right when the number of dimensions is raised, because the probability of having a low value as the most abundant in all dimensions becomes less likely the more dimensions there are. Each probability vector of course sums to 1.

A group of three of the same value cannot be the largest such group, because then there will be a group of at least (12 2 3)/2 5 4.5, meaning of at least five. Although the most abundant value can be a group of four cards, because the cards in a certain dimension can be divided into 4–4–4 groups of values, the probability of this occurring actually approaches zero, since this 4–4–4 division would have to apply to all four dimensions.

Page 28: Cognitive and Perceptual Processes in Visual Recognition

1174 Jacob and HocHstein

Dependence on Most Abundant Value Group Size (MAV-GS)

4 5 6 7 8 9 10 11 120

.1

.2

.3

.4

.5MAV-GS Distribution (Theory)

Pro

bab

ility

1 dimension4 dimensions

A

4 5 6 7 8 9 10 11 120

300

600

900

1,200MAV-GS Distributions (Actual)

Occ

urr

ence

s

MAVMAV with setMAV with detected set

B

4 5 6 7 8 9 10 11 12.5

.6

.7

.8

.9

1Detections vs. Chance Level

Frac

tio

n

Detections from existenceChance level

C

4 5 6 7 8 9 10 11 120

50

100

RT Analysis

MAV-GS

Mea

n R

T (s

ec) No MAV

Only MAVMiss MAVHit MAV (+alternative)

D

Figure 4. Most abundant values (MAVs). (A) Theoretical distribution of MAV group size (MAV-GS) of values, in one dimension only (dashed line) and in four dimensions (solid line); see the text (Box 3) for derivation. With more dimensions present, there are more chances for larger group sizes to appear. (B) Empirical MAV-GS distribution (dotted curve, similar to the theoretical distribution in panel A), distribution for largest groups including a set (as occurred on 2,251 of 2,664 rounds, 85%; dashed curve), and distribution for largest groups includ-ing a detected set (as occurred on 1,619 of the 2,251 rounds above, 72%; solid curve). (C) Mean fraction of rounds over all games in which subjects detected a set among the MAV cards, out of the rounds with such a set (solid curve), as compared with chance level (dashed curve), calculated (for each MAV-GS separately) by averaging the fraction of MAV sets from the total number of sets in the display, over all rounds. Actual findings are somewhat above the chance level, implying preference for detecting sets within the MAV group. (D) Mean response times (RTs) for detecting sets for four cases: when there was a set within the MAV and outside it (solid curves) and the detected one was within (black squares) or outside (gray diamonds); when there was a set only within the MAV (dashed black curve, squares) or only outside it (dashed gray curve, diamonds). Shorter RTs for sets within the MAV suggest that subjects show a preference for these more salient cards. Declining RTs with increases in MAV-GS suggest that MAV salience increases with group size.

Page 29: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1175

MAV group (solid black curve, squares) or outside it (solid gray curve, diamonds). Note that RTs are shorter for sets within the MAV group and that they decrease with increas-ing size, implying that the task becomes somewhat easier as the MAV-GS increases. We interpret these results as deriving from a preference in searching within the MAV group, so that sets there are detected more quickly. There are longer RTs for detecting sets outside the MAV group, not influenced by group size. Presumably, subjects waste time looking for a set within the MAV (so a set is detected

set within the MAV group (dashed curve). Experimental findings follow the chance-level increase with increased MAV-GS but are always somewhat above the chance level, suggesting that subjects indeed look preferentially within the MAV group, no matter what its size.

We also examined mean RT of detecting sets for dif-ferent MAV-GSs, as plotted in Figure 4D. We examined four cases and compared them pairwise. When there is a set within the MAV group and another outside it, we can compare RTs when the detected set is entirely within the

ASupplementary Experiment:

Detection of Most Abundant Value

5 6 7 8 90

5

10

15

20

Occ

urr

ence

s

Distribution of Cases; Average Results

5 6 7 8 90

.2

.4

.6

Normalized Hits Fractions

MAV-GS

Frac

tio

n

PresenceHitsChance

B

C

Figure 5. Supplementary task asking subjects to detect, “What is the most abundant value?” (A) Presentation of values for subject choice, following pre-sentation of regular display of 12 cards (as in Figure 1). Each subject performed 40 rounds of this task. (B) Average results for 4 subjects as a function of most abundant value group size (MAV-GS): Number of correct answers (solid curve), as compared with number of occurrences (dashed curve) and with chance level (dotted curve), taking into consideration that several values can be MAVs in the same display. (C) Correct answers as fraction of distance from chance to number of occurrences of this MAV-GS, calculated by (x 2 x0)/(xmax 2 x0), where x 5 hits, x0 5 chance, and xmax 5 presence. Thus, subjects had a good notion as to which was the MAV and could have used this information in play-ing the real game.

Page 30: Cognitive and Perceptual Processes in Visual Recognition

1176 Jacob and HocHstein

displays. Nevertheless, even though the MAV might not have been declared correctly (in the supplementary experi-ment), this still does not mean that this information was not used (even if unconsciously) for directing and speeding up set detection, as was found above. There is a difference between using information and being able to report it.

Place EffectDo subjects find sets with cards close to each other more

quickly and more often than they find sets with cards that are far from each other? As the total distance between the three cards of a set, we used the sum of the three Euclidian distances between each of the three pairs in the set (i.e., the triangle perimeter), as demonstrated in Figure 6A, in terms of the (vertical or horizontal) unit distance between adja-cent display locations. There are 24 discrete distances.

Figure 6B shows the combinatorial (dashed line) versus the actual detection (solid line) probability of appearance of each distance. They show a very good fit. When they are plotted one versus the other (Figure 6C), the regression line has a slope of 1. We conclude that there was not much effect of the distance between cards (or the frequency of a particu-lar distance). In addition, we find that there was no depen-dence of mean RT on distance (not shown). Taken together, these results suggest that subjects are able to perceive many cards at a single glance or that the order of scanning was hopping from place to place (perhaps on the basis of an at-tribute) and not necessarily to contiguous regions.

Influence of Location in the DisplayAre there favored locations, which are more easily per-

ceived by subjects? We analyzed the influence of a set in-cluding one (or both) of the two central locations. Figure 7A shows the frequency with which subjects actually chose a set with a card in each of the 12 locations (when there was more than one set in the display). As can be seen in the fig-ure, the two central locations were slightly favored.

Figure 7B shows the frequency of choosing a set with a card as a function of the card’s distance from the center of the display. There was a small, although nonsignificant, dependence, so that locations further from the center were favored less. We conclude that only the very central loca-tions are somewhat favored.

Influence of Previous Set Card LocationsWe wished to ensure that the locations of the cards in-

cluded in the previously detected set did not affect trig-gering of the next set. More attention may have been paid to these locations because, here, the cards were replaced with new ones, while others remained from the preceding trial. On the other hand, other cards were already familiar, so that less attention may have been paid to the new cards. On the basis of 2,664 rounds, in .89 of the cases (2,372 rounds), there was a set that included one or more of these replaced card locations. This resembles the theoretical probability of .88, which we derive in Box 4.

Of these 2,372 rounds, in .506 of the cases, there ex-isted at least one additional set, not including any previous location. We will consider only these 1,201 cases in which the subjects had a choice.

outside the MAV only when it “can match the competi-tion” of those inside). The MAV group itself may be more salient when it is larger. These suggestions are reinforced by comparing RTs when there is a set only within the MAV (dashed black curve, squares) or only outside it (dashed gray curve, diamonds). Again, for sets within the MAV group, RT decreases with increases in group size—that is, with MAV salience. As for sets outside the MAV group, RT increases with increases in group size (rather than being flat, like the solid gray curve). This increase may hint at search tactics, implying that the MAV cards distracted the search, by attracting attention to them, and only when a set was not found among them was a further search made.

We conclude that subjects preferentially search for sets within the MAV group, especially when the MAV-GS is large. Detecting a set within the MAV is faster than detect-ing one outside the MAV. Since a MAV group is a group of cards that are similar in a certain value, the larger the MAV-GS, the greater the degree of shared similarity, in the sense that there are more possible triplets with this dimensional similarity. This result therefore confirms the preference for similarity.

Perception of the MAV. In order to determine whether using the MAV is at all a feasible strategy for finding a set, we performed a supplementary experiment, testing the accuracy of detecting and reporting what is the MAV. This experiment was performed only after the subjects had played the regular game, in order not to influence the way in which they would play the game.

Twelve cards were displayed, as in the Set experiment, except that here, the display lasted only 5 sec. The task was to state which value was the most abundant. After the display disappeared, the 12 possible values were presented (as shown in Figure 5A), and the subject chose one, with-out time limitation. After the choice was made, another 12 cards were shown, with all the cards replaced (and not only 3, as in the regular game), so that the abundance of the values was entirely refreshed. Each subject performed 40 rounds of this task.

Average results for the 4 subjects are shown in Fig-ure 5B. For each MAV-GS, we plot the average number of correct answers (solid curve), the average number of oc-currences of this MAV-GS (dashed curve), and the chance level for correct answers (dotted curve), taking into ac-count that several values could be the most abundant in the same display (thereby increasing the chance that one of them would be chosen). The average over subjects of cor-rect answers is 22.25 (56%), nearly double the chance level of performance, which is ~12.5% (5/40) in total. Figure 5C shows correct answers as a fraction of the distance between the other two, by calculating (x 2 x0)/(xmax 2 x0), where x, x0, and xmax are the measured correct answers, chance level, and total occurrences for each MAV-GS, respec-tively. There is a major increase from a MAV-GS of 5 to a MAV-GS of 6, and then it is pretty stable up to a MAV-GS of 9, with a value of ~0.5—that is, halfway between chance and the maximal possible number of correct answers.

Thus, the subjects had some notion of what was the most abundant value in the display, even though they were able to declare what it was in only a bit above half of the

Page 31: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1177

2 2

2

Calculation of Distance Between Cards

A Dependence on Distance Between Set Cards

3 4 5 6 7 8 90

.02

.04

.06

.08

.1

.12

.14

Euclidean Distance Within the Set-Triangle Perimeter

Frac

tio

n

Distribution According to Inner Distance

CombinatorialDetections

B

0 .02 .04 .06 .08 .1 .12 .140

.02

.04

.06

.08

.1

.12

.14

Actual Findings vs. Combinatorial Probability

Combinatorial Probability

Det

ecti

on

Fra

ctio

n

m = 1.04 y = −0.002

C

Figure 6. Effect of distance between cards. (A) Measurement technique. Cal-culation of distance between cards is done by counting as one unit the distance between two adjacent cards and summing all three distances, the triangle pe-rimeter. There are 24 discrete distances, each appearing with a different prob-ability. (B) Combinatorial distribution of distances (dashed lines) and actual distribution of detections (solid lines). only (2,194) cases with at least two sets are considered. (C) Probabilities of detecting each distance versus combinato-rial probabilities of presence of this distance. Note the good fit with a slope of 1, suggesting that there is no influence of distance among set cards on the prob-ability of finding the set.

Page 32: Cognitive and Perceptual Processes in Visual Recognition

1178 Jacob and HocHstein

tive to their availability)—that is, with priority when more than one set was present simultaneously—suggesting that sets of lower classes are detected more easily and that greater perceptual similarity is a factor in set detection.

We found that the larger the number of sets present, the shorter the RTs. There was a good fit between the horse race model predictions and the actual results, suggesting that the model can account for our results. The implication of the success of the horse race model is independence (rather than synergy or interference) of the processes of finding each set when there is more than one present.

The MAV group may have been more salient when it was larger, reflected by decreasing RTs with increasing MAV-GS and by more distraction when there was no set there. There was some evidence of a preference in search-ing within the MAV group, supported by detections from the MAV group above chance level and by shorter RTs for sets within the MAV; presumably, the MAV cards distract the search, by attracting attention to them, so a set is de-tected outside of the MAV only when it “can match the competition” of those inside; or, in cases in which there is no set within the MAV cards, only when a set is not found among them is a further search made. This is the probable search strategy used. In detecting and reporting what was the MAV, the subjects’ responses were halfway between chance and the maximal number of correct answers.

There was not much effect of the distance between the cards, suggesting that the subjects were able to perceive many cards at a single glance or that the order of scan was hopping from place to place, and not necessarily to con-tiguous regions. On the other hand, there was a slight pref-erence for sets including (one of) the two central cards, those in the middle of the display.

There was some triggering by newly placed cards (pre-vious set card locations); in a situation of choice, random behavior would predict a probability of 58% of choosing a set including a previous location, versus an actual occur-rence of 67% of the cases. But this was still much less than it could have been (i.e., up to 100%).

EXPERIMENT 2 Dimensional Salience

If subjects find sets by first identifying similar cards (as is suggested by the results of Experiment 1), we might expect that sets with similarity in a more salient dimen-sion (say, color) will be chosen over similarity in a less salient dimension (e.g., shape). We first ask whether there are more salient dimensions and then how their prefer-ence affects set identification. To this end, we performed a supplementary experiment to determine dimensional preference, in a subject-by-subject manner, expecting that the results may aid in understanding strategies used to de-tect sets, whether intentionally or not.

MethodWe compare dimensional salience, using a graph theory algo-

rithm to determine the ordering of the dimensions, and then examine how this ordering influences set detection. The salience of different dimensions may be compared in a straightforward manner by judg-

Averaging (over the 1,201 rounds) the empiric ratio of the number of sets including a previous location and the total number of sets present leads to .58 as the random choice level of choosing a set with a previous location. In practice, such sets were chosen in .67 of the cases (808 of 1,201 rounds).

In summary, in .89 of the rounds, there was a set in-cluding a previous location, similar to the theoretically expected value of .88. Looking only at rounds including both a set with a previous location and another set without such card, we found that in 67.28% (confidence interval, 67.0%–67.4%) of the cases, the chosen set was one in-cluding a previous location, as compared with a chance level of 58% (SD, 614.16%; SE, 60.41%). We conclude that there was some triggering by the newly placed cards (although not as much as there could have been—i.e., 100% of the 1,201 cases considered).

Conclusions for Experiment 1

Relating to similarity, we found that sets from lower classes were detected more quickly and more often (rela-

.23

.235

.24

.245

.25

.255

.26

.265

Location Impact on Detection PreferenceA

1 sqrt(5) 3 sqrt(13)

.23

.24

.25

.26

.27

Appearance in Detected Sets

Distance From the Center

Frac

tio

n

m = −0.007

B

Figure 7. Influence of location in the display. (A) The fraction of appearances of each location in the detected sets, taking into consideration only the 2,313 rounds with more than one set pres-ent. Note that the fractions sum up to 3 (and not to 1), because of the three cards in a set. Note also that the value of .25 is the chance level. (B) Frequency of appearance of cards in the detected sets as a function of their distance from the center of the display, where distance is measured in units equal to half the vertical or horizontal distance between adjacent cards. R2 5 .3.

Page 33: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1179

requiring a directed acyclic graph (DAG). To fulfill this condition and find the ordering, the out-degree (dout) of the nodes is sorted in descending order, from (d 2 1) to 0. When this forms a DAG, its path indicates the dimensional ordering. We wish to know whether this dimensional salience ordering relates to the detection of sets.

Results and Discussion

We tested 6 subjects following their playing several sessions of the usual Set game. The experimental results show within-subjects preference consistency but different orderings for different subjects. An example for 1 subject is shown in Figure 8B, and the average over all subjects in Figure 8C. Note that different people may regard dif-ferent dimensions as salient, so before averaging, the di-mensions were sorted according to preference for each subject, termed d1 . . . d4. Relative salience can also vary for the same person, without implying inconsistency, if two dimensions are compared on different background values of other dimensions. For example, if the reference shape is filled, the color may be more important than the shape, but if it is empty, it can be the reverse.

To measure the influence of dimensional salience on set detection, the dimensions were sorted according to pre-

ing which of two test cards seems more similar to a reference card, implying that the dimensional change between the reference and the other card is more salient (Medin, 1973). The example of Figure 8A illustrates the method. We display three cards (each with only one element, and all of the same color): a filled oval reference card and empty oval and filled wave test cards, asking which seems more similar to the reference. If the filled wave is declared more similar, filling is more salient (since this is the changed dimension in the other stimulus), and vice versa.

The test was performed for all combinations of dimensions and values. Therefore, the total number of such comparisons is the prod-uct of the number of reference cards, v d, the number of ways of choosing two dimensions, and the number of values in each of these two (leaving out the value of the reference card itself)—that is,

v

dvd ⋅

⋅ − = ⋅

⋅ − =2

1 34

23 12 4 2( ) ( ) 1,944..

The outcome of these comparisons is translated into a fully directed

graph with d nodes, representing the dimensions. Weights (wij) of the directed edges (didj), representing the salience, are assigned by the number of times dimension i is found to be salient over j, normalized to number of comparisons made between them. Then edges for which w . .5 are accepted, as demonstrated in Figure 8B.

We then require that the graph have a path through all nodes (a Hamiltonian path without closure). This condition ensures con-sistency. Because it is a full graph, this requirement is equivalent to

BoX 4 Combinatorics: Probability of at Least one of the Three Cards

at the Replaced Locations Being Included in a Set

Instead of looking at the probability that these locations will be included in a set, we will look at the probability that the cards included in sets will include one of these cards, which is combinatorically the same. The latter is preferred because there are always exactly three previous locations, but if there is more than one set, the number of cards included in a set varies.

If there is one set in the 12-card array, the probability that all of the 3 cards in the set are not in any of the three locations of the newly placed cards is just

p = ⋅ ⋅ =9

12811

710

38. .

If there is more than one set in the array, involving x cards (five or more), the probability of the newly placed cards not overlapping any of these set-including cards is

p x x x= − ⋅ − ⋅ −12

1211

1110

10.

Probability (1 2 p) by number of cards involved in sets (x) is shown in Table 1. Clearly, there is quite a high probability that one of the new cards will be included in an existing set. Of course, this is true for any group of three cards in the display, and players could as well concentrate on any convenient group, and not specifically on the replaced cards.

The empirical distribution P (based on the played games) of the number of cards in the display belong-ing to any present set is shown in the second row of the table.

Looking at the combinatorics and taking the dot product of the two rows of the table (the probability of there being x cards in the sets and their probability of including at least one of the three replaced cards) yields the weighted average probability that one of the present sets includes one of the three replaced card locations or, equivalently, that one of the three new cards is included in a set—namely, .88.

Table 1

Probability of at Least one Card of the Set Being New

3 4 5 6 7 8 9 10 11 12

(1 2 p) .62 .75 .84 .91 .95 .98 .9955 1.0 1.0 1.0P .18 0 .17 .19 .13 .15 .10 .06 .02 .005

Note—(1 2 p), probability that at least one of x cards included in set(s) is among three changed cards. P, empirical probability that x cards in the display were involved in some set.

Page 34: Cognitive and Perceptual Processes in Visual Recognition

1180 Jacob and HocHstein

ferred order for each subject separately, and the weights were averaged, in that order, over all subjects. This allowed analysis for all the subjects at once, even though each had a different ordering—for example, of the ith preferred dimension.

We then compared several parameters of detected and missed sets for each dimension, including the following: number of times there was similarity (or span) in each di-mension, within the sets detected by the subject, in cases in which there was a choice among several sets (Figure 9A); for detected sets with similarity in each dimension, the number of sets present from lower classes (Figure 9B) or from the same class (Figure 9C), where we might expect dimensional salience to overcome preference for lower classes.

If dimensional salience significantly influences set detection, these parameters should vary systematically. However, we found no such monotonic dependence, so we infer that there is no apparent effect of dimension salience on set detection.

Relating the results of the two sections, so far, the three characteristics—class, MAV, and dimensional preference—are different and, as may be expected, play different roles in set detection. Class determines search procedure, and it turns out that finding sets of lower classes is easier than finding those of higher classes, per-haps because there is a natural preference for perceiving similarity. MAV is a characteristic of the cards in the display and, in accord with the preference for similarity, plays a role in the finding of sets (in the MAV group, thus sharing similarity in its value). In contrast to these characteristics, dimensional preference is a personal preference (which we found varies from subject to sub-ject) and has no bearing, on average, on set detection success. In addition, there is usually a conflict between detection based on MAV and that based on dimensional preference, and MAV wins. Thus, it may not be surpris-ing that personal preferences do not determine perfor-mance in the long run.

This division between different levels of influence of different perceptual aspects of the stimulus array may re-flect the special status of the Set game: As was described in the introduction, Set is very basically a perceptual game, in that it depends on perceiving combinations of values among the 12 presented cards. On the other hand, the task of the game is conceptual, in that specific cognitive rules must be followed. Nevertheless, as has been mentioned, conceptual processes may also, in turn, derive from and be influenced by perceptual attributes (Goldstone & Barsalou, 1998). As such, it is natural that different perceptual aspects will have different levels of influence on the processes underlying performance of this complex task.

EXPERIMENT 3 Learning and Generalization

In Experiment 3, we tested the dynamics of learning the Set game. We also asked whether training-induced learn-

B

color

number shape

filling

.72.88 .92

.94

.79

.97

d1

.69.8 .88

.9

.72

.83

C

d2

d4 d3

reference

ASupplementary Experiment:

Dimensional Salience

Figure 8. Dimensional salience and algorithm for determin-ing order of individual dimensional preference. (A) The task. The subjects were shown a reference card and were asked to judge which of two test cards seemed more similar to it. Each test card differed from the reference on one dimension. If a certain card was chosen, this meant that the dimensional change between the reference and the other test card was more salient. In the example shown, if the left card seems more similar to the reference, filling is more salient than shape. (B) Demonstration of resulting directed acyclic graph (DAG) for 1 subject. Nodes represent dimensions, and directed edge weights represent fraction of times that one dimension was salient over the other. The path, indicating dimen-sional ordering, is created by sorting the out-degree (dout) of the nodes in descending order. In this case, salience order was color, filling, shape, and number. (C) Average DAG over all the subjects. Since each subject had an individual preferred order, before av-eraging we sorted the dimensions according to each subject’s pre-ferred order, termed d1 . . . d4. Note that edges to less preferred dimensions have larger numbers (greater preferences).

Page 35: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1181

We tested the average across all 6 subjects who played the generalized game, taking their first three games, the last three before generalization, and those just after the new version was applied (as shown in Figure 11). There was highly significant learning from the first to the third game (one-tailed paired t test over subjects, between the two games, with p , .005); also highly significant is the difference between performance on the first game and that on the third-to-last game played before the sub-jects switched tasks—that is, the (L-2) game (one-tailed paired t test, p , .001). There was no increase in RT when the subjects moved to the new version (one-tailed paired t test yields p 5 .3).

We conclude that the training effect generalized to play-ing the game with new values. This training generalization may have resulted from the fact that the subjects did not yet reach a stabilized “automatic” level (Goldstone, 1998; Treisman, Vieira, & Hayes, 1992)—an assumption sup-ported by the mean RT—and that as long as performance of this task depended on a cognitive process, training gen-eralized to different values.

SUMMARY AND GENERAL DISCUSSIoN

SummaryWe found that when subjects played the Set game, sev-

eral parameters influenced set detection (Experiment 1), including the following: similarity in values (within a

ing would generalize to playing the game with changed stimulus values (see, e.g., Figure 1C). That is, after training improved performance, would changing stimulus values start learning all over again, or would the ability to identify a set be now established for all stimuli? We were interested in whether learning this task is high or low level. If there is generalization and transfer of learning effects when playing with new stimuli, the implication is that learning is a high-level effect, whereas if training is specific to trained stimuli, learning may be a low-level effect (Ahissar & Hochstein, 1997, 2004; Hochstein & Ahissar, 2002).

MethodLearning experiments included three sessions with 9–12 games.

Ten subjects participated in three complete sessions, with 1 or 2 games in the first session, 3–5 in the second, and 3–6 in the third. Following three sessions with the original cards, transfer was tested for 6 of the subjects for cards with shapes changed to a circle, a triangle, and a square and with changed colors, as demonstrated in Figure 1C.

Results and Discussion

The example learning curve in Figure 10 shows mean RT for each class as the games proceed. There is a gradual improvement, seen as a decreasing RT. Again, there are class-dependent characteristics (see above), with more stabilization and lower RTs for lower classes. The arrow points to the time when the different version was applied. There is not much difference in the RTs after this point.

1 2 3 40

200

400

600

Number of Found Sets With Similarity in Each Dimension

Nu

mb

er

1 2 3 40

30

60

90

120

Lower Classes

Nu

mb

er

1 2 3 40

200

400

600

800

Same Class

Dimensional Ordering

Preferred Dimensions

A

B C

Figure 9. Influence of dimensional preference. Dimensions were sorted accord-ing to individual preference, as in Figures 8B and 8C. only cases in which there was a choice among several sets were considered here. (A) Total number of times for all the subjects that the set detected had similarity on each of the four dimen-sions, ordered according to the preference of each subject. (B and C) Number of sets present from lower classes (B) or the same class (C), when the detected set had similarity in each dimension ordered by individual preference.

Page 36: Cognitive and Perceptual Processes in Visual Recognition

1182 Jacob and HocHstein

There was a gradual improvement in speed of play-ing the Set game with experience, with class-dependent characteristics (Experiment 3). Training-induced learning generalized to new versions of the game with new stimuli, suggesting a high-level learning effect.

These results were enabled by “complications” inher-ent in the Set game but not present in other categoriza-tion tasks (see the introduction), such as the possibility for having more than one set present in each display (allow-ing the study of competition among the processes seeking them) and the game’s including a span rule for dimen-sions on which similarity is not found. This forced the subjects to intermix conceptual and perceptual aspects in the search for the rules applying to the current display, or even to each of the sets present in it. It also allowed us to find perceptual influences in this conceptual task. These implications will be discussed below.

An interesting question, regarding perception in gen-eral, is what happens in the brain during the very recogni-tion of a set, at the exact moment of conscious perception, often called the moment of insight (Ahissar & Hochstein, 1997; Bowden & Jung-Beeman, 2003; Rubin, Nakayama, & Shapley, 1997; Smith, Gosselin, & Schyns, 2006). Fu-ture studies using the Set game as an interface may ad-dress this issue.

General DiscussionThe Set game task is complex because it involves both

perceptual and cognitive features (cf. Pomerantz, 2002; Schyns, Bonnar, & Gosselin, 2002). Subjects must per-ceive the values and the relationships among the cards on the basis of four visual dimensions, but they must decide which three of the present cards form a set on the basis of cognitive rules. Improvement may come from improved or faster perception of the dimensional values present, from better understanding and application of the cogni-tive rules, or from a combination of these.

Even though the Set game is not a game of categoriza-tion, it may be of value to compare these two tasks. Re-garding our finding that sets with more similarity were found more often and more rapidly than others, we note that categorization, too, may depend on finding similarities among different elements (Goldstone, 1994). For example, categorization has been seen as the finding of a prototype or group of exemplars and the other objects that are more similar to these than to its competitors (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). This includes similarities along a number of dimensions (referred to as family resemblance when there is similarity along many but not all dimensions; Medin, Wattenmaker, & Hampson, 1987; Regehr & Brooks, 1993; Rosch & Mervis, 1975), although subjects may base their categorization on a sin-gle salient dimension (Ashby, Queller, & Berretty, 1999; Bower & Trabasso, 1963). It is possible that the same basic mechanisms that underlie categorization are also used in playing the Set game. If this is the case, the bias that we found toward lower class sets may reflect the tendency for real-world categories to be organized around similarities, rather than around differences (Ashby & Maddox, 2005; Hammer et al., 2005, 2007, in press; Medin et al., 1987;

dimension); number of existing sets in the display, with RT acting according to the horse race model, implying independence of simultaneous searches; and the MAV and its group size, which was searched preferentially, also confirming the preference for similarity.

We used an algorithm for determining dimensional sa-lience (Experiment 2) on the basis of direct comparisons, using graph theory. The subjects showed a consistent but individual order of preference for dimensions, but this seems not to have affected set identification preference.

1 2 3 . . . L-2 L-1 L G1 G2 G30

10

20

30

40

50

Learning Dynamics: Average Over Subjects

Game Number

Mea

n R

T (s

ec)

New Values

Figure 11. Learning and generalization: Average results for 6 subjects. Mean response times (RTs) for the first and last three games with the original version (regardless of number played), and for the first three games with a different version (gray bars). Note the significant learning from the first to the second game and the lack of increase in RT when the subjects moved to a new version. The SD for RT over games changes in the same manner (not shown).

0 2 4 6 8 10 12 140

20

40

60

80

100

120

140

Learning Dynamics: 1-Subject Example

Game Number

Mea

n R

T (s

ec)

Class 1 Class 2 Class 3 Class 4

New Values

Figure 10. Learning and generalization: Example data for 1 subject. Learning dynamics are shown by mean response time (RT) for each class as the games proceed. The arrow points to the place where the different version (Figure 1C) was used.

Page 37: Cognitive and Perceptual Processes in Visual Recognition

set Recognition 1183

ence—that is, a span. This might seem, at first, to con-tradict the fact that the visual system detects change. The brain specializes in identifying difference. This is why we are so good at detecting an object that differs from its surround. Why, then, are we so good at identifying three similar objects among a mixture of items? This could be because, among such a diverse collection, what is unique is similarity. When the environment is unified, we detect difference; when the environment is diverse and colorful, we detect the few points of similarity within it.

AUTHoR NoTE

This research was supported by grants from the U.S.–Israel Binational Science Foundation (BSF) and the Israel Science Foundation (ISF). We thank Anne Treisman for inspiring discussions during the course of this study. We thank Daphna Weinshall and Leon Deouell for constructive sup-port and Ron Katz for helpful conversations and review. We thank reviewer James Pomerantz for pointing out the double meaning of the term similar-ity. We thank him and other (anonymous) reviewers for useful comments. Correspondence concerning this article should be addressed to M. Jacob, Department of Neurobiology, Institute of Life Sciences, Hebrew Univer-sity, Jerusalem 91904, Israel (e-mail: [email protected]).

REFERENCES

Ahissar, M., & Hochstein, S. (1997). Task difficulty and the specific-ity of perceptual learning. Nature, 387, 401-406.

Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptual learning. Trends in Cognitive Sciences, 8, 457-464.

Ashby, G. F., & Maddox, T. W. (2005). Human category learning. An-nual Review of Psychology, 56, 149-178.

Ashby, G. F., Queller, S., & Berretty, P. M. (1999). On the domi-nance of unidimensional rules in unsupervised categorization. Percep-tion & Psychophysics, 61, 1178-1199.

Berg, E. A. (1948). A simple objective technique for measuring flex-ibility in thinking. Journal of General Psychology, 39, 15-22.

Bowden, E. M., & Jung-Beeman, M. (2003). Aha! Insight experi-ence correlates with solution activation in the right hemisphere. Psy-chonomic Bulletin & Review, 10, 730-737.

Bower, G., & Trabasso, T. (1963). Reversals prior to solution in concept identification. Journal of Experimental Psychology, 66, 409-418.

Carr, T. H. (1992). Automaticity and cognitive anatomy: Is word recog-nition “automatic”? American Journal of Psychology, 105, 201-237.

Corballis, M. C. (1998). Interhemispheric neural summation in the absence of the corpus callosum. Brain, 121, 1795-1807.

Egeth, H. E., & Mordkoff, J. T. (1991). Redundancy gain revisited: Ev-idence for parallel processing of separable dimensions. In G. R. Lock-head & J. R. Pomerantz (Eds.), The perception of structure: Essays in honor of Wendell R. Garner (pp. 131-143). Washington, DC: American Psychological Association.

Garner, W. R., & Lee, W. (1962). An analysis of redundancy in percep-tual discrimination. Perceptual & Motor Skills, 15, 367-388.

Goldstone, R. L. (1994). The role of similarity in categorization: Pro-viding a groundwork. Cognition, 52, 125-157.

Goldstone, R. L. (1998). Perceptual learning. Annual Review of Psy-chology, 49, 585-612.

Goldstone, R. L., & Barsalou, L. W. (1998). Reuniting perception and conception. Cognition, 65, 231-262.

Graham, N. V. S. (1989). Visual pattern analyzers. New York: Oxford University Press.

Grant, D. A., & Berg, E. (1948). A behavioral analysis of degree of re-inforcement and ease of shifting to new responses in a Weigl-type card sorting problem. Journal of Experimental Psychology, 38, 404-411.

Hammer, R., Diesendruck, G., Weinshall, D., & Hochstein, S. (2008). The development of category learning strategies: What makes the difference? Manuscript submitted for publication.

Hammer, R., Hertz, T., Hochstein, S., & Weinshall, D. (2005). Category learning from equivalence constraints. Cognitive Sciences, 27(Suppl.), 893-898.

Rosch et al., 1976). Indeed, Hammer and colleagues found a preference for using common feature regularities, rather than distinctive feature irregularities, supporting the con-clusion that similarity is more beneficial and, ultimately, more natural than is dissimilarity for use in categorization. Similarity perception also develops earlier than learning from differences (Hammer et al., 2008). A between-object similarity-detecting mechanism developed for categoriza-tion may serve set detection as well. Such a mechanism may involve, for example, mutual enhancement between local feature detectors. (We built a neural network model containing a mechanism working on this principle, and it, too, found mainly low-class sets.) We conclude that the preference for finding sets with more similarity provides supporting evidence for the presence of a similarity- detecting mechanism.

Returning to the issue of the perceptual and cognitive aspects of playing the Set game, we now ask what role perception plays in this cognitive task. Is the task purely conceptual, or does it contain a perceptual element? There are at least two factors that suggest that perception does play an important role in set recognition, as follows. We found that seemingly inconsequential perceptual features influenced which sets were detected (when more than one set was present) and the speed of detecting a set (even when there was only one). For example, there was an im-portant dependence on the MAV among the 12 possible values (3 per dimension for four dimensions) in the dis-play. The second factor that indicates a perceptual element in the cognitive task of set detection relates to the issue of similarity versus span. Finding a set depends on detect-ing a mixture of both the more perceptual similarity and the more conceptual span. Although the rule of similarity can be seen as just as cognitive a rule as the rule underly-ing a span, nevertheless, only similarity detection can be seen also as a basic, immediate, and automatic percep-tual mechanism. This may be the source of the bias that we found for detecting sets with more similarity, rather than with more span. (In fact there would be an inherent advantage to detecting spans, since on average, there is much more spanning than similarity in the possible sets.) Thus, the present results confirm that even in a task such as Set, which inherently depends on both similarity and span detection—and thus, on conceptual processes—the overwhelming influence of the perceptual determines that priority will be given to sets with greater similarity.

The experimental result that subjects find sets of lower classes preferentially and more quickly suggests that peo-ple may have built-in mechanisms for finding similarities, but not for finding spans—that is, groups of items that do not include two that are similar along the relevant di-mension. In this case, finding a span may be a cognitive, analytic, or abstract reasoning task. Our result that even in this case, subjects are sensitive to perceptual features of an array (such as the MAV) suggests that perceptual and cog-nitive functions may not be totally separate (see Goldstone & Barsalou, 1998; Landy & Goldstone, 2007).

Overall, one of our major findings is that people per-ceive similarity within a dimension better than differ-

Page 38: Cognitive and Perceptual Processes in Visual Recognition

1184 Jacob and HocHstein

Raab, D. H. (1962). Statistical facilitation of simple reaction times. Transactions of the New York Academy of Sciences, 24, 574-590.

Regehr, G., & Brooks, L. R. (1993). Perceptual manifestations of an analytic structure: The priority of holistic individuation. Journal of Experimental Psychology: General, 122, 92-114.

Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 573-605.

Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psy-chology, 8, 382-439.

Rubin, N., Nakayama, K., & Shapley, R. (1997). Abrupt learning and retinal size specificity in illusory contour perception. Current Biol-ogy, 7, 461-467.

Schyns, P. G., Bonnar, L., & Gosselin, F. (2002). Show me the fea-tures! Understanding recognition from the use of visual information. Psychological Science, 13, 402-409.

Schyns, P. G., & Oliva, A. (1994). From blobs to boundary edges: Evi-dence for time and spatial scale dependent scene recognition. Psycho-logical Science, 5, 195-200.

Smith, M. L., Gosselin, F., & Schyns, P. G. (2006). Perceptual mo-ments of conscious visual experience inferred from oscillatory brain activity. Proceedings of the National Academy of Sciences, 103, 5626-5631.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643-662.

Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of el-ementary psychological processes. Cambridge: Cambridge Univer-sity Press.

Treisman, A., Vieira, A., & Hayes, A. (1992). Automaticity and pre-attentive processing. American Journal of Psychology, 105, 341-362.

Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.

Ullman, S. (1984). Visual routines. Cognition, 18, 97-159.

(Manuscript received April 10, 2007; revision accepted for publication March 17, 2008.)

Hammer, R., Hertz, T., Hochstein, S., & Weinshall, D. (2007). Clas-sification with positive and negative equivalence constraints: Theory, computation and human experiments. In F. Mele, G. Ramella, S. San-tillo, & F. Ventriglia (Eds.), Brain, vision, and artificial intelligence (pp. 264-276). Berlin: Springer.

Hammer, R., Hertz, T., Hochstein, S., & Weinshall, D. (in press). Cat-egory learning from equivalence constraints. Cognitive Processing.

Heaton, R. K., Chelune, G. J., Talley, J. L., Kay, G. C., & Cur-tiss, G. (1993). Wisconsin Card Sorting Test manual, revised and ex-panded. Odessa, FL: Psychological Assessment Resources.

Hochstein, S., & Ahissar, M. (2002). View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron, 36, 791-804.

Holyoak, K. J. (1990). Problem solving. In D. N. Osherson & E. E. Smith (Eds.), Thinking: An invitation to cognitive science (Vol. 3, pp. 117-146). Cambridge, MA: MIT Press.

Koffka, K. (1935). Principles of Gestalt psychology. New York: Har-court Brace.

Köhler, W. (1929). Gestalt psychology. New York: Liversight.Landy, D., & Goldstone, R. L. (2007). How abstract is symbolic

thought? Journal of Experimental Psychology: Learning, Memory, & Cognition, 33, 720-733.

Medin, D. L. (1973). Measuring and training dimensional preferences. Child Development, 44, 359-362.

Medin, D. L., Wattenmaker, W. D., & Hampson, S. E. (1987). Fam-ily resemblance, conceptual cohesiveness, and category construction. Cognitive Psychology, 19, 242-279.

Miller, J. (1982). Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology, 14, 247-279.

Miller, J. (1986). Timecourse of coactivation in bimodal divided atten-tion. Perception & Psychophysics, 40, 331-343.

Monnier, P. (2006). Detection of multidimensional targets in visual search. Vision Research, 46, 4083-4090.

Oliva, A., & Torralba, A. (2006). Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research, 155, 23-36.

Pomerantz, J. R. (2002). Perception: Overview. In R. A. Wilson & F. C. Keil (Eds.), Encyclopedia of cognitive science (pp. 527-537). London: Nature Publishing.

Page 39: Cognitive and Perceptual Processes in Visual Recognition

- 34 -

Chapter II

Comparing Eye Movements to Detected vs. Undetected Target Stimuli

in an Identity Search Task

Jacob, M. & Hochstein, S. (2009). Journal of Vision, 9(5):20, 1-16

Page 40: Cognitive and Perceptual Processes in Visual Recognition

Comparing eye movements to detected vs. undetectedtarget stimuli in an Identity Search task

Department of Neurobiology, Institute of Life Sciences,and Interdisciplinary Center for Neural Computation,

The Hebrew University, Jerusalem, IsraelMichal Jacob

Department of Neurobiology, Institute of Life Sciences,and Interdisciplinary Center for Neural Computation,

The Hebrew University, Jerusalem, IsraelShaul Hochstein

Why do we perceive some elements in a visual scene, while others remain undetected? To learn about the sequence ofevents leading to detection, we directly compared fixations on detected vs. undetected items. Our novel Identity Search taskdisplay comprised twelve cards, all different except for two pairs of identical cards. Participants search for one pair. Taskproperties allow us to monitor fixations on distinct card regions and study search dynamics. We find that detected pair cardswere fixated more often and for longer times than undetected pair cards. Within the search sequence, there are fewerintervening fixations between detected than undetected pair cards. Only at an advanced stage of the search do fixations onpair cards become closer. We suggest that both the absolute number of fixations and their temporal proximity influencedetection. In the dynamics of search, a bifurcation point is observed, when these differential characteristics begin. Analysisof the break point in the sequence of fixations on to-be-detected cards suggests that there is an earlyVperhapsunconsciousVrecognition stage, followed by more fixations and only later by detection. We suggest that several targetfixations are needed for processing visual information to achieve recognition.

Keywords: eye movements, detection, Identity Search task, fixations, visual awareness, search dynamics,fixation sequence, recognition

Citation: Jacob, M., & Hochstein, S. (2009). Comparing eye movements to detected vs. undetected target stimuli in anIdentity Search task. Journal of Vision, 9(5):20, 1–16, http://journalofvision.org/9/5/20/, doi:10.1167/9.5.20.

Introduction

When viewing a natural scene, there are some elementsthat we consciously perceive, while others escape ournotice. That we do not perceive all the details in a scenehas been eminently demonstrated by the phenomena ofchange blindness (Rensink, O’Regan, & Clark, 1997;Simons & Levin, 1997), repetition blindness (Kanwisher,1987, 1991), and the attentional blink (Potter, Chun,Banks, & Muckenhoupt, 1998; Raymond, Shapiro, &Arnell, 1992; Shapiro, Raymond, & Arnell, 1994). Apossible source for this conscious/non-conscious percep-tion dichotomy has been detailed in Reverse HierarchyTheory (Hochstein & Ahissar, 2002; see also Ahissar &Hochstein, 1997, 2004). Some elements in a scene attractwhat has been called exogenous attention (Jonides, 1981;Muller & Rabbitt, 1989; Posner, 1980), leading to suchphenomena as feature search pop-out (Treisman & Gelade,1980). Attention may be overt (with the eyes fixating theattended element) or covert (without such eye movements).Similarly, even with attention, detection may be consciousand reportable or unconscious, leading to positive andnegative priming effects without observer awareness(DeSchepper & Treisman, 1996; Treisman, 2006).

It is well known that eye movements reflect cognitiveprocesses (Henderson & Hollingworth, 1999; Liversedge& Findlay, 2000; Rayner, 1998; Ringach, Hawken, &Shapley, 1996; Stone, Miles, & Banks, 2003). Fixationlocates a particular part of the visual scene on the fovea,and this input, including approximately a radius of 2- ofvisual field (Anstis, 1974; Riggs, 1965), undergoes moremajor processing. Thus, the sequence of fixations, whichis controlled by high cortical levels (Bruce, Goldberg,Bushnell, & Stanton, 1985; Chen & Zelinsky, 2006;Schall, 1991), also affects the subsequent high-levelprocessing that will occur in our brains.Nevertheless, much remains unknown concerning the

dynamics of eye movements and the causal relationshipbetween eye fixations and perceptual cognition. How dofixations influence perception and how does perceptioninfluence fixations?Classical studies concluded that more informative scene

regions receive more fixations (Antes, 1974; Buswell, 1935;Loftus & Mackworth, 1978; Mackworth & Morandi, 1967;Yarbus, 1967), but does the visual system know whichregions are informative before boosting fixations uponthem? In contrast, our (non-natural scene) displays con-tained two target pairs, which were equally informative.We do not compare local visual features of objects or the

Journal of Vision (2009) 9(5):20, 1–16 http://journalofvision.org/9/5/20/ 1

doi: 10 .1167 /9 .5 .20 Received January 15, 2008; published May 19, 2009 ISSN 1534-7362 * ARVO

Page 41: Cognitive and Perceptual Processes in Visual Recognition

semantics of the objects in the scene. Rather, we examinehow two stimuli compete for perceptual and cognitiveprocessing, and how fixations influenceVand especiallyhow they are influenced byVthis processing.In contrast to change blindness, which deals with

detection over time, we devised a task that deals withsearch and detection over space. We ask what determineswhich elements will be foveated. In particular, whenobservers perform a search task requiring consciousperception and comparison among a number of targetelements, what will be their sequence of saccades andfixations on the different elements within the scene? Willthere be a phase of concentrated fixations on the targetsthat will ultimately be foundVeven before they have been(consciously) found? When there is more than one targetin a scene, what determines which target is found andwhich remains unknown? Does conscious target detectioncome before or after concentrated target fixation? Whatrole do fixations play in the search process? In particular,what is the relationship between repeated fixations on thesame scene region and limited Working Memorycapacity? What in the sequence of fixations reflects orinfluences ultimate conscious perception?We tracked eye movements to study the sequence of

perceptual events leading to conscious recognition anddetection. Similar studies have been performed with thechange-blindness paradigm (Droll, Gigone, & Hayhoe,

2007), investigating which changes in a scene are perceivedand what is the sequence of eye movements that precedechange detection. As mentioned, unlike the change-detection task, in our study, comparisons are made acrossspace rather than over time. Furthermore, we include twotarget pairs in each task display and study perceptual effectsby comparing fixations on detected vs. undetected stimuli,in the same display. For this purpose, we use a novel taskthat we introduce hereVthe Identity Search task.The Identity Search task display contains computer screen

“cards”, each with a square array of scrambled black andwhite square units. The subjects’ task was to find two exactlyidentical cards. Try to find such a pair in Figure 1. You mayhave noticed that there are two such pairs in this example.This is not by chanceVit is exactly the point. The displays arespecially designed to suit our purpose of differentiatingbetween detected and undetected target pairs. Within eachdisplay, all cards are different, except for exactly two pairs ofidentical cards. The subjects’ task was to find one identicalpair in each displayVnot being informed that two target pairswere present. Thus, in all cases one pair is detected and one isnot, allowing us to compare properties of the detected and theundetected pairs and the pattern of eye movements to each.When a pair is found and marked, the entire display is

replaced for the following trial. The number of cards,array size, and number of black square units on each cardare parameters under experimenter control but, in our

Figure 1. The Identity Search task. A number of “cards” are shown to the subject, whose task is to find two identical cards. For ourexperiments, we used the version shown, including 12 cards, eachmarked with a 4� 4 scrambled array of black and white squares. Can youfind two identical cards? Actually, there are two such pairs in the demonstrated figure, as there were in all displays in the experiment. (Here,card numbers 4 and 9 and card numbers 3 and 8 are each an identical pair; card numbering starts from the top left and goes row by row.)

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 2

Page 42: Cognitive and Perceptual Processes in Visual Recognition

experiments, were always set at 12 cards, 16 units per card(in a 4 � 4 array), and half of the units on each card wereblack, and half were white.The Identity Search task includes the following charac-

teristics that are important for the goals of our study:

1. Each display includes one detected and one unde-tected target pair for comparison.

2. Displays are divided into distinct search and eye-fixation regions (the different cards).

3. Identical card pairs do not pop out. Rather, the pre-recognition process requires several seconds of search(allowing us to follow the dynamics of eye movementpatterns during the course of the search process).

4. The search process culminates in a momentary(though not immediate) recognition (an insight or“aha” experience; see Ahissar & Hochstein, 1997;Rubin, Nakayama, & Shapley, 1997), abruptlyending the search process.

5. An enormous number of novel displays may becreated, allowing us to repeat the task with a newsearch and a new “aha” experience each time.Elaborating, there are

16

8

� �¼ 12; 870; ð1Þ

combinations for each card (denoted as x), therefore

k11

i¼0

ðxj iÞ , 2 I 1049; ð2Þ

combinations of different displays.6. The task is easy to master, so that learning is rapid

(requiring less than an hour of practice with thetask), and performance stabilizes to a steady level.

7. Task complexity may be controlled (by varying thenumber of cards and the size of the array on each),to reach a desired average detection time. (For thepresent experiment, this determination was done in apreliminary pilot experiment and we report hereresults with a constant set of parameters.)

We previously studied performance on a computerizedversion of the Set game (Set Enterprises Inc.; Jacob &Hochstein, 2008). This game also has many of thecharacteristics listed above. The Identity Search taskpreserves the benefits of the Set game and has theadvantage that it is less complex and more easily learned.This new task was devised based on the finding that Setsearch depends on similarity detection.We examine fixations on the detected pair compared

to those on the undetected pair in the same display.Regarding the number of fixations, one could expect oneof two effects of cards belonging to an ultimately detectedpair: either there will be fewer fixations on the detected

pair, or more fixations on the detected pair (see alsoNodine, Carmody, & Kundel, 1978). For each effect,several scenarios can explain the result. For example, ifdetection is a result of an inherent property of some of thepairs, then those pairs would be detected immediatelyupon being observed, and the answer will be fewerfixations on the ultimately detected pair cards. If, on theother hand, detection of a target requires several stagesuntil culminating in explicit recognition, then repeatedobservations might take place in earlier stages of theperceptual process, and the answer will be more fixationson the ultimately detected pair cards. Our current studyaims at differentiating between these possible scenarios.

Methods

Display images

We implemented the Identity Search task with a MatlabGUIDE (Graphical User Interface Design Environment).Each display had 12 cards (3 rows by 4 columns), eachrepresented as a click-able button, with an image of ascrambled 4 � 4 square array of black and white squareunits on each card. Half (8) of these squares were blackand half were white. The background between the cardswas gray (RGB: 153, 153, 153). Within each display,all cards are different, except for exactly two pairs ofidentical cards (each pair is unique and different from theother pair), located randomly. Each card occupied 2.7- �2.7- of visual field. The space between the cards was2.7- vertically and 2.9- horizontally.

The task

Subjects were instructed to mark (by mouse click) twoidentical cards. They were asked to mark these cards asquickly as possible but to try to avoid mistakes. They werenot informed that two such pairs were present. Duringeach trial, if a player changed his or her mind aftermarking one card, they could “unmark” the card by asecond mouse click on it. Upon correct designation of apair, a “continue” button appeared if it was a valididentical pair; otherwise a message “try again” appearedon the top of the screen (and the Response Time clockcontinued running until correct identification). Uponpressing the “continue” button, the entire display wasreplaced for the following trial. Response Time (RT) wasmeasured from pressing the “continue” button untilchoosing the second card in the pair; subjects wereinformed of this timing procedure.Each subject performed the task first without eye-

movement tracking for a training session of 100 trials(with a pseudorandom distribution of 1–5 card pairs).Then their performance was measured with eye tracking

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 3

Page 43: Cognitive and Perceptual Processes in Visual Recognition

in a single session of 50 trials (with 2 pairs per display;each trial lasted 8.3 s, on average).

Participants

Eight university students participated in the full courseof 100 training trials followed by 50 trials with trackedeye movements (two subjects had also participated inother variations of the game, as a part of a pilot study).Thus, we gathered for our analysis a total of 400 trialswith eye movements. Subjects were remunerated forparticipation. Sessions lasted about an hour, includingeye-tracker adjustment, calibration, and validation. In only3 of the 400 trials, subjects marked 2 wrong cards andwere informed that they do not form a pair; in another6 trials, they marked one card and then “unmarked” it.

Equipment

We used the Eyelink eye tracker (SR Research Ltd.,Ontario, Canada), based on two infrared light-emittingdiodes (IR-LEDs) in front of each eye and a 250-Hzcamera that records the LED reflections of the corneas.Similarly, head-movement compensation is done by thesame principle: 4 IR-LEDs at the display monitor cornersare detected by a cyclopean camera. Pupil position issampled once every 4 ms, for each eye. Subjects sat 80 cmfrom a Samsung SyncMaster 19W (18W viewable) CRTmonitor, with 4:3 format, and 800 � 600 pixel resolution.Thus, the foveal visual field of 2- occupies 2.8 cm or67 monitor pixels. The monitor was surrounded by a blackscreen.Calibration was done using the built-in 9-point calibra-

tion grid and was followed by the validation. Wecorrected for equipment drift by verifying gaze pointsevery 10 trials, using this verification to correct precedingand following trials.The EyeLink was run with Matlab Psychophysics and

Eyelink Toolboxes (Brainard, 1997; Cornelissen, Peters,& Palmer, 2002). Analysis of fixations and saccades wasdone with the EyeLink program.

Analysis

The relevant data for this study were the eye fixations.Subjects mark the first pair that they find, resulting in adetected target pair and one that is left undetected. Wecompare fixations on this detected pair and those on theundetected pair. Of course there were also many fixationson the other 8 cards, but these are all “dead-ends” thatcannot lead to pair detection.We recorded from both eyes, but the analysis was

performed according to the dominant eye, which haspriority in visual processing (Shneor & Hochstein, 2006).

The dominant eye was determined by the Porta Test(a sighting test in which observers position a near stimulus,such as a finger, so that it appears collinear with a distantstimulus; Porta, 1953).For eye-position analysis, the display was divided into

12 equal regions, each including one of the cards and itssurrounding region (up to the border halfway between itand the adjacent card or the screen edge). Card regions are

Figure 2. Two examples of eye-movement records during searchfor an identical pair of cards. Identical card pairs are indicated bydifferent colors: the eventually detected pair in red, the undetectedpair in blue. (a) In this example, there were 7 fixations on thedetected pair and 3 on the undetected, out of a total of 31 fixations(see Figure 6; excluding from the count the final period when thesubject marked the cards, which was excluded from the analysesand appears here in a different color). The sequential number ofeach fixation is indicated in the circle surrounding the fixationcenter. (b) Five fixations on detected pair; 2 fixations onundetected pair; a total of 17 fixations (see Figure 7). Theduration of each fixation is indicated near the fixation point.

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 4

Page 44: Cognitive and Perceptual Processes in Visual Recognition

176 � 170 pixels (È5.7- � 5.5- of visual field). For eachpair (detected and undetected), we analyzed the totalnumber of fixations (on the 2 card regions of the pair), thetotal viewing time of the two card regions, and thesequence of fixationsVthe sequential distance betweenfixations on the 2 card regions of the pair. We excludedfrom these and all following analyses the period duringwhich subjects marked the cards, so that this period doesnot bias the results, i.e., all fixations ending after the firstclick were excluded. Note that those fixations are shownin Figure 2 in order to provide a complete picture, but theywere not included in the analyses in any way. We studiedthe means of each parameter as reflecting global resultsover all trials and also analyzed temporal variations of theparameters to determine search dynamics within the trial.Trials with exceptionally long RTs (8) were discarded

(setting a bound at mean RT + 2 times the global SD). Wealso disregarded (7) trials with too few successfullyregistered fixations, which could have resulted fromblinking by the subject or incorrect equipment readings(e.g., fixations beyond the monitor range) perhaps dueto extreme subject changes of head or body postures;385 trials remained. From these, we excluded trials withoutany fixations on the undetected pair, leaving 294 trials forfull analysis. When two successive fixations were to thesame card region, we combined them and counted them as asingle fixation with duration equal to the sum of theirdurations. While there may be good reasons for not makingthe above two data analysis choices, we took this routesince it is the more conservative; the trends we find wouldbe enhanced by making the opposite choices.

Results

Mean Response Time (RT) for pair detection and mouseclick marking for the 294 trials was 8.7 T 0.3 s (mean T SE);range 2.2–24.7 s; median 7.3 s.Figure 2 demonstrates eye-movement records of two

different trials. They are typical in that, in each case, thereare fixations on various cards including on the ultimatelydetected pair of identical cards (framed in red) and thealternative target pair that was not detected (framed inblue), followed by the subject marking a pair of targetcards (fixations during the marking period are shown ingreen). What can we learn about the detection processfrom the eye-movement records?

Total number of fixations on pair cards

We compare the total number of fixations on the 2 paircards, for detected vs. undetected pairs, using only the294 trials where there was at least one fixation also on anundetected pair card. The overall number of fixations on

the detected pair cards was greater than that on theundetected pair cards. Average results for all 294 trialswith at least one fixation on both pairs (combining suc-cessive fixations to the same regions) were 4.6 (SE: 0.16)fixations on the detected pair cards and 3.2 (SE: 0.13) onthe undetected pair cards. A two-way ANOVA withdetected/undetected and subject as main factors showedsignificance for both: detected/undetected: F = 53.9,p G 0.001; subjects (as random factor): F = 4.66, p G 0.05;interaction effect insignificant: F = 0.8, p = 0.59. Thus,there were significantly more fixations on the detected paircards than on the undetected pair cards, and thispreference is consistent across subjects.Comparing the number of fixations on a trial-by-trial

basis also showed that there were generally more fixationson the detected pair cards than on the undetected pairs foreach subject, and for the entire range of 50 trials for eachsubject, as follows: Figure 3a shows a scatter plot bysubject comparing average number of fixations on thedetected pair and on the undetected pair. The points forevery subject fall above the diagonal line of equality. (Theregression line anchored to the origin is y = 1.44x; 95%confidence intervals for the slope are 1.33–1.55.) Figure 3bshows that the same is true on a trial-by-trial basis for allsubjects (y = 1.21x, C.I. 1.15–1.28). (Note that the numberof fixations cannot be zero since we include only trialswith at least one fixation on each pair.)There were more trials (68.4%) with a greater number

of fixations on the detected pair cards than trials with agreater number of fixations on the undetected cards (15%),as shown in Figure 3c. Here we plot the normalizeddifference between the numbers of fixations on the twopairs, i.e., the difference between number of fixations onthe detected and undetected pairs, divided by their sum.Red dots represent trials of more fixations on the detectedpair and blue dots represent trials of more fixations on theundetected pair. There are more red dots than blue dotsand the red dots are further from the zero line of equality(average normalized difference +0.34 vs. j0.24). Asshown in the figure, in 68% of the trials there were morefixations on the detected pair cards; in 15% of the trialsthere were more fixations on the undetected pair cards;and in the remaining (17%) there were equal numbers offixations on the two pairs.We conclude that between the two possibilities of fewer

or more fixations on the to-be-detected cards, we find thatthere were significantly more fixations on the detectedcards.

Total viewing time of pair cards

In addition to the above comparison regarding thenumber of fixations per pair, we also compared the totalviewing time (that is, the total duration of all fixations) onthe detected and undetected pair cards. The rationale forperforming this comparison in addition to the previous

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 5

Page 45: Cognitive and Perceptual Processes in Visual Recognition

one is that fixation durations vary and may independentlyshow a preference for the detected or undetected paircards.Average results for all 8 subjects and 294 trials were

total viewing time of 1,361 ms (SE: 50) on the detectedpair cards and 831 ms (SE: 38) on the undetected pair. Atwo-way ANOVA with detected/undetected and subject asmain factors showed significance for both: detected/undetected: F = 44.3, p G 0.001; subjects: F = 3.86,p G 0.05; interaction effect insignificant: F = 1.76, p = 0.09.Thus, there was significantly longer total viewing time onthe detected pair cards than on the undetected pair cards,and this preference is consistent across subjects.

Comparing the total viewing time on a trial-by-trialbasis also showed that there were generally more fixationson the detected pair cards than on the undetected pairs foreach subject and for the entire range of 50 trials for eachsubject. Figure 4a shows a scatter plot by subjectcomparing average total viewing time on the detectedpair and on the undetected pair. The points for everysubject fall above the diagonal of equality. (The origin-anchored regression line is y = 1.66x; 95% C.I. 1.47–1.85.) Figure 4b shows that the same is true on a trial-by-trial basis for all subjects (y = 1.3x; C.I. 1.21–1.39).Figure 4c shows for each trial the normalized difference

between total viewing time on detected and undetected

Figure 3. Analysis of the number of fixations on detected vs. undetected pair cards. The top row (a, b, c) presents data for the entire trialsof each subject, while the second and third rows (d, e, f and g, h, i) present data separately for the first and second halves of each trial,respectively. (a, d, g) Scatter plots by subject comparing average number of fixations on the detected pair and on the undetected pair. Thepoints for every subject fall above the diagonal of equality, both for the total data (a) and especially for the second half of each trial (g).(b, e, h) Color-scaled plots comparing trial-by-trial (294 trials) average number of fixations on the detected and undetected pairs. Again,most points lie above the line of equality for the total data and especially for the second half of each trial. (c, f, i) Normalized differencebetween numbers of fixations on the two pairs on a trial-by-trial basis. Red dots represent trials of more fixations on the detected pairand blue dots represent trials of more fixations on the undetected pair. Red–blue colors are used consistently also in following figuresfor detected and undetected pairs, respectively. Note that there are more red dots than blue dots. Furthermore, the red dots are fartherfrom the zero line of equality. The first half of each search does not show a difference between detected and undetected pairs. Theeffect of more fixations on detected pair cards is seen in the second half of each search.

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 6

Page 46: Cognitive and Perceptual Processes in Visual Recognition

pair cards; there are many more trials of longer totalviewing time on detected pair cards than on undetectedpair cards (more red than blue dots; 82.65% vs. 17%) andthe red dots are further from the zero line of equality(average normalized difference +0.36 vs. j0.19).Thus, total viewing time was also consistently longer

for the detected pair.

Average fixation duration

The average duration per fixation on detected pair cardswas 305 ms (SE: 5.6), and the average duration perfixation on undetected pair cards was 260 ms (SE: 5.3;over the 294 trials with fixations on both the detected andundetected pairs, combining successive fixations to thesame regions and accumulating their duration). A two-wayANOVA with detected/undetected and subject as main

factors showed significance for both: detected/undetected:F = 23.02, p G 0.005; subjects: F = 5.28, p G 0.05;interaction effect insignificant: F = 1.67, p = 0.11,confirming that this difference was consistent acrosssubjects. We may therefore conclude that the differencein total viewing time reflects not only the difference innumber of fixations but also a difference in individualfixation durations.

Search sequence (intervals between fixationson pair cards)

Does the proximity of fixations on the two cards of thepair influence pair detection and/or is it influenced by(prior) perception? We looked at the sequence of fixationsand analyzed the sequential distance (i.e., number of in-

Figure 4. Analysis of total viewing time on detected vs. undetected pair cards. Data are presented in the same format as in Figure 3, withcolumns presenting scatter plots by subject comparing average total viewing times on the detected pair vs. on the undetected pair (left;a, d, g); scatter plots comparing trial-by-trial total viewing times on the detected and undetected pairs (middle; b, e, h); and the normalizeddifference between total viewing time on the two pairs, on a trial-by-trial basis (right; c, f, i). The top, middle, and bottom rows present datafor the entire trials of each subject, for the first and second halves of each trial, respectively. The points for every subject fall above thediagonal of equality, both for the total data and especially for the second half of each session. The effect of longer viewing time ondetected pair cards is seen in the second half of each search.

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 7

Page 47: Cognitive and Perceptual Processes in Visual Recognition

tervening fixation steps between fixations on the cards ofthe pair). An illustration of part of a search sequence,showing successive fixations to the different card regions,is presented schematically in Figure 5a. Note that after wecombined successive fixations to the same card region,a sequential distance of 1 must reflect fixations on 2 differentpair cards. This is not the case for fixations that are 2 ormore steps apart, which can be on the same card region.We ask if sequential distance is a differentiating factorbetween detected and undetected pairs.We measured the occurrences of each sequential

distance, both for detected and undetected pair cards overall 208 trials (with at least 2 fixations on each pair, so thatat least one interval is obtained). Division by the numberof trials yields the results shown in Figure 5b, the averagenumber of times that each sequential distance appearedper trial. Surprisingly, none of the averages reaches even1, i.e., no sequential distance appeared in every trial. Notethat summing the averages for each pair type (detected orundetected) gives the average number of fixations on thattype, minus 1, because we are counting the intervalsbetween fixations.

The results show more occurrences with smallersequential distances for the detected pair cards. That is,even though there are generally more fixations on thedetected pair, all the probabilities do not rise equally;rather, there is an increase only for smaller sequentialdistances. This difference is also expressed by the meansof the distributions, indicated by arrows in Figure 5b: asequential distance of 4.1 between fixations on detectedpair cards and a sequential distance of 4.8 for undetectedpair cards. These are significantly different: t-test overtrials: p G 0.001. This result suggests that detected paircards are not only viewed more often, and for longerdurations, but they are also viewed in closer proximity, atleast in parts of the search. We relate in the Discussionsection to the interdependence of these measures.

Division of each search trial into two halves

We examined whether the effects of more and longerfixations on detected pairs are constant over differentsearch stages. For this purpose, we divided each searchtrial into its first and second halves and repeated the aboveanalyses of total number of fixations and total viewingtime for each half trial. Results are shown in the secondand third rows of Figures 3 and 4.The first half of the search, over all 294 trials, does not

show a difference between detected and undetected pairsin the number of fixations. Average results were 1.59 (SE:0.09) fixations on the detected pair cards and 1.77 (SE:0.08) on the undetected pair cards.Figure 3d shows a scatter plot by subject comparing

average number of fixations on the detected pair and onthe undetected pair. The points for every subject aregathered around the diagonal of equality, and even fall alittle below it (y = 0.91x; C.I. 0.84–0.98). The image inFigure 3e shows that the same is true on a trial-by-trialbasis for all subjects (y = 0.91x; C.I. 0.85–0.98). Note thathere (as opposed to Figure 3b) the number of fixations canbe 0, because there need not be a fixation on each pair inthis half of the trial. Figure 3f shows the normalizeddifference between the numbers of fixations on the twopairs. Overall, considering only the first half of each trial,in 28.9% of the trials there were more fixations on thedetected pair cards; in 39.8% there were more fixations onthe undetected pair cards; in 26.5% there were equalnumbers of fixations on the detected and undetected paircards; in the remaining 4.8% there were no fixations oneither of the pairs (and they were fixated only in thesecond half of the trial).In contrast to this result, there is a large effect on the

number of fixations in the second half of each trial:Average results were 2.98 (SE: 0.09) fixations on thedetected pair cards vs. 1.45 (SE: 0.08) on the undetectedpair cards. Thus, in Figure 3g, the points for every subjectfall above the diagonal of equality (y = 2.07x, C.I. 1.82–2.33). The image in Figure 3h shows that the same is true

Figure 5. (a) Schematic illustration of a sequence of successivefixations on different card regions within a trial. In this example,card numbers 3 and 7 form a pair and fixations on their regionswithin this sequence are marked in red. (b) Average occurrencesper trial (over 208 trials with at least 2 fixations on each pair) ofeach sequential distance between fixations on detected (red) andundetected (blue) pair cards. Note that there are more occur-rences with fewer intervening fixations for the detected pair cards,i.e., the red plot is shifted upward for the lower sequential-distancevalues compared to the blue plot. Arrows indicate mean sequen-tial distance between fixations on the same pair.

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 8

Page 48: Cognitive and Perceptual Processes in Visual Recognition

on a trial-by-trial basis for all subjects (y = 1.32x, C.I.1.23–1.42). Overall, in 77.2% of the trials there were morefixations on the detected pair cards; in 8.5% there weremore fixations on the undetected pair cards; and in 14.3%there were equal numbers of fixations on the detected andundetected pair cards, as shown in Figure 3i.The same difference between the two halves of the trial

is seen for the total viewing time measure. The first half ofthe search, over all trials, does not show a differencebetween detected and undetected pairs. Average results

were total viewing time of 410 ms (SE: 25) on thedetected and 472 ms (SE: 24) on the undetected pair cards.The points in the scatter plots of Figures 4d and 4e are

gathered around the diagonal of equality and even fall alittle below it (y = 0.88x; C.I. 0.81–0.96, and y = 0.7x; C.I.0.62–0.77, respectively). Figure 4f shows the normalizeddifference between the numbers of fixations on the twopairs. Overall, longer viewing times were found in 40.1%of the trials for the detected and in 54.4% for theundetected pair cards; in 0.7% there were equal viewingtimes on the two pairs and in the remaining 4.8% therewere no fixations on either.In the second half of each trial, on the other hand, large

differences were found. Average total viewing times were950 ms (SE: 32) on the detected and 359 ms (SE: 21) onthe undetected pair cards. In Figures 4g and 4h, the pointsfor every subject fall above the diagonal of equality (y =2.66x; C.I. 2.23–3.1, and y = 1.57x, C.I. 1.4–1.74,respectively). Overall, in 89.1% of the trials there was alonger viewing time on the detected pair cards; in 10.9%there was a longer viewing on the undetected pair cards(average normalized difference +0.6 vs. j0.27).

Figure 6. Search Dynamics. (a) Accumulated number of fixationson detected (red) and undetected (blue) pair cards for one trial(shown in Figure 2a) as a function of the successive searchfixations. Each increment corresponds to a fixation on a pair card.Fixations on the other 8 card regions leave both the pairs withoutan increment. The vertical line toward the end of the trialrepresents the point where the first card was marked (fixationsafter this line were not included in analyses and results).(b) Dynamics of fraction of fixations on pair cards, as a function ofnumber of search fixations. (c) Fraction of fixations, averagedover a running boxcar with a width of 6 fixations. (d) Sequentialdistance from the previous pair card, for each fixation on a paircard. The first fixation for each pair is arbitrarily plotted at zero.Numbers inside circles represent the card region on which thefixation fell.

Figure 7. Search Dynamics. Another example, as in Figure 6,corresponding to the example shown in Figure 2b.

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 9

Page 49: Cognitive and Perceptual Processes in Visual Recognition

The fact that there is such a difference between the firstand second halves of the trialVthough trial length variesover a large rangeVis a first strong indication that there isa correlationVand we will suggest a causal dependenceVbetween fixation number (and duration) and detection ofthe identical pair.

Search dynamicsNumber and sequence of fixations

Encouraged by the results of the division into halves ofeach trial, we followed the dynamics of search througheach trial. Results for two sample trials are shown inFigures 6 and 7. In Figures 6a and 7a, we plot theaccumulated number of fixations on pair cards as afunction of the successive fixations within the trial.Upward slopes reflect a fixation on a pair card and thelines remain horizontal for fixations on non-pair cards.Note that there are similar numbers of fixations on the to-be-detected (red) and the undetected (blue) pairsVup to apoint where the number on the detected pair increasesabove the level for the undetected pair cards.In Figures 6d and 7d, we plot the sequential distance

between each two fixations on the pair cards (as appears inFigure 5). An opposite effect is seen from the number offixations on the pair cards: As the search progresses, thenumber of intervening fixations between the detected paircard fixations becomes smaller, i.e., there is a graduallydecreasing sequential distance between fixations on the to-be-detected pair cards.

Fraction and running boxcar average

To further quantify the search dynamics, we analyzedtwo measures of the fixations on the pair cards. As in theexamples of Figures 6b and 7b, we studied the fraction offixations on the pair cards, for each successive fixationnumber. We also ran boxcar averaging over the number offixations on the detected and undetected cards, and thencalculated the fraction for this box size. Examples areshown in Figures 6c and 7c, using a 6-fixation boxcarwidth. Note that using a running average may moreclosely reflect the influence of memory for the most recentfixations.In both analyses, a clear point is seen where the curves

for the to-be-detected and the undetected pair cardsdiverge, and fixations on the detected pair rise abovethose on the undetected pair. This divergence reflects achange in the relative fraction of fixations on the two pairsduring the course of the search, indicating that a specificevent has taken place in the detection process.

Bifurcation point

To further investigate the issue of dynamics, we wish tosystematically study the point where a bifurcation occurs

between fixations on the detected and the undetected pairsin the search dynamicsVthe point where the ultimatelydetected pair “overpowers” the other potential target. Suchan observation can be seen for single trials in Figures 6a,b, c and 7a, b, c, for the accumulated number of fixationson the detected pair in comparison to the undetectedpairVor its running boxcar averageVduring progress ofthe search process. Since different trials have differentsearch times and a wide range of numbers of fixations, wecannot simply average over trials of different subjects oreven of a single subject.To average over trials and subjects, we measured the

backward dynamicsVthe accumulated number of fixationson pair cards (as in Figures 6a and 7a) but now relative tothe actual detection point (i.e., we look at fixations on thetarget pairs as a function of the number of fixations beforethe first mouse click on a card).Results are shown in Figure 8a. We plot the accumu-

lated fixations on the detected and undetected pair cards,averaging over different trials and subjects, aligning theresults according to the end of the trial, the time of markingdetected cards. Shown on the same graph are the resultsfor two trial-length ranges: short trials of 8–19 fixationsand longer trials with 20–80 fixations. Figure 8b plots theslopes of the accumulated number of fixation graphs ofFigure 8a. Again, the different windows are superimposedon the same graph.Methodological note: In order to perform a backward

dynamics analysis for a particular number of fixationsbefore initial card marking, there have to be at least thatnumber of fixations in the trial. Thus, very short trialswill be excluded from this analysis. For this reason, tocompare short and long trials, we applied two different“window” sizes: a window of 8 preceding fixations (trialsin the range of 8–19 fixations in total), and a window of20 preceding fixations (trials with 20–80 fixations intotal).Three characteristics are immediately apparent: (1) The

patterns of the fixations on the detected and undetectedpairs are nearly identical up to a point approximately4 fixations before the end of the trial. At this point, there isa marked change in the pattern for the detected pair, withthe number of fixations showing a sharp upturn (Figures 8aand 8b). This is seen most clearly as a point where theslope of the detected pair rises above the slope of theundetected pair (Figure 8b). (2) This divergence betweenthe detected and undetected pairs occurs a while beforeinitiation of the card-marking process, È5 fixations orabout 1.5 s before the first mouse click (Figure 8b). (3) Thepoint of divergence in slopes is only slightly shifted to theright for the short search sequences. Surprisingly, thesecharacteristics are the same for short and long trials, i.e.,there is no dependence on history. We conclude thatsomething important occurs at this point.To elaborate, the two different time windows show a

very good fit, except for an upward shift for the longertrials. This shift is due to a side effect of using a longer

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 10

Page 50: Cognitive and Perceptual Processes in Visual Recognition

window: the total number of fixations increases, thereforeslightly shifting up the number of fixations on the pairs.Yet, the average increases by only about 3 fixations onthe detected pair, in the whole range between trials with8–19 fixations and trials with 20–80 fixations. Thisrelatively small increase implies that no matter how manyfixations there are in totalVthe number of fixations onpair cards needed for identification is more or less fixed(within a certain range); see Figure 9.

Discussion

We compared fixations on detected and undetectedtarget items and found that there are more fixations on theultimately detected pair cards than on the undetected ones.

This is reflected in the proportion of trials in which thereare more fixations on detected pairs (68.4% vs. 15%) andthe average number of fixations on detected pair cards(4.6 vs. 3.2). The increase in number of fixations isconfirmed by an increase in total viewing time on detectedpair cards, and we found that the average duration of eachfixation was also longer for the detected pair.In addition, the results also show fewer intervening

fixations on other cards between fixations on the detectedpair cards in the search sequence. The average sequentialdistance between fixations on card regions is smaller forthe detected than for the undetected pairs.In fact, if there are more fixations on the detected pair

cards, then automatically the average interval or sequen-tial distance between them will be smallerVthese are twosides of the same coin. Still, it is of interest to examineboth, since we do not know, a priori, which one affectsperceptionVthe number of fixations on the cards (together

Figure 8. Backward Dynamics. (a) Average over all trials of the backward dynamicsVthe accumulated number of fixations on pair cards(as in Figures 6 and 7a) aligned to the actual detection point (i.e., the first mouse click on a card); dashed line: trials with 8–19 fixations;solid line: trials with 20–80 fixations; red: detected pairs; blue: undetected pairs. Curves for the two ranges show a very good fit, besidesan upward shift for the range of longer trials (see text). Error bars indicate SE between trials; arrow indicates where the slope (b) changessignificantlyVa change that is regarded as the bifurcation point. (b) Slope of the accumulated number of fixations (as in a). The slopeindicates the probability that the next fixation will be on a pair card. Error bars indicate 95% C.I. There is a clear bifurcation point where theslope of the detected pair rises above that of the undetected one and its slope exceeds a pre-determined threshold. The time of thisbifurcation in (b) is indicated by an arrow in (a).

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 11

Page 51: Cognitive and Perceptual Processes in Visual Recognition

with their duration, affecting total viewing time) or thetemporal adjacency of the fixations. Further study isrequired to differentiate between these interdependentparameters, e.g., by varying the number of cards in thedisplay, which can separately affect the number offixations on the detected pair or the sequential distancebetween such fixations. As discussed below, we believethat these two effects work together to bring aboutconscious perception of the target pair.As mentioned in the Methods section, we combined

successive saccades that fell on the same card region.Which choice is the correct one? If the two-saccadesequence is pre-planned, it may be correct to considerthem as two separate fixations on the card, rather than as asingle fixation with a “correction” in the middle. Recentstudies of saccades to long or short words (Vergilino &Beauvillain, 2001) or to elongated or separated shorterobjects that were displaced during the first saccade(Vergilino-Perez & Findlay, 2006) suggest that successivesaccades to the same object may be very different thansaccades to different objects, perhaps reflecting pre-planning in the within-object case. For example,“between-object saccades compensated for the displacementto aim for a target position on the new object whereaswithin-object saccades did not show compensation”. Ifthere is separate planning and intentionality for bothsaccades, then each fixation may be informative forprocessing the observed object. Nevertheless, we chosethe conservative analysis route and combined successivefixations on the same region. The results would have beenstrengthened if we had not combined fixations.An increase in number of fixations is of course

inconsistent with the presence of a mechanism of the type

proposed to explain the alternative outcome (of fewerfixations on detected cards), i.e., that detection is a resultof perceiving an inherent property of the detected targetand that once this property is perceived detection isimmediate. Rather, our findings suggest that several stagesare involved in the perceptual process leading ultimatelyto target pair detection.Similarly, our results rule out the hypothesis that

fixations are irrelevant for detection. This alternativemight stem from a need for perceiving a particular cardaspect, triggered in random fashion by a single fixation.On average, the number of fixations on the two pairswould be equal, and chance would determine which pair isdetected first.An inherent requirement for performance of the Identity

Search task is the use of Working Memory to enablecomparison of different cards. Ballard, Hayhoe, Pook, andRao (1997) suggest a tradeoff between working memoryload and the number of required fixations. The well-known limitations of Working Memory (Horowitz &Wolfe, 1998; McCarley, Wang, Kramer, Irwin, &Peterson, 2003) may make repeated fixations to the samecard advantageous for this comparison process. Beyondtheir deictic role suggested by Ballard et al. (1997),repeated fixations can be used to gather information andintegrate it.The greater number of fixations on the ultimately

detected pairs raises an essential issue: What comes first?Does an arbitrary intensive observation of the pair cardslead to perception, or, the opposite, does detection of thepair lead to more fixations on it? In other words, does thelarger number of fixations give rise to detection or is it aresult of detection?

Figure 9. (a) Histogram of number of fixations for each trial on the detected pair cards, over all trials. (b) Psychometric curve of percentdetection as a function of the number of fixations on the detected pair cards.

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 12

Page 52: Cognitive and Perceptual Processes in Visual Recognition

To discriminate between these possibilities and deter-mine which, if any, is the actual relationship betweenfixation and detection, we analyzed the sequence offixations to determine when the differentiation betweendetected and undetected cards occurs. We argued that iffixations were randomly organized, and chance led tomore fixations on one pair than on the otherVand thus toits ultimate detectionVthen there should be no regularpattern of fixations on the detected cards until the momentof detection. Any regular pattern should derive from achange of state in the search sequenceVthat is, from someperceptual detection mechanism.On the other hand, if it is detection that leads to

increased fixations, the period of increased fixations on thedetected cards should be very brief. Once there isconscious recognition of the existence of the pair, thereshould be very few fixations before the ultimate markingof these cards. Note that the time taken for confirmationthat detected cards indeed form an identical pair isprobably very brief. We conclude this from anotherexperiment that we performed, with a much morecomplex task, the Set game (Jacob & Hochstein, 2008and unpublished results). We asked subjects to verifywhether the few cards presented (for only 300 ms) form athree-card Set. They achieved 96% accuracy. This speedshould be at least matched in the easier two-card IdentitySearch task. Thus, the confirmatory check period shouldhave required no more than the time taken for 1–2fixations. Another indication of the ease of confirmationis that sometimes pairs are detected after very fewfixations (Figure 9). An additional indication that pureconfirmation of cards was not conducted during the periodof increased fixations on the detected pair comes from theslope in Figure 8b. If final fixations reflected consistentconfirmation, the slope during these last fixations wouldhave had the value of 1 (indicating that the next fixationwas always on a pair card), which is not the case. Weconclude that subjects are not yet consciously aware of thetarget pair at this pointVthough the visual system mayalready have some information in this direction, leading tothe preponderance of such fixations.There is additional evidence of a mismatch between

fixation and detection, that is, that fixation on the targetis not always accompanied by explicit detection (Barlasov-Ioffe &Hochstein, 2008; Motter & Belky, 1998; Rutishauser& Koch, 2007; Sheinberg & Logothetis, 2001). Recog-nition often requires a “double-take” saccade, i.e., one ormore fixations away from the target (Rutishauser & Koch,2007), during which conscious recognition presumablyoccurs, leading the eyes back to the target. We suggestthat in the period between return fixations, even whenawareness of the target is at most unconsciousVsince theresponse is still not initiatedVone should not consider themind as being completely ignorant, but as having implicitpre-recognition. This implicit pre-recognition stage maybe similar to that found in the persistent activity oftemporal cortex (Sheinberg & Logothetis, 2001) preced-

ing target acquisition, and without target awareness. Wesuggest that this implicit recognition guides the eyes andsaccadic planning.We turn to the backward dynamics, the accumulated

number of fixations on pair cards, averaged over trials,with respect to the moment of card marking, anddemonstrated in Figure 8. The results show that thepatterns of fixations on the detected and undetected pairsare nearly identical up to a point a few fixations beforemarking. At this point, there is a sharp upturn in thenumber of fixations on the detected pair.The dynamics are not very different for short compared

to long trials (see Figure 9). The small increase incumulative number of fixations on detected pair cards inthe long search sequences (6.4 vs. 3.2 fixations), relativeto the total number of fixations in the sequence, impliesthat the number of fixations on pair cards needed foridentification is defined within a certain period of time-that is, there is a dying memory type of recall andcomparison.Nevertheless, longer searches, i.e., longer sequences of

fixations, naturally raise the total number of fixations, andtherefore in particular also raise the number of fixationson the pair cards. Thus, the important factor determiningdetection must take into account also the proximity offixations. We conclude that not only the absolute numberof fixation has an influence but also their proximity intime or number of intervening fixations.We suggest that since fixations on each pair are sparse

during the initial part of the search, the fixation slope(Figure 8b) is still low. Only at a later stage do thefixations on the pair card become closer to each other andthe slope rises.We found that the sequential distance between fixations

were different for detected and undetected pair cardssuggesting that not only the number of fixations but alsothe time between them is a significant factor fordetermining pair detection. This factor may be related tothe limited capacity of Working Memory (Horowitz &Wolfe, 1998; McCarley et al., 2003; Phillips & Christie,1977a, 1977b). It may be difficult for subjects to keepmany cards in Working Memory at the same time, so thatfixations need to be close to each other to associate placewith identity. The average sequential distance (Figure 5)perhaps puts an upper bound on average working memoryat 4 cards. The sequential distance decreases whenapproaching detection (Figures 6d and 7d), so that thismay even be an overestimate (see also the recency effectwith free viewing in Phillips & Christie, 1977a, who useda similar stimulus pattern but different paradigm). Perhapsa necessary condition for detection is that two cards berepresented concurrently in Working Memory.Still, even viewing two cards one right after the other

does not ensure that they be recognized as an identicalpair (Figure 5b). We conclude that no single sequential-distance fixation pattern (fixating the pair cards one afterthe other, with one card intervening, etc.) is sufficient by

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 13

Page 53: Cognitive and Perceptual Processes in Visual Recognition

itself to bring about detection, nor is any one patternnecessary. Nevertheless, very close patterns (with asequential distance of 1–3) are more prevalent for thedetected cards implying that perhaps one or the other ofthese is a necessary condition for detection. Perhaps aconjunction of a few patterns is sufficient for determiningdetection.We suggest that the important factor may be the change

in the accumulated number of fixations, i.e., the slope,rather than the number itself. Detection may depend onthe slope rising above some threshold. The slope does notrely on the intersection point between the two graphs (ofthe detected and undetected pairsVFigure 8a). The slopeis equivalent to the probability that the next fixation willbe on a pair card. Its rise reflects an end of random,unproductive search. The point where the slope exceedsa pre-determined threshold may be regarded as thebifurcation point where there is a change of state in thesearch process.This bifurcation point may reflect a transition between a

first stage of “search in the dark” to a second stage of“early implicit recognition”. This suggests that there is anearly recognition stage, which is followed by more fixations.The finding that there is a bifurcation point a while

before the time of marking the cards suggests that theremight have been implicit perception of the pair before itsentering into conscious awareness (Mitroff, Simons, &Franconeri, 2002; Rensink, 2004). This implicit percep-tion may direct eye movements to the pair cards; however,note that not all fixations are on these cards even at thispoint. During this stage of concentrating on the pair cards,the unconscious discovery is brought to awareness.In summary, we suggest a 3-stage model of the

perceptual recognition process, as follows: Stage 1: InitialsearchVrandom fixations on the different cards inarbitrary orderVa “search in the dark”. Stage 2: Implicit(unconscious) recognition of the target pair, perhapscontrolling and guiding eye movements to the relevantsensed location of these target cards. The transition fromStage 1 to Stage 2 is seen in the bifurcation point of thefixation slope in Figure 8. Stage 3: Insight: Explicitdetection with conscious knowledge of target presenceand its locationVfollowed by rapid marking of the twocards.

Conclusions

Searching for a pair of identical cards, where two suchpairs were present in each display of twelve cards, thecards of the pair that was ultimately detected are observedmore frequently than cards of the undetected pairVthereare more fixations and longer fixations on the ultimatelydetected pair, and the average sequential distance betweenfixations on card regions is smaller for the detected pairs.

A bifurcation point is observed along the dynamics ofsearch, in which the to-be-detected pair overpowers theundetected one. This suggests an early, implicit, recog-nition stage in the process of perception, which isfollowed by more fixations, leading, ultimately, to thepoint of explicit target pair recognition.

Acknowledgments

We thank Anne Treisman and Robert Shapley forhelpful discussions throughout this study and DaphnaWeinshall and Leon Deouell, members of the MJ doctoralcommittee, for comments and suggestions. We thankMichael Wagner and Guy Goldner for assistance withthe eye-movement recording equipment.This study was supported by grants from the Israel

Science Foundation (ISF) and the US–Israel BinationalScience Foundation (BSF). We are grateful to theNational Institute for Psychobiology in Israel and Prof.Micha Spira for use of the Charles E. Smith and JoelElkes Laboratory for Collaborative Research in Psycho-biology for the eye-movement recording reported here.

Commercial relationships: none.Corresponding author: Michal Jacob.Email: [email protected]: Department of Neurobiology, Institute of LifeSciences, Hebrew University, Givat Ram, Jerusalem91904, Israel.

References

Ahissar, M., & Hochstein, S. (1997). Task difficulty andthe specificity of perceptual learning. Nature, 387,401–406. [PubMed]

Ahissar, M., & Hochstein, S. (2004). The reversehierarchy theory of visual perceptual learning. Trendsin Cognitive Sciences, 8, 457–464. [PubMed]

Anstis, S. M. (1974). Letter: A chart demonstratingvariations in acuity with retinal position. VisionResearch, 14, 589–592. [PubMed]

Antes, J. R. (1974). The time course of picture viewing.Journal of Experimental Psychology, 103, 62–70.[PubMed]

Ballard, D. H., Hayhoe, M. M., Pook, P. K., & Rao, R. P.(1997). Deictic codes for the embodiment of cogni-tion. Behavioral and Brain Sciences, 20, 723–767.[PubMed]

Barlasov-Ioffe, A., & Hochstein, S. (2008). Perceivingillusory contours: Figure detection and shape discrim-ination. Journal of Vision, 8(11):14, 1–15, http://

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 14

Page 54: Cognitive and Perceptual Processes in Visual Recognition

journalofvision.org/8/11/14/, doi:10.1167/8.11.14.[PubMed] [Article]

Brainard, D. H. (1997). The Psychophysics Toolbox.Spatial Vision, 10, 433–436. [PubMed]

Bruce, C. J., Goldberg, M. E., Bushnell, C., & Stanton,G. B. (1985). Primate frontal eye fields. II. Physio-logical and anatomical correlates of electricallyevoked eye movements. Journal of Neurophysiology,54, 714–734. [PubMed]

Buswell, G. T. (1935). How people look at pictures.Chicago: University Chicago Press.

Chen, X., & Zelinsky, G. J. (2006). Real-world visualsearch is dominated by top-down guidance. VisionResearch, 46, 4118–4133. [PubMed]

Cornelissen, F. W., Peters, E. M., & Palmer, J. (2002).The Eyelink Toolbox: Eye tracking with MATLABand the Psychophysics Toolbox. Behavior ResearchMethods, Instruments & Computers, 34, 613–617.[PubMed]

DeSchepper, B., & Treisman, A. (1996). Visual memoryfor novel shapes: Implicit coding without attention.Journal of Experimental Psychology: Learning, Mem-ory and Cognition, 22, 27–47. [PubMed]

Droll, J. A., Gigone, K., & Hayhoe, M. M. (2007).Learning where to direct gaze during changedetection. Journal of Vision, 7(14):6, 1–12, http://journalofvision.org/7/14/6/, doi:10.1167/7.14.6.[PubMed] [Article]

Henderson, J. M., & Hollingworth, A. (1999). High-levelscene perception. Annual Review of Psychology, 50,243–271. [PubMed]

Hochstein, S., & Ahissar, M. (2002). View from the top:Hierarchies and reverse hierarchies in the visualsystem. Neuron, 36, 791–804. [PubMed]

Horowitz, T. S., & Wolfe, J. M. (1998). Visual search hasno memory. Nature, 394, 575–577. [PubMed]

Jacob, M., & Hochstein, S. (2008). Set recognition as awindow to perceptual and cognitive processes. Per-ception & Psychophysics, 70, 1165–1184. [PubMed]

Jonides, J. (1981). Voluntary versus automatic controlover the mind’s eye movement. In J. B. Long &A. D. Baddeley (Eds.), Attention performance (vol. IX,pp. 187–203). Hillsdale, NJ: Erlbaum.

Kanwisher, N. (1987). Repetition blindness: Type recog-nition without token individuation. Cognition, 27,117–143. [PubMed]

Kanwisher, N. (1991). Repetition blindness and illusoryconjunctions: Errors in binding visual types withvisual tokens. Journal of Experimental Psychology:Human Perception and Performance, 17, 404–421.[PubMed]

Liversedge, S. P., & Findlay, J. M. (2000). Saccadic eyemovements and cognition. Trends in CognitiveSciences, 4, 6–14. [PubMed]

Loftus, G. R., & Mackworth, N. H. (1978). Cognitivedeterminants of fixation location during picture view-ing. Journal of Experimental Psychology: HumanPerception and Performance, 4, 565–572. [PubMed]

Mackworth, N. H., & Morandi, A. J. (1967). The gazeselects informative details within pictures. Perception& Psychophysics, 2, 547–552.

McCarley, J. S., Wang, R. F., Kramer, A. F., Irwin, D. E.,& Peterson, M. S. (2003). Psychological Science, 14,422–426. [PubMed]

Mitroff, S. R., Simons, D. J., & Franconeri, S. L. (2002).The siren song of implicit change detection. Journalof Experimental Psychology: Human Perception andPerformance, 28, 798–815. [PubMed]

Motter, B. C., & Belky, E. J. (1998). The guidance of eyemovements during active visual search. VisionResearch, 38, 1805–1815. [PubMed]

Muller, H. J., & Rabbitt, P. M. (1989). Reflexive andvoluntary orienting of attention: Time course ofactivation and resistance to interruption. Journal ofExperimental Psychology: Human Perception andPerformance, 15, 315–330. [PubMed]

Nodine, C. F., Carmody, D. P., & Kundel, H. L. (1978).Searching for Nina. In J. W. Senders, D. F. Fisher, &R. A. Monty (Eds.), Eye movements and the higherpsychological functions (pp. 241–257). Hillsdale, NJ:Erlbaum.

Phillips, W. A., & Christie, D. F. (1977a). Components ofvisual memory. Quarterly Journal of ExperimentalPsychology, 29, 117–133.

Phillips, W. A., & Christie, D. F. (1977b). Interferencewith visualization. Quarterly Journal of ExperimentalPsychology, 29, 637–650. [PubMed]

Porta, J. B. (1953). De refractione optices parte: Librinovem. Naples, Italy: Carlinum & Pacem.

Posner, M. I. (1980). Orienting of attention. QuarterlyJournal of Experimental Psychology, 32, 3–25.[PubMed]

Potter, M. C., ChunM.M., Banks, B. S., &Muckenhoupt, M.(1998). Two attentional deficits in serial targetsearch: The visual attentional blink and an amodaltask-switch deficit. Journal of Experimental Psychol-ogy: Learning, Memory and Cognition, 24, 979–992.[PubMed]

Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992).Temporary suppression of visual processing in anRSVP task: An attentional blink? Journal of Exper-imental Psychology: Human Perception and Perfor-mance, 18, 849–860. [PubMed]

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 15

Page 55: Cognitive and Perceptual Processes in Visual Recognition

Rayner, K. (1998). Eye movements in reading andinformation processing: 20 years of research. Psycho-logical Bulletin, 124, 372–422. [PubMed]

Rensink, R. A. (2004). Visual sensing without seeing.Psychological Science, 15, 27–32. [PubMed]

Rensink, R. A., O’Regan, J. K., & Clark, J. J. (1997). Tosee or not to see: The need for attention to per-ceive changes in scenes. Psychological Science, 8,368–373.

Riggs, L. A. (1965). Visual acuity. In C. H. Graham (Ed.),Vision and Visual Perception (pp. 321–349). NewYork: Wiley.

Ringach, D. L., Hawken, M. J., & Shapley, R. (1996).Binocular eye movements caused by the perception ofthree-dimensional structure from motion. VisionResearch, 36, 1479–1492. [PubMed]

Rubin, N., Nakayama, K., & Shapley, R. (1997). Abruptlearning and retinal size specificity in illusory-contourperception. Current Biology, 7, 461–467. [PubMed]

Rutishauser, U., & Koch, C. (2007). Probabilistic model-ing of eye movement data during conjunction searchvia feature-based attention. Journal of Vision, 7(6):5,1–20, http://journalofvision.org/7/6/5/, doi:10.1167/7.6.5. [PubMed] [Article]

Schall, J. D. (1991). Neuronal activity related to visuallyguided saccades in the frontal eye fields of rhesusmonkeys: Comparison with supplementary eye fields.Journal of Neurophysiology, 66, 559–579. [PubMed]

Shapiro, K. L., Raymond, J. E., & Arnell, K. M. (1994).Attention to visual pattern information produces theattentional blink in rapid serial visual presentation.

Journal of Experimental Psychology: Human Percep-tion and Performance, 20, 357–371. [PubMed]

Sheinberg, D. L., & Logothetis, N. K. (2001). Noticingfamiliar objects in real world scenes: The role oftemporal cortical neurons in natural vision. TheJournal of Neuroscience, 21, 1340–1350. [PubMed][Article]

Shneor, E., & Hochstein, S. (2006). Eye dominance effectsin feature search. Vision Research, 46, 4258–4269.[PubMed]

Simons, D. J., & Levin, D. T. (1997). Change blindness.Trends in Cognitive Sciences, 7, 261–267.

Stone, L. S., Miles, F. A., & Banks, M. S. (2003). Linkingeye movements and perception [Abstract]. Journal ofVision, 3(11):i, i–iii, http://journalofvision.org/3/11/i/,doi:10.1167/3.11.i.

Treisman, A. (2006). How the deployment of attentiondetermines what we see. Visual Cognition, 14, 411–443.[PubMed] [Article]

Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology,12, 97–136. [PubMed]

Vergilino, D., & Beauvillain, C. (2001). Reference framesin reading: Evidence from visually and memory-guided saccades. Vision Research, 41, 3547–3557.[PubMed]

Vergilino-Perez, D., & Findlay, J. M. (2006). Between-object and within-object saccade programming in avisual search task. Vision Research, 46, 2204–2216.[PubMed]

Yarbus, A. L. (1967). Eye movements and vision. NewYork: Plenum.

Journal of Vision (2009) 9(5):20, 1–16 Jacob & Hochstein 16

Page 56: Cognitive and Perceptual Processes in Visual Recognition

- 51 -

Chapter III

Graded Recognition as a Function of Target Fixations

Page 57: Cognitive and Perceptual Processes in Visual Recognition

- 52 -

Graded Recognition as a Function of Target Fixations

Abstract

Target recognition stages were studied by exposing observers to varying controlled

numbers of target fixations. The target, present in half the displays, consisted of two

identical cards (Identity Search Task; Jacob & Hochstein, 2009). Following more

fixations, targets are better recognized, indicated by decreased Response Time, increased

Hit rate and detectability, and growing confidence, reflecting current stage in recognition

process. Do fixations lead to detection or does unconscious recognition guide the eyes to

more fixations? Results suggest that a few fixations lead to early implicit recognition,

which in turn leads to more fixations, and thus ultimately to full explicit recognition.

1. Introduction

When searching for a complex target, what leads to detection? What are the stages

and mechanisms in the process of target recognition? To study this issue, we examine the

dynamics of detection and recognition as a function of the number of fixations on the

target. We previously devised a novel Identity Search Task, where we found that target

detection depends on the sequence of fixations (Jacob & Hochstein, 2009). We now use

this task, again tracking eye movements, but halt the display after a certain (varied)

number of fixations, specifically on the target, to catch different stages in the process of

recognition.

There is a cognitive plan behind eye-movements: Fixation patterns or scan-paths are

influenced by cognitive processes on the basis of task demands and the task-specific

information available at different parts of the scene (Antes, 1974; Brandt, 1945; Buswell,

1935; Hochberg, 1970; Henderson & Hollingworth, 1999; Neisser, 1976; Ringach,

Hawken & Shapley, 1996; Yarbus, 1967).

In a recent example, eye movements were studied in the context of “change

blindness”. When viewing two alternating pictures of a scene, where a small or even a

Page 58: Cognitive and Perceptual Processes in Visual Recognition

- 53 -

large difference has been introduced between them, observers are often "blind" to the

change if it is not in the scene's focus of interest (Rensink, O’Regan & Clark, 1995;

Rensink, O’Regan & Clark, 1997; Simons & Levin, 1997, 1998). It was found that even

when the changing region has been fixated in both its states, detection rate is only 25%

(Hollingworth, Williams & Henderson, 2001). The possible need for several fixations per

region was not examined.

Classical studies concluded that more informative scene regions receive more

fixations (Buswell, 1935; Yarbus, 1967; Mackworth & Morandi, 1967; Antes, 1974;

Loftus & Mackworth, 1978; Nodine, Carmody, & Kundel, 1978; Over, Hooge, Vlaskamp

and Erkelens, 2007) and that regions that receive more fixations are eventually identified

(Nodine, et al., 1978; they also found that Hits were preceded by examination-type, long

duration fixations, while Misses were preceded by survey-type, short duration fixations;

see also Over et al., 2007 who suggested that eye-movements may follow a compulsory

coarse-to-fine strategy). Are these extra fixations essential for recognizing targeted

objects? Subjects in the experiments of Nodine et al. (1978) knew their target, (the word

Nina), so that different regions of the display were more likely than others to be

concealing the target. In our display, a priori, each region is as likely as any other to be

the target. The only characteristic that renders one card a target is presence of another,

identical card. This makes target detection quite complicated in our task, adding to task

difficulty and trial duration. We do not compare local visual features in the scene, nor the

semantics of the object (for example, consistent/inconsistent, Hollingworth, Williams &

Henderson, 2001). Rather, we examine how the number of fixations on the target

influences the process of recognition.

We ask whether conscious target detection comes before or after extended target

fixation. Do arbitrary multiple observations of the pair cards lead to detection, or, on the

contrary, does detection of the pair lead to more fixations on it? In other words, does the

larger number of fixations give rise to detection or does unconscious pre-recognition

guide the eyes to more fixations?

There is evidence of a mismatch between fixation and detection, that is, that target

fixation is not always accompanied by explicit detection (Motter & Belky, 1998;

Rutishauser & Koch, 2007; Sheinberg & Logothetis, 2001; Barlasov-Ioffe & Hochstein,

Page 59: Cognitive and Perceptual Processes in Visual Recognition

- 54 -

2008). Recognition often requires a “double-take” saccade, i.e. one or more fixations

away from the target (Rutishauser & Koch, 2007), during which conscious recognition

presumably occurs, leading the eyes back to the target.

We suggest that in the intervening period there is implicit perception (Mitroff,

Simons, & Franconeri, 2002; Nodine, Carmody, & Kundel, 1978; Rensink, 2004), which

guides the eyes and saccadic planning. We further suggest that the following stage of

boosted target fixations brings this unconscious discovery to conscious awareness.

We proposed a 3-stage model of the perceptual recognition process during visual

search (Jacob & Hochstein, 2009): Stage 1: An initial “search in the dark”, consisting of

random fixations in arbitrary order; Stage 2: Implicit (unconscious) detection of the

target, guiding further eye movements to the target location, i.e. boosting fixations on it;

Stage 3: Explicit detection with conscious knowledge of target presence and its location,

dependant on crucial fixations on the target.

The Identity Search Task, which we first introduced in Jacob & Hochstein (2009), is

a spatial recognition task, in which subjects are instructed to detect two identical cards

(Figure 1). The display contains computer screen "cards", each with a square array of

scrambled black and white square units. The task is to detect two exactly identical cards,

regarded as the target. The characteristics of the Identity Search Task that are important

for our current research are that the identical card pairs do not pop out – rather, the

recognition process requires several fixations on the target – and that displays are divided

into distinct search and eye-fixation regions (the different cards) – allowing us to count

fixations on the different regions. An enormous number of novel displays may be created,

allowing us to repeat the task with a new search each time.

In our previous study we used twelve-card displays, each with two pairs of identical

cards, in order to compare between the eventually detected target and the undetected one

– as we had done previously in a study of a more complex search task (Jacob &

Hochstein, 2008). We found that the cards of the pair that was ultimately detected were

observed more frequently than cards of the undetected pair. There were more fixations

and longer fixations on the ultimately detected pair, and the average sequential distance

between fixations on these card regions was smaller for the detected pairs. A bifurcation

Page 60: Cognitive and Perceptual Processes in Visual Recognition

- 55 -

point was observed along the dynamics of search, in which the to-be detected pair

overpowered the undetected one.

1

2

3

45

6

7

8

9

a

123

45

67

89

10

11

12

13

14

15

16

17

1819

2021

2223

24

25

26

b

1 234

56

78 910

1112

13

c

Figure 1. The Identity Search task. 12 “cards” are displayed, each with a 4x4 scrambled array of black and white squares; subject task is to state whether they include an identical pair. Three examples are shown of displays with a target (shown here with a dark frame) with superimposed eye-movement records during search. For these three trials, the same subject had different numbers of target fixations before the display was turned off. The sequential number of each fixation is indicated in the circles. The Subject responded, “Yes” in all these examples, but reported different confidence levels. a. Pre-determined number of target fixation was 2, but incremented to 3 to obtain at least one fixation on each pair card. Confidence level: ‘Don’t know’. b. Pre-determined and actual number of target fixations was 5. Confidence level: ‘Maybe’. c. Pre-determined target fixations: 7; actual: 6, i.e. an early response was given. Confidence level: ‘Sure’.

In the main experiment of the current study there is only one identical pair, or none at

all, and the task is not to actively detect the identical pair, but to state whether such a pair

exists in the display at all. Eye Movements and fixations were recorded in real time to

allow us to count the number of fixations on the pair cards, and to abort the display after

a certain number. In this way, we controlled not the time of the display, but the more

relevant parameter – the number of target fixations. We then analyze precision of

Page 61: Cognitive and Perceptual Processes in Visual Recognition

- 56 -

detection response and degree of participant response certitude as a function of the

number of target fixations achieved before the display is turned off. In this way we hope

to measure the contribution of multiple fixations in the process of target detection.

2. Methods

2.1. The task

The experiment included three stages with the first two serving as training for the

third, which included measurement of eye fixations and served as the central task of the

experiment. Each stage included 100 trials and lasted about 30 minutes, with the third

taking a bit more time to include calibration and drift correction of the eye-movement

monitor. For each Identity Search trial we presented a display of twelve “cards” and

subjects were instructed to find two identical cards. Two identical cards did not appear in

adjacent locations (i.e., one above the other, or side-by-side, but they could appear

diagonally). The three stages were as follows:

1. Active detection of an identical pair in displays with exactly one identical pair,

giving subjects practice and a sense of such displays. Subjects marked cards with

mouse clicks. They could un-mark a card, as long as only one was chosen.

Marking two different cards was considered a mistake, and a message ‘Try again’

appeared at the top of the screen. Response Time (RT) was measured from the

appearance of the display until the click on the second correct card; subjects were

informed of this timing procedure and that speed was important. At the end of a

trial, a ‘Continue’ message appeared, and the next trial began when the subject

mouse-clicked it. Then the entire display was replaced for the following trial.

2. Target / No-Target training. In half of the trials there was an identical pair, and in

half there wasn’t. Fixations were not recorded and display times were randomized

to approximately match a pre-determined number of fixations for each trial; (see

below: Task Design). Subjects replied yes | no according to presence | absence of

a target, and they reported their confidence in this response, choosing from three

possible levels (don’t know | maybe | sure).

3. Target / No-Target + tracking eye-movements (the core of the experiment). This

stage was identical to the preceding one except that here we recorded eye-

Page 62: Cognitive and Perceptual Processes in Visual Recognition

- 57 -

movements and received real-time fixation data, allowing us to count target

fixations (when a target was present) and to stop the trial after a pre-determined

number of target fixations.

Nine subjects participated in the experiment. Two performed the 3rd experimental

stage twice, once with and once without reporting their confidence level (in either order).

This was added to determine whether confidence level reporting caused a delay or change

in initial yes | no response.

The dominant eye of each subject was determined before the experiment, using the

“hole-in-the-card” test (Durand & Gould, 1910; see review in Shneor & Hochstein,

2006). Fixations were analyzed according to the dominant eye.

2.2. Experimental routine for the central 3rd stage

At the beginning of each trial, subjects were prompted with a message, “Press ‘space’

when ready”. When the ready signal was given, we performed a drift-correction using the

center-of-screen point of the EyeLink built-in drift correction. Subjects were instructed to

fixate carefully the dot before and while pressing the ‘space’ key, and were told that

otherwise this may harm the experimental results. If after two presses of the ‘space’ bar,

the dot did not disappear, meaning the drift was unacceptably large, which rarely

occurred, we performed again the calibration, validation, and drift correction, and the

experiment continued from that point. After successful disappearance of the drift

correction point, the display was shown.

For trials with a target, the display was turned off after a pre-determined number of

fixations on it, and the mask was presented. As per prior instructions, subjects responded

by pressing the ‘g’ or ‘h’ key, for ‘no’ or ‘yes’, respectively. (Red and green stickers

were attached to these keys.) Subjects were instructed that response correctness was

primary and speed secondary. After responding ‘g’ or ‘h’, subjects were prompted with

the message, “Confidence: 1 – Don’t know; 2 – Maybe; 3 – Sure”, and were given as

much time as needed for replying. They were allowed an early yes | no response before

the display disappeared and they were then prompted for a confidence level reply. They

pressed the ‘space’ bar to begin the next trial.

Page 63: Cognitive and Perceptual Processes in Visual Recognition

- 58 -

2.3. Payment procedure

Subjects received, a basic payment of NIS 50 for the whole session (~$13), plus, for

the third stage, a bonus of NIS 0.50 for each correct response (Hit or Correct Rejection)

above chance level and for correct above incorrect early responses (disregarding net

negative bonuses).

Subjects were informed of this payment procedure, were given examples, and were

explained that the optimal tactic would be to try to give a correct response, and only then,

to attempt to give it quickly. The objective of this payment procedure was to encourage

as early a response as possible, once subjects knew the answer, given that they couldn't

know when the display would disappear. In this way we obtained information regarding

the number of fixations required for explicit recognition.

2.4. Analysis of number of target fixations

We combined successive fixations on the same card region (See discussion in Jacob

& Hochstein, 2009), whether they were target or non-target card regions, if: 1. The

distance between them was less than 2º (67 pixels), and, 2. One of them lasted <130ms or

the two together lasted <330ms. More than two fixations could be combined if each pair

obeyed these conditions. The durations of the combined fixations were summed.

(Nevertheless, eye movement records in Figure 1 reflect uncombined fixations.)

2.5. Design of the main task

In the target/no-target eye-movement task, an identical pair target appeared pseudo-

randomly in half of the trials. The number of target fixations was randomized in advance,

from the range of 2-7 and 10 fixations (as determined by the distribution of detected pair

cards fixations in our earlier experiment; Jacob & Hochstein, 2009). When this number of

fixations was reached, the trial was aborted and a mask was presented. In displays with a

target pair, we required also that there be at least one fixation on each of the pair cards.

Otherwise, the trial was continued until the subject made a fixation on the other card.

Subjects were not aware that when we tracked eye movements we also aborted trials

following a certain number of fixations; they were told that termination time was random.

Four considerations guided determination of the distribution of number of fixations:

Page 64: Cognitive and Perceptual Processes in Visual Recognition

- 59 -

1. Combinatorial calculations: For each number of fixations n, there are 2n possible

divisions of the fixations between the two cards (counting also opposite

scenarios), but two of these have all fixations on the same card. To compensate

for these, we multiply the number of target fixations desired by 2n/(2n–2).

2. As mentioned, if the pre-determined number of fixations was reached with all of

them on one card, we waited for at least one fixation on the second card of the

pair. Thus, the number of target fixations was higher than intended (See example

in Figure 1a).

3. If an early response was given, then the number of target fixations was lower than

intended (See example in Figure 1c).

4. We combined successive fixations on the same card, as elaborated above. This led

to a reduction in actual number of target fixations.

Taking all of these factors into consideration led to the approximation used and

shown in Table 1 (pre-determined) together with the average actual resulting number – to

which we relate in all the following analyses. Due to the low number of trials with 10

target fixations, the results were analyzed only for the target fixation range of 2-7.

Table 1 – Distribution of average number of trials for each number of target fixations, when present, or equivalent duration when target absent.

Present: # fixations (Absent: time±1sec)

1 2 (4.1)

3 (5.3)

4 (6.5)

5 (7.7)

6 (8.9)

7 (10.1)

8 9 10 (13.7)

Sum

Target absent planned & actual #

0 8 7 7 7 7 7 0 0 7 50

Target present

Pre-determined #

0 14 5 6 6 6 7 0 0 6 50

Actual #

0.4 7.7 9.1 8.9 7.9 6 5.8 1 0.8 2.3 50

Actual # (with Combined fixations)

0.8 9.6 10.3 10.1 7.3 5.9 3.3 1.1 0.9 0.7 50

For trials without a target, we obviously could not count target fixations; therefore,

each trial was assigned a duration (t), which was randomized according to the matching

number of fixations (n) and using a linear regression of the means (t = [1.2*n+1.67]sec)

Page 65: Cognitive and Perceptual Processes in Visual Recognition

- 60 -

calculated from a previous experiment. This was jittered randomly in the range of t±1s to

avoid biasing subjects. False Alarm rates were related to this equivalent target fixation

number.

When a display with a target resulted in no target fixations at all, or fixations on just

one of the cards, it follows that the subject gave an early response. This could happen in

two ways: 1. The subject did not respond correctly, i.e. responded ‘no’, when actually a

target was present. 2. The subject considered two other cards as a pair; (otherwise, the

subject must have perceived the cards with peripheral vision, which is not very likely).

2.6. Implementation

The experiment was implemented using the GUI of Matlab 7.0.4. Cards were

represented as clickable buttons, uniformly distributed over 3 rows and 4 columns. Each card

occupied 85x85 pixels (~2.7°x2.7° of visual field) with a 92 (horizontal) and 86 (vertical) pixel

space between cards. The borders of the regions used for analysis were taken at half this distance,

including for peripheral cards, (a radius of <2°; Anstis, 1974; Riggs, 1965). The mouse

pointer was moved off screen.

2.7. Equipment

Dominant eye fixations were recorded with an SR Research Ltd. (Ontario, Canada)

EyeLink I eye-tracker. Subjects sat constrained by a chin-rest, 80cm from a Samsung

SyncMaster 19" CRT monitor, with 4:3 format and screen resolution of 800x600 pixels

so that the foveal field of 2° occupies 2.8cm or 67 monitor pixels. The monitor was

surrounded by a black screen.

We used the SR-supplied (binocular) 9-point calibration and validation grid,

repeating as necessary (effective radial resolution was 0.6deg.) Drift correction was

performed before each trial. The EyeLink was controlled by Matlab Psychophysics and

Eyelink Toolboxes (Brainard, 1997; Cornelissen, Peters & Palmer, 2002). Fixation and

saccade analyses were performed with the EyeLink program. ‘End-fixation’ events were

retrieved in real-time to count the number of target fixations.

Page 66: Cognitive and Perceptual Processes in Visual Recognition

- 61 -

3. Results

3.1. Performance

For 100-trial tests, the average total number of correct answers (Hits plus Correct

Rejections, CR) was 67 (range 61-74), where 50 was chance level. On average, subjects

responded early in 15 of 100 trials (range 4-46; after an average of 5 fixations), of which

11 (44-100%) were correct. Average bonus payment was NIS 12 (range NIS 6-19).

Table 2 shows the across-subject mean distribution of response types – Hits, Misses,

Correct Rejection (CR) and False Alarms (FA). For 50 target and 50 no-target displays

per subject, there were only 29 ‘yes’ answers, on average, and 71 ‘no’ replies, indicating

conservative strategies overall. This is to be expected since we stopped the trials – and

turned off the display – often at quite early stages of search. Subjects may well be saying

“No, I did not detect the target” rather than “No, I am sure there is no target.” This is to

our advantage, since we are interested in the process of detecting the target. The fact that

subjects were all quite conservative means that they answered "No Target" by default and

needed to be convinced (even if implicitly) that there was a target. We follow this process

of "being convinced" as it proceeds with display time and number of target fixations.

Table 2 – distribution of response types; mean of all/conservative(Cons.)/daring subjects.

Yes No Total (subject control)

Target

Subject group:

Response All Cons. Daring All Cons. Daring All Cons. Daring

Hits FAs Yes

23 18 30 6 1 13 29 19 43

Misses CRs No

27 32 20 44 49 37 71 81 57

Total (predetermined)

50 50 100 trials/subject

Page 67: Cognitive and Perceptual Processes in Visual Recognition

- 62 -

Still, some subjects were more conservative (Cons. in Table 2) than others (called

‘Daring’). We divided subjects into these two groups according to their number of 'yes'

answers: The 5 more conservative subjects gave 14-21 ‘yes’ answers, including 0-2 FAs;

the 4 more daring subjects gave 35-46 ‘yes’ answers, including 7-14 FAs. (Note that

‘daring’ subjects also have fewer than 50 ‘yes’ answers.) Interestingly the two strategies

led to the same average number of correct responses (18 Hit + 49 CR vs. 30 Hit + 37

CR), yielding the same mean correct response bonus payment for the two groups.

3.2. Number of Target Fixations

The core of this research is to investigate the influence of the number of target

fixations on target recognition. We show the dependence on number of target fixations of

correct responses (Hits and CRs; Figure 2a), mean RT (Figure 2b) and Confidence level

(Figure 2c), as well as the effect on the resulting ROC curves (Figure 4).

Performance (Hit, Miss, FA and CR) is plotted in Figure 2a as a function of actual

number of target fixations (or equivalent for no-target displays; see Methods). Note the

trend for increasing Hit rate and decreasing FA rate with number of target fixations.

When rising from 2-3 fixations to 6-7 fixations, there are more Hits (39% → 57%) and

more CRs (77% → 90%). Hit rate increase was mainly in more conservative subjects, FA

rate decrease mainly in more daring subject (not shown).

Mean response time (RT) from display disappearance to yes|no response was

1.3±0.2s (between subject mean±SD) for both target presence and absent trials, but of

course excluding trials with an early response when the display was not extinguished by

the experimenter. Post-display-disappearance RT falls with the number of fixations, as

shown in Figure 2b, probably reflecting the increased information that subjects have

concerning target presence or absence.

It would seem that a fixation, which lasts ~200ms 'saves’ only 40ms in response time,

which might seem not ‘worthwhile’, but, actually, each fixation raises the amount of

information available, as seen in the performance and confidence results in Figures 2a,c;

RT serves only as an indicator of information, not as the goal in itself. There was no

difference in RT between conservative and daring subjects.

Page 68: Cognitive and Perceptual Processes in Visual Recognition

- 63 -

2 3 4 5 6 70

0.2

0.4

0.6

0.8

1

Per

form

ance

CRHitMissFA

a

2 3 4 5 6 70.9

1

1.1

1.2

1.3

1.4

1.5

1.6

Mea

n R

T (s

ec)

y=1.42−0.04x , R2=0.42b

2 3 4 5 6 70.1

0.2

0.3

0.4

0.5

Number of target fixations

Fra

ctio

n

SureMaybeDon’t know

c

Figure 2. Impact of target fixations. a. Performance (Hit-, Miss-, FA- and CR-rates) vs. number of target fixations; mean over all 9 subjects (900 trials). Error-bars are between subject SE. b. Mean RT (between subjects) after disappearance of display vs. number of target fixations, not including trials with early responses. Error bars represent SE between subjects. 9 subjects, 765 trials (excluding 135 early responses). c. Fraction of each Confidence Level as a function of number of target fixations (for 450 displays with a target). Note sharp increase in 'sure' rate at the expense of decreasing 'don't knows' (and constant 'maybe' responses) with increasing target fixations.

Page 69: Cognitive and Perceptual Processes in Visual Recognition

- 64 -

We now look at confidence level as a function of target fixations, demonstrated in

Figure 2c. Confidence level responses turned out not to be 100% reliable in the sense that

subjects were often "sure" but wrong. Nevertheless, there is still a consistent increase in

surety with number of fixations. Figure 2c shows the fraction of each of the three

confidence levels as a function of the number of target fixations for target present trials.

‘Maybe’ responses are constant at ~45% while 'Sure' responses rise from 15% to ~40% at

the expense of decreasing ‘Don’t know’ responses. When rising from 2-3 target fixations

to 6-7 target fixations, the Surety index (see Table 3) increased (Hits surety 0.52→0.73;

CR surety 0.36→0.56; Miss surety 0.33→0.56; FA surety 0.32→0.36). There were

almost no ‘Sure’ responses following very few target fixations (in the CRs, Misses, and

FAs: 6-8%, for 2-3 target fixations; not shown). As the number of target fixations

increases, ‘Sure’ responses rise for Hits, and with further increase in the number of

fixations, ‘Sure’ responses rise also for CRs and Misses (not shown). We conclude that

with more target fixations, subjects become more confident of their answers.

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

2−3 fixations

Per

form

ance

SureMaybeDon’t know

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

6−7 fixations

Hits

Hits

FAs

CRs

Misses

CRs

FAs

Misses

Figure 3. Histogram representing division of performance (Hits, Misses, CRs and FAs) into confidence levels. Performance in displays with a target (Hits, Misses) sum to 1, as does performance without a target (CRs, FAs). Shown is the progress in performance and surety from 2-3 target fixations (left) to 6-7 target fixations (right).

Page 70: Cognitive and Perceptual Processes in Visual Recognition

- 65 -

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

FA rate

Hit

rate

234567

d’=1.5

d’=0.5

a

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

FA rate

Hit

rate

234567

d’=2

d’=1.5d’=1

b

2 3 4 5 6 7

0

0.5

1

1.5

2

Number of target fixations

d−pr

ime

ConservativeDaring

c

2 3 4 5 6 7

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Number of target fixations

Crit

erio

n

ConservativeDaringd

Figure 4. a&b. Receiver Operating Characteristics for each number of target fixations (indicated by gray scale and marker) for a. Daring and b. Conservative subjects; mean over all trials. c. Detectability, d’ and d. Criterion vs. number of target fixations; (solid line: daring subjects, dashed line: conservative subjects).

In Figure 4a,b we show Receiver Operating Characteristics (ROC) curves (Green &

Swets, 1966) for more daring and more conservative subjects, respectively, for each

number of target fixations. We separated the ROC curves of the daring and conservative

subjects for obvious reasons – the conservative rarely replied ‘yes’, therefore their results

are grouped along the ordinate, while the others are spread out. Figure 4c,d shows

detectability d' and subject criterion, again as a function of target fixations. For

Page 71: Cognitive and Perceptual Processes in Visual Recognition

- 66 -

conservative subjects, there is a consistent increase in d’ with number of target fixations,

but only a slight decrease from the conservative adopted criterion. By definition, the

criterion for more conservative subjects is higher than for more daring subjects, and in

fact the more conservative subjects have almost no FAs (all points are near the ordinate).

Both groups become less conservative going from mid-level to high numbers of target

fixations. Interestingly, for daring subjects, d' increases mainly from lowest to mid-level

number of target fixations; there is initially an increase in d’ and then stabilization (on

d’=1.5). Detectability does not improve from 4 to 7 fixations, i.e. the information is

already in their systems, though they are only gradually becoming aware of it and they

are still improving in their criterion from quite conservative to optimal. At the same

time, subject criterion decreases toward optimal for higher numbers of target fixations.

Figure 5 shows average d’ and criterion vs. confidence level, for different ranges of

target fixations. Detectability increases with confidence level, and only then criterion

decreases.

Don’t know Maybe Sure0

1

2

3

d−pr

ime

2−34−56−7

Don’t know Maybe Sure0

0.5

1

1.5

Confidence level

Crit

erio

n

Figure 5. d’ and Criterion vs. confidence level, for combined 2-3, 4-5 and 6-7 target fixations (indicated by gray scale and marker), averaged over all subjects.

Page 72: Cognitive and Perceptual Processes in Visual Recognition

- 67 -

3.3. Number of Fixations as Indicator of the Stage in the Process of Recognition

To learn about the effect of the number of fixations on the process of recognition, we

use confidence level as an indicator of the stage in this process. ‘Don’t know’ may reflect

a situation of ‘searching in the dark’, where the response is just a guess; when responding

‘Maybe’, subjects already have a vague idea (conscious or unconscious), but they are still

not sure – this is a situation of implicit recognition without confidence; ‘Sure’ is

equivalent to the stage of full perception. We show how the number of fixations dictates

the stage in the process of recognition in Figure 6.

We plot, for each level of confidence, the normalized average number of target

fixations that led to a Hit. That is, for each confidence level (C), we average the number

of fixations (F) that led to a Hit, weighted by the relative number of Hits at that

confidence level, or ∑∑ ⋅f f

cf

f f

cf

Hit

Hitf

Hit

Hit ,, )( . This is the same as calculating the center

of gravity of the surface below the plots in Figure 2c.

Don’t know Maybe Sure2.5

3

3.5

4

4.5

5

5.5

Confidence Level

Nor

mal

ized

wei

ghte

d ta

rget

fixa

tions

Figure 6. Normalized weighted average number of target fixations for trials with Hits for each confidence level. There is an increase in number of fixations needed to reach each level of confidence. Data for 5 subjects (3 daring; 2 conservative) who showed more reliability in confidence level responses, i.e. at least 80% Hits when declaring ‘Sure’; (reliability demonstrated in Figure 8c&d).

There is an increase in number of fixations from level to level of confidence. This

result is even more significant for those subjects whose confidence reports were more

reliable, and these are shown in Figure 6. After an average of 3 fixations they are still in

the stage of “searching in the dark”. After an average of 5 fixations, they already have

full perception of the target. Somewhere in between they are in the implicit recognition

Page 73: Cognitive and Perceptual Processes in Visual Recognition

- 68 -

stage. Subjects' report of a 'Maybe' confidence level seems to reflect a sense of the

presence of a target, without explicit knowledge. We infer that progress along the process

of recognition requires added fixations.

3.4. Confidence Level

Figure 7 shows the mean confidence level distribution separated for target and no-

target displays. Subjects responded ‘Don’t know’ in about 13 (of 50) of trials in either

case. The main shift was from the most common response, “maybe”, to the less frequent,

“sure”, for displays with a target, as might be expected since one can be sure of having

seen a target, but only lengthy systematic scanning can make one sure of target absence.

Target No−Target0

5

10

15

20

25

30

35

40

45

50

Occ

urre

nces

SureMaybeDon’t know

Figure 7. Confidence level vs. target/no-target.

Conservative and daring subjects have similar confidence level distributions for target

present and absent trials. Recall that the terms daring and conservative do not relate to

declared confidence level, but to the frequency of ‘yes’ (target present) responses.

Though it may seem counter-intuitive, daring subjects responded 'Don't know' a bit more

frequently (31% vs. 23%). At first glance, one might expect that they would 'dare' also in

terms of confidence level. But the opposite is the case: Their daring is expressed in their

willingness to give a 'yes' response even when not sure, and even when aware of not

Page 74: Cognitive and Perceptual Processes in Visual Recognition

- 69 -

knowing. The extra 'Don't know' responses come on account of fewer responses of

'Maybe' (for displays with no-target) and 'Sure' (for target present displays).

The main difference between daring and conservative subjects is of course in their

yes and no answers, but this difference is not uniform across confidence levels, as

demonstrated in Figure 8. Daring subjects have on average more ‘Don’t know’ and fewer

'Sure' responses, and the frequency of yes responses is monotonically increasing for

conservative subjects and bell-shaped for daring subjects.. Conservative subjects nearly

always say 'no' when they 'Don't know' and in 90% of the 'Maybe' cases. Daring subjects

are much closer to half yeses for all confidence levels.

Don’t know Maybe Sure0

0.1

0.2

0.3

0.4

Fra

ctio

n

YesNo

a Daring subjects

Don’t know Maybe Sure0

0.1

0.2

0.3

0.4

Fra

ctio

nYesNo

b Conservative subjects

Don’t know Maybe Sure0

0.2

0.4

0.6

0.8

1

Per

form

ance

CRHitMissFA

c

Don’t know Maybe Sure0

0.2

0.4

0.6

0.8

1

Per

form

ance

d

Figure 8. a&b. Positive and negative responses as a function of confidence level. c&d. Performance (Hit-, Miss-, FA- and CR-rates) vs. confidence level. Left: daring subjects; right: conservative subjects. Note rising number of 'Yes' responses with confidence level for more conservative subjects and rising correct responses for both daring and conservative subjects.

Page 75: Cognitive and Perceptual Processes in Visual Recognition

- 70 -

In Figure 8c,d we plot performance (Hit-, Miss-, FA- and CR-rate) as a function of

confidence level for daring and conservative subjects, respectively. As mentioned,

conservative subjects had lower Hit-rates, and almost no false-alarms. For daring

subjects, the Hit-rate increases with confidence level, and the FA-rate decreases – as

shown in the ROC curve and the increasing d' presented in Figures 4&5.

RT also decreases as confidence level increases, as demonstrated in Figure 9. Early

responses were excluded; data are for 7 subjects because 2 did not give any ‘don’t know’

or any non-early ‘Sure’ responses. The clear RT decrease confirms the relevance of

subjects’ self-reported confidence level.

Don’t know Maybe Sure0.6

0.8

1

1.2

1.4

1.6

1.8

2

Confidence level

Mea

n R

T (

sec)

Figure 9. Mean RT (between subjects) after display disappearance vs. confidence level.

To find whether the need to declare confidence level influenced the yes | no responses

themselves or their RTs, we tested two subjects (one of them turned out to be more

conservative, and the other more daring) twice, once with and once without reporting

their confidence level (counter-balancing the order of the two runs). We found no change

in their pattern of responses, consistent with their own strategy, (number of yes | no

responses; number of Hits, Misses, FA and CRs), but there was a major speeding of their

RTs from 1.5s with confidence level reporting to 1.0s without. This was true for both

subjects and irrespective of order, suggesting it was not a learning effect. (Again, there

was no difference between target- and no-target displays). Perhaps the need to declare

one’s confidence level caused hesitation and delay in giving the ‘yes’|’no’ response, but

Page 76: Cognitive and Perceptual Processes in Visual Recognition

- 71 -

did not affect the content of that response. There was also no influence on number of

early responses.

Individual subject confidence level distributions are shown in Table 3. The order of

the rows is according to weighted average Surety (with weights: Don't know: 0; Maybe:

0.5; Sure: 1). It would seem that different strategies for declaring confidence level were

adopted by different subjects. Note again that confidence level is not directly linked to

criterion, i.e. to degree of conservatism of subject.

Table 3 – Distribution of confidence level across subjects. Subject Don’t

know Maybe Sure ‘Yes’

Responses Hits Correct Weighted

average – Surety

Y.S. 50 46 4 Conservative 13 63 0.27

O.H. 52 38 10 Daring 30 64 0.29

T.G. 55 28 17 Conservative 17 67 0.31

A.A. 38 31 31 Daring 25 61 0.47

J.H. 12 78 10 Daring 32 69 0.49 N.L. 8 67 25 Conservative 20 70 0.59

E.T. 20 40 40 Daring 31 74 0.60 I.G. 0 70 30 Conservative 16 64 0.65

D.K. 2 35 63 Conservative 20 69 0.81

Table is sorted by weighted average surety index. Interleaving of conservative and daring subjects suggest there is no correlation between conservatism and surety.

There is no apparent correlation between Conservative-Daring (in giving ‘yes’

responses) and the ‘daringness’ in declaration of confidence in the response. One might

expect that a daring subject would also be more confident in his or her response, but it

turned out not to be so. For example, one of the conservative subjects is the one who gave

the highest number of ‘Sure’ responses. Almost in each confidence distribution pattern

there are subjects of both types. But it should be noted that two of the daring subjects

were the ones who gave approximately equally distributed confidence responses. Daring

subjects, by definition, indeed gave more ‘yes’ answers, but apparently they were aware

of that they were ‘gambling’, and declared they were not sure. The confidence level

Page 77: Cognitive and Perceptual Processes in Visual Recognition

- 72 -

declaration allowed subjects to respond ‘yes’ even if they were not sure, or even did not

know at all.

3.5. Impact of non-target fixations

Is there a relationship between detection and the fraction of fixations that are on the

target? Figure 10 shows the Hit rate as a function of the total number of fixations for a

fixed number of target fixations. That is, as the total number increases, the fraction on the

target decreases. Clearly, the Hit rate decreases with decreasing target fixation fraction –

even though the number of target fixations is fixed. We conclude that target fixations

need to be closer together and/or not disturbed by distracting fixations on non-target

cards. The main effect of many display fixations is not a contribution to familiarity; it is

distraction from the target.

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

Total fixations

Hit

rate

5 target fixations

Figure 10. Hit-rate as a function of different total number of fixations on the display. Shown is a representative example with 5 target fixations; histograms for other numbers of target fixations are similar.

4. Discussion

Our aim was to understand better different stages in the process of detecting and

recognizing a target and to determine the effect of the number of target fixations on the

transition from stage to stage. To this end, we exposed subjects to controlled numbers of

target fixations – by stopping the display at different points – and inferred subject

recognition stage from their performance and confidence levels.

Page 78: Cognitive and Perceptual Processes in Visual Recognition

- 73 -

The target, if present, comprised two identical cards in the Identity Search Task.

There were two responses: a timed yes|no response relating to target presence|absence

and a report of the subject's level of confidence in that response, as Don’t-know | Maybe |

Sure.

Evidence for the influence of the number of target fixations comes from a number of

our experimental findings, as follows:

1- Faster response – shorter RT – with increasing target fixations, implying that an

increase in target fixations leads to progress in the recognition process (Figure 2b).

2- Improved performance. Hit-rate increases as a function of the number of target

fixations (Figure 2a). There is an increase in d’ and a decrease in the criterion towards

optimality (Figure 4). This is consistent with earlier research, which revealed that when

two targets are present, the one that is detected is that with more fixations on it (Jacob&

Hochstein, 2009). Subjects were not very good in recognizing the target after very few

fixations on it.

3- More confident responses. As the number of target fixations increased, subjects

became more confident about their responses (Figure 2c).

There were two types of subjects – more conservative, with very few ‘yes’ responses,

and less conservative, called ‘daring’, who gave more ‘yes’ responses, (though still less

than 50%). Conservative subjects had very few FAs, but also few Hits (i.e., a lot of

Misses). Both strategies yielded the same average rate of correct answers – Hits plus

Correct Rejections – pointing to a tradeoff between those two response types.

Misses may occur despite many target fixations when target fixations are interspersed

with many non-target fixations, which may be both distracting (that it, cause a larger

sequential distance between target fixations), and give the sense that if the target has not

been spotted by now, it might not be present. Thus, there is an inverse influence of total

number of fixations on the entire display on detection (Results section 3.5 and Figure 10).

This rules out the option that the described effects are due just to search time, and not to

number of target fixations. Performance improves with more target fixations, but for a

fixed number of target fixations, performance diminishes with more fixations on the

entire display in total, that is, with increased search time.

Page 79: Cognitive and Perceptual Processes in Visual Recognition

- 74 -

We are more interested, obviously, in displays which included a target, because we

are trying to put our finger on what leads to detection. At the ‘Sure’ level we are

interested in how many fixations led to this firm response. We also tried to catch the

weaker sense of knowledge, perhaps without awareness, which was latent in the ‘Maybe’

response – and the number of fixations that led to this hunch. Finally, if the subject felt

that he or she had no clue as to presence or absence of the target – the ‘Don’t know’

level, but replied above chance level, their response was still affected by the number of

target fixations.

To learn the effect of the number of fixations on the process of recognition, we use

confidence level as an indicator of the stage in the process. The ‘Don’t know’ level is

regarded as equivalent to a ‘search in the dark’, the ‘Maybe’ level to the stage of implicit

recognition, and the ‘Sure’ – to the stage of full perception (explicit recognition). There is

an increase in number of fixations from level to level of confidence.

Clearly, the number of target fixations necessary for recognition depends also on the

type of target itself. In our experiment, because the target consisted of two separate cards,

at least one (or even two) fixations on each card were necessary for recognition, hence

the poor results for total of only two and three fixations.

Returning to the question of what comes first – more fixations or recognition – we

will now discuss the two scenarios. If just vast fixations led to recognition, there wouldn’t

be a rise in the Hit-rate after fewer (but more than two) fixations. So we can conclude that

even a few fixations lead to some recognition. This partial recognition, reflected in the

Hit-rate which is above chance level but not very high, is then followed by more

fixations, until reaching complete recognition. Uniting the two, we can therefore

conclude that a few fixations lead to an early recognition state, which in turn leads to

more fixations, leading eventually to full explicit recognition.

Evidence for the 3-stage process comes from: 1. Different levels of confidence

(‘Don’t know’, ‘Maybe’ and ‘Sure’), each accompanied by different performance, i.e.

Hit-rate (Figure 8c&d); 2. Change in d’, from ‘Don’t know’ to ‘Maybe’, and only then a

change in criterion from ‘Maybe’ to ‘Sure’ (at 4-5 fixations; Figure 5); 3. Decrease in RT

when moving from one confidence level to another (Figure 9).

Page 80: Cognitive and Perceptual Processes in Visual Recognition

- 75 -

A remark regarding the Aha! experience in detection (Insight: Ahissar & Hochstein,

1997; Bowden & Jung-Beeman, 2007; Maier, 1931; Rubin, Nakayama & Shapley, 1997;

Smith & Kounios, 1996; Sternberg & Davidson, 1995): It might seem that if the process

of recognition is gradual, as the title of this chapter suggests, then there is no momentary

experience of discovery. This is not true, because the process of recognition includes also

the stage of implicit recognition, in which there is no awareness of the discovery (See

Ahissar & Hochstein, 1997, 2004; Hochstein & Ahissar, 2002). Therefore, even though a

long process is taking place, transfer from unconscious to conscious recognition can

emerge momentarily.

It has recently been claimed that there is no complete representation of the visual

scene built up and remembered across visual fixations, but that instead memory depends

on return fixations to recall what is in previously visited sites (O’Regan, 1992).

Additional such claims were sustained on phenomena such as change blindness – not

noticing that details are changed between fixations (e.g. Rensink, 2000; see also

Hollingworth et al., 2001) or slow (serial) visual search being independent of fixed

element position (Horowitz & Wolfe, 1998). Hollingworth et al. (2001) indeed tested if

change blindness would prevail following single fixations on the changing region, once

before and once following the change. They find that in only 25% of the cases did this

suffice for change detection. However, the conclusion that there is no inter-saccade

memory would not follow if information were gathered gradually over a number of

fixations to a scene region – as we now find. Thus, it would be of interest to repeat

Hollingworth et al.'s (2001) experiment, but testing change detection following multiple

fixations on the changing region. We would predict that detection will rise with the

number of fixations to the region before and after the change – and with the temporal

proximity of such fixations.

We suggest that it may be more economical for the visual system to scan the visual

scene and gather partial information from each sampled region, instead of expending all

its resources on one location to gather full information from one site at a time. Attention

can be spread uniformly at first, gathering minimal information in minimal time, and only

then repeatedly to already observed regions, (perhaps emphasizing more important

locations), to accumulate information gradually. The advantage of the evolutionary

Page 81: Cognitive and Perceptual Processes in Visual Recognition

- 76 -

development of this scanning method comes from creation of parallel partial

representations for all locations, rather than full knowledge about one location, on

account of none about others.

We conclude that graded recognition derives from graded information gathered

fixation after fixation on the same scene region(s).

5. Conclusions

We exposed the subjects to controlled stages along the recognition process by varying

the number of target fixations. We found that with increasing numbers of target fixations

there is a decrease in RT, an increase in Hit-rate and in detectability (d’), and an increase

in confidence level. Taken together, these results imply improved recognition with more

target fixations. That is, an increase in number of target fixations leads to progress along

the recognition process, and thus to faster, more accurate and more confident reactions.

In order to learn about the effect of the number of fixations in the process of

recognition, we used subject confidence level as an indicator of the stage in the process.

We found an increase in the number of target fixations when moving from level to level

of confidence.

In addition, these findings add support to the 3-stage theory of the perceptual

recognition process (Jacob & Hochstein, 2009): Stage 1: A “search in the dark”, reflected

by the ‘Don’t know’ response; Stage 2: Implicit (unconscious) recognition of the target,

reflected by the ‘Maybe’ response; Stage 3: Explicit detection with conscious knowledge

of target presence and its location. We found evidence that these stages are separated

from each other by the number of target fixations.

Analysis of the Hit-rates as a function of number of target fixations, led us to the

conclusion that a few fixations lead to the stage of early, implicit, recognition in the

process of perception, which in turn leads to more fixations, perhaps controlled and

guided to the relevant sensed location of the target cards, leading eventually to full

explicit recognition.

These findings support the conclusion that gathering of information over a specific

region of the scene results from a growing number of fixations on that particular region.

Page 82: Cognitive and Perceptual Processes in Visual Recognition

- 77 -

This leads to the conclusion that several fixations on a scene location are necessary for

achieving recognition.

Acknowledgments

This study was supported by grants from the Israel Science Foundation (ISF) and the US-

Israel Binational Science Foundation (BSF). We thank the Charles E. Smith Family and

Prof. Joel Elkes Laboratory for Collaborative Research in Psychobiology for use of its

eye-movement monitoring equipment without which this study could not have been

accomplished. Special thanks to Reut Avinun for her contribution to the experiment. We

thank SR Research Ltd. for their advice and support in programming.

Page 83: Cognitive and Perceptual Processes in Visual Recognition

- 78 -

Chapter IV

A Model for Visual Information

Page 84: Cognitive and Perceptual Processes in Visual Recognition

- 79 -

A Model for Visual Information

Abstract

We construct a mathematical model indicating the available information for each

spatial unit in the visual scene, at any given moment, dependant on previous fixations and

eye movement scanpath. The visual scene is therefore divided into discrete distinct units

(regions). Two processes affect the amount of information available from each region at

each point in time: an incremental component and a memory decay component. The

decay in information is affected by the sequential distance between fixations on each unit.

The increment is due to the number of fixations on the specific region. This available-

information measure might predict the probability of perceiving a change in stimulation

at a corresponding visual unit, or its inverse, the probability of change blindness.

1. Introduction

Do we have a memory trace of specific scene regions that we fixate? According to

Feldman (1985), a traditional assumption is that the visual system constructs a global

sensory image of the external world, for example by integrating sensory information over

multiple eye fixations. Another point of view is that repeated fixations serve as a pointer

to refresh our memory (Ballard, Hayhoe, Pook & Rao, 1997), that is, they are essential

for retrieving memories, which otherwise would not be available for use. This view

suggests a tradeoff between working memory load and the number of required fixations.

A completely opposite alternative to the traditional assumption of a global sensory

image is that the world serves as an external memory (O’Regan, 1992), required since our

own memory is very limited (Horowitz & Wolfe, 1998) and the “visual representation of

a scene is both local and transient, limited almost exclusively to the currently attended

object” (O’Regan, 1992; O’Regan, Rensink & Clark, 1999; Rensink, 2000a, 2000b;

Rensink, O’Regan & Clark, 1997; Simons & Levin, 1997; Wolfe, 1999).

Page 85: Cognitive and Perceptual Processes in Visual Recognition

- 80 -

A related proposal by Irwin (1992) is that while the representation of an object may

be retained in Visual Short Term Memory (VSTM) after withdrawal of attention from the

object, the representation is rapidly replaced. VSTM is characterized as a limited-

capacity, long-lasting store, which is non-maskable and not tied to spatial position (Irwin,

1991; see Atkinson & Shiffrin, 1968, for the “multi-store model”). Therefore, due to

limitations of VSTM, there will be little or no accumulation and integration of visual

information from previously attended regions (Irwin, 1992; Irwin & Andrews, 1996;

Irwin, Yantis & Jonides, 1983; O’regan & Levi-Schoen, 1983; see Hollingworth,

Williams & Henderson, 2001, for review).

Additional support for this approach comes supposedly from change blindness – the

deficiency of observers to notice large changes in the visual scene (Rensink, O’Regan &

Clark, 1997; Simons & Levin, 1997, 1998). Studies of eye-movements in conditions of

change blindness found that even when the changing region has been fixated in both its

states, detection rate is only 25% (Hollingworth et al, 2001). This raises the issue of

whether several fixations per location are required to form a full and stable

representation. However, change detection experiments were not used widely to study the

effect of the information gathered from the fixations.

We discuss two opposing theories concerning memory resulting from fixations, and

suggest a combination of the two. One theory, suggested by Rensink (2000a; see also

Hollingworth et al, 2001, for review) is termed Coherence Theory. It states that, when

attention is withdrawn from an object, the visual representation of that object decays

immediately from VSTM, and is overwritten by new visual input. We only have an

impression of scene continuity. Because visual representations of local objects do not

persist after the withdrawal of attention, the visual system is unable to accumulate

information from previously attended regions. Therefore, the visual system does not rely

on memory. Instead, objects can be sampled when necessary.

An alternative – the visual memory theory – suggests that “As a consequence of being

attended, the higher level visual representation of an object is consolidated into a more

stable long-term memory representation. Thus, over multiple fixations on a scene, visual

information from local objects accumulates in memory, forming a large scale

representation of that scene” (Hollingworth et al, 2001).

Page 86: Cognitive and Perceptual Processes in Visual Recognition

- 81 -

Note that visual memory theory refers to the entire visual scene, not to information

available about each specific scene region, which may differ from region to region. The

accumulation in Hollingworth et al’s description refers to the gathering of information

about the whole scene, constructed from the contribution of all local objects, thus

integrating visual information from several local objects, and forming a large scale

representation of the scene.

To summarize, coherence theory (Rensink, 2000a) states that “visual object

representations disintegrate immediately upon withdrawal of attention”, while visual

memory theory (Hollingworth et al, 2001) claims that “visual representations accumulate

to form a relatively detailed representation of a scene”.

One challenge to coherence theory is the question of what happens the second, third,

and so on, time that the same object is viewed. Does the processing of visual information

really start all over again, or is there an amount of information already “there”? Some

memory decay surely takes place – and we have found that target recognition is

influenced not only by the number of target fixations, but also by the temporal distance

between target fixations (Jacob & Hochstein, 2009, Chapter II). Thus we accept this

aspect of coherence theory.

According to the coherence theory view, however, there would be no retained

information at all. This is certainly counter-intuitive, but it would also imply that a single

fixation would suffice for retrieving all the required information about a certain region.

We showed that this is apparently not the case, when examining how the number of

target fixations influences recognition (Chapter III). Moreover, classical studies

concluded that more informative scene regions receive more fixations (Antes, 1974;

Buswell, 1935; Loftus & Mackworth, 1978; Mackworth & Morandi, 1967; Yarbus, 1967)

and that regions that receive more fixations are eventually identified (Nodine, Carmody,

& Kundel, 1978), suggesting that regional information is not an all-or-none effect (all:

following a fixation; none: before or soon thereafter). We therefore also partly accept

visual memory theory, emphasizing the importance of several fixations. We suggest an

extension of this theory related to gathering information over fixations at each scene

region.

Page 87: Cognitive and Perceptual Processes in Visual Recognition

- 82 -

Here we suggest a theoretical model that describes mathematically the combination of

these two theories (coherence theory and visual memory theory), with additional regard

to local scene regions. We account for available information decay as deriving from

short-term memory decay, together with the strengthening of a long-term memory

representation, dependant on the number of target fixations. Our model simulates the

available information for each unit in the visual scene.

2. Methods

2.1. The Model

The available information, calculated for each region of the scene, depends on

previous fixations and the eye movement scan-path. For the sake of quantification, the

visual scene is divided into discrete distinct units or regions. The model indicates for each

region, the amount of information available about it at any given moment (at time t). Two

processes affect this available information: a memory decay component, which is due to

the limitations of vSTM, and in our model is affected by the sequential distance from the

last fixation on the unit, and an incremental component affected by the number of

fixations on the specific region.

The incremental component is composed of a sigmoid function, )(1

1cxa ie

I−−

+= ,

where a represents the slope of the sigmoid, c, its midpoint (the xi value for which the

Information, I=0.5), and xi is the number of fixations on the specific unit i; (on a target:

Chapter II, Jacob & Hochstein, 2009; Chapter III; or on a token: Hollingworth et al,

2001). Both parameters, a and c, can vary, and result in different graded information

levels as a function of the number of fixations (Figure 1a). An alternative to this sigmoid

function is the “nucleation and growth kinetics” function (Avrami, 1939, 1940, 1941;

Erofe'ev, 1946; Johnson & Mehl, 1939), nxkeI ⋅−

−='1 , where n gives the dimensionality

of growth and indicates the rate of nucleation changes, and k’ is proportional to the initial

number of nucleation sites and the degree (power) of linear velocity of growth (Figure

1a).

Page 88: Cognitive and Perceptual Processes in Visual Recognition

- 83 -

The memory decay component is defined by the exponential function ksie− (Figure

1b), where k represents the decay rate (the smaller k is, the slower the memory decay),

and si represents the sequential distance from the previous fixation on unit i (see Jacob &

Hochstein, 2009, Chapter II).

Combining these components gives the full model: Upon fixation on a unit,

information is increased according to a sigmoidal function. Then, while other units are

observed, decay occurs according to the exponential function. When another fixation is

conducted to the same unit, information is increased by incrementing the current

“equivalent number of fixations” on the unit (corresponding to the current information;

see Figure 2c for details). The resulting available information for each unit is expressed

by a value ranging from 0 to 1. For each model iteration, a randomly selected unit is

fixated and the available information concerning this unit is increased and concerning the

others, decayed.

0 2 4 6 80

0.2

0.4

0.6

0.8

1

Fixations on Target

Incr

emen

tal I

nfor

mat

ion

1/(1+exp(−a*(x−c))), a=1.5, c=1.5

1/(1+exp(−a*(x−c))), a=1, c=2

1−exp(−k*x n), k=0.2, n=1.7

1−exp(−k*x n), k=0.1, n=2

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Sequential Distance

Dec

ayin

g In

form

atio

n

exp(−0.02*x)exp(−0.01*x)

Figure 1. a. Sigmoid functions (red) expressing the incremental component, dependant on the number of regional fixations. (Note that immediately following a first fixation to a region, ~25% of the total information from this region becomes available; compare observed detection-rate in Henderson et al., 2001.) Also shown are nucleation and growth functions (brown); See Methods. b. Exponential decay functions expressing the memory decay component as a function of the sequential distance; See Methods.

There are several model variables: total number of fixations (N) on the scene (taken

usually as 150); number of units in the scene (taken as 5x6 or 4x5 units). Model

parameters can be varied until a good fit is found for the available information as

Page 89: Cognitive and Perceptual Processes in Visual Recognition

- 84 -

evidenced by performance in a particular task. The model was implemented using

MatLab.

3. Results

We demonstrate model simulations with a variety of parameters. In Figure 2a we

show an example of the available information for each unit, at the conclusion of N=150

iterations (fixations) of the described dynamics. Figure 2b shows the maximum

information for each unit, during the entire dynamics of the same simulation. Figure 2e

demonstrates the distribution of these 150 fixations, i.e. number of fixations on each unit.

Figure 2f shows the mean sequential distance (time) between fixations, for each unit.

Note that dark colors in Figure 2e represent fewer fixations (and therefore less

information gathered), but smaller sequential distances in Figure 2f (and therefore more

remaining information). As demonstrated by the simulation dynamics (not shown), the

image changes gradually from no information (bluish in Figure 2a) to substantial

information (reddish). Nevertheless, the model does not reach full information for any

scene region.

To demonstrate the increase and decay of information according to the model, Figure

2c presents the dynamics of available information for the most observed unit. Figure 2d

shows the accumulation and decay of information at all 30 units, as well as the average

available information. Note that decreasing the number of fixations in the simulation

would result in a decrease of available information. However, an increase in the number

of fixations for the current chosen parameters would not result in an increase of available

information for the entire scene, because, as observed in Figure 2d, the mean information

had already reached saturation. Saturation is reached because the available information

for each unit decreases rapidly, and more fixations contribute to the fixated units, but at

the expense of other units.

Page 90: Cognitive and Perceptual Processes in Visual Recognition

- 85 -

Information

0

0.2

0.4

0.6

0.8

1Max Informationa b

0 1 2 3 4 5 6 70

0.2

0.4

0.6

0.8

1

Equivalent Fixations

Info

rmat

ion

0 50 100 1500

0.2

0.4

0.6

0.8

1

Total Fixations

Info

rmat

ion

c d

End of simulation

(1) End of decay

(2) Equivalentfixations

(3) Increment by 1

(4) New information

1

2

4

3

59

76

8

1

75

43

2

68

9

Fixations

0

1

2

3

4

5

6

7

8

9

Sequential Distance

10

20

30

40

50

60

70

e f

Figure 2. Model dynamics and snapshot after N=150 iterations (fixations); a=1; c=2; k=0.02; units=30; (see Methods). a. Available information for each visual scene unit after N fixations; average information across units = 0.34. b. Maximum information reached for each unit during the simulation dynamics; average = 0.62. c. Dynamics of available information for the most observed unit (with 9 fixations; marked with white border in a). When a fixation falls on this unit (red dots), information is raised to the corresponding point in the sigmoid function. While fixations to other regions are made (blue dots), information for this unit decays. Upon a new fixation to this unit, calculation of the updated information is as follows (arrows): (1) End of current decay – a new fixation has occurred. (2) Equivalent fixations of current information value is calculated from the inverse sigmoid. (3) Equivalent fixation number is incremented by 1. (4) Corresponding information value to new number of equivalent fixations is derived from sigmoid. Then decay starts again, until next fixation on this unit. d. Dynamics of information over all units. Most observed unit is indicated by thick black line; red line indicates average information over all units. e. Number of fixations on each unit. f. Mean sequential distances for each unit.

Page 91: Cognitive and Perceptual Processes in Visual Recognition

- 86 -

To increase the total available information, three variations can be considered:

i. Fewer units (Figure 3a,b); ii. Different sigmoid parameters - larger a and smaller c,

each in turn raising the information: a by increasing the slope, and c by decreasing the

information midpoint (Figure 3c,d); iii. Smaller decay rate k (Figure 3g,h). Any

combination of the above can be applied; for instance, fewer units and different sigmoid

parameters are employed in Figure 3e,f.

Figure 3a,b shows a simulation of the model with the same parameters as in Figure 2

but with fewer scene units (20 instead of 30). More information is apparent (average of

0.53, compare average of 0.34 in Figure 2, at the snapshot of N=150). The fewer the

units, the more fixations land on each of them, and the smaller is the average sequential

distance between them. When raising the number of fixations to 200, the average

snapshot information does not change (0.53), but the average maximum information

increases from 0.76 to 0.81 (not shown).

We further changed parameters of the model, starting with the sigmoid slope and

mid-point, a and c (Figure 3c,d), and then altering the exponential decay rate, k (Figure

3g,h). Figure 3e,f includes a combination model of fewer units and different sigmoid

parameters. In all these examples, more information is apparent (averages of 0.47, 0.62,

0.56, Figures 3c,d, 3e,f and 3g,h, respectively; compare average of 0.34 in Figure 2, at the

snapshot of N=150). Average information in Figure 3e,f (0.62), which is a combination of

the parameters in Figures 3a,b and 3c,d is higher than both (0.53, 0.47, respectively).

Note that when changing the sigmoid parameters, not only is the mean final information

larger, but also the number of fixations needed to reach saturation is smaller – in Figure

3e,f even ~100 fixations suffice.

Page 92: Cognitive and Perceptual Processes in Visual Recognition

- 87 -

0 50 100 1500

0.2

0.4

0.6

0.8

1

Info

rmat

ion

0

0.2

0.4

0.6

0.8

1a b

Information

0 50 100 1500

0.2

0.4

0.6

0.8

1

Info

rmat

ion

0

0.2

0.4

0.6

0.8

1c d

0 50 100 1500

0.2

0.4

0.6

0.8

1

Info

rmat

ion

0

0.2

0.4

0.6

0.8

1 e f

0 50 100 1500

0.2

0.4

0.6

0.8

1

Info

rmat

ion

Total Fixations

0

0.2

0.4

0.6

0.8

1g h

Figure 3. Model simulations. a,c,e&g. Available information, at snapshot of N=150. b,d,f&h. Information dynamics for all units; red line indicates average information. a&b. Parameters as in Figure 2, except with only 20 scene units; a=1; c=2; k=0.02; units=20. Average information = 0.53; average maximum information = 0.76 (not shown). c&d. Changed sigmoid parameters: a=1.5; c=1.5; k=0.02; units=30. Average information = 0.47; average max-information = 0.81. e& f. Both fewer units and changed sigmoid parameters: a=1.5; c=1.5; k=0.02; units=20. Average information = 0.62; average max-information = 0.88. g&h. Lower exponential decay rate, k: a=1; c=2; k=0.01; units=30. Average information = 0.56; average max-information = 0.74.

Page 93: Cognitive and Perceptual Processes in Visual Recognition

- 88 -

Figure 4 demonstrates the model with two variations of the nucleation and growth

function for the incremental information (See Methods and Figure 1a).

0 50 100 1500

0.2

0.4

0.6

0.8

1

Info

rmat

ion

0

0.2

0.4

0.6

0.8

1a b

Information

0

0.2

0.4

0.6

0.8

1

0 50 100 1500

0.2

0.4

0.6

0.8

1

Total Fixations

Info

rmat

ion

c d

Figure 4. Model simulation; nucleation and growth; units=30; N=150; a & b: k’=0.2; n=1.7; (compare Figure 2); c & d: k’=0.1; n=2. a. Available information; average = 0.57. Average max-information = 0.73. c. Available information; average = 0.52. Average max information = 0.68. b & d. Information dynamics.

Until now, we controlled the simulation such that there was the same probability of

fixation on all units. In the following example, we set a higher probability of observing

two arbitrarily determined central spots of the visual scene. The probability of observing

the favored regions was set at 0.6, and 0.4 for observing the surrounding regions. In the

example shown in Figure 5, the favored regions span 8 units, resulting in a probability of

0.075 for observing each one; the surrounding regions include 22 units, giving a

probability of 0.018 for each. This yields a factor of >4 for the probability of observing a

preferred unit relative to a non-preferred unit, as is seen in the model outcome (Figure

5c). The average information displays a 3.5 ratio in favor of the preferred unit. The

Page 94: Cognitive and Perceptual Processes in Visual Recognition

- 89 -

smaller ratio may be due to the smaller contribution of additional fixations, as the decay

reduces the information; another explanation may be the different locations on the

sigmoid of the preferred and the other units – the other units lie before the midpoint

where the slope is larger, see Figure 1a). Clearly, there is a tradeoff between the amount

of available information on the preferred regions and the amount of available information

on the other regions.

Information

0

0.2

0.4

0.6

0.8

1

0 50 100 1500

0.2

0.4

0.6

0.8

1

Total Fixations

Info

rmat

ion

a b

Fixations

01234567891011121314151617

c

Figure 5. Model simulation with higher probability of observing eight favored units (shown with frame) than surrounding units; a=1; c=2; k=0.02; units=30; N=150. a. Available information; average = 0.33; preferred units: 0.70; others: 0.20. Average max-information = 0.52; preferred units: 0.90; others: 0.38 (not shown). b. Information dynamics; preferred units indicated in black; others in gray. c. Fixations on the scene; average 5; preferred: 11.3; others: 2.7.

It is important to measure the available information during the entire dynamics of the

simulation, and not only at its endpoint. Following the dynamics (not shown), it was

apparent that the available information on the preferred regions was greater during the

whole process of viewing the “scene”, and not just after the arbitrary last fixation. The

result, obviously, is more information regarding these spots.

Another way of constructing higher information for preferred regions but without

interfering in their sampling rate is by using a different memory decay rate, k. A smaller

decay rate would apply for the two central spots of the visual scene (Figure 6), but with

equal observation probability for all units. Those preferred regions could result from their

being regarded as more semantically meaningful spots.

In this example (Figure 6), in comparison to the above (Figure 5), we find actually

less available information on average (0.25 vs. 0.33). In this example we did not change

the probability of observation, therefore the number of fixations on a preferred unit is

Page 95: Cognitive and Perceptual Processes in Visual Recognition

- 90 -

typically the same as on any other unit, and actually in this example even a bit smaller

(4.4 on average on preferred units, vs. 5.2 on other units). Even though the number of

fixations is smaller, there is on average more available information (0.47 on preferred

units, vs. 0.17 on other units). This difference is significant, showing the effect of

memory decay on the available information.

Information

0

0.2

0.4

0.6

0.8

1

0 50 100 1500

0.2

0.4

0.6

0.8

1

Total Fixations

Info

rmat

ion

a b

Fixations

0

1

2

3

4

5

6

7

8

9c

Figure 6. Model simulation, with smaller memory decay rate for favored regions (k=0.01) than for the surrounding regions (k=0.05); a=1; c=2; units=30; N=150. a. Available information; average = 0.25; preferred units: 0.47; others: 0.17. Average max-information = 0.50; preferred: 0.68; others: 0.43 (not shown). Most observed unit (with 10 fixations is marked with white border b. Information dynamics; preferred units indicated in black; others in gray. Red thick line indicates average information over all units; top thin line indicates average information over preferred units; bottom thin line indicates average information over other units. c. Distribution of fixations. Note that unlike the example of Figure 5, preferred units are not necessarily the most observed. This is also apparent in the number of fixations per unit: average 5; preferred: 4.4; others: 5.2. Yet, the preferred units hold more information due to the smaller decay rate.

Continuing with alteration of the memory decay rate, but now in general and not with

differentiation between regions, we assign k values dependant on the number of regional

fixations. That is, each unit has a different decay rate at different times (Figure 7). The

mapping between number of fixations and k is such that more fixations lead to smaller

decay rate, starting from k=0.05, decreasing k in 0.05 for each additional fixation, down

to k=0.01. With such dependence, the number of regional fixations needs to cross a

barrier in order to pass some threshold of available information. The theory behind this k

determination is that after a certain amount of examination, the resulting knowledge (i.e.

the information) might be more affected by the already-conducted fixations than by the

decaying memory. This may imply, though not necessarily, that a Long Term Memory

(LTM) takes place.

Page 96: Cognitive and Perceptual Processes in Visual Recognition

- 91 -

Information

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 5000

0.2

0.4

0.6

0.8

1

Total Fixations

Info

rmat

ion

a b

Figure 7. Model simulation with memory decay rates that are a function of the number of regional fixations. That is, more fixations lead to a smaller decay rate. a=1; c=2; units=30; N=500. a. Available information; average = 0.59. Average max-information = 0.86 (not shown). b. Information dynamics. Note two phases in rise of information; first is to pass the information threshold of one fixation, due to the rapid decay (large k) after only one fixation on the unit. After this threshold is passed, the decay becomes smaller, allowing a more rapid increase of information in the second phase. The number of fixations for reaching half of the asymptotic average information is 176 (somewhere in between the two phases).

Table 1.

Average results over 10 runs of each simulation ± S.E.

Parameters Results

Figure Description a c k units N Mean Information

Mean Max Information

Basic, Sigmoid (2) 1 2 0.02 30 150 0.35±0.003 0.60±0.008

Units (3a,b) 1 2 0.02 20 150 0.51±0.004 0.77±0.006

Sigmoid (3c,d) 1.5 1.5 0.02 30 150 0.47±0.004 0.77±0.009

Combination (3e,f) 1.5 1.5 0.02 20 150 0.61±0.007 0.90±0.006

k (3g,h) 1 2 0.01 30 150 0.55±0.006 0.71±0.007

Nucleation (4) k’=0.2, n=1.7

k’=0.1; n=2

30 150 0.58±0.005 0.75±0.006

Preferred Regions – Fixating probability(5)

1 2 0.02 30 150 0.32±0.003 0.54±0.005

Preferred Regions – different k (6)

1 2 0.05 ; 0.01 30 150 0.26±0.003 0.52±0.004

k depends on N (7) 1 2 mapping 30 500 0.59±0.008 0.77±0.009

Page 97: Cognitive and Perceptual Processes in Visual Recognition

- 92 -

A summary of the influence of the different parameters on the available information

is presented in Table 1. Note the robustness of the results regarding the available

information in each simulation condition. The implication of such robustness is that,

taken that the model indeed describes the viewing process, the average information one

has about the scene at each moment is determined by the number and sequence of

conducted fixations and the rate of memory decay. Yet, the amount of information differs

between the scene regions.

4. Discussion

We offer a model for visual information, depending on two mechanisms – an

incremental component and a memory decay component – affected, respectively, by the

number of and sequential distance between regional fixations. The aim of the model is to

explain change blindness results, which might be due to different levels of available

information regarding a scene region. This available-information measure might predict

the probability of noticing a change to the corresponding visual unit, that is, to predict the

change detection probability (p; or change blindness probability = 1-p), given that the

next fixation will be to that unit.

There is a fundamental difference between a trial which is controlled by the

experimenter and stopped at a pre-determined moment, and a trial which is controlled by

the subject, that is, can be stopped at any moment. In the former, the relevant model of

information would be the available information after the last snapshot; in the latter the

relevant model is the max-information at each unit, because the subject could have

stopped the trial at any point in the middle, when the regional information might have

been higher, before a decrease has occurred.

We showed a few variations to the model. We set a higher probability of observing

two arbitrarily determined regions of the visual scene (Figure 5). The average

information was higher for the preferred units. We used another method for constructing

higher information for preferred regions, without interfering in their sampling rate, by

using a smaller memory decay rate, k, for the two determined regions of the visual scene

(Figure 6). Even though the number of fixations is essentially equal, there is on average

Page 98: Cognitive and Perceptual Processes in Visual Recognition

- 93 -

more available information for the preferred units. Furthermore, In Figure 7 we define the

decay according to the actual number of fixations on the unit.

In continuation to the alteration of Figure 7, where we define the decay according to

the actual number of fixations on the unit, another suggested variation to the model

would be to define the decay rate in accordance with the equivalent number of fixations.

The equivalent fixations are actually parallel to the current amount of available

information – the more information is present, the slower would be its decay. The

rationale behind this would be that upon establishing a certain level of information, this

information would persist longer and therefore more time will be required to diminish it.

The greater the available information the stronger its stability.

This model’s outcome might be checked in a change detection experiment, involving

eye movement tracking: instead of making the change during the first saccade that leaves

the designated region (i.e., after the first fixation; Hollingworth et al, 2001), the change

can be made after a varying number of fixations on the region (i.e., during the ith saccade

leaving the region). Upon a return fixation to the changed region, the display will be

aborted and observers will be asked for a response. Then, in retrospect, the simulation

can be run on the data, using the observer’s sequence of fixations to the different regions,

into which the display would be divided in advance. The outcome of the simulation is a

probability of detection for a current moment and for each region, therefore yielding the

change detection probability for the specific applied change. However, the outcome of

one trial is either detection or miss. Therefore, many trials need to be conducted

(probably on different subjects or with different pictures), and then gathered according to

the probability that was yielded by the simulation. Only then the rate of detection can be

compared to the probabilities resulting from the simulation, to examine whether the

responses of the subject fit the predictions of the model. We assume, in accord with the

suggested model, that the detection rate will rise with the number of fixations conducted

to the region before the change, and with the temporal proximity of such fixations. After

gathering the results, the parameters of the model can be played with, until reaching the

ones which supply the best fit for the data.

Positive results will support the theory that gathering of information over a specific

region of the scene is derived from a growing number of fixations on that particular

Page 99: Cognitive and Perceptual Processes in Visual Recognition

- 94 -

region, and will strengthen the conclusion that several fixations on a scene location are

necessary for achieving recognition.

5. Conclusions

In this chapter, we suggest a model that might suggest a source of our not being that

brilliant at detecting change in the visual scene, although the change might be major (but

limited to a specific region), even semantically.

The theory that stands behind this model is that a few fixations are needed to gain

enough information on a scene data unit. This alone is not enough, because as the eyes

move aside and new data are presented to the visual field, the intervening data act as a

mask, and memory decays. Here the sequential distance between fixations on a region

comes into play.

The result of the model, in terms of the dynamics, information distribution among the

units, the average available information and the maximum information in each unit, may

reflect an actual scenario. There are many parameters that can be adjusted, as shown in

the Results section. Moreover, combinations of the parameter variations can be applied.

This chapter shows the potential entailed in this model. Further investigation of real-

world data should follow, for comparison and determination of the model parameters

empirically. The application, the fit and the predictability of the model still needs to be

tested.

Acknowledgments

I thank Isaac Jacob for suggesting the nucleation and growth kinetics.

Page 100: Cognitive and Perceptual Processes in Visual Recognition

- 95 -

DISCUSSION

Summary

I investigated aspects of visual perception leading to recognition, using

psychophysical methods (Chapters I-III) and eye movement tracking (Chapters II-III).

In my first experiment (Chapter I, Jacob & Hochstein, 2008) I used the Set game,

containing four dimensions and rich, abstract stimuli. A preference for similarity within

the set over span was repeatedly observed, pointing to the existence of a basic similarity-

perceiving mechanism. Development of such a mechanism may be explained by the

general principle that the visual system specializes in detecting spatial change, so that it

spots identical objects, which are the unique in the otherwise rich and diverse everyday

environment. Another finding was that perceptual elements, such as similarity and the

MAV (most abundant value) of the card display influence set detection. Therefore, we

may conclude that even in cognitive tasks, there are perceptual elements that affect

performance. A third major finding, derived from the results of the horse-race model

analysis, is that visual search processes may take place independently and

simultaneously. All these results, the preference for similarity, the impact of perceptual

elements in performance of cognitive tasks, and the carrying out of separate processes

independently and in parallel, are all general rules that would be expected to apply to a

variety of tasks and perception scenarios.

Secondly, I researched the influence of fixations – their number and their sequence –

on recognition. For this purpose I formulated the Identity Search Task, and conducted a

comparison between detected and undetected targets (Chapter II, Jacob & Hochstein,

2009). I found more fixations on the eventually detected target than on the undetected

one, together with a different pattern in the sequence of fixations. In a backward

dynamics analysis a bifurcation point between the detected and the undetected target was

revealed, where the differential viewing properties begin.

With those conclusions, I moved on to the next experiment, controlling the display

according to a varying number of fixations on the target (Chapter III). The findings were

that an increase in number of target fixations leads to improved performance and

Page 101: Cognitive and Perceptual Processes in Visual Recognition

- 96 -

accuracy, measured by hit-rate and detectability, d’, to decrease in response time, and to

higher subjective confidence level.

Last but not least, I designed a mathematical model for simulating gain of information

during observation of the visual scene (Chapter IV).

General Discussion

Number of fixations

A major finding of my research is that, in the comparison between two

simultaneously presented targets, there are more fixations on the eventually detected

target than on the undetected one (Chapter II, Jacob & Hochstein, 2009). The opposite

scenario – more fixations on the undetected target – would have implied that only a few

fixations, or even just one on each region, are needed for detection, so that when a target

is fixated it will either be detected immediately (or at least very quickly) or perhaps

never. This hypothesis is thus rejected. The third potential scenario, which was also not

found, is of an equal number of fixations on the detected and undetected targets on

average. The experimental result of the numbers not being equal allows us to rule out the

hypothesis that the number of fixations is irrelevant for recognition. Instead, we may

conclude that more fixations are needed for processing visual information in order to

achieve recognition. This can also be clearly observed in Chapter III, where I found that

an increase in number of target fixations leads to better recognition, reflected by a

decrease in response time and increase in hit-rate and detectability, d’.

Multiple fixations can play one (or both) of two roles: A deictic role, as suggested by

Ballard, Hayoe, Pook & Rao (1997), pointing to the card which properties are currently

being processed, so that there is a tradeoff between Working Memory and the number of

fixations; An integrating (or incremental) role, if more than one fixation is needed for

completing the entire process of perception of one card. I suggest that fixations are used

to gather and integrate information.

Page 102: Cognitive and Perceptual Processes in Visual Recognition

- 97 -

Sequential distance

I found that not only the number of fixations on the target is influential, but also the

sequential distance between target fixations (Chapter II, Jacob & Hochstein, 2009).

Additional results (Appendix I, referring to Chapter II) show that detection does not

always follow even after viewing two cards one right after the other (i.e. in quick

succession). Even when detection does follow, it does not always follow immediately,

but can occur many fixations later. These results imply that the requirement is not for just

one pattern of sequential distance that suffices for detection, but rather a combination of

patterns might be necessary. I suggest two options for such a combination: A

combination of several patterns, which can include also sequential distances larger than 3

(each by itself less prevalent in the detected cards scan path); alternatively, only

sequential distances of 1-3 are influential, but a few of them are needed in order to cross

the “barrier” to conscious recognition.

In the Identity Search Task, the sequential distance refers to the distance between

fixations on the two distinct target cards, but the sequential distance can be generalized to

the distance between two fixations on the same region, in case the target consists of one

item only. Note that when quick succession is close to detection, it may be a result (not

the cause) of early detection. In this case we cannot answer simply the question of what

comes first, more fixations or recognition.

What comes first?

Returning to the question of what comes first – more fixations or recognition – I will

now discuss the two scenarios. If just vast fixations led to recognition, there wouldn’t be

a rise in the hit-rate after fewer fixations (Chapter III). So I can conclude that even a few

fixation lead to some recognition. This partial recognition, reflected in the hit-rate which

is above chance level but not very high, is then followed by more fixations, until reaching

complete recognition. Uniting the two, I can therefore conclude that a few fixations lead

to an early recognition state, which in turn leads to more fixations, leading eventually to

full explicit recognition. Graded recognition derives from the graded information that is

gathered fixation after fixation on the same scene region.

Page 103: Cognitive and Perceptual Processes in Visual Recognition

- 98 -

Information debate

I presented in the Introduction a debate regarding the information received during

fixation. Two opposing standpoints were presented, one suggesting that visual object

representations disintegrate immediately upon withdrawal of attention (coherence theory,

Rensink, 2000b), and the other suggesting that visual representations accumulate to form

a relatively detailed representation of a scene (visual memory theory, Hollingworth et al.,

2001).

As I previously pointed out, the literature refers to accumulation and integration of

information across the entire visual scene. I want to take the same debate, and apply it to

local regions. To elaborate, I question whether information obtained from fixations to a

certain local region is accumulated, or does the information vanish as a result of memory

decay as soon as a fixation to the local region is over, needing another fixation

accompanied by a shift of attention to that region, for retrieving the information again.

(Then, what happens in the second time, or third, and so on, of viewing the same object –

does the processing of visual information start all over again, or is there an amount of

information already “there”?)

I believe that the truth lies somewhere in the middle. My results show that several

fixations are needed for detection on the one hand, pointing to accumulation of

information, and on the other hand, the sequential distance between fixations also plays a

role, implying memory decay. In Chapter IV I presented a model of available visual

information that accounts for a combination between accumulation of information and

memory decay.

Previous hidden assumption: Is one fixation indeed enough?

Another thing worth noting is that a hidden assumption of the coherence theory

(Rensink, 2000) and of the theory of the world as an outside memory (O’Regan, 1992) is

that one fixation per region is enough for retrieving full information regarding a certain

location. Therefore, the inference regarding limitation of memory capacity across a

saccade, and the same for existence stability of the visual representation, comes from a

principle logical error. The conclusion that only a few items can be remembered across a

saccade (Rensink, 2000) was drawn from experiments guaranteeing only one fixation on

Page 104: Cognitive and Perceptual Processes in Visual Recognition

- 99 -

the region. This was not taken one step further. The problem with this logic is the

assumption that one fixation must be sufficient, making it legitimate to expect that after

one fixation, if memory can be preserved, it will be reflected in performance. But,

consider the alternative that the reason for the low performance is that in the first place

the presentation of the region was not built completely after only one fixation, because a

few fixations on the target are necessary for a full and stable perception.

This theory might be checked in a change detection experiment: instead of making

the change during the first saccade that leaves the region (i.e., after the first fixation;

Hollingworth, Williams & Henderson, 2001), the change can be made after few fixations

on the region (i.e., during the ith saccade leaving the region). I assume, in accord with the

suggested theory, that the detection rate will rise with the number of fixations conducted

to the region before the change, and with the temporal proximity of such fixations.

Theoretical reasoning regarding evolutionary advantage

Returning to the unstated assumption that one fixation per region is enough for

retrieving full information regarding a certain location, I claim that besides experimental

evidence there is also an evolutionary advantage in requiring several fixations.

I suggest that it may be more economical for the visual system to scan the visual

scene and gather partial information from each sampled region, instead of expending all

the resources on one location to gather full information. Attention can be spread

uniformly at first, gathering minimal information in minimal time, and only then can

attention be paid repeatedly to the already observed regions, or just to the more important

locations, to accumulate information about them gradually (and not necessarily equally).

The advantage of the evolutionary development of this scanning method comes from the

creation of parallel partial representations for all locations, rather than full knowledge

about one location, on account of none about others.

According to this theory, if 250ms (an average fixation duration) were sufficient for

gathering all the information about a particular 2º field, then fixations should be shorter

(in order to act according to the model's parallel gathering of information). We therefore

conclude that 250ms do not suffice. Another implication of this theory is that the

duration of a fixation is not determined solely by the time needed for saccadic

Page 105: Cognitive and Perceptual Processes in Visual Recognition

- 100 -

programming (Findlay, Brown & Gilchrist, 2001), and not by the time needed for

gathering all the information present at a location, but by the time needed for gathering a

minimal amount of functional and practical information.

What is this minimal information? It changes from the first fixation to later ones.

After the first fixation, the minimal information may be such that it is “sufficient to direct

attention and the eyes to whatever object is required” (Rensink, 2000, p. 1476). But the

increase in the amount of information following each additional fixation on the same

region is not necessarily linear.

Moreover, the amount of time required for gathering full information about a spot

depends on the informativeness of the region. More complex and informative regions

require more time for processing.

The duration of an optimal fixation, adopted by an ideal observer, can theoretically be

calculated as the equilibrium state between a long enough fixation for gathering full

information about a location, and the necessity to build spatial and parallel representation

of the whole scene. The calculation should also consider the time and energy consumed

by programming and conducting the saccade.

A concluding remark

Recall the demonstration of the passing gorilla (Simons & Chabris, 1999), which was

described in the Introduction. This example teaches us that we sometimes do not detect

even major changes in the scene (a passing gorilla), not because of lack of memory as to

what appeared before (certainly the gorilla was not there before), but because of absence

of information about the specific region. The absence of information can result from no

fixations or from not enough fixations, which are essential for acquiring the information.

Rensink (2000b) started his paper with, “Once upon a time it was widely believed

that human observers built up a complete representation of everything in their visual

field”. I might say that, “Once upon a time there was a belief that one fixation is enough

for gathering all the information about a certain region”.

Page 106: Cognitive and Perceptual Processes in Visual Recognition

- 101 -

Future Directions

i. Looking into the Eyes

As a next step in my research concerning eye-movements in visual search leading to

recognition, continuing the comparison between detected and undetected targets (Chapter

II, Jacob & Hochstein, 2009) and the graded recognition as a function of target fixations

(Chapter III), I seek an array of objects in which a target is embedded, with the following

characteristics:

1- A target which is consisted of only one item. (Not a pair or even three cards, as in

my previous studies, therefore not involving a comparison between items).

2- A natural target which subjects will search for – with instructions, but even

without training – for example due to a semantic meaningful difference from the

distractors. The experiment should not involve memorizing and recall of an

object, because I am interested in (isolating) the pure process of recognition.

3- A target which does not differ much from the distractors so that it does not pop

out (since I want the subjects to scan the different items in the display), and is

generally not identified at first glance – even at first glance at the item (so that

several fixation on the target will be conducted for the integration of information

over multiple fixations).

One idea for such a target relies on research showing that people are very good at

identifying eyes that are looking directly at them (Baron-Cohen, Campbell, Karmiloff-

Smith, Grant & Walker, 1995; Jenkins, Beaver & Calder, 2006; Kleinke, 1986), i.e.

straight ahead; (below, “direct gaze”). Observers are highly accurate at discerning direct

gaze from gaze averted to the left or right by 10º (Jenkins et al., 2006). Therefore the

target will be the directly gazing eyes, and the distractors will be averted eyes. Physically

– not too different stimuli, but semantically, meaningfully – very different.

This research meets another interest of mine. Besides what we perceive with the eyes,

there is the question of what do the eyes reflect? Why is it a common belief that the eyes

reflect our state of mind (Calder, Lawrence, Keane, Scott, Owen, Christoffels & Young,

2002; Perrett & Emery, 1994)? Is it something in the angle of the eyes, pupil size, eye-lid

Page 107: Cognitive and Perceptual Processes in Visual Recognition

- 102 -

opening, around-eye wrinkles and the expression on the face, or something more deep

within the eyes?

Why do specifically the eyes, which serve for vision, among all the five senses, also

reflect something from the inside of the person, a state of the mind? What is so unique

about the eyes? I will go over the five senses, maybe trivially, in a search of an answer.

Hearing is done with the ears. Although very complex, the ears do not have to move

or to be directed towards the source in order to receive the sound (aside from some

movement of the head). Smell is processed with the help of the nose; it, too, does not

have to change location for the process to occur (aside from the extreme of sniffing).

Taste is processed via the tongue, which does move, but is an inner organ, not exposing

its movement. The somatosensory sense is our ability to feel any sensation, mediated by

the skin. It includes exploration of objects with our hands, finger tips, etc.

It turns out that the eyes, which serve vision, are the only specifically exposed organs

that have to move in order to sense accurately. This is precisely because the foveal field,

which occupies ~2º (Anstis, 1974; Riggs, 1965), is the only region in which information

is perceived with sharpness, clarity and accuracy. This forces us to fixate different

locations over and over again. Moreover, this process occurs in most of our awake hours.

Therefore, movements of the eyes reflect processes underlying vision, the information we

are concentrating on, and the cognitive states of our minds.

There is an additional value to this research: In order to learn more about what is

reflected by the eyes, a comparison will be made between people who do not practice,

and karate practitioners, or any other practitioners of an art involving intense perception

of messages from the eyes: practitioners of other martial arts, therapists (e.g. shiatsu or

conventional therapists such as psychologists), yogis, and meditation practitioners.

The principles of karate are both physical and mental. Karate practice demands hold

of the body in a certain posture, not using force of the limbs, but the whole weight of the

body; going ahead in an outburst, reaching the target with one movement – “being there”,

using one’s kime – the energy coming from the center of the body.

The mental sides of the practice demand awareness to the surrounding environment,

and focusing of the mind. A good practice also involves detecting in advance an intention

of the opponent to initiate a move, so that the reaction would precede the opponent’s

Page 108: Cognitive and Perceptual Processes in Visual Recognition

- 103 -

movement and reaching. This is done by looking into the opponent’s eyes (probably

because the movements do not start in the limbs, but in the brain, and the eyes reflect its

state). Another demand is not showing signs of intention to initiate a move – not by small

movements of the body, not by an expression on the face, and by not allowing the eyes to

reflect this intention. Thus, these principles entail, on one hand – perceiving other’s

intentions from what is reflected from their eyes, and avoiding reflecting intentions with

the eyes on the other hand.

The experiment will be divided into two parts. The design of the first part will be

similar to the one described in Chapter II (Jacob & Hochstein, 2009), to allow

comparison between detected and undetected eyes. The tested parameter in this part will

be the RT. The design of the second part will be similar to the one described in Chapter

III. Half of the displays will include a target (just one) and half not, exposing the

observers to varying number of fixations on the target. The tested parameter will be the

correctness of the response, as a function of fixations on the target. In both experiments, I

am of course interested in the eye movements and the sequence of fixations leading to

recognition.

Expected outcome and potential significance of research:

The experiment would investigate if a few fixations are necessary on an object to

achieve recognition, even when comparison between items is not needed. I expect better

detection of direct gaze by the practitioners, in terms of search time, correction of

responses, and number of fixations on the target. I would attempt to find an additional

component reflected through the eyes, which not all are aware of in everyday life, and

which is capable of being learned with persistent practice.

ii. New response methodology (relating to Chapter III)

In Chapter III, I exposed subjects to varying numbers of fixations on the target, after

which the display was terminated and a response was obtained. Half of the displays

included a target, and half not, and observers were asked to respond accordingly. In

Page 109: Cognitive and Perceptual Processes in Visual Recognition

- 104 -

addition to a yes/no response, they also reported their confidence level – “Don’t know”,

“Maybe”, or “Sure” – in regard to their given answer. They were given a monetary award

only according to correctness of the answer (Hits and Correct Rejections).

A problem arose from the conservativeness of some of the subjects in their responses,

lessening ‘yes’ responses, and replying positively only when sure. I would like to adopt a

method which would encourage giving the correct answer, even when not sure.

To overcome this problem, I suggest combining the two responses – subjects would

respond in the range 1-5. The middle number, 3, will indicate a completely ‘Don’t know’

situation, 4|5 ‘Yes’, and 2|1 ‘No’. Responses of 4|2 will indicate ‘Maybe’ (yes|no,

respectively), and 5|1 ‘Sure’, as follows:

1 2 3 4 5

No No Yes Yes

Sure Maybe

Don’t know Maybe Sure

Subjects will be informed that they will not be awarded for “3” responses. Therefore,

when a subject has a hunch, it is monetarily worthwhile to be “daring” and go in either

direction. ‘Maybe’ and ‘Sure’ responses may still vary between subjects, but the ‘Don’t

know’ response will be avoided unless they really have no idea, thus forcing subjects to

choose between ‘yes’ and ‘no’, instead of giving a ‘no’ response by default.

iii. Experimental test of the visual information model (Chapter IV)

Hollingworth, Williams & Henderson (2001) conducted a change detection

experiment, tracking eye movements, and applying the change when the eyes first left the

designated region, i.e. when a saccade was initiated from it. This experiment can be

repeated, and upon a return fixation to the changed region, abort the display and ask the

observer for a response. Then, in retrospect, the simulation described in Chapter IV will

be run on the data, using the observer’s sequence of fixations to the different regions, into

which the display would be divided in advance. The outcome of the simulation is a

Page 110: Cognitive and Perceptual Processes in Visual Recognition

- 105 -

probability of detection for a current moment and for each region, therefore yielding the

change detection probability for the specific applied change. However, the outcome of

one trial is either detection or a miss. Therefore, many trials need to be conducted, and

then gathered according to the probability that was yielded by the simulation. Only then

the rate of detection can be compared to the probabilities resulting from the simulation, to

examine whether the responses of the subject fit the predictions of the model.

In the experimental design described above, there will always be exactly two

fixations on the changed region, one before and one after the change, that is, actually only

one fixation before the change. This will test only the component of the model which is

influenced by memory, related to the sequence of fixations. To overcome this limitation,

the number of fixations before applying the change can be varied (as already suggested

earlier in the Discussion). After gathering the results, the parameters of the model can be

played with, until reaching the ones which supply the best fit for the data.

Another examination can be made to already existing data collected in change

detection experiments which involved eye movements tracking, but were not necessarily

modified according to the fixations or saccades. In such data, the fixations to the regions

of the change prior to its application can be counted, and the detection rate can be plotted

versus it. I predict that as there were more fixations on the changed region before the

change was conducted, the rate of detection was also higher. This would strengthen my

theory that information is gathered and integrated across fixations to the same region.

Page 111: Cognitive and Perceptual Processes in Visual Recognition

- 106 -

CONCLUSIONS

I introduced a method of comparing detected and undetected targets, using eye

movement tracking. For this purpose, I used a computerized version of the Set game and

also devised the Identity Search Task, using also the preference for similarity found when

investigating components of the Set game.

Analysis of fixations during performance of the task revealed that there are

consistently more and longer fixations on the ultimately detected target, ruling out the

hypothesis that the number of fixations is irrelevant for detection. Furthermore, the

pattern of sequential distance between the pair cards changes when approaching

detection. A bifurcation point is observed along the dynamics of search, in which the to-

be detected pair overpowers the undetected one. I can therefore conclude that more

fixations are needed for processing the visual information in order to achieve recognition.

I introduced another novel method, of exposing subjects to varying numbers of

fixations specifically on the target. I found a decrease in RT with increasing number of

fixations on the target, increase in hit-rate and in detectability (d’), and increase in

confidence level. These findings point to gathering of information over a specific region

of the scene, resulting from a growing number of fixations on that certain region. This

strengthens my conclusion that several fixations on a scene location are necessary for

achieving recognition.

These findings suggest that in visual search, several stages are involved in the

perceptual process leading ultimately to recognition and detection. Stage 1: A “search in

the dark” – random fixations on the different cards in arbitrary order; Stage 2: Implicit

(unconscious) recognition of the target, controlling and guiding eye movements to the

relevant sensed location of the target; Stage 3: Explicit detection with conscious

knowledge of target presence and its location.

This 3-stage model of the perceptual recognition process is supported both by the

observed bifurcation point (Chapter II, Jacob & Hochstein, 2009) and by the experiment

described in Chapter III. The bifurcation point – the sharp upturn in the slope of

accumulated fixations derived from the backward dynamics alignment – suggests the

early, implicit, recognition stage in the process of perception, which is followed by more

Page 112: Cognitive and Perceptual Processes in Visual Recognition

- 107 -

fixations, leading, ultimately, to the point of explicit target recognition. The experiment

described in Chapter III supplies evidence that these stages are separated from each other

by the number of fixations on the target.

Do fixations lead to detection, or does unconscious pre-recognition guide the eyes to

more fixations? Analysis of the hit-rates as a function of number of fixations on the

target, together with the confidence level responses of the subjects, suggests that a few

fixations lead to an early implicit recognition state, which in turn leads to more fixations,

and thus eventually to full explicit recognition.

I also constructed a mathematical model for spatially available information,

depending on two prevailing components: The increment of information, due to the

growing number of fixations on the region, and the decay of information, influenced by

the sequential distance from the last fixation on the region, due to memory decay. The

model is compatible with my conclusion that several fixations on the target are essential

to obtain recognition.

Page 113: Cognitive and Perceptual Processes in Visual Recognition

- 108 -

REFERENCES

Ahissar, M. & Hochstein, S. (1997). Task difficulty and the specificity of perceptual

learning. Nature, 387, 401-406.

Ahissar, M. & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptual

learning. Trends in Cognitive Sciences, 8, 457-464.

Altmann, G. T. (2004). Language-mediated eye movements in the absence of a visual

world: The ‘blank screen paradigm’. Cognition, 93, B79–B87.

Anstis, S. M. (1974). Letter: A chart demonstrating variations in acuity with retinal

position. Vision Research, 14, 589-592.

Antes, J. R. (1974). The time course of picture viewing. Journal of Experimental

Psychology, 103, 62– 70.

Ashby, G. F. & Maddox, T. W. (2005). Human category learning. Annual Review of

Psychology, 56, 149-178.

Ashby, G. F., Queller, S. & Berretty, P.M. (1999). On the dominance of unidimensional

rules in unsupervised categorization. Perception & Psychophysics, 61, 1178-1199.

Atkinson, R. C. & Shiffrin, R. M. (1968). Human memory: A proposed system and its

control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning

and motivation (Vol. 2, pp. 89-195). New York: Academic Press.

Avrami, M. (1939). Kinetics of Phase Change. I. General Theory. Journal of Chemical

Physics, 7, 1103–1112.

Avrami, M. (1940). Kinetics of Phase Change. II. Transformation-Time Relations for

Random Distribution of Nuclei. Journal of Chemical Physics, 8, 212–224.

Avrami, M. (1941). Kinetics of Phase Change. III. Granulation, Phase Change, and

Microstructure. Journal of Chemical Physics, 9, 177–184.

Page 114: Cognitive and Perceptual Processes in Visual Recognition

- 109 -

Baddeley, A. D. & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The

psychology of learning and motivation: Advances in research and theory (Vol. 8, pp. 47-

89). New York: Academic Press.

Ballard, D. H., Hayoe, M. M., Pook, P. K., & Rao, R. P. (1997). Deictic codes for the

embodiment of cognition. Behavioral and Brain Sciences, 20, 723-767.

Barlasov-Ioffe, A. & Hochstein, S. (2008). Perceiving illusory contours: Figure detection

and shape discrimination. Journal of Vision, 8(11):14, 1-15.

Baron-Cohen, S., Campbell, R., Karmiloff-Smith, A., Grant, J., & Walker, J. (1995). Are

children with autism blind to the mentalistic significance of the eyes? British Journal of

Developmental Psychology, 13, 379–398.

Becker, M. W., Pashler, H., Anstis, S. M. (2000). The role of iconic memory in change

detection tasks. Perception, 29, 273-286.

Berg, E. A. (1948). A simple objective technique for measuring flexibility in thinking.

The Journal of General Psychology, 39, 15–22.

Bowden, E. M. & Jung-Beeman, M. (2003). Aha! Insight experience correlates with

solution activation in the right hemisphere. Psychonomic Bulletin & Review, 10, 730-737.

Bowden, E. M. & Jung-Beeman, M. (2007). Methods for investigating the neural

components of insight. Methods, 42, 87-99.

Bower, G. & Trabasso, T. (1963). Reversals prior to solution in concept identification.

Journal of Experimental Psychology, 66, 409-418.

Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436.

Brandt, H. F. (1945). The Psychology of Seeing. New York: Philosophical Library, Inc.

Bridgeman, B., Hendry, D. & Stark, L. (1975). Failure to detect displacement of the

visual world during saccadic eye movements. Vision Research, 15, 719-722.

Page 115: Cognitive and Perceptual Processes in Visual Recognition

- 110 -

Bruce, C. J., Goldberg, M. E., Bushnell, C., & Stanton, G. B. (1985). Primate frontal eye

fields. II. Physiological and anatomical correlates of electrically evoked eye movements.

Journal of Neurophysiology, 54, 714–734.

Buswell, G. T. (1935). How People Look at Pictures. Chicago: University Chicago Press.

Calder, A. J., Lawrence, A. D., Keane, J., Scott, S. K., Owen, A. I., Christoffels, I. &

Young, A. W. (2002). Reading the mind from eye gaze. Neuropsychologia, 40, 1129–

1138.

Carr, T. H. (1992). Automaticity and cognitive anatomy: Is word recognition

“automatic”? American Journal of Psychology, 105, 201-237.

Chalmers, D. J. (1995a). Facing up the problem of consciousness. Journal of

Consciousness Studies, 2, 200-219.

Chalmers, D. J. (1995b). The puzzle of conscious experience. Scientific American, 273,

80-86.

Chen, X., & Zelinsky, G. J. (2006). Real-world visual search is dominated by top-down

guidance. Vision Research, 46, 4118–4133.

Cohen, J. D., Dunbar, K. & McClelland, J. L. (1990). On the control of automatic

processes: A parallel distributed processing model of the Stroop effect. Psychological

Review, 97, 332-361.

Cohen, J. D., Servan-Schreiber, D. & McClelland, J. L. (1992). A parallel distributed

processing approach to automaticity. American Journal of Psychology, 105, 239-269.

Corballis, M. C. (1998). Interhemispheric neural summation in the absence of the corpus

callosum. Brain, 121, 1795-1807.

Cornelissen, F. W., Peters, E. M., & Palmer, J. (2002). The Eyelink Toolbox: Eye

tracking with MATLAB and the Psychophysics Toolbox. Behavior Research Methods,

Instruments & Computers, 34, 613–617, http://cornelis.med.rug.nl/pub/EyelinkToolbox.

Page 116: Cognitive and Perceptual Processes in Visual Recognition

- 111 -

DeSchepper, B., & Treisman, A. (1996). Visual memory for novel shapes: Implicit

coding without attention. Journal of Experimental Psychology: Learning, Memory and

Cognition, 22, 27–47.

Deubel, H. & Schneider, W. X. (1996). Saccade target selection and object recognition:

Evidence for a common attentional mechanism. Vision Research, 36, 1827-1837.

Droll, J. A., Gigone, K., & Hayhoe, M. M. (2007). Learning where to direct gaze during

change detection. Journal of Vision, 7(14):6, 1–12.

Durand, A. C. & Gould, G. M. (1910). A method of determining ocular dominance,

Journal of the American Medical Association, 55, 369–370.

Egeth, H. E. & Mordkoff, J. T. (1991). Redundancy gain revisited: Evidence for parallel

processing of separable dimensions. In G. R. Lockhead & J. R. Pomerantz (Eds.), The

perception of structure: essays in honor of Wendell R. Garner (pp. 131–143).

Washington DC: American Psychological Association.

Erofe'ev, B. V. (1946). Generalized equation of chemical kinetics and its Application in

reactions involving solids. Comptes Rendus de L’Academie des Sciences de L’URSS, 52,

511-514.

Farah, M. J. & Aguirre, G. K. 1999. Imaging visual recognition: PET and fMRI studies of

the functional anatomy of human visual recognition. Trends in Cognitive Sciences, 3,

179-186.

Feldman, J. A. (1985). Four frames suffice: A provisional model of vision and space.

Behavioral & Brain Sciences, 8, 265-289.

Findlay, J. M., Brown, V., & Gilchrist, I. D. (2001). Saccade target selection in visual

search: The effect of information from the previous fixation. Vision Research, 41, 87–95.

Garner, W. R. & Lee, W. (1962). An analysis of redundancy in perceptual discrimination.

Perceptual and Motor Skills, 15, 367-388.

Page 117: Cognitive and Perceptual Processes in Visual Recognition

- 112 -

Gersch, T. M., Kowler, E., & Dosher, B. (2004). Dynamic allocation of visual attention

during the execution of sequences of saccades. Vision Research, 44, 1469–1483.

Goldstone, R. L. (1994). The role of similarity in categorization: providing a

groundwork. Cognition, 52, 125-157.

Goldstone, R. L. (1998). Perceptual learning. Annual Review of Psychology, 49, 585-612.

Goldstone, R. L. & Barsalou, L.W. (1998). Reuniting perception and conception.

Cognition, 65, 231-262.

Graham, N. V. S. (1989). Visual Pattern Analyzers. New York: Oxford University Press.

Grant, D. A. & Berg, E. A. (1948). A behavioural analysis of degree of reinforcement

and ease of shifting to new responses in a Weigl-type card sorting problem. Journal of

Experimental Psychology, 38, 404–411.

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New

York: Wiley.

Hammer, R., Diesendruck, G., Weinshall, D. & Hochstein, S. (2009). The development

of category learning strategies: what makes the difference? Cognition, 112, 105-119.

Hammer, R., Hertz, T., Hochstein, S. & Weinshall, D. (2005). Category learning from

equivalence constraints. Cognitive Sciences Suppl. 27, 893-8.

Hammer, R., Hertz, T., Hochstein, S. & Weinshall, D. (2007). Classification with

positive and negative equivalence constraints: Theory, computation and human

experiments. In F. Mele, G. Ramella, S. Santillo, & F. Ventriglia, (Eds.), Brain, Vision,

and Artificial Intelligence (pp.264-276). Berlin Heidelberg: Springer-Verlag Press.

Hammer, R., Hertz, T., Hochstein, S. & Weinshall, D. (2008). Category learning from

equivalence constraints. Cognitive Processing. In press.

Page 118: Cognitive and Perceptual Processes in Visual Recognition

- 113 -

Heaton, R. K., Chelune, G. J., Talley, J. L., Kay G.C. & Curtiss G. (1993). Wisconsin

Card Sorting Test manual revised and expanded. Odessa, FL: Psychological Assessment

Resources.

Henderson, J. M. & Hollingworth, A. (1999). High-level scene perception. Annual

Review of Psychology, 50, 243-271.

Henderson, J. M., Pollatsek, A. & Rayner, K. (1989). Covert visual attention and

extrafoveal information use during object identification. Perception & Psychophysics, 45,

196-208.

Hochberg, J. (1970). Attention, organization, and consciousness. In D. I. Mostofsky

(Ed.), Attention: Contemporary theory and analysis (pp. 99-124). New York: Appleton-

Century-Crofts.

Hochstein, S. & Ahissar, M. (2002). View from the top: hierarchies and reverse

hierarchies in the visual system. Neuron, 36, 791-804.

Hollingworth, A., Williams, C. C. & Henderson, J. M. (2001). To see and remember:

Visually specific information is retained in memory from previously attended objects in

natural scenes. Psychonomic Bulletin & Review, 8, 761-768.

Hoffman, J. E. & Subramanian, B. (1995). The role of visual attention in saccadic eye

movements. Perception & Psychophysics, 57, 787-795.

Holyoak, K. J. (1990). Problem Solving. In D. N. Osherson & E. E. Smith (Eds.),

Thinking - An invitation to cognitive science, Vol. 3 (pp. 117-146). Cambridge, MA: MIT

Press.

Horowitz, T. S. & Wolfe, J. M. (1998). Visual search has no memory. Nature, 394, 575-

577.

Irwin, D. E. (1991). Information integration across saccadic eye movements. Cognitive

Psychology, 23, 420-456.

Page 119: Cognitive and Perceptual Processes in Visual Recognition

- 114 -

Irwin, D. E. (1992). Visual memory within and across fixations. In K. Rayner (Ed.), Eye

movements and visual cognition: Scene perception and reading (pp. 146-165). New

York: Springer-Verlag.

Irwin, D. E. (1996). Integrating information across saccadic eye movements. Current

Directions in Psychological Science, 5, 94-100.

Irwin, D. E. & Andrews, R. (1996). Integration and accumulation of information across

saccadic eye movements. In T. Inui & J. L. McClelland (Eds.), Attention and

Performance XVI: Information integration in perception and communication (pp. 125-

155). Cambridge, MA: MIT Press, Bradford Books.

Irwin, D. E., Yantis, S. & Jonides, J. (1983). Evidence against visual integration across

saccadic eye movements. Perception & Psychophysics, 34, 49-57.

Jackson, F. (1986). What Mary Didn't Know. Journal of Philosophy, 83, 291-295.

Jacob, M. & Hochstein, S. (2008). Set Recognition as a window to perceptual and

cognitive processes. Perception & Psychophysics, 70, 1165-1184.

Jacob, M. & Hochstein, S. (2009). Comparing eye movements to detected vs. undetected

target stimuli in an Identity Search Task. Journal of Vision, 9(5):20, 1-16.

Jenkins, R., Beaver, J. D. & Calder, A. J. (2006). I thought you were looking at me:

direction-specific aftereffects in gaze perception. Psychological Science, 17, 506-513.

Johnson, W. A. & Mehl, R. F. (1939). Reaction kinetics in processes of nucleation and

growth. Transactions of the American Institute of Mining and Metallurgical Engineer,

135, 416-442.

Jonides, J. (1981). Voluntary versus automatic control over the mind’s eye movement. In

J. B. Long & A. D. Baddeley (Eds.), Attention performance (vol. IX, pp. 187–203).

Hillsdale, NJ: Erlbaum.

Page 120: Cognitive and Perceptual Processes in Visual Recognition

- 115 -

Jung-Beeman, M., Bowden, E. M., Haberman, J., Frymiare, J. L., Arambel-Liu, S.,

Greenblatt, R., Reber, P. J. & Kounios, J. (2004). Neural Activity when people solve

verbal problems with insight. PLoS Biology, 2, 500-510.

Kanwisher, N. (1987). Repetition blindness: Type recognition without token

individuation. Cognition, 27, 117–143.

Kanwisher, N. (1991). Repetition blindness and illusory conjunctions: Errors in binding

visual types with visual tokens. Journal of Experimental Psychology: Human Perception

and Performance, 17, 404–421.

Kleinke, C. (1986). Gaze and eye-contact: a research review. Psychological Bulletin, 100,

78–100.

Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt Brace.

Kohler, W. (1929). Gestalt Psychology. New York: Liversight.

Kowler, E., Anderson, E., Dosher, B. & Blaser, E. (1995). The role of attention in the

programming of saccades. Vision Research, 35, 1897-1916.

Landy, D. & Goldstone, R. L. (2007). How abstract is symbolic thought? Journal of

Experimental Psychology: Learning, Memory & Cognition, 33, 720-733.

Liversedge, S. P. & Findlay, J. M. (2000). Saccadic eye movements and cognition.

Trends in Cognitive Sciences, 4, 6-14.

Loftus, G. R. & Mackworth, N. H. (1978). Cognitive determinants of fixation location

during picture viewing. Journal of Experimental Psychology: Human Perception and

Performance, 4, 565–572.

Mackworth, N. H. & Morandi, A. J. (1967). The gaze selects informative details within

pictures. Perception & Psychophysics, 2, 547–552.

Maier, N. R. F. (1931). Reasoning in humans. II. The solution of a problem and its

appearance in consciousness. Journal of Comparative Psychology, 12, 181–194.

Page 121: Cognitive and Perceptual Processes in Visual Recognition

- 116 -

Martinez-Conde, S. (2006). Fixational eye movements in normal and pathological vision.

Progress in Brain Research, 154, 151-176.

Martinez-Conde, S., Macknik, S. L. & Hubel, D. H. (2004). The role of fixational eye

movements in visual perception. Nature Reviews Neuroscience, 5, 229-240.

McCarley, J. S., Wang, R. F., Kramer, A. F., Irwin, D. E., & Peterson, M. S. (2003). How

much memory does oculomotor search have? Psychological Science, 14, 422-426.

McConkie, G. W. & Zola, D. (1979). Is visual information integrated across successive

fixations in reading? Perception & Psychophysics, 25, 221-224.

Medin, D. L. (1973). Measuring and training dimensional preferences. Child

Development, 44, 359-362.

Miller, J. (1982) Divided attention: evidence for coactivation with redundant signals.

Cognitive Psychology, 14, 247–279.

Miller, J. (1986). Time course of coactivation in bimodal divided attention. Perception &

Psychophysics, 40, 331–343.

Mitroff, S. R., Simons, D. J. & Franconeri, S. L. (2002). The siren song of implicit

change detection. Journal of Experimental Psychology: Human Perception and

Performance, 28, 798–815.

Monnier, P. (2006). Detection of multidimensional targets in visual search. Vision

Research, 46, 4083-4090.

Motter, B. C. & Belky, E. J. (1998). The guidance of eye movements during active visual

search. Vision Research, 38, 1805-1815.

Müller, H. J. & Rabbitt, P. M. (1989). Reflexive and voluntary orienting of attention:

time course of activation and resistance to interruption. Journal of Experimental

Psychology: Human Perception and Performance, 15, 315–330.

Neisser, U. (1976). Cognition and reality. San Francisco: Freeman.

Page 122: Cognitive and Perceptual Processes in Visual Recognition

- 117 -

Neumann, O. (1984). Automatic processing: A review of recent findings and a plea for an

old theory. In W. Prinz & A. F. Sanders (Eds.) Cognition and Motor Processes, Springer-

Verlag, Berlin.

Nodine, C. F., Carmody, D. P., & Kundel, H. L. (1978). Searching for Nina. In J. W.

Senders, D. F. Fisher, & R. A. Monty (Eds.), Eye movements and the higher

psychological functions (pp. 241-257). Hillsdale, NJ: Erlbaum.

Oliva, A. & Torralba, A. (2006). Building the gist of a scene: the role of global image

features in recognition. Progress in Brain Research, 155, 23-36.

O’Regan, J. K. (1992). Solving the “real” mysteries of visual perception: The world as an

outside memory. Canadian Journal of Psychology, 46, 461-488.

O’Regan, J. K. & Levy-Schoen, A. (1983). Integrating visual information from

successive fixations: Does trans-saccadic fusion exist? Vision Research, 23, 765-768.

O’Regan, J. K., Rensink, R. A. & Clark, J. J. (1999). Change blindness as a result of

“mudsplashes.” Nature, 398, 34.

Over, E. A. B., Hooge, I. T. C., Vlaskamp, B. N. S., & Erkelens, C. J. (2007). Coarse-to-

fine eye movement strategy in visual search. Vision Research, 47, 2272-2280.

Perrett, D. I., & Emery, N. J. (1994). Understanding the intention of others from visual

signals – neurophysiological evidence. Cahiers de Psychologie Cognitive, 13, 683–694.

Phillips, W. A. (1974). On the distinction between sensory storage and short-term visual

memory. Perception & Psychophysics, 16, 283-290.

Phillips, W. A. & Christie, D. F. M. (1977a). Components of visual memory. Quarterly

Journal of Experimental Psychology, 29, 117-133.

Phillips, W. A. & Christie, D. F. M. (1977b). Interference with visualization. Quarterly

Journal of Experimental Psychology, 29, 637-650.

Page 123: Cognitive and Perceptual Processes in Visual Recognition

- 118 -

Pomerantz, J. R. (2002). Perception: Overview. Encyclopedia of Cognitive Science.

Hampshire, England: MacMillan/Nature Publishing Group.

Porta, J. B. (1953). De refractione optices parte: Libri novem, Carlinum & Pacem,

Naples.

Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental

Psychology, 32, 3–25.

Potter, M. C., Chun M. M., Banks, B. S. & Muckenhoupt, M. (1998). Two attentional

deficits in serial target search: the visual attentional blink and an amodal task-switch

deficit. Journal of Experimental Psychology: Learning, Memory and Cognition, 24, 979-

992.

Raab, D. H. (1962). Statistical facilitation of simple reaction times. Transactions of the

New York Academy of Sciences, 24, 574-590.

Raymond, J. E., Shapiro, K. L. & Arnell, K. M. (1992). Temporary suppression of visual

processing in an RSVP task: an attentional blink? Journal of Experimental Psychology:

Human Perception and Performance, 18, 849-860.

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of

research. Psychological Bulletin, 124, 372-422.

Rayner, K., McConkie, G. W. & Ehrlich, S. (1978). Eye movements and integrating

information across fixations. Journal of Experimental Psychology: Human Perception &

Performance, 4, 529-544.

Regehr, G. & Brooks, L. R. (1993). Perceptual manifestations of an analytic structure: the

priority of holistic individuation. Journal of Experimental Psychology: General, 122, 92-

114.

Rensink, R. A. (2000a). The dynamic representation of scenes. Visual Cognition: Special

Issue on Change Detection & Visual Memory, 7, 17-42.

Page 124: Cognitive and Perceptual Processes in Visual Recognition

- 119 -

Rensink, R.A. (2000b). Seeing, sensing, and scrutinizing, Vision Research, 40, 1469–

1487.

Rensink, R. A. (2004). Visual sensing without seeing. Psychological Science, 15, 27–32.

Rensink, R. A., O'Regan, J. K. & Clark, J. J. (1995). Image flicker is as good as saccades

in making large scene changes invisible. Perception (Suppl), 24, 26-28.

Rensink, R. A., O'Regan, J. K. & Clark, J. J. (1997). To see or not to see: the need for

attention to perceive changes in scenes. Psychological Science, 8, 368–373.

Richardson, D. C., & Spivey, M. J. (2000). Representation, space and Hollywood

Squares: Looking at things that aren't there anymore. Cognition, 76, 269–295.

Riggs, L. A. (1965). Visual acuity. In C. H. Graham (Ed.), Vision and Visual Perception

(pp. 321–349). New York: Wiley.

Ringach, D. L., Hawken, M. J., & Shapley, R. (1996). Binocular eye movements caused

by the perception of three-dimensional structure from motion. Vision Research, 36, 1479-

1492.

Rosch, E. & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure

of categories. Cognitive Psychology, 7, 573-605.

Rosch, E., Mervis, C. B., Gray W. D., Johnson, D. M. & Boyes-Bream P. (1976). Basic

objects in natural categories. Cognitive Psychology, 8, 382-439.

Rubin, N., Nakayama, K. & Shapley, R. (1997). Abrupt learning and retinal size

specificity in illusory-contour perception. Current Biology, 7, 461-467.

Rubin, N., Nakayama, K. & Shapley, R. (2002). The role of insight in perceptual

learning: evidence from illusory contour perception. In M. Fahle & T. Poggio (Eds.),

Perceptual learning (pp. 235-251). Cambridge, MA: MIT Press.

Ruthishauser, U. & Koch, C. (2007). Probabilistic modeling of eye movement data

during conjunction search via feature-based attention. Journal of Vision, 7(6):5, 1-20.

Page 125: Cognitive and Perceptual Processes in Visual Recognition

- 120 -

Schall, J. D. (1991). Neuronal activity related to visually guided saccades in the frontal

eye fields of rhesus monkeys: comparison with supplementary eye fields. Journal of

Neurophysiology, 66, 559-579.

Schneider, W. & Shiffrin, R. M. (1977). Controlled and automatic human information

processing: I. Detection, search, and attention. Psychological Review, 84, 1-66.

Schyns, P. G., Bonnar, L. & Gosselin, F. (2002). Show me the features! Understanding

recognition from the use of visual information. Psychological Science, 13, 402-409.

Schyns, P. G. & Oliva, A. (1994). From blobs to boundary edges: evidence for time and

spatial scale dependent scene recognition. Psychological Science, 5, 195-200.

Shapiro, K. L., Raymond, J. E. & Arnell, K. M. (1994). Attention to visual pattern

information produces the attentional blink in rapid serial visual presentation. Journal of

Experimental Psychology: Human Perception and Performance, 20, 357-371.

Sheinberg, D. L. & Logothetis, N. K. (2001). Noticing familiar objects in real world

scenes: The role of temporal cortical neurons in natural vision. The Journal of

Neuroscience, 21, 1340-1350.

Shepherd, M., Findlay, J. M. & Hockey, R. J. (1986). The relationship between eye

movements and spatial attention. Quarterly Journal of Experimental Psychology, 38A,

475-491.

Shiffrin, R. M. & Schneider, W. (1977). Controlled and automatic human information

processing: II. Perceptual learning, automatic attending and a general theory.

Psychological Review, 84, 127-190.

Shneor, E. & Hochstein, S. (2006). Eye dominance effects in feature search. Vision

Research, 46, 4258-4269.

Simons, D. & Chabris, C. (1999). Gorillas in our midst: sustained inattentional blindness

for dynamic events. Perception, 28, 1059-1074.

Page 126: Cognitive and Perceptual Processes in Visual Recognition

- 121 -

Simons, D. J. & Levin, D. T. (1997). Change blindness, Trends in Cognitive Sciences, 7,

261–267.

Simons, D. J. & Levin, D. T. (1998). Failure to detect changes to people in real-world

interaction. Psychonomic Bulletin & Review, 5, 644-649.

Simons, D. J., & Rensink, R. A. (2005). Change blindness: Past, present, and future.

Trends in Cognitive Sciences, 9, 16–20.

Smith, M. L., Gosselin, F., & Schyns, P. G. (2006). Perceptual moments of conscious

visual experience inferred from oscillatory brain activity. Proceedings of the National

Academy of Science of the USA, 103, 5626-5631.

Smith, R. W. & Kounios, J. (1996). Sudden insight: All-or-none processing revealed by

speed-accuracy decomposition. Journal of Experimental Psychology: Learning, Memory,

and Cognition, 22, 1443-1462.

Sperling, G. (1960). The information available in brief visual presentations.

Psychological Monographs, 74, 1–29.

Sternberg, R. J. & Davidson, J. E. (1995). The Nature of Insight. Cambridge, MA:

Bradford Books/MIT Press.

Stone, L. S., Miles, F. A., & Banks, M. S. (2003). Linking eye movements and

perception. Journal of Vision, 3(11):i, i-iii.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of

Experimental Psychology, 18, 643-662.

Townsend, J. T. & Ashby, F. G. (1983). Stochastic modeling of elementary psychological

processes. Cambridge, MA: Cambridge University Press.

Treisman, A. (2006). How the deployment of attention determines what we see. Visual

cognition, 14, 411-443.

Page 127: Cognitive and Perceptual Processes in Visual Recognition

- 122 -

Treisman, A. M. & Gelade, G. (1980). A feature-integration theory of attention.

Cognitive Psychology, 12, 97-136.

Treisman, A., Vieira, A. & Hayes, A. (1992). Automaticity and preattentive processing.

American Journal of Psychology, 105, 341-362.

Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.

Ullman, S. (1984). Visual Routines. Cognition, 18, 97-159.

Vergilino, D. & Beauvillain, C. (2001). Reference frames in reading: evidence from

visually and memory-guided saccades. Vision Research, 41, 3547–3557.

Vergilino-Perez, D. & Findlay, J. M. (2006). Between-object and within-object saccade

programming in a visual search task. Vision Research, 46, 2204–2216.

Wolfe, J. M. (1999). Inattentional amnesia. In V. Coltheart (Ed.), Fleeting memories (pp.

71-94). Cambridge: MIT Press.

Yarbus, A. L. (1967). Eye Movements and Vision. New York: Plenum.

Zhaoping, L. (2008). After-search—visual search by gaze shifts after input image

vanishes. Journal of Vision, 8(14):26, 1-11.

Page 128: Cognitive and Perceptual Processes in Visual Recognition

- 123 -

APPENDIX I

Appendix to Chapter II

Predictive value of number of target fixations and their sequential distance

To further investigate whether the number of target fixations and their sequential

distance indeed have a crucial role on target recognition, their predictive value was

analyzed. If the number of target fixations is influential, then it would indicate the

approaching detection; same for the sequential distance, (below, “interval patterns”).

I show below histograms of the time to detection for each number of fixations on the

detected cards. The histograms move to the left and get narrower (i.e., the variance

decreases) at 2-4 fixations, suggesting that after a few fixations imminent detection is

predicted. Thus, there is indeed good predictive value, though of course having a few

fixations is a necessary but not a sufficient condition since a similar number of fixations

are sometimes found on the undetected pair cards.

Figure 1. Time until detection, for each number of target fixations. Top right insets indicate the number of trials with at least that number of fixations on the detected pair, same for undetected, and percent of the latter from the former.

Page 129: Cognitive and Perceptual Processes in Visual Recognition

- 124 -

A similar analysis was conducted for the interval patterns, that is, the pattern of

sequential distances between the fixations on the target cards. The distribution of the

number of fixations until detection, counted from the first appearance of a sequential

distance pattern, is shown in Figures 2a&b. For each sequential distance between fixating

the two members of a pair, I plot the mean number of fixations after viewing that pattern

for the first time until detection (Figure 3). This is shown for the detected pair (red) and

for the undetected pair (blue; error bars indicate SE; numbers are number of trials in

which such a pattern occurred). Detection does not always follow immediately, but can

occur many fixations later.

Detected

Sequential distance

Num

ber

of fi

xatio

ns u

ntil

dete

ctio

n

1 2 3 4 5 6 7

2

4

6

8

10

12

14

16

18

>19

Undetected

Sequential distance1 2 3 4 5 6 7

2

4

6

8

10

12

14

16

18

>19

0

5

10

15

20

25

30

0

5

10

15

20

25

30

Figure 2. Distribution of number of fixations until detection, counted from the first appearance of a sequential distance pattern. Left: Detected pair. Right: Undetected pair.

For trials where only one pair was seen with each sequential distance pattern, I plot

the probability of detection of that pair, (rather than the pair without that pattern; Figure

4). Note that only 75% of the pairs observed in quick succession are detected. Note that

when quick succession is close to detection, it may be a result, and not necessarily the

cause, of early detection.

I suggest two options regarding the sequential distance pattern – one is that the

requirement for detection is actually for a combination of several sequential distances

(rather than for just a certain one), which can include also sequential distances larger than

4 (See Discussion, Chapter II, Jacob & Hochstein, 2009). A second option is that indeed

only smaller distances have impact on detection, but a few of them are needed in order to

cross the threshold for conscious recognition.

Page 130: Cognitive and Perceptual Processes in Visual Recognition

- 125 -

0 1 2 3 4 5 6 7 86

8

10

12

14

16

18

20

72

29

73

47

58

50

4435

40

3443

35

29

26

Sequential distance between target cards

Num

bet o

f fix

atio

n to

det

ectio

n (f

rom

firs

t app

eara

nce)

Predictions according to intervals between target fixations

Detected − meanUndetected − mean

Figure 3. Mean number of fixations until detection, after viewing a sequential distance pattern for the first time; detected pair in red, undetected pair in blue. Error bars indicate SE; numbers are number of trials in which such a pattern occurred. Detection does not always follow immediately, but can occur many fixations later.

1 2 3 4 5 6 7

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

89

88

86

59

6060

41

Probability of detection following each pattern

Sequential distance

Pro

babi

lity

of d

etec

tion

y=−0.1x+0.84 , R2=1

y=−0.003x+0.57 , R2=0.08

Figure 4. Probability of detection, when only one pair was seen with the specified sequential distance pattern.

Page 131: Cognitive and Perceptual Processes in Visual Recognition

- 126 -

APPENDIX II

Appendix to Chapter III

After-Search

Another interesting issue is the observation of a location when the stimulus is no

longer there. Sometimes, after masking, the next fixation or fixations are to the location

of the second card of the pair, or they wander between the locations of both cards. Those

fixations occur without the presence of the target cards. Why should such fixations be

conducted? After all, there is no additional information that can be retrieved from that

location.

Zhaoping (2008) found the same phenomenon of visual search with gaze shifts after

the input image vanishes. She found that performance was enhanced by gaze to the target.

An immediate problem arises as to feasibility, as it is accepted that saccadic planning

takes place during the preceding fixation, or at most during the one before that (Findlay,

Brown & Gilchrist, 2001; Gersch, Kowler & Dosher, 2004; Kowler, Anderson, Dosher &

Blaser, 1995). Therefore, contrary to what is known about visual persistence and iconic

memory, which vanish within ~0.2 sec (Sperling, 1960), there must be memory and

information to rely on after mask onset. Zhaoping suggested that continued fixations are a

result of a bottom-up saliency map.

I take it one step further, and offer an explanation not for its feasibility, but for its

utility. This extra fixation (if not initiated before the masking, or initiated after the

masking but before the subject has realized that a mask was applied) may imply that

fixating a location may aid in retrieving or processing the information that previously

resided there. This is in line with Zhaoping’s explanation that ocular-motor coordinates

may be used by the brain as pointers to memories of objects or events. Maybe attention

allocation to a certain area helps process information that had been previously presented

there, even if it is no longer present. This may also be concluded from the tendency to

look back at locations of previously viewed visual objects, even after these have vanished

(Altmann, 2004; Richardson & Spivey, 2000). This might also support my suggestion

that several fixations on the target are essential to obtain recognition.

Page 132: Cognitive and Perceptual Processes in Visual Recognition

- 1 -

תקציר

בחנתי אלמנטים שונים המשפיעים על התהליך ? אילו היבטים בתפיסה החזותית מובילים לזיהוי והכרה

והן , )I ,Jacob & Hochstein, 2008פרק (הן אלמנטים של התמונה החזותית , בכלל זה. המוביל לזיהוי

).IIIפרק ; II ,Jacob & Hochstein, 2009פרק ( על הזיהוי –רן והרצף שלהן מספ–השפעת הפיקסציות

בו הצגתי על פני מסך מחשב , ).Set Enterprises Inc®" (סט"בניסוי הראשון השתמשתי במשחק

כאשר , מספר ומילוי, צורה, צבע–המכילים צורות אבסטרקטיות בעלות ארבעה מימדים חזותיים " קלפים"

המשימה היא למצוא שלושה . ת המימדים עשוי לקבל אחד מתוך שלושה ערכים אפשרייםכל אחד מארבע

אחד הממצאים . קלפים שעבורם בכל מימד בנפרד מתקיים דמיון מוחלט או שוני מוחלט בין הערכים

דבר המעלה , )מאשר חביקת ערכים שונים(העיקריים בחלק זה של המחקר הינו העדפה לדמיון פנימי בסט

העדפת הדמיון . על אף שהמח מתמחה בזיהוי הבדלים ושוני, דמיון בסיסי-ל קיום מנגנון תפיסתאפשרות ש

אשר בה הדבר יוצא הדופן והשונה הוא , יכולה להיות בעלת יתרון אבולוציוני דווקא בשל המגוון של הסביבה

צאה מעלייה במספר ממצא משמעותי נוסף הוא ירידה בזמן התגובה כתו. דווקא הנקודות הבודדות של הדמיון

Corballis, 1998; Egeth & Mordkoff, 1991; Garner" (מרוץ הסוסים"בהתאם למודל , הסטים במיצג

& Lee, 1962; Graham, 1989; Miller, 1982, 1986; Monnier, 2006; Raab, 1962; Townsend

& Ashby, 1983( ,בנוסף מצאתי השפעות הן . זמנית-תלות בין חיפושים המתבצעים בו-דבר המעיד על אי

ממצאים אלו מחזקים את הטענה שתפקודים . של אלמנטים תפיסתיים והן של אלמנטים רעיוניים על הזיהוי

,Goldstone & Barsalou, 1998; Landy & Goldstone(תפיסתיים וקוגניטיביים קשורים זה בזה

2007.(

על מנת ללמוד , מזוהים- בין פריטים בלתיהשלב הבא במחקרי היה ביצוע השוואה בין פריטים מזוהים ל

& II,Jacobפרק (פרדיגמה זו שימשה אותי בניסוי השני . אודות רצף האירועים המובילים לזיהוי

Hochstein, 2009 .(חיפוש הזהויות"תכננתי את משימת , לשם מטרה זו) "Identity Search (– חיפוש

כאשר על כל אחד מערך ריבועי , וצגים על מסך המחשבהמ" קלפים"המשימה כוללת . פריטים זהים לחלוטין

המשימה של המשתתפים היא למצוא שני קלפים . של יחידות מרובעות שחורות ולבנות המעורבבות ביניהן

כך , זוגות זהיםשניכל מיצג של קלפים הכיל , באופן המותאם במיוחד למטרה. זוג זהה–זהים לחלוטין

.מזוהה-וד שהזוג השני נותר בלתיבע, שהתאפשר זיהוי של זוג אחד

. הגישה המרכזית שננקטה במחקר נסמכה על רישום תנועות העיניים של המשתתפים תוך כדי הניסוי

-הפיקסציות הינן התמקדויות שמשכן כ. תנועות העיניים ידועות כמשקפות את המצב הקוגניטיבי של האדם

במרכז שדה הראיה ולפיכך המידע ממנו 2°-ל כאשר ממקמות אזור מסויים בעל רדיוס ש, מילישניות250

אשר , אני התרכזתי בפיקסציות. והוא היחיד הניחן בעיבוד מפורט, )מרכז הרשתית של העין (fovea-נקלט ב

Page 133: Cognitive and Perceptual Processes in Visual Recognition

- 2 -

י שימוש "המעקב אחר תנועות העיניים בוצע ע. ובמידע הנאסף באמצעותן, עליהן מושתתת הראיה האנושית

..SR Research Ltd מתוצרת EyeLink Iבמכשיר

בנוסף . והתההממצא שהתקבל הוא קיום פיקסציות מרובות יותר על המטרה שזוהתה ביחס לזו שלא ז

אך , לכך התקבלה תבנית רצף פיקסציות שבה יש סמיכות רבה יותר בין הפיקסציות על המטרה שזוהתה

הובהר ביתר שאת תפקיד , בעקבות ממצאים אלו. סמיכות זו אינה מופיעה בשלב ההתחלתי של החיפוש

.הפיקסציות

חד מהזוגות ביחס לנקודות הזיהוי כלומר הצבת הפיקסציות המצטברות על כל א, ניתוח הדינאמיקה לאחור

בין המטרה נקודת הסתעפותחשף , תוך מיצוע על פני כל המיצגים, )נקודה אחת ויחידה בתום כל מיצג בניסוי(

תבניות הפיקסציות . אשר בה מסתמנת התחלה של שינוי במאפייני ההתבוננות, שזוהתה לבין זו שלא זוהתה

עד לנקודה בה מספר הפיקסציות על הזוג המזוהה החל , דומות מאודמזוהים היו -על הזוגות המזוהים והבלתי

השיפוע של הזוג המזוהה גבר על השיפוע של –כתוצאה מאירוע מקדים , להראות שבירה חדה כלפי מעלה

כי אם גם של קרבת , מכך אני גוזרת השפעה על הזיהוי לא רק של מספר הפיקסציות. מזוהה-הזוג הלא

, במספר המצטבר של הפיקסציותשינוימציעה כי הגורם המשמעותי יכול להיות האני . הפיקסציות בזמן

הנקודה בה השיפוע עובר ערך . ולא רק מספר הפיקסציות עצמו, כלומר שיפוע והסמיכות הפיקסציות בזמן

נקודת . כלשהו הנקבע מראש יכולה להיחשב כנקודת ההסתעפות שבה יש שינוי מצב בתהליך החיפוש

". זיהוי מרומז-קדם"לשלב שני של " חיפוש באפלה"כולה לשקף מעבר בין שלב ראשוני של הסתעפות זו י

דבר המוביל בסופו של , בעקבותיו מתבצעות עוד פיקסציות, זיהוי-מכך עולה האפשרות שיש שלב של טרום

.דבר לזיהוי

מנת - על. זיהויהאני ממשיכה וגורסת כי נדרשות מספר פיקסציות המכוונות באופן פרטני למטרה לצורך

חשיפת המשתתפים למספר משתנה ומבוקר של –תכננתי פרדיגמה חדשה וייעודית , לבחון היפותזה זו

י "הדבר נעשה ע). IIIפרק (המיצג והתבקשה תגובה מהמשתתףשלאחריו הועלם , פיקסציות על המטרה

אלא שהפעם , טים הזהיםהשתמשתי שוב במשימת חיפוש הפרי. אמת-בחינת הנתונים אודות הפיקסציות בזמן

.ומחציתם לא הכילו מטרה כלל, )זוג זהה יחיד(מחצית המיצגים הכילו מטרה אחת בדיוק

, דבר המתבטא בעליה ברמת הביצוע והדיוק, מצאתי שעליה במספר הפיקסציות מובילה לזיהוי טוב יותר

כן בירידה בזמן -וכמו, ’detectability (d(ויכולת ההבחנה ) Hits( הנכונות "פגיעות"כלומר שעורי ה

.התגובה

ניתוח של ? זיהוי לא מודע מנחה את העיניים לעוד פיקסציות-האם הפיקסציות מובילות לזיהוי או שטרום

מעלה , ביחד עם רמת הוודאות של המשתתפים באשר לתגובתם, שעור הפגיעות כתלות במספר הפיקסציות

אשר בתורו מוביל לעוד פיקסציות על , זיהוי מרומז- את האפשרות שמעט פיקסציות מובילות למצב של טרום

.ולבסוף לזיהוי מלא ומפורש, המטרה

Page 134: Cognitive and Perceptual Processes in Visual Recognition

- 3 -

פני פיקסציות המכוונות באופן -אני מסיקה שיש איסוף ומיזוג של מידע על, על סמך תוצאות מחקרי

כפי שהומחש בעזרת מספר הפיקסציות הנדרשות לשם זיהוי , כלומר פיקסציות מקומיות, פרטני למטרה

לאזור המיועד של השדה , לא בהכרח עוקבות, המידע נאסף באופן הדרגתי על פני מספר פיקסציות. רההמט

הידיעה שלרצף החיפוש ולמרחק הסדרתי בין הפיקסציות גם נודעת חשיבות מצביעה על דעיכת , מנגד. החזותי

.ודות אזור מסוייםאני גורסת שפיקסציה יחידה בהחלט אינה מספיקה על מנת לאסוף מידע מלא א. זיכרון

אני טוענת שיש . בצורה כזולהתפתחות מערכת הראיה , למעשה יתרון אבולוציוני, אני מציעה הסבר

מאשר , משום חסכון עבור מערכת הראיה בסריקת התמונה החזותית ואיסוף מידע חלקי אודות כל אזור נדגם

הסריקה הראשונית רק לאחר. מלאבהקצאת כל המשאבים וריכוזם באזור אחד בכדי לאסוף אודותיו מידע

היתרון של התפתחות . תופנה תשומת הלב למוקדים המשמעותיים יותר בכדי לצבור מידע נוסף אודותם

להבדיל מביסוס , המערכת באופן המתואר הוא ביצירת ייצוגים חלקיים במקביל עבור אזורים שונים בתמונה

.רים אחריםהבא על חשבון אזו, מידע מלא אודות אזור אחד ויחיד

המודל ). IVפרק (אני מציעה מודל מתמטי עבור מידע מרחבי זמין , בעקבות הגילויים והרעיונות הללו

בעקבות עלייה במספר הפיקסציות על האזור , במידע גידול בכמות–מחד : בנוי משני מרכיבים מנוגדים

הפיקסציה האחרונה על עתהסדרתי מהמושפעת מהמרחק , דעיכה של מידע–ומאידך , )המרכיב המתווסף(

).מרכיב דעיכת הזיכרון(האזור

פיקסציה אחר , אסכם בהסקה אודות קיום הדרגתיות בזיהוי כתוצאה מההדרגתיות במידע שנאסף

מנת לעבד את המידע -נדרשות מספר פיקסציות על אזור נתון על. על אזור מסויים בתמונה החזותית, פיקסציה

. י ולהכרה שלמההחזותי המצוי בו ולהגיע לזיהו