16
1 Cross-modal perceptual organization Professor Charles Spence, Oxford University To appear in: Oxford Handbook of Perceptual Organization Oxford University Press Edited by Johan Wagemans 1. Introduction The last quarter of a century or so has seen a dramatic resurgence of research interest in the question of how sensory inputs from different modalities are combined, merged, and/or integrated, and, more generally, come to affect one another in perception (see Bremner et al. 2012; Stein 2012; Stein et al. 2010, for reviews). Until very recently, however, the majority of this research, inspired as it often has been by neurophysiological studies of orienting responses in model brain systems, such as the superior colliculus, has tended to use simple stimuli (e.g., a single beep, flash, and/or tactile stimulus) on any given trial (see Stein & Meredith 1993, for a review). As a result, to date, problems of perceptual organization have generally taken something of a back seat in the world of multisensory perception research. That said, there has recently been a surge of scientific interest in trying to understand how the perceptual system (normally in humans) deals with, or organizes, more complex streams/combinations of multisensory inputs into meaningful perceptual units, and how ambiguous (often bistable) inputs are interpreted over time. In trying to answer such questions, it is natural that researchers look for inspiration in the large body of empirical research that has been published over the last century on the Gestalt grouping principles identified within the visual (Beck 1982; Kimchi et al. 2003; Kubovy & Pomerantz 1981; Wagemans et al. 2012; Wertheimer 1923/1938; see also the many other chapters in this volume), auditory (Bregman 1990; Wertheimer 1923/38; see also Denham & Winkler this volume), and occasionally tactile systems (Gallace & Spence 2011; see also Kappers & Tiest this volume). One might reasonably imagine that those classic grouping principles such as common fate, binding by proximity, and binding by similarity, that have been shown to influence perceptual organization when multiple stimuli are presented within the same sensory modality should also operate when combinations of stimuli originating from different sensory modalities are presented instead. In this review, the evidence concerning the existence of general principles of cross-modal perceptual organization and multisensory Gestalt grouping is summarized. The focus here is primarily on cross- modal perceptual organization and multisensory Gestalten for the spatial (some would say ‘higher’) senses of audition, vision, and touch. Given the space constraints, this review will focus primarily on the results of research that has been published more recently. 1 The main body of the text is arranged around a review of the evidence that is relevant to answering four key questions that run through the literature on cross-modal perceptual organization. 1 Researchers interested in more of a historical perspective should see Spence et al. (2007) and/or Spence and Chen (2012).

Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

1

Cross-modal perceptual organization

Professor Charles Spence, Oxford University To appear in: Oxford Handbook of Perceptual Organization Oxford University Press Edited by Johan Wagemans 1. Introduction The last quarter of a century or so has seen a dramatic resurgence of research interest in the question of how sensory inputs from different modalities are combined, merged, and/or integrated, and, more generally, come to affect one another in perception (see Bremner et al. 2012; Stein 2012; Stein et al. 2010, for reviews). Until very recently, however, the majority of this research, inspired as it often has been by neurophysiological studies of orienting responses in model brain systems, such as the superior colliculus, has tended to use simple stimuli (e.g., a single beep, flash, and/or tactile stimulus) on any given trial (see Stein & Meredith 1993, for a review). As a result, to date, problems of perceptual organization have generally taken something of a back seat in the world of multisensory perception research. That said, there has recently been a surge of scientific interest in trying to understand how the perceptual system (normally in humans) deals with, or organizes, more complex streams/combinations of multisensory inputs into meaningful perceptual units, and how ambiguous (often bistable) inputs are interpreted over time. In trying to answer such questions, it is natural that researchers look for inspiration in the large body of empirical research that has been published over the last century on the Gestalt grouping principles identified within the visual (Beck 1982; Kimchi et al. 2003; Kubovy & Pomerantz 1981; Wagemans et al. 2012; Wertheimer 1923/1938; see also the many other chapters in this volume), auditory (Bregman 1990; Wertheimer 1923/38; see also Denham & Winkler this volume), and occasionally tactile systems (Gallace & Spence 2011; see also Kappers & Tiest this volume). One might reasonably imagine that those classic grouping principles such as common fate, binding by proximity, and binding by similarity, that have been shown to influence perceptual organization when multiple stimuli are presented within the same sensory modality should also operate when combinations of stimuli originating from different sensory modalities are presented instead. In this review, the evidence concerning the existence of general principles of cross-modal perceptual organization and multisensory Gestalt grouping is summarized. The focus here is primarily on cross-modal perceptual organization and multisensory Gestalten for the spatial (some would say ‘higher’) senses of audition, vision, and touch. Given the space constraints, this review will focus primarily on the results of research that has been published more recently.1 The main body of the text is arranged around a review of the evidence that is relevant to answering four key questions that run through the literature on cross-modal perceptual organization.

1 Researchers interested in more of a historical perspective should see Spence et al. (2007) and/or Spence and

Chen (2012).

Page 2: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

2

2. Four key questions in the study of cross-modal perceptual organization Q1: Does the nature of the perceptual organization (or interpretation) of stimuli taking place in one sensory modality influence the perceptual organization (or interpretation) of stimuli presented in another modality? Researchers have typically addressed this first question by investigating whether there is any correlation between the perceptual organization/interpretation of an ambiguous (typically bistable) stimulus (or stream of stimuli) in one modality and the perceptual organization/interpretation of an ambiguous (typically bistable) stimulus (or stream of stimuli) presented simultaneously in a different sensory modality (e.g., Hupé et al. 2008; O’Leary & Rhodes 1984). In what is perhaps the most-often cited early paper on this topic, O’Leary and Rhodes (1984) presented participants with a six-element bistable auditory display and/or with a six-element bistable visual display. The auditory display consisted of a sequence of tones alternating in pitch, while the visual display consisted of an alternating sequence of dots presented from one of two sets of elevations on a monitor (see Figure 1). The onsets of the auditory and visual stimuli were synchronized. The spacing (in pitch and elevation) and the interstimulus interval between the successive stimuli in these displays was manipulated until participants’ perception of whether there appeared to be a single stream of stimuli, alternating in either pitch (audition) or elevation (vision), versus two distinct streams (presented at different pitches and/or elevations) itself alternated on a regular basis over time. The specific question that O’Leary and Rhodes wanted to address in their study was whether their participants’ perception of one vs. two streams in a given sensory modality (say audition) would influence their judgments regarding the number of streams perceived in the other modality (e.g., vision). Confirming their predictions, the results did indeed demonstrate that the number of streams that participants reported in one modality was sometimes influenced by the number of streams that they were currently experiencing (or at least reported experiencing) in the other modality.

Figure 1. (A, B) Schematic illustration of the sequence of auditory and visual stimuli presented by O’Leary and Rhodes (1984) in their study of cross-modal influences on perceptual organization. T1–T6 indicate the temporal order (from first to last) in which the six stimuli were presented in each sensory modality. Half

Page 3: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

3

of the stimuli were from an upper group (frequency in sound, spatial location in vision), the rest from a lower group. The stimuli were presented in sequence, alternating between events from the upper and lower groups, either delivered individually (unimodal condition) or else together in synchrony (in the cross-modal condition). (C, D) Perceptual correlates associated with different rates of stimulus presentation. In either sensory modality, at slow rates of stimulus presentation (C), a single stream (auditory or visual) was perceived, as shown by the continuous line connecting the points. At faster rates of stimulus presentation (D), however, two separate streams were perceived concurrently, one in the upper range (frequency or spatial position, for sound or vision, respectively) and the other in the lower range. In the cross-modal condition, at intermediate rates of stimulus presentation, participants’ reports of whether they perceived one stream versus two in a given sensory modality were influenced by their perception of there being one or two streams in the other modality. O’Leary and Rhodes took these results to show that the nature of the perceptual organization in one sensory modality can influence how the perceptual scene may be organized (or segregated) in another modality. (Reprinted with permission from Spence & Chen 2012, Figure 1.)

O’Leary and Rhodes’ (1984) interpreted their findings as providing some of the first empirical evidence to support the claim that the perceptual organization in one sensory modality affects the perceptual organization of any (plausibly-related) stimuli that may happen to be presented in another modality.2 However, most researchers writing since seem convinced that an alternative non-perceptual explanation (in terms of response bias) might explain the findings just as well (e.g., Cook & Van Valkenburg 2009; Kubovy & Yu 2012; Spence & Chen 2012; Spence et al. 2007; Vroomen & De Gelder 2000). What is more, in one of the only other studies to have directly addressed this first question, a negative result was obtained. In particular, the participants in a study by Hupé et al. (2008) were presented with bistable auditory and visual displays either individually or at the same time. These researchers examined the statistics of the perceptual alternations that took place in each modality stream when presented individually (that is, unimodally) and compared them to the pattern of reversals seen when the stimuli were presented in both modalities simultaneously. The idea being that if the perceptual organization of the stimuli in one sensory modality were to carry-over and influence any perceptual organization in the other modality, then the statistics of perceptual reversals should change, and/or be correlated under conditions of multisensory stimulation. However, Hupé et al. found no such evidence in two experiments. The visual stimuli in Hupé et al.’s (2008) first experiment consisted of a network of crossing lines (square wave gratings) viewed through a circular aperture. This display could either be perceived as two gratings moving in opposite directions or as a single plaid moving in an intermediate direction. Meanwhile, pure tones alternating in frequency in the pattern High (pitch)/Low/High-High/Low/High could be presented over headphones. The participants either heard two segregated streams (High-High-High, and --Low---Low--) or a single stream with the pitch alternating from item to item. While the statistics of switching between alternative perceptual interpretations were similar for the two modalities, there was absolutely no correlation between the perceptual switches taking place in audition and vision.

2 Note that the stimulus displays capitalized on the crossmodal correspondence between pitch and elevation (see

Spence 2011, for a review).

Page 4: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

4

This first experiment can, though, be criticized on the grounds that the participants would have had no particular reason to treat the auditory and visual stimuli as belonging to the same object or event (that is, they were completely unrelated). Hence, the fact that Hupé et al. (2008) obtained a null result is perhaps not so surprising. In a second experiment, the auditory and visual stimuli were spatiotemporally correlated: The auditory stimuli were as in Experiment 1, but were now presented in an alternating sequence from one of a pair of loudspeaker cones, one placed on either side of central fixation. The visual stimuli consisted of the illumination of an LED placed in front of either loudspeaker that could be perceived either as two lights flashing independently, or else could give rise to the perception of horizontal visual apparent motion. However, once again, there was no evidence of any correlation between the perceptual switches taking place in the two modalities. Therefore, despite the fact that the spatiotemporal presentation of the auditory and visual stimuli was correlated in this study, the participants would presumably not have had any particularly good reason to bind the contents of their visual and auditory experience. One other study that is worth mentioning here comes from Sato et al. (2007). They investigated the auditory and visual verbal transformation effect. In the auditory version of this phenomenon (see Warren & Gregory, 1958), as a participant listens to a speech stimulus that is played repeatedly, such as the word ‘life’, after a number of repetitions, it alternates and the observer will likely hear it as ‘fly’ instead. As time passes by, the percept alternates back and forth. Sato et al. discovered that the same thing happens if we look at moving lips repeatedly uttering the same syllable instead (this is known as the visual the transformation effect). Sato and his colleagues presented auditory alone, visual alone, and audiovisual stimulus combinations (either congruent or incongruent). The participants were instructed to report their initial auditory ‘percept’, and whenever it changed over the course of the 90 seconds of each trial. In Sato et al.’s study, either /psә/ or /sәp/ were used as the speech stimuli. The results of their first experiment revealed that the incongruent audiovisual condition, where the visual stimulus alternated between being congruent and incongruent with what was heard, resulted in a higher rate of perceptual alternations as compared to any of the other three conditions. Note here that what is seen and what is heard may be taken by participants to refer to the same phonological entity. In fact, Kubovy and Yu (2012) have argued recently that this (speech) may constitute a unique case when it comes to multisensory multistability.3 To date, the only studies that have attempted to investigate the question of whether the perceptual organization taking place in one modality affects the perceptual organization taking place in the other have involved the presentation of audiovisual stimuli (Hupé et al. 2008; O’Leary & Rhodes 1984; Sato et al. 2007). It is interesting to speculate, then, on whether a similar conclusion would also have been reached on the basis of visuotactile studies.4 There is currently surprisingly little unequivocal support for the view that the perceptual organization (or interpretation) of an ambiguous, or bistable, stimulus (or stimuli) in one sensory modality will necessarily, and automatically, affect the perceptual organization (or interpretation) of a stimulus (or stimuli) that

3 One final thing to note here is that it is unclear from Sato et al.’s (2007) study whether their participants ever

experienced the audiovisual stimulus stream as presenting one stimulus auditorily and another visually, as

sometimes happens in McGurk-type experiments.

4 One way to test this possibility would be to look for correlations in the changing interpretation of bistable

spatial displays such as the Ternus display (Harrar & Harris 2007; cf. Shi et al. 2010), or in simultaneously

presented visual and tactile apparent motion quartets (Carter et al. 2008). Suggestive evidence from Harrar and

Harris, not to mention one’s own intuition, would appear to suggest that if the appropriate stimulus timings

could be established, such that synchronous stimulus presentation was maintained while both modality inputs

retained their individual bistability, then any switch in the perceptual interpretation of the visual display would

likely also trigger a switch in the interpretation of the tactile display (one might certainly frame such a result in

terms of visual dominance).

Page 5: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

5

happens to be presented in another modality at around the same time (even when the auditory and visual stimuli can plausibly be related to one another – e.g., as a result of their cross-modal correspondence, see O’Leary & Rhodes 1984, or due to their spatiotemporal patterning, Hupé et al. 2008; see also Kubovy & Yu 2012). Q2: Does intramodal perceptual grouping modulate cross-modal perceptual grouping? One of the best-known studies to have addressed the question of whether intramodal perceptual grouping modulates cross-modal interactions was reported by Watanabe and Shimojo (2001). The participants in their studies had to report whether two disks that started each trial moving directly towards each other on a screen looked as though they streamed through each other (the more common percept when the display is viewed in silence) or else bounced off one another. This is known as the stream/bounce illusion (Metzger 1934; Michotte 1946/1963). Previously, it had been demonstrated that if a sound is presented at the moment when the two disks meet, the likelihood of participants reporting bouncing increases (Sekuler et al. 1997). Now the innovative experimental manipulation in Watanabe and Shimojo’s study involved demonstrating that the magnitude of this cross-modal effect was modulated by the strength of any intramodal grouping taking place within the auditory modality. More specifically, these researchers found that if the sound presented at the moment of ‘impact’ happened to be embedded within a stream of similar regularly temporally-spaced tones then participants reported fewer bounce percepts. However, the incidence of bounce percepts increased once again if the other tones in the auditory sequence had a markedly different frequency from the ‘impact’ tone. Further support for the claim that the cross-modal effect of an auditory stimulus on visual perception can be modulated by the strength of any intramodal auditory perceptual grouping has also been demonstrated in a number of other studies, utilizing a variety of experimental paradigms (e.g., Ngo & Spence 2010; Vroomen & de Gelder 2000). Additionally, other researchers have reported that the magnitude of the temporal ventriloquism effect5 is modulated by any perceptual grouping that happens to be taking place in the auditory modality (Keetels et al. 2007; see also Cook & Van Valkenburg 2009). But what about any cross-modal effects operating in the reverse direction? Does the perceptual grouping taking place within the visual modality also modulate the cross-modal influence of vision on auditory perception? The answer would appear to be in the affirmative. The majority of the work on this particular issue has been conducted using variations of ‘the cross-modal dynamic capture task’. In a typical study, participants try to discriminate the direction in which an auditory apparent motion stream moved (i.e., judging whether a pair of sequentially-presented sounds appeared to move from left-to-right, or vice versa; see Herzog & Ogmen this volume, on the topic of apparent motion). At the same time, the participants are instructed to ignore any cues delivered by the simultaneous presentation of an irrelevant visual (or, on occasion, tactile) apparent motion stream (see Soto-Faraco et al. 2004b, for a review). The results of numerous studies have now demonstrated that people simply cannot ignore the visual apparent motion (even though it may be entirely task-irrelevant), and will often report that they perceived the sound as moving in the same direction, even if the opposite was, in fact, the case (e.g., Soto-Faraco et al. 2002). As hinted at already, similar cross-modal dynamic capture effects have also been reported in experiments involving the presentation of tactile stimuli as well, both when tactile apparent motion happens to act as the target modality, as well as when it acts as the to-be-ignored distractor modality (Lyons et al. 2006; Sanabria et al. 2005b; Soto-Faraco et al. 2004a).

5 The temporal ventriloquism effect has most frequently been demonstrated between pairs of auditory and visual

stimuli. It occurs when the perceived timing of an event in one modality (normally vision) is pulled toward

temporal alignment with a slightly asynchronous event presented in another modality (e.g., audition; see

Morein-Zamir et al. 2003; Vroomen et al. 2004).

Page 6: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

6

One other area of research that is relevant to the question of cross-modal perceptual organization relates to the local versus global perceptual grouping taking place within a given modality and its effect on perceptual organization within another sensory modality. For instance, Sanabria et al. (2004) demonstrated the dominance of global field effects over local visual apparent motion when the two were pitted directly against each other in the setting of the cross-modal dynamic capture task (see Figure 2). In this particular experiment, the four-lights display (see Figure 2B) induced the impression of two pairs of lights moving in one direction, while the central pair of lights (if considered in isolation) appeared to move in the opposite direction. In other words, if the local motion of the two central lights was from right-to-left, the global motion of the four-light display was from left-to-right instead. However, Sanabria et al.’s results revealed that it was the direction of global visual motion that ‘captured’ the perceived direction of auditory apparent motion (see also Sanabria et al. 2005a).

Figure 2. Schematic illustration of the different trial types presented in Sanabria et al.’s (2004) study of the effect of local versus global visual perceptual grouping on the cross-modal dynamic capture effect. The horizontal arrows indicate the (global) direction of visual apparent motion. The magnitude of the cross-modal dynamic capture effect was significantly greater in the 2-lights displays (A) than in the 4-lights displays (B). More importantly for present purposes though, the results also revealed that the modulatory cross-modal effect of visual apparent motion on the perceived direction of auditory apparent motion was determined by the global direction of visual apparent motion rather than by the local motion of the central pair of lights (which appeared to move in the opposite direction).

Elsewhere, Rahne et al. (2008) have used an alternating high/low tone sequence, similar to that used by O’Leary and Rhodes (1984), to demonstrate the effect of visual segmentation cues on auditory stream segregation. The participants in their study either saw a circle presented in synchrony with every third tone (thus being paired successively with a high tone, then with a low tone, then with a high tone, etc) or else they saw a square that appeared in synchrony with just the low-pitched tones. The likelihood that the participants would perceive the auditory sequence as a single stream was significantly higher in the former (circle) condition than in the latter (square) condition (see also Kubovy this volume). In terms of visuotactile interactions, Yao et al. (2009) have investigated whether the presentation of visual information would affect the cutaneous rabbit illusion (Geldard & Sherrick 1972). They placed

Page 7: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

7

tactile stimulators at either end of a participant’s arm. LEDs were also placed at the same locations, as well as at the ‘illusory’ locations where the tactile stimuli are generally perceived to have been presented following the activation of the tactors (in this case, at the intervening position, along the arm). Yao et al. reported that the activation of the lights that mimicked the hopping percept strengthened the tactile illusion, while the activation of the lights at the veridical locations of tactile stimulation weakened it. This result shows that the tactile grouping underlying the cutaneous rabbit illusion can be modulated by concurrently presented visual information, even if it is not relevant to the participant’s task. At this point, it is worth noting that the majority of studies reported thus far in the text have involved situations in which the conditions for intramodal perceptual grouping were established prior to the presentation of the critical cross-modal stimuli (e.g., see Ngo & Spence 2010; Vroomen & De Gelder 2000; Watanabe & Shimojo 2001; Tao et al. 2009). However, it turns out that even when the situation is temporally reversed, and the strength of intramodal perceptual grouping is modulated by any stimuli that happen to be presented after the critical cross-modal stimuli, the story remains unchanged (e.g., see Sanabria et al. 2005b). Thus, it would appear that intramodal perceptual grouping normally tends to take precedence over cross-modal perceptual grouping (see also Cook & Van Valkenburg 2009, for a similar conclusion). In summary, then, a relatively large body of empirical evidence involving a range of different behavioural paradigms has by now convincingly demonstrated that as the strength of intramodal perceptual grouping increases, the magnitude of any cross-modal effects on visual, auditory, or tactile perception are reduced. Thus, the answer to the second of the questions posed in this article would appear to be unequivocally in the affirmative: That is, the strength of intramodal perceptual grouping can indeed modulate the strength/magnitude of cross-modal interactions (at least when the stimuli can be meaningfully related to one another; cf. Cook & Van Valkenburg 2009). Before moving on, it should be noted that a large body of research shows that the rate of stimulus presentation in one sensory modality can influence the perceived rate of presentation of stimuli delivered in another modality (e.g., Gebhard & Mowbray 1959; Recanzone 2003; Wada et al. 2003; Welch et al. 1986). However, as highlighted by Spence et al. (2007), given the high rates of stimulus presentation used in the majority of studies in this area, it could plausibly be argued that most of the results that have been published to date actually tell us more about cross-modal influences on the perception of a discrete stimulus attribute (e.g., the flicker or flutter rate) rather than necessarily telling us anything meaningful about the cross-modal constraints on perceptual organization. An argument could certainly be made here that only when the stimuli are presented at rates that are slow enough to allow for the individuation of the elements within the relevant stimulus streams and thus the matching of those elements across sensory modalities, that the results of such research will really start to say anything interesting about cross-modal perceptual organization (rather than just being relevant to researchers interested in multisensory integration). Relevant to this discussion is research by Fujisaki and Nishida (e.g., Fujisaki & Nishida 2010). They conducted a number of studies demonstrating that people can only really pair (or bind) pairs of auditory, visual, and or tactile stimulus streams cross-modally (i.e., in order to make in/out-of-phase judgments) when the stimuli in those streams are presented at rates that do not exceed 4 Hz.6 If we take this as a legitimate argument (and I am the first to flag up that some may find it controversial), then the majority of research on cross-modal influences on rate perception and on flicker/flutter

6 The one modality pairing where this limit did not apply was for crossmodal interactions between auditory and

tactile stimuli. There phase judgments are possible at stimulus presentation rates as high as 12Hz (Fujisaki &

Nishida 2010).

Page 8: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

8

thresholds may, ultimately, turn out not to be relevant to the topic of cross-modal perceptual organization (see also Benjamins et al., 2008). Q3: Do intersensory Gestalten exist? The first question to address here is “What exactly are Intersensory Gestalten?” Well, the terminology is certainly muddled/confusing, with different researchers using different terms for what may well turn out to be the same underlying concept. Gilbert (1938, 1941) was perhaps the first to introduce the notion when he wrote: “…we must also reckon with the total field properties. This involves the superimposition of one pattern of stimulation upon a heteromodal pattern, with a resulting new complex “inter-sensory Gestalt” in which the properties of the original patterns are modified.” (Gilbert 1941, p. 401). Several decades later, Allen and Kolers (1981, p. 1318) talked of a “common or suprasensory organizing principle”. More recently still, Kubovy and Yu (2012, p. 963) have introduced the notion of ‘trans-modal Gestalts’. What is, however, common to all of these various suggestions is the idea that there may be some sort of multisensory (or supramodal) organization (or structure), which, importantly, isn’t present in any of the constituent sensory modalities when considered individually (see Spence & Chen 2012; Spence et al. 2007). However, over-and-above any problem of terminology, the key issue is that despite occasional claims that such intersensory Gestalten exist (e.g., Harrar et al. 2008; Zapparoli & Reatto 1969), there is surprisingly little concrete (i.e., uncontroversial) evidence in their favour (Allen & Kolers 1981; Sanabria et al. 2005b; Spence & Bayne in press). To give but one example of the sort of approach that has been used by researchers in recent times, let’s take the study reported by Huddleston et al. (2008; Experiment 3). These researchers presented a series of auditory and visual stimuli from four locations arranged on a virtual clock face (e.g., with visual stimuli at 12 and 6, and auditory stimuli at 3 and 9; see Figure 3). The visual and auditory stimuli were presented sequentially at a range of temporal rates. At the appropriate timings, the participants were clearly able to perceive visual apparent motion vertically and auditory apparent motion horizontally. That said, the participants never reported any circular cross-modal (or intermodal) apparent motion (despite being able to determine whether the stimuli were being presented in a clockwise or counter-clockwise sequence). Huddleston et al.’s results therefore provide evidence against the existence of intermodal Gestalten.

Figure 3. Schematic illustration of the stimulus displays used to investigate the possibility of an intersensory motion Gestalt (i.e., supramodal apparent motion) by Huddleston et al. (2008). When the interstimulus intervals were adjusted appropriately, participants reported visual apparent motion (vertically), auditory apparent motion (horizontally), but there were no reports of any circular supramodal (or intermodal) apparent motion, thus providing evidence against the existence of an intersensory Gestalt, at least in this case of audiovisual apparent motion.

Page 9: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

9

By contrast, a somewhat different conclusion was reached by Harrar et al. (2008). They presented pairs of stimuli, one from either side of fixation. The two stimuli could both be visual, both tactile, or there might be one visual and one tactile stimulus. The stimuli alternated repeatedly, and participants had to rate the strength of any apparent motion between them. The participants gave a numerical response that was between 0 (‘No apparent motion’) and 6 (indicating ‘Strong apparent motion’), across a range of interstimulus intervals (ISIs). The results revealed that the strength of apparent motion was modulated by the ISI. As one might have expected, the visual apparent motion was stronger than the tactile motion. However, the interesting result for present purposes was that mean ratings of the strength of apparent motion, while much weaker than intramodal motion, were significantly greater than 0 for the cross-modal trials at many of the ISIs tested. However, one could imagine that if Allen and Kolers (1981) were still writing, they might not be convinced by such effects based, as they are, on self-report. It would seem plausible that task demands might have played some role in modulating how participants respond in this kind of task. Thus, more objective data using a more indirect task would certainly be useful in order to convince the skeptic. However, on the other hand, Harrar et al. might want to argue that there is, in fact, nothing fundamentally wrong with using subjective ratings to assess the strength of apparent motion. Researchers have also looked for evidence to support the existence of intersensory Gestalten in the area of intersensory rhythm perception. The idea here is that it might be possible to experience a cross-modal (or intermodal) rhythm that is not present in any one of the component unisensory stimulus streams. However, just as for the other studies already mentioned, a closer look at the literature reveals that while claims of intermodal rhythm perception certainly do exist (Guttman et al., 2005), there is actually surprisingly little reliable psychophysical evidence to back up such assertions. Furthermore, many authors have explicitly argued against the possibility of intermodal rhythm perception (e.g., Fraisse, 1963). Perhaps the strongest evidence in support of such a claim comes from recent research on the perception of musical meter. Huang et al. (2012) have recently provided some intriguing evidence that appears to suggest that people can efficiently extract the musical meter (defined as the abstract temporal structure corresponding to the periodic regularities of the music) from a temporal sequence of elements, some of which happen to be presented auditorily, others via the sense of touch. Importantly, here, the meter information was not available to either modality stream when considered in isolation. Huang et al.’s results can therefore be taken as providing support for the claim that audiotactile musical meter perception constitutes one of the first genuinely intersensory Gestalten to have been documented to date.

Page 10: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

10

In conclusion, despite a number of attempts having been made over the decades, there is still surprisingly little scientific evidence to support the claim that intersensory (or cross-modal) Gestalten really do exist (see Guttman et al., 2005, p. 234; Huddleston et al., 2008).7 That said, both of the examples just described (e.g., Harrar et al., 2008; Huang et al., 2012) might be taken to challenge the conclusion forwarded recently by Spence and Chen (2012) that truly intersensory Gestalten do not exist (see also Spence & Bayne, in press). One suggestion here as to why they may be so elusive in laboratory studies (and presumably also in daily life) is that the nature of the experience that we have in each of the senses is so fundamentally different that it may make cross- or trans-modal Gestalten particularly difficult, if not impossible, to achieve/find (see Kubovy & Yu, 2012; Spence & Bayne, in press, on this point; though see Aksentijevid, Elliott, & Barber, 2001; Julesz & Hirsh, 1972; Lakatos & Shepard, 1997, for evidence that similar grouping principles may structure our experience in the different modalities). Q4: Can cross-modal correspondences be considered as examples of intersensory Gestalten? Cross-modal correspondences have been defined as compatibility effects between attributes, or dimensions, of stimuli (i.e., objects and events) in different sensory modalities (be they redundant or not; Spence, 2011). Cross-modal correspondences have often been documented between polarized stimulus dimensions, such that a more-or-less extreme stimulus on a given dimension in one modality should be compatible with a more-or-less extreme value on the corresponding dimension in another modality. So, for example, increasing auditory pitch tends to be associated with higher elevations, smaller objects, and lighter visual stimuli (see Spence, 2011). What is more, the presentation of cross-modally corresponding pairs of stimuli often gives rise to a certain feeling of ‘rightness’, despite the fact that there may be no objective truth about the matter (cf. Koriat, 2008); Recently, cross-modally congruent combinations of stimuli have been shown to give rise to enhanced multisensory integration, as compared to when incongruent pairings of stimuli are presented (see Guzman-Martinez et al., 2012; Parise & Spence, 2009; see also Sweeny et al., 2012). And when it comes to the discussion of perceptual organization, it is worth noting that cross-modally corresponding stimuli have often been presented in previous studies (e.g., O’Leary & Rhodes, 1984; see also Gebhard & Mowbray, 1959).8 To give an example, research by Parise and Spence (2009) has highlighted the perceptual consequences of playing with the well-documented cross-modal correspondence that exists between auditory pitch and the size of (in this case visually-perceived) objects; People normally associate smaller objects with higher-pitched sounds and larger objects with lower-pitched sounds (e.g., Parise & Spence, 2012). The participants in the first of Parise and Spence’s (2009) studies had to make unspeeded perceptual judgments regarding the temporal order in which a pair of auditory or visual stimuli had been presented. The stimulus onset asynchrony in the cross-modal temporal order judgment task was varied on a trial-by-trial basis using the method of constant stimuli. The pair of visual and auditory stimuli presented on each trial were either cross-modally congruent (i.e., a smaller circle was presented together with a higher-pitched sound or a larger circle with a lower-

7 Those working in the field of flavour perception often suggest that flavours constitute a form of multisensory

Gestalt (e.g., Delwiche, 2004; Small & Green, 2011; Spence et al., 2012; Verhagen & Engelen, 2006). If such a

claim were to be true, then this could constitute another example of (genuinely intermodal) perceptual grouping.

However, it is difficult to determine whether many of the authors making such claims really mean anything

more by the suggestion that flavour is a Gestalt than merely that the combination of gustatory, retronasal

olfactory, and trigeminal inputs give rise to an emergent property, or object, that is, the flavour of a food or

beverage than happens to be localized to the mouth. There really isn’t time to do justice to these questions here,

but the interested reader is directed to Kroeze (this volume, for further discussion of this issue).

8 It is perhaps worth noting that cross-modal causality also plays an important role in audiovisual integration

(see Armontrout, Schutz, & Kubovy, 2009; Kubovy & Schutz, 2010; Schutz & Kubovy, 2009).

Page 11: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

11

pitched sound) or else they were incongruent (i.e., a smaller circle was paired with a lower-pitched sound or a larger circle paired with a higher-pitched sound). The results revealed that participants found it significantly harder to report the temporal order in which the stimuli had been presented on the cross-modally congruent trials as compared to on the cross-modally incongruent trials. The same pattern of results was also documented in a second experiment in which the cross-modal correspondence between visual shape (angularity) and auditory pitch/waveform was assessed. In a final study, Parise and Spence (2009) went on to demonstrate a larger spatial ventriloquism effect for pairs of spatially-misaligned auditory and visual stimuli when they were cross-modally congruent than when they were incongruent. The results demonstrate enhanced spatiotemporal integration (as measured by the temporal and spatial ventriloquism effects), thus leading to poorer temporal and spatial resolution of the component unimodal stimuli) on cross-modally congruent as opposed to cross-modally incongruent trials. Such findings suggest that cross-modal correspondences, which can perhaps be thought of as a form of cross-modal Gestalt grouping by similarity, influence multisensory perception/integration. A growing number of studies published over the last few years have also demonstrated that the perception of a bistable or ambiguous stimulus on one modality (normally vision) can be biased by the information presented in another sensory modality, usually audition (e.g., Conrad et al. 2010; Guzman-Martinez et al. 2012; Kang & Blake 2005; Takahashi & Watanabe 2010, 2011; Van Ee et al. 2009) but, on occasion, touch/haptics (see Binda et al. 2010; Bruno et al. 2007; Lunghi et al. 2010). Often, such studies have contrasted pairings of stimuli that do, or do not, correspond cross-modally. So, for example, in one study, the frequency of an amplitude-modulated auditory stimulus was shown to bias subjective reports (e.g., in the binocular rivalry situation) toward one of two competing visual stimuli (gratings) whose phase and contrast modulation frequency happened to match that of the sound (see Kang & Blake 2005). Similarly, exploring an oriented grooved surface haptically can also bias a participant’s perception in the binocular rivalry situation toward a congruently (as opposed to an orthogonally oriented) visual image (grating) of the same spatial frequency (see Binda et al. 2010; Lunghi et al. 2010). Thus, taken together, the latest evidence on the topic of cross-modal correspondences demonstrates that when the stimuli presented in different sensory modalities correspond, there may be perceptual interactions observed that are not present when the stimuli are incongruent (either because they are incongruent, or else because they are simply unrelated to the stimuli/task that a participant has been given to perform; Sweeny et al. 2012). What is more, there is also a feeling of rightness that accompanies the pairing of stimuli that correspond cross-modally (that isn’t there for pairs of stimuli that do not correspond; Koriat 2008). Such correspondences need not be based on a perceptual mapping, but they often are. What is more, they can often affect both perceptual organization and awareness. Such phenomena can be conceptualized in terms of the Gestalt grouping based on similarity. Indeed, cross-modal correspondences have been described as cross-modal similarities by some researchers (e.g., see Marks 1987a, b).9

9 Note here that there is likely also an interesting link to questions of perceptual organization in synaesthesia

proper (with which crossmodal correspondences are often confused; though see Deroy & Spence, submitted)

and their potential use within the burgeoning literature on sensory substitution (see Styles & Shimojo this

volume).

Page 12: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

12

3. Conclusions The latest evidence from a number of psychophysical studies of cross-modal scene perception and perceptual organization that have been reviewed in this chapter provides some answers to the four questions that were outlined at the start of this piece. First, it would appear that the perceptual organization of the stimuli taking place in one sensory modality does not automatically influence the perceptual organization of stimuli presented in another sensory modality (Hupé et al. 2008; O’Leary & Rhodes 1984), except perhaps in the case of speech (Sato et al. 2007; see also Kubovy & Yu 2012). Second, intramodal perceptual grouping frequently modulates the strength of cross-modal perceptual grouping (or interactions; Soto-Faraco et al. 2002; see Spence & Chen 2012, for a review). The evidence suggests that unimodal auditory, visual, and tactile perceptual grouping can, and do, affect the cross-modal interactions taking place between auditory and visual stimuli. Finally, there is currently little convincing evidence for the existence of intersensory Gestalten (see Allen & Kolers 1981; Huddleston et al. 2008), despite various largely anecdotal or introspective claims to the contrary (e.g., see Harrar et al. 2008; Zapporelli & Reatto 1969). We should keep in mind that several of the latest findings might nevertheless require us to revise this view (Harrar et al. 2008; Huang et al. 2012; Yao et al. 2009, on this question). Finally, I have reviewed the latest evidence showing that cross-modal correspondences (Spence 2011), which sometimes modulate both perceptual organization and awareness, can be conceptualized in terms of cross-modal grouping by similarity. It would seem probable that our understanding of the cross-modal constraints on perceptual organization will likely be furthered in the coming years by animal (neurophysiological) studies (see Rahne et al. 2008, for one such study). Furthermore, although beyond the scope of the present study, it should also be noted that attention is likely to play an important role in cross-modal perceptual organization (see Kimchi & Razpurker-Apfeld 2004; Sanabria et al. 2007; Talsma et al. 2010; and the chapters by Alais, Holcombe, Humphreys, and Rees this volume). What does though seem clear already is that cross-modal perceptual organization is modulated by Gestalt grouping principles such as grouping by spatial proximity, common fate, and similarity just as in the case of intramodal perception.

Page 13: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

13

4. References Aksentijevid, A., Elliott, M.A., and Barber, P.J. (2001). Dynamics of perceptual grouping: Similarities in the organization of visual and auditory groups. Visual Cognition, 8, 349-358. Allen, P.G., and Kolers, P.A. (1981). Sensory specificity of apparent motion. Journal of Experimental Psychology: Human Perception and Performance, 7, 1318-1326. Armontrout, J.A., Schutz, M., and Kubovy, M. (2009). Visual determinants of a cross-modal illusion. Attention, Perception, & Psychophysics, 71, 1618-1627. Beck, J. (Ed.). (1982). Organization and representation in vision. Hillsdale, NJ: Erlbaum. Benjamins, J.S., van der Smagt, M.J., and Verstraten, F.A.J. (2008). Matching auditory and visual signals: Is sensory modality just another feature? Perception, 37, 848-858. Binda, P., Lunghi, C., and Morrone, C. (2010). Touch disambiguates rivalrous perception at early stages of visual analysis. Journal of Vision, 10(7), 854. Bregman, A.S. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press. Bremner, A., Lewkowicz, D., and Spence, C. (Eds.). (2012). Multisensory development. Oxford: Oxford University Press. Bruno, N., Jacomuzzi, A., Bertamini, M., and Meyer, G. (2007). A visual-haptic Necker cube reveals temporal constraints on intersensory merging during perceptual exploration. Neuropsychologia, 45, 469-475. Carter, O., Konkle, T., Wang, Q., Hayward, V., and Moore, C. (2008). Tactile rivalry demonstrated with an ambiguous apparent-motion quartet. Current Biology, 18, 1050-1054. Conrad, V., Bartels, A., Kleiner, M., and Noppeney, U. (2010). Audiovisual interactions in binocular rivalry. Journal of Vision, 10(10):27, 1-15. Cook, L.A., and Van Valkenburg, D.L. (2009). Audio-visual organization and the temporal ventriloquism effect between grouped sequences: Evidence that unimodal grouping precedes cross-modal integration. Perception, 38, 1220-1233. Delwiche, J. (2004). The impact of perceptual interactions on perceived flavor. Food Quality and Preference, 15, 137-146. Deroy, O., and Spence, C. (submitted). Weakening the case for ‘weak synaesthesia’: Why crossmodal correspondences are not synaesthetic. Psychonomic Bulletin & Review. Fraisse, P. (1963). The psychology of time. London: Harper & Row. Fujisaki, W., and Nishida, S. (2010). A common perceptual temporal limit of binding synchronous inputs across different sensory attributes and modalities. Proceedings of the Royal Society B, 277, 2281-2290. Gallace, A., and Spence, C. (2011). To what extent do Gestalt grouping principles influence tactile perception? Psychological Bulletin, 137, 538-561. Gebhard, J.W., and Mowbray, G.H. (1959). On discriminating the rate of visual flicker and auditory flutter. American Journal of Psychology, 72, 521-528. Geldard, F.A., and Sherrick, C.E. (1972). The cutaneous "rabbit"; a perceptual illusion. Science, 178, 178-179. Gilbert, G.M. (1938). A study in inter-sensory Gestalten. Psychological Bulletin, 35, 698. Gilbert, G.M. (1941). Inter-sensory facilitation and inhibition. Journal of General Psychology, 24, 381-407. Guttman, S.E., Gilroy, L.A., and Blake, R. (2005). Hearing what the eyes see: Auditory encoding of visual temporal sequences. Psychological Science, 16, 228-235. Guzman-Martinez, E., Ortega, L., Grabowecky, M., Mossbridge, J., and Suzuki, S. (2012). Interactive coding of visual spatial frequency and auditory amplitude-modulation rate. Current Biology, 22, 383-388. Harrar, V., and Harris, L.R. (2007). Multimodal Ternus: Visual, tactile, and visuo-tactile grouping in apparent motion. Perception, 10, 1455-1464.

Page 14: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

14

Harrar, V., Winter, R., and Harris, L.R. (2008). Visuotactile apparent motion. Perception & Psychophysics, 70, 807-817. Huang, J., Gamble, D., Sarnlertsophon, K., Wang, X., and Hsiao, S. (2012). Feeling music: Integration of auditory and tactile inputs in musical meter perception. PLoS ONE, 7(10): e48496. Huddleston, W.E., Lewis, J.W., Phinney, R.E., and DeYoe, E.A. (2008). Auditory and visual attention-based apparent motion share functional parallels. Perception & Psychophysics, 70, 1207-1216. Hupé, J.M., Joffoa, L.M., and Pressnitzer, D. (2008). Bistability for audiovisual stimuli: Perceptual decision is modality specific. Journal of Vision, 8(7):1: 1-15. Julesz, B., and Hirsh, I.J. (1972). Visual and auditory perception - An essay of comparison. In E.E. David, Jr., and P.B. Denes (Eds.), Human communication: A unified view (pp. 283-340). New York: McGraw-Hill. Kang, M.-S., and Blake, R. (2005). Perceptual synergy between seeing and hearing revealed during binocular rivalry. Psichologija, 32, 7-15. Keetels, M., Stekelenburg, J., and Vroomen, J. (2007). Auditory grouping occurs prior to intersensory pairing: Evidence from temporal ventriloquism. Experimental Brain Research, 180, 449-456. Kimchi, R., Behrmann, M., and Olson, C.R. (Eds.). (2003). Perceptual organization in vision: Behavioral and neural perspectives. Mahwah, NJ: Erlbaum. Kimchi, R., and Razpurker-Apfeld, I. (2004). Perceptual grouping and attention: Not all groupings are equal. Psychonomic Bulletin & Review, 11, 687-696. Koriat, A. (2008). Subjective confidence in one’s answers: The consensuality principle. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 945-959. Kubovy, M., and Pomerantz, J.J. (Eds.). (1981). Perceptual organization. Hillsdale, NJ: Erlbaum. Kubovy, M., and Schutz, M. (2010). Audio-visual objects. Review of Philosophy & Psychology, 1, 41-61. Kubovy, M., and Yu, M. (2012). Multistability, cross-modal binding and the additivity of conjoint grouping principles. Philosophical Transactions of the Royal Society B, 367, 954-964. Lakatos, S., and Shepard, R.N. (1997). Constraints common to apparent motion in visual, tactile, and auditory space. Journal of Experimental Psychology: Human Perception & Performance, 23, 1050-1060. Lunghi, C., Binda, P., and Morrone, M.C. (2010). Touch disambiguates rivalrous perception at early stages of visual analysis. Current Biology, 20, R143-R144. Lyons, G., Sanabria, D., Vatakis, A., and Spence, C. (2006). The modulation of crossmodal integration by unimodal perceptual grouping: A visuotactile apparent motion study. Experimental Brain Research, 174, 510-516. Marks, L.E. (1987a). On cross-modal similarity: Auditory-visual interactions in speeded discrimination. Journal of Experimental Psychology: Human Perception and Performance, 13, 384-394. Marks, L.E. (1987b). On cross-modal similarity: Perceiving temporal patterns by hearing, touch, and vision. Perception & Psychophysics, 42, 250-256. Metzger, W. (1934). Beobachtungen über phänomenale Identität (Sudies of phenomenal identity). Psychologische Forschung, 19, 1-60. Michotte, A. (1946/1963). The perception of causality. London: Methuen. Morein-Zamir, S., Soto-Faraco, S., and Kingstone, A. (2003). Auditory capture of vision: Examining temporal ventriloquism. Cognitive Brain Research, 17, 154-163. Ngo, M., and Spence, C. (2010). Crossmodal facilitation of masked visual target identification. Attention, Perception, & Psychophysics, 72, 1938-1947. O’Leary, A., and Rhodes, G. (1984). Cross-modal effects on visual and auditory object perception. Perception & Psychophysics, 35, 565-569. Parise, C., and Spence, C. (2009). ‘When birds of a feather flock together’: Synesthetic correspondences modulate audiovisual integration in non-synesthetes. PLoS ONE 4(5): e5664. Parise, C.V., and Spence, C. (2012). Audiovisual crossmodal correspondences and sound symbolism:

Page 15: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

15

An IAT study. Experimental Brain Research, 220, 319-333. Rahne, T., Deike, S., Selezneva, E., Brosch, M., König, R., Scheich, H., Böckmann, M., and Brechmann, A. (2008). A multilevel and cross-modal approach towards neuronal mechanisms of auditory streaming. Brain Research, 1220, 118-131. Recanzone, G.H. (2003). Auditory influences on visual temporal rate perception. Journal of Neurophysiology, 89, 1078-1093. Sanabria, D., Soto-Faraco, S., Chan, J.S., and Spence, C. (2004). When does visual perceptual grouping affect multisensory integration? Cognitive, Affective, & Behavioral Neuroscience, 4, 218-229. Sanabria, D., Soto-Faraco, S., Chan, J.S., and Spence, C. (2005a). Intramodal perceptual grouping modulates multisensory integration: Evidence from the crossmodal congruency task. Neuroscience Letters, 377, 59-64. Sanabria, D., Soto-Faraco, S., and Spence, C. (2005b). Assessing the effect of visual and tactile distractors on the perception of auditory apparent motion. Experimental Brain Research, 166, 548-558. Sanabria, D., Soto-Faraco, S., and Spence, C. (2007). Spatial attention modulates audiovisual interactions in apparent motion. Journal of Experimental Psychology: Human Perception and Performance, 33, 927-937. Sato, M., Basirat, A., and Schwartz, J. (2007). Visual contribution to the multistable perception of speech. Perception & Psychophysics, 69, 1360-1372. Schutz, M., and Kubovy, M. (2009). Causality and cross-modal integration. Journal of Experimental Psychology: Human Perception & Performance, 35, 1791-1810. Sekuler, R., Sekuler, A.B., and Lau, R. (1997). Sound alters visual motion perception. Nature, 385, 308. Shi, Z., Chen, L., and Müller, H. (2010). Auditory temporal modulation of the visual Ternus display: The influence of time interval. Experimental Brain Research, 203, 723-735. Small, D.M., and Green, B.G. (2011). A proposed model of a flavour modality. In M.M. Murray and M. Wallace (Eds.), Frontiers in the neural bases of multisensory processes (pp. 705-726). Boca Raton, FL: CRC Press. Soto-Faraco, S., Lyons, J., Gazzaniga, M., Spence, C., and Kingstone, A. (2002). The ventriloquist in motion: Illusory capture of dynamic information across sensory modalities. Cognitive Brain Research, 14, 139-146. Soto-Faraco, S., Spence, C., and Kingstone, A. (2004a). Congruency effects between auditory and tactile motion: Extending the phenomenon of crossmodal dynamic capture. Cognitive, Affective, & Behavioral Neuroscience, 4, 208-217. Soto-Faraco, S., Spence, C., Lloyd, D., and Kingstone, A. (2004b). Moving multisensory research along: Motion perception across sensory modalities. Current Directions in Psychological Science, 13, 29-32. Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, & Psychophysics, 73, 971-995. Spence, C., and Bayne, T. (in press). Is consciousness multisensory? In M Matthen and D. Stokes (Eds.), The senses. Oxford: Oxford University Press. Spence, C., and Chen, Y.-C. (2012). Intramodal and crossmodal perceptual grouping. In B.E. Stein (Ed.), The new handbook of multisensory processing (pp. 265-282). Cambridge, MA: MIT Press. Spence, C., Ngo, M., Percival, B., and Smith, B. (2012). Crossmodal correspondences: Assessing the shape symbolism of foods having a complex flavour profile. Food Quality & Preference. http://dx.doi.org/10.1016/j.foodqual.2012.08.002 Spence, C., Sanabria, D., and Soto-Faraco, S. (2007). Intersensory Gestalten and crossmodal scene perception. In K. Noguchi (Ed.), Psychology of beauty and Kansei: New horizons of Gestalt perception (pp. 519-579). Tokyo: Fuzanbo International. Stein, B.E. (Ed.). (2012). The new handbook of multisensory processing. Cambridge, MA: MIT Press.

Page 16: Cross-modal perceptual organization Professor Charles Spence, …gestaltrevision.be/pdfs/oxford/Spence-Cross-modal... · 2013. 3. 15. · To date, the only studies that have attempted

16

Stein, B.E., and Meredith, M.A. (1993). The merging of the senses. Cambridge, MA: MIT Press. Stein, B.E., Burr, D., Costantinides, C., Laurienti, P.J., Meredith, A.M., Perrault, T.J., et al. (2010). Semantic confusion regarding the development of multisensory integration: A practical solution. European Journal of Neuroscience, 31, 1713-1720. Sweeny, T.D., Guzman-Martinez, E., Ortega, L., Grabowecky, M., and Suzuki, S. (2012). Sounds exaggerate visual shape. Cognition, 124, 194-200. Takahashi, K., and Watanabe, K. (2010). Implicit auditory modulation on the temporal characteristics of perceptual alternation in visual competition. Journal of Vision, 10(4): 1-13. Takahashi, K., and Watanabe, K. (2011). Visual and auditory influence on perceptual stability in visual competition. Seeing and Perceiving, 24, 545-564. Talsma, D., Senkowski, D., Soto-Faraco, S., and Woldorff, M.G. (2010). The multifaceted interplay between attention and multisensory integration. Trends in Cognitive Sciences, 14, 400-410. van Ee, R., van Boxtel, J.J.A., Parker, A.L., and Alais, D. (2009). Multimodal congruency as a mechanism for willful control over perceptual awareness. Journal of Neuroscience, 29, 11641-11649. Verhagen, J.V., and Engelen, L. (2006). The neurocognitive bases of human multimodal food perception: Sensory integration. Neuroscience and Biobehavioral Reviews, 30, 613-650. Vroomen, J., and de Gelder, B. (2000). Sound enhances visual perception: Cross-modal effects of auditory organization on vision. Journal of Experimental Psychology: Human Perception and Performance, 26, 1583-1590. Vroomen, J., Keetels, M., de Gelder, B., and Bertelson, P. (2004). Recalibration of temporal order perception by exposure to audio-visual asynchrony. Cognitive Brain Research, 22, 32-35. Wada, Y., Kitagawa, N., and Noguchi, K. (2003). Audio-visual integration in temporal perception. International Journal of Psychophysiology, 50, 117-124. Wagemans, J., Elder, J.H., Kubovy, M., Palmer, S.E., Peterson, M.A., Singh, M., and von der Heydt, R. (2012). A century of Gestalt psychology in visual perception. I. Perceptual grouping and figure-ground organization. Psychological Bulletin, 138, 1218-1252. Warren, R.M., and Gregory, R.L. (1958). An auditory analogue of the visual reversible figure. American Journal of Psychology, 71, 612-613. Watanabe, K., and Shimojo, S. (2001). When sound affects vision: Effects of auditory grouping on visual motion perception. Psychological Science, 12, 109-116. Welch, R.B., DuttonHurt, L.D., and Warren, D.H. (1986). Contributions of audition and vision to temporal rate perception. Perception & Psychophysics, 39, 294-300. Wertheimer, M. (1923/1938). Laws of organization in perceptual forms. In W. Ellis (Ed.), A source book of Gestalt psychology (pp. 71-88). London: Routledge & Kegan Paul. Yao, R., Simons, D., and Ro, T. (2009). Keep your eye on the rabbit: Cross-modal influences on the cutaneous rabbit illusion. Journal of Vision, 9, 705. Yau, J.M., Olenczak, J.B., Dammann, J.F., and Bensmaia, S.J. (2009). Temporal frequency channels are linked across audition and touch. Current Biology, 19, 561-566. Zapparoli, G.C., and Reatto, L.L. (1969). The apparent movement between visual and acoustic stimulus and the problem of intermodal relations. Acta Psychologica, 29, 256-267.