7
Sound can prolong the visible persistence of moving visual objects Souta Hidaka a, * , Wataru Teramoto a,b , Jiro Gyoba a , Yôiti Suzuki b a Department of Psychology, Graduate School of Arts and Letters, Tohoku University, Sendai, Miyagi 980-8576, Japan b Research Institute of Electrical Communication, Tohoku University, Sendai, Miyagi 980-8577, Japan article info Article history: Received 19 March 2010 Received in revised form 23 July 2010 Keywords: Visible persistence Frequency changes of sounds Audio–visual interaction Object representation Apparent motion abstract An abrupt change in a visual attribute (size) of apparently moving visual stimuli extends the time the changed stimuli is visible even after its physical termination (visible persistence). In this study, we show that elongation of visible persistence is enhanced by an abrupt change in an attribute (frequency) of the sounds presented along with the size-changed apparently moving visual stimuli. This auditory effect dis- appears when sounds are not associated with the visual stimuli. These results suggest that auditory attri- bute change can contribute to the establishment of a new object representation and that object-level audio–visual interactions can occur in motion perception. Ó 2010 Elsevier Ltd. All rights reserved. 1. Introduction For our perceptual systems, which continuously receive large amounts of input, it is tremendously important to be able to distin- guish a single object from the input and preserve single-object con- tinuity across time and space. In visual perception, object-based processing, in which recently sampled information is integrated with an existing representation of the scene, is assumed to be a solution for the brain to realize the distinction and preservation of information (e.g., ‘‘object file” theory; Kahneman, Treisman, & Gibbs, 1992). Although the spatiotemporal proximity or continuity of inputs is a predominant factor (Scholl, 2001), the consistency of each object’s features is also likely to be an important cue. Moore, Mordkoff, and Enns (2007) demonstrated that an abrupt change in the attribute (size) of an object in an apparent motion sequence ex- tended the time for which the changed object is visible (visible per- sistence) even after its physical termination (Fig. 1). They also showed that this elongation of visible persistence did not occur when the visual stimulus moved behind an occluder containing a small hole so that there was a reasonable change in size of the vi- sual stimuli. These results suggest that the object-level changes are likely to cause this phenomenon rather than the retinal-level changes. In a motion display, the visual system is considered to re- duce the duration of visible persistence to establish the perception of a single moving object from continuous physical inputs (Burr, 1980). Moore et al. (2007) suggested that this suppression mecha- nism does not work in cases where the object representations be- tween the two neighboring frames of a sequence of visual stimuli are different; the perceptual systems establish a new object repre- sentation so that the visible persistence of the size-changed stimuli increases in motion sequences (see also Moore & Enns, 2004). Our internal representations are established by combining or integrating multisensory inputs, depending on the accuracy and/ or reliability of each input (Welch & Warren, 1986), to make coher- ent and robust percepts (Ernst & Bülthoff, 2004). Thus far, from this perspective, many studies have investigated audio–visual cross- modal contributions to primates’ perception. Although the superi- ority of vision over audition has been reported frequently (e.g., ventriloquism effect, Howard & Templeton, 1966), recent studies have shown that auditory stimuli can modulate the perception of visual stimuli. For instance, transient auditory stimuli can capture the temporal position of visual stimuli (temporal ventriloquism ef- fect; Vroomen & de Gelder, 2004). It has also been shown that tran- sient auditory stimuli can enhance the perception of visual stimuli in regard to perceived intensity (Stein, London, Wilkinson, & Price, 1996), perceived vividness (Sheth & Shimojo, 2004), and the sal- iency of visual changes (Noesselt, Bergmann, Hake, Heinze, & Fend- rich, 2008). Furthermore, the presence of multiple auditory stimuli can cause a static visual flash to be perceived as multiple flashes (Shams, Kamitani, & Shimojo, 2000). The manner in which object representations are maintained or updated has been investigated mainly with regard to visual modality. Given that our internal representations are established by means of information from a variety of sensory modalities, object representations can also have multimodal characteristics. 0042-6989/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.visres.2010.07.021 * Corresponding author. Present address: Department of Psychology, College of Contemporary Psychology, Rikkyo University, 1-2-26, Kitano, Niiza-shi, Saitama 352-8558, Japan. Fax: +81 48 471 7164. E-mail address: [email protected] (S. Hidaka). Vision Research 50 (2010) 2093–2099 Contents lists available at ScienceDirect Vision Research journal homepage: www.elsevier.com/locate/visres

Sound can prolong the visible persistence of moving visual objects

Embed Size (px)

Citation preview

Page 1: Sound can prolong the visible persistence of moving visual objects

Vision Research 50 (2010) 2093–2099

Contents lists available at ScienceDirect

Vision Research

journal homepage: www.elsevier .com/locate /v isres

Sound can prolong the visible persistence of moving visual objects

Souta Hidaka a,*, Wataru Teramoto a,b, Jiro Gyoba a, Yôiti Suzuki b

a Department of Psychology, Graduate School of Arts and Letters, Tohoku University, Sendai, Miyagi 980-8576, Japanb Research Institute of Electrical Communication, Tohoku University, Sendai, Miyagi 980-8577, Japan

a r t i c l e i n f o a b s t r a c t

Article history:Received 19 March 2010Received in revised form 23 July 2010

Keywords:Visible persistenceFrequency changes of soundsAudio–visual interactionObject representationApparent motion

0042-6989/$ - see front matter � 2010 Elsevier Ltd. Adoi:10.1016/j.visres.2010.07.021

* Corresponding author. Present address: DepartmContemporary Psychology, Rikkyo University, 1-2-2352-8558, Japan. Fax: +81 48 471 7164.

E-mail address: [email protected] (S. Hidaka).

An abrupt change in a visual attribute (size) of apparently moving visual stimuli extends the time thechanged stimuli is visible even after its physical termination (visible persistence). In this study, we showthat elongation of visible persistence is enhanced by an abrupt change in an attribute (frequency) of thesounds presented along with the size-changed apparently moving visual stimuli. This auditory effect dis-appears when sounds are not associated with the visual stimuli. These results suggest that auditory attri-bute change can contribute to the establishment of a new object representation and that object-levelaudio–visual interactions can occur in motion perception.

� 2010 Elsevier Ltd. All rights reserved.

1. Introduction

For our perceptual systems, which continuously receive largeamounts of input, it is tremendously important to be able to distin-guish a single object from the input and preserve single-object con-tinuity across time and space. In visual perception, object-basedprocessing, in which recently sampled information is integratedwith an existing representation of the scene, is assumed to be asolution for the brain to realize the distinction and preservationof information (e.g., ‘‘object file” theory; Kahneman, Treisman, &Gibbs, 1992). Although the spatiotemporal proximity or continuityof inputs is a predominant factor (Scholl, 2001), the consistency ofeach object’s features is also likely to be an important cue. Moore,Mordkoff, and Enns (2007) demonstrated that an abrupt change inthe attribute (size) of an object in an apparent motion sequence ex-tended the time for which the changed object is visible (visible per-sistence) even after its physical termination (Fig. 1). They alsoshowed that this elongation of visible persistence did not occurwhen the visual stimulus moved behind an occluder containing asmall hole so that there was a reasonable change in size of the vi-sual stimuli. These results suggest that the object-level changes arelikely to cause this phenomenon rather than the retinal-levelchanges. In a motion display, the visual system is considered to re-duce the duration of visible persistence to establish the perceptionof a single moving object from continuous physical inputs (Burr,

ll rights reserved.

ent of Psychology, College of6, Kitano, Niiza-shi, Saitama

1980). Moore et al. (2007) suggested that this suppression mecha-nism does not work in cases where the object representations be-tween the two neighboring frames of a sequence of visual stimuliare different; the perceptual systems establish a new object repre-sentation so that the visible persistence of the size-changed stimuliincreases in motion sequences (see also Moore & Enns, 2004).

Our internal representations are established by combining orintegrating multisensory inputs, depending on the accuracy and/or reliability of each input (Welch & Warren, 1986), to make coher-ent and robust percepts (Ernst & Bülthoff, 2004). Thus far, from thisperspective, many studies have investigated audio–visual cross-modal contributions to primates’ perception. Although the superi-ority of vision over audition has been reported frequently (e.g.,ventriloquism effect, Howard & Templeton, 1966), recent studieshave shown that auditory stimuli can modulate the perception ofvisual stimuli. For instance, transient auditory stimuli can capturethe temporal position of visual stimuli (temporal ventriloquism ef-fect; Vroomen & de Gelder, 2004). It has also been shown that tran-sient auditory stimuli can enhance the perception of visual stimuliin regard to perceived intensity (Stein, London, Wilkinson, & Price,1996), perceived vividness (Sheth & Shimojo, 2004), and the sal-iency of visual changes (Noesselt, Bergmann, Hake, Heinze, & Fend-rich, 2008). Furthermore, the presence of multiple auditory stimulican cause a static visual flash to be perceived as multiple flashes(Shams, Kamitani, & Shimojo, 2000).

The manner in which object representations are maintained orupdated has been investigated mainly with regard to visualmodality. Given that our internal representations are establishedby means of information from a variety of sensory modalities,object representations can also have multimodal characteristics.

Page 2: Sound can prolong the visible persistence of moving visual objects

Fig. 1. Schematic illustration of the stimuli and the percept demonstrated by Mooreet al. (2007). The size change in apparently moving stimuli induces the perceptionof multiple stimuli in the display where only one stimulus is physically presented.

2094 S. Hidaka et al. / Vision Research 50 (2010) 2093–2099

Several studies with findings consistent with this idea have shownthat object recognition reflects the semantic congruencies of in-puts from different sensory modalities (Alpert, Hein, Tsai, Naumer,& Knight, 2008; Hein et al., 2007; Laurienti, Kraft, Maldjian, Burd-ett, & Wallace, 2004). This finding indicates that multimodal objectrepresentations are established in cognitive or semantic process-ing stages. However, it is not clear whether the object representa-tions containing multimodal information can be formed in theperceptual processing stage, which would mediate the establish-ment of cognitive- or semantic-level object representations. In amotion display, object representations are being established andupdated in response to incoming physical inputs at the perceptuallevel (Moore et al., 2007). In the current study, we measured theduration of visible persistence of an apparently moving object toinvestigate whether a radical change in an attribute (frequency)of sounds presented along with moving visual stimuli could con-tribute to the establishment of a new object representation (Mooreet al., 2007).

In Experiment 1, we showed that frequency changes of auditorystimuli could alter the visible persistence of apparently moving vi-sual stimuli. In Experiments 2 and 3, we confirmed that the abruptonset or offset of auditory stimuli did not have any effect. In Exper-iment 4, we demonstrated that the effect of frequency changes inauditory stimuli on visible persistence disappeared when the audi-tory stimuli were difficult to be associated with apparently movingvisual stimuli. These findings suggest that object representationscan be formed by means of the multisensory integration of audi-tory and visual information in motion perception and that ob-ject-level audio–visual interactions can occur at a perceptual level.

2. Experiment 1

In Experiment 1, we investigated the effect of auditory attributechanges on the visible persistence of apparently moving visualstimuli. The frequency of auditory stimuli was abruptly changedin the penultimate frame, where the visual stimuli either did ordid not change its size in apparent motion sequences.

Fig. 2. Schematic illustration of time course and experimental design of visualstimuli used in Experiment 1. The fixation circle was blue, instead of black, in theexperiment.

2.1. Methods

2.1.1. Participants and apparatusWritten consent was obtained from each participant before all

experiments. The experiments were approved by the local ethicscommittee of Tohoku University. In Experiment 1, there were 11participants—two of the authors (S.H. and W.T.) and nine studentsof Tohoku University who were naive to the purpose of this exper-iment. All the participants had normal or corrected-to-normal vi-sion and normal hearing. The visual stimuli were presented on aCRT display (Sony Trinitron GDM-FW900, 24-in.) with a resolutionof 1600 � 1200 pixels and a refresh rate of 75 Hz. The auditorystimuli were presented via headphones (SENNHEISER HDA200). Acustomized PC (Dell-Dimension 8250) and MATLAB (The Math-works, Inc.) with the Psychophysics Toolbox (Brainard, 1997; Pelli,

1997) were used to control the experiment. The participants wereinstructed to place their heads on a chin rest.

2.1.2. StimuliA white disc (0.6� in diameter, 49.11 cd/m2) was sequentially

presented against a gray background (7.28 cd/m2) as an apparentmotion stimulus. The disc moved by 15� in every frame of 80 ms,following a circular trajectory whose center was located at a bluefixation point (5.31 cd/m2, the CIE coordinates were 0.15 and0.11; see Fig. 2). The radius of the imaginary circle of the motiontrajectory was 3�. Two types of pure tone—600 Hz (L tone;SPL = 77.5 dB) and 3000 Hz (H tone; SPL = 71 dB)—were used asauditory stimuli. The duration of each tone was 80 ms with 8 msof cosine ramp at the onset and offset and was consistent withthe duration of each visual frame.

2.1.3. ProcedureIn an apparent motion display, the disc was sequentially pre-

sented, in synchrony with the tone. In half of the trials, the sizeof the disc remained constant in all the frames (Without-Visual-Change condition); in the other half of the trials, the disc becamesmaller (0.45� � 0.45�) at the penultimate frame and returned tothe previous size in the last frame (Visual-Change condition)(Fig. 2). As with previous research (Moore et al., 2007), our preli-minary experiment confirmed that a change in size from smallerto larger did not lead to a sufficient effect. Moreover, in our exper-imental procedure, the effect of such a size change was found to behighly individual among the participants. We, therefore, intro-duced only larger-to-smaller changes in the Visual-Change condi-tion. A sequence of either L or H tones—a tone for each frame—was constantly presented in the Without-Auditory-Change condi-tion. In the Auditory-Change condition, a sequence of either L orH tones was presented up to the frame before the penultimateframe. Then, the tone was changed from L to H at the penultimateframe for the L tone sequence or vice versa for the H tone sequenceand was returned to the previous tone at the last frame (i.e., LHL orHLH in the last three frames). The fundamental tone was the Htone for half of the trials and the L tone for the other half for theWithout-Auditory-Change and Auditory-Change conditions. Wealso set a condition in which the auditory stimuli did not appearin all frames (Without-Sound condition). In addition, we manipu-lated the number of discs that were actually presented in the lastframe. One disc was presented at the final location of the motion

Page 3: Sound can prolong the visible persistence of moving visual objects

S. Hidaka et al. / Vision Research 50 (2010) 2093–2099 2095

trajectory in half of the trials (Single trials), and two discs werepresented at the penultimate and final locations of the motion tra-jectory in the other half (Double trials). In the Double trials, thesize of the two discs in the last frame was the same in the With-out-Visual-Change condition, but their sizes were different in theVisual-Change condition. We used three independent variables,namely, visual size changes of the disc (2; Visual-Change/With-out-Visual-Change), auditory conditions (3; Auditory-Change/Without-Auditory-Change/Without-Sound), and the number ofdiscs in the last frame (2; Single/Double). The disc started movingat one of four locations (0�, 90�, 180�, or 270�) and moved in eithera clockwise or counterclockwise direction for 13 (195�), 19 (285�),or 25 (375�) frames. These variables—initial position, direction ofmotion, and the duration of the presentation of the disc—were ran-domized across the trials and counterbalanced among the condi-tions to prevent the observers from predicting the final locationof the disc.

A trial began with the presentation of the fixation circle for1000 ms. Then, the apparent motion sequence was presented.The participants were asked to fixate their eyes on the fixation cir-cle during the trial and to report how many discs they observed(one or two) in the last frame by using the corresponding buttons.In a practice session, the participants completed 24 trials withouttones—the visual size changes of the disc (2) � the number of discsin the last frame (2) � the length of the circular motion trajectory(3) � repetitions (2). When the participants reported observing onedisc even though two discs were presented in the last frame, theyreceived error feedback in order for them to be able to set a deci-sional criterion (Moore et al., 2007). In the main session, the partic-ipants completed 288 trials without feedback—the visual sizechanges of the disc (2) � auditory conditions (3) � the number ofdiscs in the last frame (2) � the length of the circular motion tra-jectory (3) � repetitions (8). The order of each condition was ran-domized and counterbalanced among the trials in both thepractice and main sessions.

2.2. Results and discussion

First, the average proportions of the trials in which the partici-pants reported two visual stimuli were calculated (SupplementalFig. S1A). From this data, we computed the sensitivity (d-prime)to the perceived number of the stimuli presented in the last framebased on signal detection theory (Macmillan & Creelman, 2004).The responses of reporting two discs were regarded as ‘‘hit” inthe Double trials and as ‘‘false alarm” in the Single trials. Thismeant that the decrement of the d-prime indicated that two visualstimuli were perceived in the last frame when only one visualstimulus was physically present. In addition, changes of the d-prime could be separated from those of criterion, namely, responseor decisional biases (e.g., the frequency changes of the auditorystimuli induced the frequent ‘‘two stimuli” responses). We can,

Fig. 3. d-Primes in Experiment 1. The error bars denote 95% confidence interval.

therefore, assume that the experimental results cannot be ex-plained simply as a result of a response or decisional bias. The pro-portions of 0% or 100% were corrected as 1/n or (n � 1)/n,respectively, where n was the total number (24 times) of presenta-tions (Anscombe, 1956; Sorkin, 1999).

The d-primes obtained in Experiment 1 are shown in Fig. 3. Atwo-way repeated-measure analysis of variance (ANOVA) was con-ducted with visual size changes (2) � auditory conditions (3). Thisanalysis revealed a significant main effect of the visual sizechanges (F(1, 10) = 10.70, p < .01), whereas the main effect of theauditory conditions was not significant (F(2, 20) = 2.58, p = .10).An interaction effect between the visual size changes and auditoryconditions was also significant (F(2, 20) = 7.69, p < .005); it re-vealed a significant simple main effect of the auditory conditionsin the Visual-Change condition (F(2, 40) = 6.75, p < .005). A posthoc test (Tukey’s HSD) revealed that the d-prime in the Auditory-Change condition was smaller than in the other conditions(p < .05). In contrast, a simple main effect of the auditory condi-tions was not significant in the Without-Visual-Change condition(F(2, 40) = 0.60, p = .56).

The results showed that the size changes in the visual stimuliinduced a decrement in the sensitivity (d-prime) to the perceivednumber of the stimuli, indicating that two visual stimuli were per-ceived in the last frame when only one visual stimulus was phys-ically presented. This finding was in line with the previous researchconducted by Moore et al. (2007), where the size changes elon-gated the visible persistence of apparently moving visual stimuli.We also discovered that the frequency changes of the auditorystimuli further decreased the sensitivity in the Visual-Change con-ditions. Therefore, it was newly suggested that the auditory attri-bute change could also contribute to the elongation of visiblepersistence of the apparently moving visual stimuli.

The results of Experiment 1 would suggest that the auditoryattribute (frequency) change, as well as the visual attribute change(Moore et al., 2007), contributed to the establishment of a new ob-ject representation of the apparently moving visual stimuli, result-ing in elongation of visible persistence. On the other hand, theauditory effect on visible persistence was not observed in theWithout-Visual-Change condition. This result indicates anotherpossible interpretation of the current finding: The transient signalcontained in the frequency changes of the auditory stimuli mayhave enhanced perceptual intensity to the size-changed stimulior served as a cue for the visual size changes. We addressed this is-sue in the next experiment.

3. Experiment 2

Previous studies demonstrated that transient auditory signalsincreased the perceptual intensity or the vividness of visual stimuli(Sheth & Shimojo, 2004; Stein et al., 1996). One would have as-sumed that the transient signals induced by the frequency changesof the auditory stimuli might have strengthened the perceptualintensity of the concurrent size-changed visual stimulus so thatits visible persistence was elongated in the Visual-Change condi-tion of Experiment 1. It was also shown that auditory transient sig-nals increased the saliency of visual events (Noesselt et al., 2008).This finding might indicate that the transient signal induced by thefrequency changes of the auditory stimuli directed much moreattention to the visual size changes (cueing effect on visual sizechanges) in the Visual-Change condition.

In order to investigate the effect of auditory transient signals onvisible persistence, we examined whether the transient onset of anauditory stimulus could affect the visible persistence of the appar-ently moving visual stimuli in Experiment 2. The effect of auditorystimuli on the visible persistence should be replicated if the audi-tory transient signal was important for the current phenomenon.

Page 4: Sound can prolong the visible persistence of moving visual objects

2096 S. Hidaka et al. / Vision Research 50 (2010) 2093–2099

3.1. Methods

3.1.1. Participants and apparatusThis experiment included 10 participants—two of the authors

(S.H. and W.T.) and eight students of Tohoku University who werenaive to the purpose of this experiment, although seven of themhad participated in Experiment 1. All the participants had normalor corrected-to-normal vision and normal hearing. The apparatuswas identical to that used in the previous experiment.

3.1.2. Stimuli and procedureThe auditory conditions were modified in this experiment, as

compared with those in Experiment 1. In half of the trials, a brieftone (either 600 Hz or 3000 Hz) was abruptly presented at the pen-ultimate frame in synchrony with the appearance of the disc (Audi-tory-Onset condition); in the other half of the trials, no sound waspresented (Without-Sound condition). Apart from these differ-ences, the stimulus parameters and procedures were identical tothose used in Experiment 1. The participants completed the mainsession of 192 trials, which consisted of the visual size changesof the disc (2) � the number of discs in the last frame (2) � thelength of the circular motion trajectory (3) � auditory conditions(2) � repetitions (8).

3.2. Results and discussion

We calculated d-primes (Fig. 4A) based on the average propor-tions of reporting two visual stimuli (Supplemental Fig. S1B). Atwo-way repeated-measure ANOVA was conducted with visualsize changes (2) � auditory conditions (2). Just as in Experiment1, this analysis revealed a significant main effect of the visual sizechanges (F(1, 9) = 23.82, p < .001). In contrast, a main effect of theauditory conditions (F(1, 9) = 0.01, p = .95) and an interaction effectbetween the visual size changes and auditory conditions (F(1, 9) =4.59, p = .07) were not significant.

Fig. 4. d-Primes in Experiment 2 (A) and Experiment 3 (B). The error bars denote95% confidence interval.

The results of Experiment 2 revealed that the visual sizechanges decreased the sensitivity to the perceived number of stim-uli, indicating that visible persistence was elongated. In contrast toExperiment 1, however, the transient onset of auditory stimuli inthe penultimate frame did not affect sensitivity. This finding indi-cates that the auditory transient stimuli alone could not elongatethe visible persistence of the apparently moving visual stimuli.Thus, it could be considered that the changes in visible persistenceof apparently moving visual stimuli in Experiment 1 could not beattributed to the increment of perceived intensity of visual stimulior the cueing effect on visual changes.

One might assume that audio–visual interaction attenuated inExperiment 2 because the auditory transient signal suddenly ap-peared during the continuous presentations of the visual stimuli.We investigated this possibility in the next experiment.

4. Experiment 3

In Experiment 1, the auditory stimulus was accompanied by a vi-sual stimulus before the frequency changes of the auditory stimulioccurred. However, in Experiment 2, the auditory stimulus ap-peared suddenly. Such different temporal properties of the audio–visual stimuli should induce less audio–visual interactions (Wallaceet al., 2004). Thus, one could argue that the lack of effect of auditorytransient signal on visible persistence might result from feweropportunities for associations and interactions between the audi-tory and visual stimuli in Experiment 2. Therefore, in Experiment3, a situation was tested in which a sequence of tones was accom-panied by a sequence of discs in a certain period, and the tone atthe penultimate frame disappeared abruptly (i.e., transient offset).

4.1. Methods

4.1.1. Participants and apparatusThis experiment included 10 participants—two of the authors

(S.H. and W.T.) and eight students of Tohoku University who werenaive to the purpose of the experiment, although four of them hadparticipated in the previous experiments. All the participants hadnormal or corrected-to-normal vision and normal hearing. Theapparatus was identical to that used in the previous experiment.

4.1.2. Stimuli and procedureTwo auditory conditions were introduced. For the Without-Off-

set condition, a tone (600 Hz or 3000 Hz pure tone) was synchro-nously presented with a disc in all the frames. For the Auditory-Offset condition, a tone abruptly disappeared in the penultimateframe, whereas in the other frames, a tone was synchronously pre-sented with the disc. Apart from these differences, the stimulusparameters and procedures were identical to those used in Exper-iment 2.

4.2. Results and discussion

We calculated d-primes (Fig. 4B) from the average proportionsof reporting two visual stimuli (Supplemental Fig. S1C). A two-way repeated-measure ANOVA with visual size changes (2) � audi-tory conditions (2) revealed a significant main effect of the visualsize changes (F(1, 9) = 17.97, p < .005). As in Experiment 2, a maineffect of the auditory conditions (F(1, 9) = 0.13, p = .73) and aninteraction effect between the visual size changes and auditoryconditions (F(1, 9) = 0.11, p = .75) were not significant.

In the current experiment, the auditory stimuli were accompa-nied by the visual stimuli so that the audio–visual interactionwould be more likely to occur, as compared with Experiment 2.Nevertheless, the auditory offset signal in the penultimate frame

Page 5: Sound can prolong the visible persistence of moving visual objects

S. Hidaka et al. / Vision Research 50 (2010) 2093–2099 2097

did not affect sensitivity to the perceived number of visual stimuliin the last frame. This indicated that the auditory transient signaland its resulting effect, such as the cueing effect on visual changes,was not the main factor leading to the elongation of visible persis-tence observed in Experiment 1.

5. Experiment 4

In Experiment 1, it could be considered that the transient signalinduced by the frequency changes in auditory stimuli contributedto the elongation of visible persistence. However, the results ofExperiments 2 and 3 suggested that this idea was not accurate.Rather, the frequency changes in auditory stimuli were found tobe likely to contribute to the establishment of a new object repre-sentation for the apparently moving visual stimuli and result inelongation of the visible persistence (Moore et al., 2007). The re-sults of Experiment 1 also suggested that elongation of visible per-sistence was enhanced only when the frequency change of soundswas accompanied by a change in the attribute (size) of a visual ob-ject. The aim of Experiment 4 was to further investigate the object-selective aspect of the auditory effect; we examined whether theeffect of the frequency changes in auditory stimuli on the visiblepersistence of the moving visual stimuli was attenuated whenthe auditory information was difficult to associate with the visualstimuli. A situation was introduced in which the color of the fixa-tion changed synchronously with a sequence of several tones be-fore a moving disc appeared. This procedure was used so that thetones were more likely to be associated with the fixation than withthe moving disc (Fig. 5A). We hypothesized that if the frequencychanges of the auditory stimuli had a selective effect on the visualobject associated with the auditory stimuli, the visible persistenceof the moving discs would not be elongated.

5.1. Methods

5.1.1. Participants and apparatusThis experiment included 10 participants—two of the authors

(S.H. and W.T.) and eight students of Tohoku University who werenaive to the purpose of the experiment, although six of them hadparticipated in the previous experiments. All the participants hadnormal or corrected-to-normal vision and normal hearing. Theapparatus was identical to that used in the previous experiment.

Fig. 5. (A) Schematic illustration of the time course of the stimuli presentation inExperiment 4. The fixation circles were blue and red (instead of white and black) inthe experiment. (B) d-Primes in Experiment 4. The error bars denote 95% confidenceinterval.

5.1.2. Stimuli and procedureRed (13.29 cd/m2, the CIE coordinates were 0.64 and 0.0.37) and

blue (5.31 cd/m2, the CIE coordinates were 0.15 and 0.11) fixationcircles (0.5� in diameter) were alternately presented at the centerof the display. The presentation duration of each fixation circlegradually decreased by 13.33 ms in each frame from 280 ms to80 ms for the initial 16 frames and was 80 ms after the 17th frame.A white disc (0.6� in diameter, 49.11 cd/m2) was sequentially pre-sented from the 17th frame to the last frame along a circular trajec-tory around the alternating red/blue fixation circles (Fig. 5A). Theonset of the white discs was synchronized with the alternation ofthe fixation circles. The presentation order of the colors of the fix-ations (from red to blue or vice versa) was randomized across thetrials. Except for these stimulus parameters and procedures, theexperimental design was identical to those in Experiment 1—thevisual size changes of the disc (2; Visual-Change/Without-Visual-Change) and auditory conditions (3; Auditory-Change/Without-Auditory-Change/Without-Sound). In the Auditory-Change andWithout-Auditory-Change conditions, the onset of the tones(80 ms duration) was synchronized with that of the alternatingred/blue fixation circles.

5.2. Results and discussion

We calculated d-primes (Fig. 5B) based on the averaged propor-tions of reporting two visual stimuli (Supplemental Fig. S1D). Atwo-way repeated-measure ANOVA with visual size changes(2) � auditory conditions (3) revealed a significant main effect ofthe visual size changes (F(1, 9) = 20.21, p < .005); a main effect ofthe auditory conditions was not significant (F(1, 18) = 0.38,p = .69). An interaction effect between the visual size changesand auditory conditions was also significant (F(2, 18) = 4.80,p < .05). However, a simple main effect of the auditory conditionswas not significant for both the Visual-Change condition(F(2, 36) = 0.65, p = .53) and the Without-Visual-Change condition(F(2, 36) = 2.95, p = .07).

In the current experiment, an effect of the frequency changes ofthe auditory stimuli on the sensitivity to the perceived number ofstimuli was not observed, even in the Visual-Change condition. Inorder to directly compare the results of Experiments 1 and 4, weconducted a two-way measure ANOVA (with experiments(2) � auditory conditions (3)) with regard to the Visual-Changecondition. This was performed using a mixture design; the factorof the experiments was regarded as a between-subjects factor be-cause the number and identity of half of the participants were dif-ferent between the experiments, whereas the factor of the auditoryconditions was considered to be a within-subjects factor. In addi-tion to the main effect for the auditory conditions(F(2, 38) = 6.05, p < .01), an interaction effect between factors wasalso found to be significant (F(2, 38) = 3.82, p < .05). Further analy-sis of the interaction effect revealed a simple main effect for theauditory conditions in Experiment 1 (F(2, 38) = 9.38, p < .001),and a post hoc test (Tukey’s HSD) found that the d-prime in theAuditory-Change condition was smaller than in the other condi-tions (p < .05). In contrast, a simple main effect for the auditoryconditions in Experiment 4 was found not to be significant(F(2, 38) = 0.49, p = .62). These results indicated that the effect ofthe frequency changes in auditory stimuli on the visible persis-tence of the apparently moving visual stimuli disappeared whenthe auditory stimuli were difficult to associate with the moving vi-sual stimuli. This finding suggests that the frequency changes ofthe auditory stimuli could influence the visible persistence ofapparently moving stimuli in an object-selective manner.

One might assume that the blinking fixation diverted the partic-ipants’ attention from the apparently moving visual stimuli so thatthe auditory effect disappeared. If this were the case, the effect of

Page 6: Sound can prolong the visible persistence of moving visual objects

2098 S. Hidaka et al. / Vision Research 50 (2010) 2093–2099

the visual size changes would also attenuate or disappear. How-ever, the effect of the visual size changes was almost identical be-tween Experiments 1 and 4; in the direct comparison between theexperiments, the main effect of the experiments were not signifi-cantly different (F(1, 19) = 0.17, p = .69) (see also Figs. 3 and 5B,and Supplemental Fig. S1A and D). Therefore, the distracting effectof the blinking fixation did not fully explain the results of Experi-ment 4.

6. General discussion

It has been demonstrated previously that changes in a visualattribute (size) can extend the visible time of the changed object,even after its physical termination (visible persistence); this effectwas determined by measuring the perceived number of apparentlymoving visual stimuli at a given moment (Moore et al., 2007). Theaim of the present study was to investigate whether an abruptchange in an attribute (frequency) of auditory stimuli (pure tone)presented with moving visual stimuli could also alter the visiblepersistence of the stimuli, and how this effect might occur. Wefound that visible persistence was elongated when an abruptchange in tone was introduced into a sequence of constant tonespresented synchronously with apparently moving visual stimuli,only when the visual stimuli included a size change (Experiment1). We also found that the effects of the frequency changes in audi-tory stimuli occurred in an object-selective manner; the auditoryeffect disappeared when the auditory stimuli were difficult to attri-bute to the apparently moving visual stimuli (Experiment 4 andthe direct comparison of the results between Experiments 1 and4). These findings suggest that changes in an auditory attributeassociated with moving visual stimuli can contribute to the elonga-tion of the visible persistence of the stimuli.

It was suspected that transient signals induced by the frequencychanges of the auditory stimuli strengthened the perceptual inten-sity of apparently moving visual stimuli (Stein et al., 1996: Sheth &Shimojo, 2004) or directed attention to the visual size changes(Noesselt et al., 2008). However, the results of Experiments 2 and3 did not show an effect of the transient auditory signals (onsetand offset) on the visible persistence of apparently moving visualstimuli. It also has been reported that multiple auditory transientscan increase the perceived number of visual stimuli (Shams et al.,2000). If this mechanism had contributed to the current results, thechanges in sensitivity to the perceived number of stimuli shouldhave occurred irrespective of the frequency changes of the auditorystimuli; however, our result did not show this effect. Thus, we con-sider that the current findings could not be attributed to thechange in the perceived number of visual stimuli induced by multi-ple auditory transients.

It has been suggested that the visual systems encode some vi-sual attributes (color, shape, size, etc.) into an object representa-tion (Kahneman et al., 1992). Regarding an auditory object,frequency and time components are considered as prominent attri-butes (Kubovy & van Valkenburg, 2001). Moore et al. (2007) sug-gested that the elongation of visible persistence occurs in object-level processing in such a way that the changes in a visual attribute(size) induce the perceptual systems to establish a new object rep-resentation. The findings of the current research newly suggestthat the audio–visual interaction occurs with the perceptual pro-cessing stage where object representations are established. Itshould be noted that the effect of an auditory attribute (frequency)change on visible persistence was observed when it was accompa-nied by a change in the visual objects’ attribute (size) as shown inExperiment 1. This finding indicates that the auditory attributechange can become effective particularly when the establishmentof a new object representation is induced by the visual attributechanges. Our internal representations are established by combin-

ing or integrating multisensory inputs, depending on the accuracyand/or reliability of each input (Ernst & Bülthoff, 2004; Welch &Warren, 1986). In regard to the establishment of visual object rep-resentation, it would be reasonable to assume that the perceptualsystems receive information predominantly from the visual modal-ity, because the information is more reliable and accurate than thatfrom the other sensory modalities.

The results of Experiment 4 indicated that the change in anauditory attribute could not affect the visible persistence of theapparently moving stimuli when the auditory information wasnot associated with it. The temporal properties (duration in each,total duration presented together, onset timing, and so on) of theauditory and moving visual stimuli in Experiment 1 were identicalto those in Experiment 4, except for the presentation of the blink-ing fixation. Thus, this object-selective aspect could not be fully ex-plained by audio–visual interactions at the sensory input level(Wallace et al., 2004). Based on these findings, we would considerthat changes in an auditory attribute contribute to the establish-ment of a new object representation and that object representa-tions can be formed by the multimodal integration of auditoryand visual information in motion perception.

Similarly, Vroomen and de Gelder (2000) reported that the fre-quency change in an auditory stimulus elongated the visible per-sistence of a static visual stimulus. In their experiment, thesequence of visual dot patterns appeared on the display, synchro-nized with the sequence of auditory stimuli. They reported that thetransient change in the frequency of the auditory stimuli tempo-rarily ‘‘froze” the perception of the visual pattern, and thus thedetection performance for the positions of dot patterns was im-proved. The current study further revealed that the effect of thefrequency changes in auditory stimuli occurred in a dynamic situ-ation (i.e., motion), in which object representations were beingestablished and updated in response to incoming physical inputs.

Most importantly, our original finding is that the effect of thefrequency changes of the auditory stimuli occurred in an object-selective manner. A recent paper (Ngo & Spence, 2010) suggeststhat a transient auditory signal could also induce the freezing ef-fect, as was also shown by Vroomen and de Gelder (2000). In con-trast, transient auditory signals did not lead to an obvious effect inthe present research. In this respect, the auditory onset signal didnot produce elongation of visible persistence when sounds werenot associated with the moving visual stimuli (Experiment 2). Fur-thermore, the auditory offset signal did not show a clear effecteven when the sounds were accompanied by moving visual stimulibefore the offset was presented (Experiment 3). A previous study(Moore et al., 2007) suggested that it was not changes of object fea-ture (size) itself but the resulting object-level change that was fun-damental to elongation of visible persistence in the motion display,where the visual system was regarded as reducing the duration ofvisible persistence through establishing the perception of a singlemoving object from continuous physical inputs (Burr, 1980). Ourfindings similarly suggest that changes of auditory attributes couldaffect visible persistence when auditory information is encoded asan object attribute of moving visual objects. Moreover, it was alsosuggested that relative changes of an auditory attribute (i.e., fre-quency), as bound to a visual stimulus, is important for the elonga-tion of visible persistence or the establishment of a new objectrepresentation rather than the transient changes of sounds (i.e.,onset or offset). Whereas the underlying mechanism leading tothe freezing effect could be considered as a multimodal interactionat a sensory or attentional level (Ngo & Spence, 2010), the currentfindings would be mainly based on an object-level audio–visualinteraction.

One could assume that a congruency effect exists between thesounds and visual stimuli. For example, Gallace and Spence(2006) reported a synesthetic congruency effect, where high-fre-

Page 7: Sound can prolong the visible persistence of moving visual objects

S. Hidaka et al. / Vision Research 50 (2010) 2093–2099 2099

quency tones were found to be naturally associated with small cir-cles or vice versa. In our motion display, the visual stimuli becamesmaller in the Visual-Change condition so that the frequencychanges from the higher to the lower tone might have been moreeffective than the opposite changes. In order to check this possibil-ity, we compared the d-primes of H to L and L to H sequences in theAuditory-Change condition with regard to the With-Visual changecondition for Experiment 1, and found no difference (see Supple-mental Fig. S2). As indicated by previous research (Gallace &Spence, 2006), the synesthetic congruency effect could be basedon cognitive processing or associative learning. We speculate, how-ever, that elongation of visible persistence would occur at a per-ceptual processing stage. It might further be possible that thepair of audio–visual stimuli used in our experiment was incongru-ent with regard to associative learning in terms of triggering thesynesthetic congruency effect. A detailed investigation regardingthis issue is, however, beyond the scope of the current study andwould be better addressed through future research.

Some researchers have recently reported that audio–visualinteractions occurred in the object recognition process. Laurientiet al. (2004) demonstrated that the reaction time to detect a visualblue circle was decreased when the visual stimulus was accompa-nied by auditory stimuli that were semantically congruent withthe visual stimuli (blue, in this case). It was also demonstrated thatsemantically similar or congruent crossmodal stimuli enhancedthe activation of brain areas involved in multimodal informationprocessing (Amedi, von Kriegstein, van Atteveldt, Beauchamp, &Naumer, 2004). For example, the presentation of pictures of ani-mals (e.g., a cat) with their unique sounds (meow) was reportedto enhance the activation of the inferior frontal cortex and superiortemporal sulcus (Hein et al., 2007) and induce faster activation ofthe superior temporal gyrus (Alpert et al., 2008). These findingsindicate that the object-level interactions between the auditoryand visual stimuli occur at the cognitive level. However, it is alsoa fact that cognitive-level object representations are not congeni-tally established but are developmentally acquired by paired asso-ciate learning between the auditory and visual inputs(Morrongiello, Fenwick, & Chance, 1998). The present findings sug-gest that object-level audio–visual interactions occur at the per-ceptual level and that the perceptual systems integrate auditoryand visual information as attributes of object representation at thatlevel. In consideration of the previous findings (Amedi et al., 2004;Laurienti et al., 2004; Morrongiello et al., 1998), we could assumethat this mechanism would contribute to or mediate the establish-ment of cognitive- or semantic-level object representations in la-ter, relatively higher processing stages.

7. Conclusions

The current study demonstrated that an abrupt change in anattribute of contingent auditory stimuli could contribute to theelongation of visible persistence of an apparently moving visualobject in an object-selective manner. The results also confirmedthat transient auditory signals were not a decisive factor in the cur-rent findings. These results suggest that object representations canbe formed by means of the multisensory integration of auditoryand visual information in motion perception and that object-levelaudio–visual interaction can occur at the perceptual level.

Acknowledgments

We are grateful to two anonymous reviewers for their valuableand insightful comments and suggestions for early versions of the

manuscript. This research was supported by a Grant-in-Aid forSpecially Promoted Research (No. 19001004, Spatiotemporal inte-gration of multimodal sensory information).

Appendix A. Supplementary material

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.visres.2010.07.021.

References

Alpert, G. F., Hein, G., Tsai, N., Naumer, M. J., & Knight, R. T. (2008). Temporalcharacteristics of audiovisual information processing. The Journal ofNeuroscience, 28, 5344–5349.

Amedi, A., von Kriegstein, K., van Atteveldt, N. M., Beauchamp, M. S., & Naumer, M. J.(2004). Functional imaging of human crossmodal identification and objectrecognition. Experimental Brain Research, 166, 559–571.

Anscombe, F. J. (1956). On estimating binomial response relations. Biometrika, 43,461–464.

Brainard, D. H. (1997). The Psychophysics toolbox. Spatial Vision, 10, 433–436.Burr, D. (1980). Motion smear. Nature, 284, 164–165.Ernst, M. O., & Bülthoff, H. H. (2004). Merging the senses into a robust percept.

Trends in Cognitive Science, 8, 162–169.Gallace, A., & Spence, C. (2006). Multisensory synesthetic interactions in the

speeded classification of visual size. Perception & Psychophysics, 68,1191–1203.

Hein, G., Doehrmann, O., Müller, N. G., Kaiser, J., Muckli, L., & Naumer, M. J. (2007).Object familiarity and semantic congruency modulate responses in corticalaudiovisual integration areas. The Journal of Neuroscience, 27, 7881–7887.

Howard, I. P., & Templeton, W. B. (1966). Human spatial orientation. New York:Wiley.

Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of object files:Object-specific integration of information. Cognitive Psychology, 24,175–219.

Kubovy, M., & van Valkenburg, D. (2001). Auditory and visual objects. Cognition, 80,97–126.

Laurienti, P. J., Kraft, R. A., Maldjian, J. A., Burdett, J. H., & Wallace, M. T. (2004).Semantic congruence is a critical factor in multisensory behavioralperformance. Experimental Brain Research, 158, 405–414.

Macmillan, N. A., & Creelman, C. D. (2004). Detection theory: A user’s guide (2nd ed.).New Jersey: Lawrence Erlbaum Associates Inc.

Moore, C. M., & Enns, J. T. (2004). Object updating and the flash-lag effect.Psychological Science, 15, 866–871.

Moore, C. M., Mordkoff, J. T., & Enns, J. T. (2007). The path of least persistence. Objectstatus mediates visual updating. Vision Research, 47, 1624–1630.

Morrongiello, B. A., Fenwick, K. D., & Chance, G. (1998). Crossmodal learning innewborn infants: Inferences about properties of auditory-visual events. InfantBehavior & Development, 21, 543–554.

Ngo, M. K., & Spence, C. (2010). Crossmodal facilitation of masked visual targetdiscrimination by informative auditory cuing. Neuroscience letters, 479,102–106.

Noesselt, T., Bergmann, D., Hake, M., Heinze, H. J., & Fendrich, R. (2008). Soundincreases the saliency of visual events. Brain Research, 1220, 157–163.

Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics:Transforming numbers into movies. Spatial Vision, 10, 437–442.

Scholl, B. J. (2001). Spatiotemporal priority and object identity. Cahiers dePsychologie Cognitive, 20, 359–371.

Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what youhear. Nature, 408, 788.

Sheth, B. R., & Shimojo, S. (2004). Sound-induced recovery from and persistenceagainst visual filling-in. Vision Research, 44, 1907–1917.

Sorkin, R. D. (1999). Spreadsheet signal detection. Behavior Research Methods,Instruments, & Computers, 31, 46–54.

Stein, B. E., London, N., Wilkinson, L. K., & Price, D. D. (1996). Enhancement ofperceived visual intensity by auditory stimuli: A psychophysical analysis.Journal of Cognitive Neuroscience, 8, 497–506.

Vroomen, J., & de Gelder, B. (2000). Sound enhances visual perception: Cross-modaleffects of auditory organization on vision. Journal of Experimental Psychology:Human Perception and Performance, 26, 1583–1590.

Vroomen, J., & de Gelder, B. (2004). Temporal ventriloquism: Sound modulates theflash-lag effect. Experimental Psychology: Human Perception and Performance, 30,513–518.

Wallace, M. T., Roberson, G. E., Hairston, W. D., Stein, B. E., Vaughan, J. W., &Schirillo, J. A. (2004). Unifying multisensory signals across time and space.Experimental Brain Research, 158, 252–258.

Welch, R. B., & Warren, D. H. (1986). Intersensory interactions. In K. R. Boff, L.Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance(pp. 25.1–25.36). New York: Wiley.