10
http://pss.sagepub.com/ Psychological Science http://pss.sagepub.com/content/25/7/1448 The online version of this article can be found at: DOI: 10.1177/0956797614533571 2014 25: 1448 originally published online 2 June 2014 Psychological Science Linda Polka, Matthew Masapollo and Lucie Ménard Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties Published by: http://www.sagepublications.com On behalf of: Association for Psychological Science can be found at: Psychological Science Additional services and information for http://pss.sagepub.com/cgi/alerts Email Alerts: http://pss.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: What is This? - Jun 2, 2014 OnlineFirst Version of Record - Jul 14, 2014 Version of Record >> at SEIR on November 19, 2014 pss.sagepub.com Downloaded from at SEIR on November 19, 2014 pss.sagepub.com Downloaded from

Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

  • Upload
    l

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

http://pss.sagepub.com/Psychological Science

http://pss.sagepub.com/content/25/7/1448The online version of this article can be found at:

 DOI: 10.1177/0956797614533571 2014 25: 1448 originally published online 2 June 2014Psychological Science

Linda Polka, Matthew Masapollo and Lucie MénardWho's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

  

Published by:

http://www.sagepublications.com

On behalf of: 

  Association for Psychological Science

can be found at:Psychological ScienceAdditional services and information for    

  http://pss.sagepub.com/cgi/alertsEmail Alerts:

 

http://pss.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

What is This? 

- Jun 2, 2014OnlineFirst Version of Record  

- Jul 14, 2014Version of Record >>

at SEIR on November 19, 2014pss.sagepub.comDownloaded from at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 2: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

Psychological Science2014, Vol. 25(7) 1448 –1456© The Author(s) 2014Reprints and permissions: sagepub.com/journalsPermissions.navDOI: 10.1177/0956797614533571pss.sagepub.com

Research Article

Understanding spoken language involves categorization processes at many levels. The perception of phonemes, the smallest speech units that carry meaning, entails rec-ognizing phonetic categories in the presence of enor-mous acoustic variability. One of the largest sources of acoustic variability that the perceiver must contend with arises from the wide variation in physical attributes of different talkers related to age and gender (Kuhl, 2004). In particular, acoustic speech patterns are determined by length of the vocal folds and length of the vocal tract, which are highly correlated with an individual’s height (Fitch & Giedd, 1999). Developmental research has dem-onstrated that young infants can differentiate phonetic categories across different talkers (man, woman, child) in some test situations ( Jusczyk, Pisoni, & Mullennix, 1992; Kuhl, 1979, 1983). Other research, however, sug-gests that this process is initially difficult for infants, par-ticularly when the task is more complex or the talkers are more acoustically dissimilar (e.g., Houston & Jusczyk, 2000).

Currently, research is focused on understanding how the ability to process phonetic information in complex multitalker contexts develops and improves during infancy. This work is motivated by a desire to understand how speech comprehension skills are acquired. The abil-ity to track phonetic information across talker differences has not been addressed from a perspective that considers the perceptual resources that infants need to acquire speech production skills. Recent findings show that, like adults, 4-year-olds not only recognize the phonetic equivalence between self-generated speech and speech sounds produced by other talkers, but they can also use this skill to monitor and adjust their vocal output as they speak (MacDonald, Johnson, Forsythe, Plante, & Munhall,

533571 PSSXXX10.1177/0956797614533571Polka et al.Infants’ Perception of Infant Vowelsresearch-article2014

Corresponding Author:Linda Polka, School of Communication Sciences and Disorders, McGill University, 2001 McGill College, 8th floor, Montreal, Quebec, Canada H3A 1G1 E-mail: [email protected]

Who’s Talking Now? Infants’ Perception of Vowels With Infant Vocal Properties

Linda Polka1,2, Matthew Masapollo1,2, and Lucie Ménard2,3

1School of Communication Sciences and Disorders, McGill University; 2Centre for Research on Brain, Language and Music, McGill University; and 3Département de Linguistique, Université du Québec à Montréal

AbstractLittle is known about infants’ abilities to perceive and categorize their own speech sounds or vocalizations produced by other infants. In the present study, prebabbling infants were habituated to /i/ (“ee”) or /a/ (“ah”) vowels synthesized to simulate men, women, and children, and then were presented with new instances of the habituation vowel and a contrasting vowel on different trials, with all vowels simulating infant talkers. Infants showed greater recovery of interest to the contrasting vowel than to the habituation vowel, which demonstrates recognition of the habituation-vowel category when it was produced by an infant. A second experiment showed that encoding the vowel category and detecting the novel vowel required additional processing when infant vowels were included in the habituation set. Despite these added cognitive demands, infants demonstrated the ability to track vowel categories in a multitalker array that included infant talkers. These findings raise the possibility that young infants can categorize their own vocalizations, which has important implications for early vocal learning.

Keywordsspeech perception, infancy, vowel categorization, babbling, talker variability

Received 10/18/13; Revision accepted 4/1/14

at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 3: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

Infants’ Perception of Infant Vowels 1449

2012). However, it is unknown whether infants can accu-rately perceive their own speech or speech produced by another infant. Research has been silent on this issue because infant-generated speech has not been imple-mented in controlled perceptual experiments.

There are currently two perspectives regarding the relationship between speech perception and production capacities in early development. These views make dif-ferent assumptions about the perceptual resources that young infants need to engage in vocal learning. According to one view—which we call the high-resource/imitation view—infants’ perceptual skills develop well in advance of production skills and provide a critical infrastructure that supports emerging production skills via imitation (Kuhl & Meltzoff, 1996). In this view, young infants can access phonetic category information across different talkers (including infants), which they can use to learn to imitate phonetic categories in the ambient language. This account is bolstered by evidence that 3- to 5-month-olds modified their vocalizations in response to target vowels produced (noninteractively) by an adult on a television (Kuhl & Meltzoff, 1996).

According to another view—which we term the low-resource/interaction view—speech perception and pro-duction skills develop concurrently, guided by exchanges in an interactive context (Howard & Messum, 2011; Zlatin & Koenigsknecht, 1976). Accordingly, infants are not obliged to interpret their own utterances; they can rely on caregiver’s imitative and affective responses to indicate when their productions perceptually match “target” sounds in the ambient language. Thus, vocal learning can proceed even if infant perceptual resources are low or partially developed. Supporting this view, studies show that caregivers frequently imitate the vocalizations of their young infants (Pawlby, 1977) and provide social stimula-tion (e.g., smiling or touching), which facilitates more advanced vocal behavior (Goldstein, King, & West, 2003).

Crucially, the ability to recognize phonetic categories in infant speech is a prerequisite in the high-resource/

imitation view, but not in the low-resource/interaction view. Thus, findings pertaining to infant perception of infant speech speak to the conceptual merits of each view. Using technical advances in speech synthesis to generate infant speech, we investigated, for the first time, how infants perceive vowels produced by other infant talkers. This can be a challenging task for infants for sev-eral reasons.

First, vowels produced by an infant are acoustically distinct because an infant’s vocal folds and vocal tract are much shorter than those of an adult or a child (Kent & Murray, 1982; Kuhl & Meltzoff, 1996; Ménard, Schwartz, & Boe, 2004; Rvachew, Mattock, Polka, & Ménard, 2006; Rvachew, Slawinski, Williams, & Green, 1996; Vorperian & Kent, 2007). The fundamental frequency (correspond-ing to voice pitch) and the formant frequencies (corre-sponding to the vocal tract resonances) observed in infant vocalizations are well above the values in adult or child speech. This is illustrated in spectrograms of adult, child, and infant vowel sounds in Figure 1. Vowel sounds are characterized by acoustic energy concentrated in sev-eral narrow frequency bands known as formants. The first two formants (F1 and F2) provide critical information for vowel identity. For example, the vowel /i/ (“ee”) has a low F1 and high F2 frequency, whereas /a/ (“ah”) has a high F1 and an intermediate F2 frequency. Vowel for-mant frequencies are typically plotted with the F1 and F2 axes reversed, as in Figures 2 and 3. In such displays, the corner vowels /i/ “ee,” /a/ “ah,” and /u/ “oo” correspond to extreme articulatory postures (e.g., high front for “ee,” high back for “oo,” and fully open for “ah”), and the resulting space encompasses all possible vowel sounds for a given vocal tract length. As illustrated in Figures 2 and 3, the infant acoustic vowel space overlaps only par-tially with the adult and child acoustic space. Thus, intro-ducing infant vowels increases the range of acoustic variation that infants encounter in the speech they hear.

The second reason that it may be challenging for infants to perceive vowels produced by other infant

Time (s)0 0.5

0

104 104 104 104Fr

eque

ncy

(Hz)

Time (s)0 0.5

0

Time (s)0 0.5

0

Time (s)0 0.5

0

Fig. 1. Spectrograms of the vowel /i/ (“ee”) with the vocal properties of (from left to right) a 6-month-old infant, an 8-year-old child, an adult female, and an adult male.

at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 4: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

1450 Polka et al.

talkers is that until they begin to babble, most infants have limited experience listening to infant speech. Third, tracking phonetic information across acoustically dissimi-lar adult talkers is cognitively demanding for infants (Houston & Jusczyk, 2000). Together, these factors sug-gest that perceiving vowels in a multitalker context that includes infant talkers may be particularly difficult for young infants.

We addressed these issues in two experiments designed to determine whether young (4- to 6-month-old) infants, who are not yet producing canonical babble, can recog-nize the same vowel when produced by adult, child, and infant talkers. We synthesized isolated vowels, /i/ (“ee”) and /a/ (“ah”), to simulate productions by men, women, children, and infants. At 4 to 6 months, infants are just beginning to develop productive control of fully resonant vowels; their vowel productions typically fall within a nar-row range of the vowel space, with front, low, and central vowels preferred. Expansion of the infant vowel space to stabilize the corner vowels—/i/, /u/, and /a/—requires another year of practice (Rvachew et al., 2006; Rvachew et al., 1996). Thus, infants in the present study had only limited exposure (from their own speech) to the full range of vowel qualities that can be produced by an infant vocal tract (Kent & Murray, 1982).

We chose the vowels /i/ and /a/ because previous research has shown that 5- to 6-month-olds can perceive this phonetic contrast in a multitalker context. Using the conditioned head-turn procedure, Kuhl (1979) tested infants with vowels synthesized to simulate the produc-tions of men, women, and children. Similarly, we used vowels that were synthesized to control for acoustic dimensions that are not related to talker age and gender, but we tested infants with the visual-habituation proce-dure, which does not involve conditioning or training. In this procedure, infants were first habituated to a set of vowel exemplars (e.g., /i/) produced by different talkers. As soon as the infants habituated, we presented four test trials—two with new exemplars from the same (familiar) vowel category (e.g., /i/) and two with new exemplars from a novel vowel category (e.g., /a/). Crucially, all vow-els in test trials (familiar and novel) were produced by a new talker who was not encountered during habituation. If infants had formed a category representation for the habituation vowel, we expected them to recognize the novel vowel as belonging to a different phonetic cate-gory and to show this by recovering interest and listening longer to the novel vowel than to the familiar vowel. Infants may also recognize the change in talker and listen longer to the familiar vowel (produced by a new talker) compared with the final habituation trials. However, despite recognizing the talker change, infants should show a larger recovery of interest to the novel than to the familiar test vowel if they recognize that the familiar test vowel is another instance of the vowel presented during habituation.

Experiment 1

Method

Participants. Data from 56 infants, aged 4 to 6 months (36 male, 20 female; mean age = 161 days, range = 138–197 days), were analyzed; all infants were exposed to languages in which /i/ and /a/ are phonemic, with the majority being from English- or French-speaking families. Twenty-three additional infants were excluded because of fussiness (n = 15), caregiver interference (n = 3), or experimental error (n = 5). All were full term and had no known health problems.

Stimuli. We selected eight /i/ and eight /a/ isolated vowels from a large corpus of vowels synthesized using the variable linear articulatory model (VLAM), described in Ménard et  al. (2004). This corpus simulates talkers across a broad age range from infancy to adulthood. The selected vowels simulate eight different talkers: two 6-month-old infants; an 8-, 10-, and 12-year-old child; two adult females; and one adult male. Example

5101520

2

3

4

5

6

7

8

9

10

11

12

F2 (Bark units)

F1 (B

ark

units

)

i21 u21

a21

u0

a0

i0

Fig. 2. A simulation of the maximal acoustic vowel spaces of an adult male and an infant. As in a standard vowel plot, each vowel in the simulation set is depicted with the frequency of the first formant (F1) plotted on the y-axis and the frequency of the second formant (F2) on the x-axis. For the infant, the corner vowels /i/ (“ee”), /a/ (“ah”), and /u/ (“oo”), which encompass the full range of vowel articulations, are represented by circles and labeled “i0, a0, u0”; for the adult, the corner vowels are represented by stars and labeled “i21, a21, u21.” The axes are scaled in Bark units, which transform formant frequencies into units of equal perceptual distance. (To convert from Bark to Hz, use the following formula: F(Hz) = 650 × sinh(F(Bark)/7); to convert from Hz to Bark, use this formula: F(Bark) = 7 × asinh(F(Hz)/650); see also Schroeder, Atal, and Hall, 1979.)

at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 5: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

Infants’ Perception of Infant Vowels 1451

spectrograms are shown in Figure 1. Details of the VLAM synthesis and acoustic description of the vowels are pro-vided in the Supplemental Material available online.

All vowels were 500-ms long and matched in intensity and intonation contour. The stimuli were judged to be intelligible, natural-sounding exemplars of each vowel category by English- and French-speaking adults. Adults also accurately identified the age and gender differences simulated in the stimulus set. For testing, we created 16 stimulus files (one per vowel for each talker); each 30-s

file included 20 repetitions of the same vowel with 1-s interstimulus intervals.

Procedure. Infants were tested using the visual- habituation (look-to-listen) procedure (Polka, Jusczyk, & Rvachew, 1995). The infant sat on the caregiver’s lap at a distance of about 150 cm facing a 21-in. television moni-tor in a dimly lit, curtained, soundproof booth. Audiotrak ( Jooan-Dong, Nam-Gu, Incheon, Korea) BSI-90 loud-speakers and a Sony digital video camera were located

0

2

4

6

8

10

12

05101520

F1 (B

ark

units

)

F2 (Bark units)

Male (M)

Female (F1, F2)

8-Year-Old Child (C8)

10-Year-Old Child (C10)

12-Year-Old Child (C12)

Infant (IN1, IN2)

/i/

/a/

/u/

  Habituation Test  Pretest Number of Trials Varied Four Trials Posttest

Experiment 1 Music iM iC8 iF2 iM iC12 iF2 … Until Criterion iIN1 aIN2 aIN1 iIN2 or aIN2 iIN1 iIN2 aIN1 Music

Music aM aC8 aF1 aM aC12 aF2 … Until Criterion iIN1 aIN2 aIN1 iIN2 or aIN2 iIN1 iIN2 aIN1 Music

 

    Number of Trials Varied Four Trials  

Experiment 2 Music iIN1 iC8 iF1 iIN2 iC12 iF2 … Until Criterion iM aM aM iM or aM iM iM aM Music

Fig. 3. Stimuli and design of the two experiments. The graph shows the frequency of the first formant (F1) as a function of the frequency of the second formant (F2) for the /i/ (“ee”) and /a/ (“ah”) vowel stimuli used in the present study, separately for each speaker type. Frequencies for the /u/ (“oo”) vowel are also plotted. The axes are scaled in Bark units (see Fig. 2). The bottom panel shows an example of the vowel stimuli presented in each phase of Experiments 1 and 2. In the habituation phase of Experiment 1, participants heard either /i/ or /a/ vowels produced by an adult male (M), two adult females (F1, F2), and 8-, 10-, and 12-year-old children (C8, C10, and C12, respectively). In the test phase, there were four trials, two with the same vowel heard during habituation and two with a different vowel, with the order of the familiar and the novel vowels varying across the two possibilities shown. Each vowel in the test phase was produced by one of two 6-month-old infants (IN1, IN2). The design of Experiment 2 was the same as that of Experiment 1, except that during the habituation phase, only the /i/ vowel was spoken and the male voice was replaced by an infant voice, whereas during the test phase, an adult male voice was heard instead of an infant voice.

at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 6: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

1452 Polka et al.

behind the curtain just below the TV screen; the camera lens protruded through an opening in the curtain. An experimenter observed the infant outside of the testing room on a monitor linked to the video camera. The care-giver wore noise-canceling headphones and listened to masking music during the entire procedure to avoid influencing the infant’s behavior. Habit software (Cohen, Atkinson, & Chaput, 2000) was used to present stimuli and record looking (listening) times, which we refer to as listening times for simplicity.

At the start of each trial, a red flashing light was pre-sented to direct attention, followed by a black-and-white checkerboard. The experimenter (who could not hear the stimuli) pressed a key when the infant fixated on the

checkerboard; this activated the auditory stimulus and provided an index of the infant’s listening time. When the infant looked away for more than 2 s, the sound stopped and the screen went black. The minimum look time for a trial was 1 s. If the infant looked away for less than 2 s, the sound continued to play but the look-away time was not included in the looking time for that trial. The trial was terminated when the infant looked away for more than 2 s or when the complete stimulus file (30-s long) had played. After a brief pause, the red flashing light returned to start the next trial.

Design. The experiment consisted of four consecutive phases: pretest, habituation, test, and posttest (see Fig. 3), with no breaks or pauses between. Instrumental music was presented during pre- and posttest trials. On each habituation and test trial, infants heard a vowel produced repeatedly by the same talker; a different talker was pre-sented in each trial. The vowel presented during habitu-ation was counterbalanced across participants. During habituation, the order of talkers was randomized within blocks: Each block contained three trials in which a man, a woman, and a child talker spoke. The software tracked a running average of listening time across a three-trial window. The habituation criterion was met when the running average dropped below 65% of the longest three-trial average for that infant. Thus, the number of habituation trials varied across infants; however, all infants completed at least four habituation trials (most completed six or more) and were exposed to all three talker types (man, woman, child).

During the test phase, there were four trials containing only vowels produced by infants; two trials contained the same vowel as during habituation (F = familiar), and two contained the contrasting vowel not heard during habitu-ation (N = novel). Test trials were presented in one of two fixed orders: FNNF or NFFN. Infants were assigned to four conditions (two habituation conditions, two test orders) as shown in Figure 3.

Results

Data from the two habituation groups were combined because they showed no differences in total habituation time, number of habituation trials, or posttest listening times. For each infant, listening time was averaged across the last two trials in habituation and for each test-trial type (novel, familiar).

Group means (collapsed across habituation condition and test-trial order) are shown in Figure 4a. The scores were submitted to a mixed analysis of variance (ANOVA) with test-trial order (FNNF vs. NFFN) as a between-sub-jects factor and trial type (habituation vs. familiar vs.

0

2

4

6

8

10

12

14

16

Last Two Familiar

Mea

n Li

sten

ing

Tim

e (s

)

Experiment 1

0

2

4

6

8

10

12

14

16

Mea

n Li

sten

ing

Tim

e (s

)

Experiment 2b

a

Novel

Last Two Familiar Novel

**

*

Test TrialsHabituation Trials

Habituation Trials Test Trials

Fig. 4. Mean listening time to the last two habituation trials and to the novel and familiar test trials in (a) Experiment 1 and (b) Experiment 2. Error bars show standard errors of the mean. The asterisk indicates a significant difference between trial types (p < .05).

at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 7: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

Infants’ Perception of Infant Vowels 1453

novel) as a within-subjects factor. There was no main effect of test-trial order or interaction of test-trial order with trial type, F < 1. There was a main effect of trial type, F(2, 108) = 23.81, p < .001, ηp

2 = .306. Post hoc compari-sons showed that infants listened longer to the novel (M = 11.7 s, SD = 6.3) than to the familiar test vowel, (M = 9.8 s, SD = 6.1), t(55) = −2.46, p = .017, r2 = .314, and longer to the familiar test vowel than to the habituation vowel (M = 6.5 s, SD = 3.0), t(55) = 4.30, p < .001, r2 = .501.

Discussion

Experiment 1 shows that young infants can track a change in vowel category and a change in talker when they hear vowels produced by an infant talker. Thus, young infants have some ability to recognize vowel categories across acoustically and perceptually distinct talkers. Because infants encountered the infant vowels at the end of the task, Experiment 1 provided little insight into the process-ing demands involved in this task. Experiment 2 addressed this issue.

Experiment 2

Perceiving speech in a multitalker context entails some processing costs for infants as well as adults (e.g., Houston & Jusczyk, 2000; Jusczyk et al., 1992; Mullennix & Pisoni, 1990; Schmale & Seidl, 2009). Moreover, stimu-lus complexity affects infant processing and category for-mation in many domains (Hunter & Ames, 1988). Thus, we reasoned that because infant speech augments stimu-lus complexity, it might increase the cognitive demands associated with perceiving speech in a multitalker con-text. We tested this prediction in Experiment 2 using the same task and stimuli as in Experiment 1, but with one change—the infant vowels were included in the habitua-tion set (replacing the adult male vowels), and only adult male vowels were presented in the test phase (replacing the infant vowels; see Fig. 3). This manipulation increased the acoustic variability present in the habituation set, but not the number of talkers. We predicted that this change would raise the stimulus complexity encountered during habituation and increase cognitive demands in the cate-gorization task.

If this turned out to be the case, then infants would need more time to encode the phonetic category informa-tion in the habituation stage well enough to recognize the novel category in the test phase. This leads to two specific predictions. First, infants will listen longer during habitua-tion in Experiment 2 compared with Experiment 1 before reaching the habituation criterion. Second, infants’ catego-rization performance will be affected by their level of engagement during habituation, that is, total listening time during habituation will be positively correlated with the

magnitude of the novelty effect observed in test trials. It is important to note that prior categorization studies using this paradigm show that individual differences in catego-rization ability are typically negatively correlated with total habituation time. That is, in both auditory and visual tasks, infants who are better categorizers process stimuli more efficiently and thus require less time to habituate to a stimulus set (Arterberry & Bornstein, 2002; Polka, Rvachew, & Molnar, 2008). Thus, a positive correlation between total habituation time and novelty score reflects an increase in stimulus complexity and processing load rather than individual differences in categorization ability. (Total habituation time and magnitude of novelty scores were not correlated in Experiment 1; r2 = .209, p = .122.)

Method

Participants. Data from 27 infants, aged 4 to 6 months (10 males, 17 females; mean age = 158 days, range = 132–195 days), were analyzed; infants’ language back-ground was the same as in Experiment 1. Nine additional infants were excluded because of fussiness (n = 7), fail-ure to habituate (n = 1), or experimental error (n = 1). All were full term with no known health problems.

Stimuli and procedure. The stimuli and procedure were the same as in Experiment 1.

Design. The design was identical to that in Experiment 1, except that all infants were habituated to /i/, the habit-uation trials included simulations of vowels spoken by an infant (but not a man), and vowels presented on test trials simulated an adult male.

Results

Total listening time during habituation (summed across habituation trials) was computed for each infant. Group means for infants in Experiment 2 and the /i/ habituation condition of Experiment 1 are plotted in Figure 5, which shows that infants listened significantly longer during habituation in Experiment 2 compared with Experiment 1, t(55) = −3.41, p = .001, r2 = .417. Thus, as predicted, infants required more time to encode the variability asso-ciated with the different talkers when infant vowels were present in the habituation set (M = 118.1 s, SD = 12.9) compared with when only adult and child vowels were present (M = 71.6 s, SD = 5.7); the number of talkers was the same in both experiments. There was no difference between experiments in the number of habituation trials needed to reach criterion or in posttest listening times.

For each infant, listening time was averaged across the last two habituation trials and for each test-trial type

at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 8: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

1454 Polka et al.

(novel, familiar). Group means for these scores (col-lapsed across test-trial order) are shown in Figure 4b. As in Experiment 1, these scores were submitted to a mixed ANOVA with test-trial order (FNNF vs. NFFN) as a between-subjects factor and trial type (habituation vs. familiar vs. novel) as a within-subjects factor. There was no main effect of test-trial order or interaction of test-trial order with trial type, F < 1. There was a main effect of trial type, F(2, 50) = 3.57, p = .035, ηp

2 = .125. Post hoc comparisons showed that infants listened longer to the novel test vowel (M = 11.9 s, SD = 6.7) than to the familiar test vowel (M = 8.6 s, SD = 4.0), t(26) = −2.29, p = .031, r2 = .409. Listening times to the habituation vowel (M = 9.3 s, SD = 3.8) and to the familiar test vowel were not significantly different, t(26) = 0.633, p = .532.

To compare the novelty response across experiments, we conducted a mixed ANOVA with experiment (Experiment 1: /i/ habituation group vs. Experiment 2) as a between-subjects factor and test-trial type (familiar vs. novel) as a within-subjects factor. There was no effect of experiment or interaction of experiment with test-trial type, F < 1. There was only a main effect of test-trial type, F(1, 55) = 5.10, p = .028, ηp

2 = .085, which shows that the novelty response was comparable across experiments. As predicted, total habituation time was positively correlated with the size of the novelty score (listening time on novel test trials minus listening time on familiar test trials; r2 = .460, p = .014). A follow-up analysis showed that roughly half of the infants in Experiment 2 (n = 13) displayed total habituation listening times comparable with those in Experiment 1 (within 1 SD), and the other half (n = 14) listened much longer, with levels more than 1 standard

deviation above the mean of those in Experiment 1 (M = 164.19 s, SD = 68.95). As shown in Figure 6, a reliable nov-elty effect was observed only in the long-listener subgroup in Experiment 2, t(12) = −2.42, p = .032. Thus, increased listening time during habituation was clearly linked to suc-cessful recognition of the novel vowel in this task.

Discussion

As predicted, processing demands increased when the infant vowels were part of the stimulus set that infants needed to encode to form a habituation category. Despite the added demands, infants were able to track changes in vowel quality and displayed a novelty response compa-rable with the effect observed in Experiment 1. However, unlike in Experiment 1, the magnitude of the novelty score in Experiment 2 was directly related to the amount of listening time invested during habituation. Overall, Experiment 2 shows that including infant vowels in a mul-titalker context increases processing demands, but the added costs fall within the cognitive abilities of infants.

General Discussion

The perception of talker variability is a focal issue in research on infant speech perception. Until now, infant speech has been left out of the picture despite its rele-vance in infant speech development. The findings reveal that young infants’ ability to track vowel categories across talkers extends to infant vowel productions. We observed this ability in prebabbling infants who lack the motor skills required to produce the target vowels in a con-trolled way. Therefore, unless they engage in frequent interactions with older babies, they will have limited

0

20

40

60

80

100

120

140

160

Experiment 1 Experiment 2

Mea

n Li

sten

ing

Tim

e (s

)

*

Fig. 5. Mean listening time of the /i/ habituation groups in Experi-ment 1 and Experiment 2. Error bars show standard errors of the mean. The asterisk indicates a significant difference between experiments (p < .05).

0

5

10

15

20

25

Familiar

Mea

n Li

sten

ing

Tim

e (s

)

Trial TypeNovel Familiar Novel

Short Subgroup Long Subgroup

*

Fig. 6. Mean listening time during familiar and novel test trials for infants in the two habituation subgroups of Experiment 2. Error bars show standard errors of the mean. The asterisk indicates a significant difference between trial types (p < .05).

at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 9: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

Infants’ Perception of Infant Vowels 1455

exposure to infant speech. Moreover, infants displayed this skill in the visual-habituation procedure, which relies on infants’ spontaneous listening behavior with no con-ditioning or reinforcement to support or guide them. Thus, without explicit training or reinforcement, and with minimal exposure, infants were able to extrapolate beyond their immediate experience and adapt quickly to large acoustic shifts related to the size of the talker.

The precise mechanisms underlying this perceptual capacity are not fully understood. Previous research has suggested that tracking vowel quality across talker differ-ences related to physical attributes (size) may be a natural process involving innate auditory processes (Smith, Patterson, Turner, Kawahara, & Irino, 2005). Further work is also needed to determine the precise information infants used to recognize the same vowel category across talkers; this issue is not fully resolved even for adults ( Johnson, 2005). The vowel spaces plotted in Figure 2 show that it is not possible for infants to recognize vowel categories on the basis of raw formant values. Infants may be relying on complex relational or configural patterns, or more global spectral signal properties. Nevertheless, the present findings indicate that infants can process the “who” (talker) and the “what” (vowel quality) aspects of the speech signal across a diverse talker array.

Our results also show that prebabbling infants have some perceptual skills that may allow them to assess their own vocal output well before they are accomplished babblers. Thus, both the high-resource/imitation view and the low-resource/interaction view of early vocal learning remain viable. The perceptual skills documented here show that learning via auditory imitation alone is possible, at least when infants are targeting highly dis-tinct phonetic categories. Learning via interaction, which does not require highly refined perceptual skills, is also feasible and may well be critical when the target sounds involve more subtle phonetic differences. However, it is important to note that although the present findings uncover relevant perceptual skills, they do not show how infants perceive their own productions during vocal learning. This remains an important issue for future research.

Across both experiments, infants recognized the novel vowel. Yet listening time increased when the habituation set included infant vowels, and longer habituation times were clearly linked to recognition of the novel vowel. This suggests that cognitive demands increased when infant speech was part of the multitalker array during habituation. In other studies, infants have not adapted quickly and effectively to talker differences involving much smaller acoustic differences. The robust, rapid adjustments observed in this study may reflect a natural response to the sharp shift in acoustic variability or nov-elty tied to the infant signals, a perceptual bias favoring

infant speech, or some combination of these factors. It is noteworthy that high voice pitch and expanded vowel space characterize infant speech as well as infant-directed speech, which is known to attract infant attention and facilitate speech processing (Fernald & Kuhl, 1987). This raises the intriguing possibility that infant-directed speech may help prime the infant perceptual system to perceive infant vocalizations.

In summary, the present study provides the first evi-dence that prebabbling infants can recognize infant- produced vowels as phonetically similar to adult and child productions. The present study breaks new ground. Exploring how infants perceive speech with infant vocal properties brings researchers a step closer to understand-ing the interplay between speech perception and pro-duction in early language development.

Author Contributions

L. Polka developed the study concept. L. Ménard created the auditory stimuli. Infants were tested by M. Masapollo. All authors contributed to the data analysis and writing of the man-uscript. All authors approved the final version of the manuscript for submission.

Acknowledgments

We thank Leanne Ma and Sunah Jeon for assistance with infant recruitment and testing, and Athena Vouloumanos, Susan Rvachew, Kris Onishi, Michael Tyler, Terry Gottfried, and Janet Werker for feedback on this manuscript.

Declaration of Conflicting Interests

The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.

Funding

This work was supported by Discovery Grants from the Natural Sciences Engineering and Research Council awarded to L. Polka (DG 105397) and to L. Ménard (DG 312395).

Supplemental Material

Additional supporting information may be found at http://pss .sagepub.com/content/by/supplemental-data

References

Arterberry, M. E., & Bornstein, M. H. (2002). Variability and its sources in infant categorization. Infant Behavior & Development, 25, 515–528.

Cohen, L. B., Atkinson, D. J., & Chaput, H. H. (2000). Habit X: A new program for training and organizing data in infant perception and cognition studies (Version 1.0). Austin: University of Texas.

Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for motherese speech. Infant Behavior & Development, 10, 279–293.

at SEIR on November 19, 2014pss.sagepub.comDownloaded from

Page 10: Who's Talking Now? Infants' Perception of Vowels With Infant Vocal Properties

1456 Polka et al.

Fitch, W. T., & Giedd, J. (1999). Morphology and development of the human vocal tract: A study using magnetic resonance imaging. Journal of the Acoustical Society of America, 106, 1511–1522.

Goldstein, M. H., King, A. P., & West, M. J. (2003). Social inter-action shapes babbling: Testing parallels between bird-song and speech. Proceedings of the National Academy of Sciences, USA, 100, 8030–8035.

Houston, D. M., & Jusczyk, P. W. (2000). The role of talker-specific information in word segmentation by infants. Journal of Experimental Psychology: Human Perception and Performance, 26, 1570–1582.

Howard, I. S., & Messum, P. (2011). Modeling the develop-ment of pronunciation in infant speech acquisition. Motor Control, 15, 85–117.

Hunter, M. A., & Ames, E. W. (1988). A multifactor model of infant preferences for novel and familiar stimuli. In L. P. Lipsitt (Ed.), Advances in child development and behavior (pp. 69–95). New York, NY: Academic Press.

Johnson, K. (2005). Speaker normalization in speech perception. In D. B. Pisoni & R. Remez (Eds.), The handbook of speech perception (pp. 363–389). Oxford, England: Blackwell.

Jusczyk, P. W., Pisoni, D. B., & Mullennix, J. (1992). Some con-sequences of stimulus variability on speech processing by 2-month-old infants. Cognition, 43, 252–291.

Kent, R. D., & Murray, A. D. (1982). Acoustic features of infant vocalic utterances at 3, 6, and 9 months. Journal of the Acoustical Society of America, 72, 353–365.

Kuhl, P. K. (1979). Speech perception in early infancy: Perceptual constancy for spectrally dissimilar vowel categories. Journal of the Acoustical Society of America, 66, 1668–1679.

Kuhl, P. K. (1983). Perception of auditory equivalence classes for speech in early infancy. Infant Behavior & Development, 6, 263–285.

Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience, 5, 831–843.

Kuhl, P. K., & Meltzoff, A. N. (1996). Infant vocalizations in response to speech: Vocal imitation and developmental change. Journal of the Acoustical Society of America, 100, 425–438.

MacDonald, E. N., Johnson, E. K., Forsythe, J., Plante, P., & Munhall, K. (2012). Children’s development of self- regulation in speech production. Current Biology, 22, 113–117.

Ménard, L., Schwartz, J.-L., & Boe, L.-J. (2004). Role of vocal tract morphology in speech development: Perceptual

targets and sensorimotor maps for synthesized French vow-els from birth to adulthood. Journal of Speech, Language, and Hearing Research, 47, 1059–1080.

Mullennix, J. W., & Pisoni, D. B. (1990). Stimulus variability and processing dependencies in speech perception. Attention, Perception, & Psychophysics, 47, 379–390.

Pawlby, S. J. (1977). Imitation interaction. In H. R. Schaffer (Ed.), Studies in mother-infant interaction (pp. 203–223). London, England: Academic Press.

Polka, L., Jusczyk, P. W., & Rvachew, S. (1995). Methods for studying speech perception in infants and children. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 49–89). Baltimore, MD: York Press.

Polka, L., Rvachew, S., & Molnar, M. (2008). Speech perception by 6- to 8-month-olds in the presence of distracting sounds. Infancy, 13, 421–439.

Rvachew, S., Mattock, K., Polka, L., & Ménard, L. (2006). Developmental and crosslinguistic variation in the infant vowel space: The case of Canadian English and Canadian French. Journal of the Acoustical Society of America, 120, 2250–2259.

Rvachew, S., Slawinski, E. B., Williams, M., & Green, C. (1996). Formant frequencies of vowels produced by infants with and without early onset otitis media. Canadian Acoustics, 24, 19–28.

Schmale, R., & Seidl, A. (2009). Accommodating variability in voice and foreign accent: Flexibility of early word represen-tations. Developmental Science, 12, 583–601.

Schroeder, M., Atal, B., & Hall, J. (1979). Optimizing digital speech coders by exploiting masking properties of the human ear. Journal of the Acoustical Society of America, 66, 1647–1652.

Smith, D. R. R., Patterson, R. D., Turner, R., Kawahara, H., & Irino, T. (2005). The processing and perception of size information in speech sounds. Journal of the Acoustical Society of America, 117, 305–318.

Vorperian, H. K., & Kent, R. D. (2007). Vowel acoustic space development in children: A synthesis of acoustic and ana-tomic data. Journal of Speech, Language, and Hearing Research, 50, 1510–1545.

Zlatin, M., & Koenigsknecht, R. (1976). Development of the voicing contrast: A comparison of voice onset time in stop perception and production. Journal of Speech and Hearing Research, 19, 93–111.

at SEIR on November 19, 2014pss.sagepub.comDownloaded from