48
Accepted Manuscript “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic prosody in L2 hyperarticulation Yuki Asano, Michele Gubian PII: S0167-6393(17)30201-7 DOI: 10.1016/j.specom.2017.12.011 Reference: SPECOM 2518 To appear in: Speech Communication Received date: 30 May 2017 Revised date: 6 November 2017 Accepted date: 18 December 2017 Please cite this article as: Yuki Asano, Michele Gubian, “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic prosody in L2 hyperarticulation, Speech Communication (2018), doi: 10.1016/j.specom.2017.12.011 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

“Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

Accepted Manuscript

“Excuse meeee!!”: (Mis)coordination of lexical and paralinguisticprosody in L2 hyperarticulation

Yuki Asano, Michele Gubian

PII: S0167-6393(17)30201-7DOI: 10.1016/j.specom.2017.12.011Reference: SPECOM 2518

To appear in: Speech Communication

Received date: 30 May 2017Revised date: 6 November 2017Accepted date: 18 December 2017

Please cite this article as: Yuki Asano, Michele Gubian, “Excuse meeee!!”: (Mis)coordination oflexical and paralinguistic prosody in L2 hyperarticulation, Speech Communication (2018), doi:10.1016/j.specom.2017.12.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a serviceto our customers we are providing this early version of the manuscript. The manuscript will undergocopyediting, typesetting, and review of the resulting proof before it is published in its final form. Pleasenote that during the production process errors may be discovered which could affect the content, andall legal disclaimers that apply to the journal pertain.

Page 2: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

“Excuse meeee!!”: (Mis)coordination of lexical and paralinguisticprosody in L2 hyperarticulation

Yuki Asano and Michele Gubian

Corresponding author:Yuki AsanoUniversity of TubingenWilhelmstraße 5072076 [email protected]

Michele GubianSchool of Experimental PsychologyUniversity of Bristol, UKOffice 3D3The Priory Road Complex,Priory Road, Clifton BS8 1TUemail: [email protected]

Page 3: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Abstract

The present study investigates how first language (L1) and second language (L2)speakers coordinate lexical and paralinguistic prosody (lexically determined pitchaccent and utterance-level intonation) in hyperarticulated speech. Modificationsof F0 and segmental durations in repeated utterances were analyzed jointly us-ing a novel method, Functional Principal Component Analysis (FPCA). By testingJapanese and German participants as L1 and as L2 speakers of the respective lan-guages, the task elicited a situation in which L2 speakers had to handle F0 andsegmental durations when these prosodic properties mainly carry a paralinguisticfunction in their L1, but carry a lexical one in their L2 or vice versa. Results showlanguage-dependent and -independent modifications in hyperarticulated speech inboth speaker groups. The language-dependent modifications were negatively trans-ferred to the respective L2. The restriction of Japanese lexical pitch accent preventedJapanese participants from phonologically modifying F0 contours in L1 and L2, andGerman L2 speakers applied the German pitch accent rules in L2 Japanese utter-ances. Lexical rules outweighed paralinguistic rules in modifying prosody. At thesame time, both German and Japanese speakers hyperarticulated the utteranceswith utterance-final lengthening, larger pitch range and higher peak. Furthermore,we argue that there may be different dominance orders of prosodic properties inJapanese and German according to which prosodic property was more likely to bemodified in hyperarticulated speech. L2 speakers were more prone to change a moredominant prosodic property in their L1 to convey a paralinguistic meaning, i.e.,segmental durations for Japanese and F0 for German. The findings are particularlynoteworthy as the uttered words were highly frequent words of which L2 learnersshould have had sufficient L2 input before.

Key words

lexical vs. paralinguistic prosody, L1 & L2 hyperarticulation, Functional PrincipalComponent Analysis, Functional Data Analysis

1

Page 4: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

1 Introduction

Imagine being in a crowded and noisy bar where you are trying to call a waiter toorder a drink. The waiter does not hear you calling and you have to try to callhim again. In such a situation, a speaker often reinforces and hyperarticulates anutterance in order to pursue a response (Stivers and Rossano, 2010a,b), by changingthe syntactic structures or by changing prosodic properties1 (Stent et al., 2008).The current study focuses on the latter, the prosodic changes, especially focusingon intonation and segmental length which are measured by F0 and segmental du-rations, respectively. Now imagine being in a crowded and noisy bar in a foreigncountry. Should you try to catch the waiter’s attention and pursue response, likein the situation above, you would be willing to reinforce the utterance by changingL2 prosody. Difficulties may occur in such a situation when the uses of L2 andL1 prosody are different. In some languages, for instance in Japanese, prosodicproperties such as F0 and segmental length are primarily used to convey lexicalinformation and distinguish words (Sekiguchi and Nakajima, 1999); while in otherlanguages, for instance in German or English, these prosodic properties primarilyserve at the paralinguistic level to convey emotion or attitudinal states of a speaker(Baumann and Grice, 2006; Gibbon, 1998; Liscombe, 2007) or at the post-lexicallevel to signal syntactic or pragmatic information such as topic vs. focus, sentencemode (question vs. statement) (e.g., Braun, 2006; Fery, 1993). What matters isthat lexical and paralinguistic prosodic properties have to be coordinated withoutcompeting with each other. The question investigated in the present study is howa speaker reinforces an utterance in an L2 when the same prosodic features in theL1 and L2 are primarily used at different linguistic levels (lexical vs. paralinguisticlevels), such as in the case reported between Japanese and German.

The difficulties in acquiring L2 prosody are well known since almost every adultL2 learner retains an immediately identifiable foreign accent, and such a foreignaccent has also been observed in otherwise highly proficient L2 speakers who mas-tered the grammatical system very well (Altmann and Kabak, 2011; Els and Bot,1987; Gut, 2003; Jilka et al., 2007; Mennen, 2007; Oyama, 1976). One of the majorfactors causing a foreign accent is negative language transfer from one’s L1 to L2(Gass and Selinker, 1994), which has been an important issue in L2 acquisition re-search. Negative transfers are not limited to the acquisition of novel sounds at thesegmental domain (e.g., the distinction between /r/ and /l/ by Japanese learnersof German) but also extend to the prosodic domain (Trouvain and Gut, 2007, foran overview). However, the number of studies investigating the latter domain isstill smaller compared to the former (Asano, 2016). As a consequence, the theoriesof phonological acquisition consider phenomena mostly at the segmental level, butrarely at the prosodic level, and mostly focus on L2 perception (Best, 1995; Bestet al., 2001; Best and Tyler, 2007; Flege et al., 1995; Flege, 1999; Flege et al., 2002;Kuhl, 1991; Kuhl and Iverson, 1995) rather than on production (with some excep-

1The term prosody broadly refers to the phonological organization of individual sounds (i.e.,segments) into higher-level constituents, which is cued by variation of F0, duration, amplitude andsegment quality (Shattuck-Hufnagel and Turk, 1996; Ueyama, 2000). Since phonetic correlatesof stress and intonation often manifest at the segmental level, we will use the more general term“prosody” for these phenomena instead of using the division of segmentals vs. suprasegmentals.

2

Page 5: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

tions such as De Bot, 1992; Kormos, 2006, 2012). Therefore, the incorporation ofL2 prosody into models of speech processing and acquisition is still awaited.

In this work, we will investigate Japanese and German2 as an example of two lan-guages whose prosodic systems contrastively differ from each other. Regarding thedifferences in the use of F0, in Japanese, the meaning of a word can change depend-ing on the position of a lexically specified pitch fall, called a pitch accent (Kubozono,2011; Vance, 1987), e.g., hashi-ga = chopsticksNOM, hashı-ga = bridgeNOM, hashi-ga= edgeNOM, the grave accent indicates the position of the pitch fall, if specified. Inother words, the presence or absence of a pitch accent is an inherent property ofa word. In German, F0 is one of the phonetic constituents besides duration andintensity that build a metrical word stress(Wiese, 2000). In other words, a stressedsyllable receives a pitch movement. Crucially, however, the direction of a pitchmovement is not lexically fixed, i.e., both rising and falling pitch accents are alloweddepending on the post-lexical or paralinguistic information that a speaker intendsto convey (Baumann and Grice, 2006; Braun, 2006; Fery, 1993; Gibbon, 1998; Griceet al., 2005) (see Figure 1).

Figure 1: Example contours of the word Entschuldigung, one with a falling pitchaccent on the stressed syllable (schul) (left), the other with a rising pitch accent(right).

With respect to a phonological and phonetic form of a pitch accent, in Japanesethere is only one phonological type of a pitch accent that is phonetically realizedas a sharp fall from a high level occurring near the end of the accented mora to alow level in the following mora (Gussenhoven, 2004; Vance, 1987; Venditti, 2000).This phonetic form does not exist in German as a falling pitch accent. In German,the phonological inventory of pitch accents is richer (six basic pitch accent types)(Baumann et al., 2001; Grice et al., 1996) and each accent shape is associated witha specific pragmatic meaning that contributes to the overall meaning of the utter-ance, as in English (Pierrehumbert and Hirschberg, 1990). Finally, the phoneticcorrelates of a pitch accent also differ in the two languages. Japanese pitch accentsare characterized only in changing of F0, while German pitch accents relate to thephonetic changes in F0, duration and intensity (Wiese, 2000).

Japanese pitch accent does not trigger a longer duration or a higher intensity,because these cues contribute to the perception of acoustic length, which can changethe meaning of a word. Japanese is a mora-timed language (Cutler and Otake, 1999;

2In this paper, “Japanese” refers to standard Tokyo Japanese and “German” to standard Ger-man unless otherwise stated.

3

Page 6: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Pike, 1946): i.e., a syllable consisting of a vowel or a vowel and consonant takesup one timing unit (mora) and all morae have approximately the same duration.Japanese distinguishes between long and short vowels as well as long and shortconsonants (e.g., kite = come , ki:te = listen, kit:e = cut, all in the imperativeform - the colon indicates a long segment) (Kubozono, 1993, 1999). On the otherhand, German is a stress-timed language in which stressed syllables occur at regularintervals. In German, only vowel length is contrastive for /a/ and /E/ (Wiese, 2000),while consonant length is not contrastive3.

The different uses of the same prosodic properties in Japanese and Germanmay become hurdles in L2 acquisition. If negative transfer occurs from Germanto Japanese, then German learners of Japanese may risk communication problemssince words may not be identifiable. In case of transfer from Japanese to German,Japanese learners of German may deliver undesirable attitudes without being awareof it. For instance, a falling pitch accent, described as H?+L L-% in the JapaneseToBI system (Venditti, 1997), is the lexical default of a Japanese pitch accent with-out conveying a certain emotional status, while this realization in German is oneof many phonological varieties and conveys a certain emotional meaning such asfrustration or seriousness (Baumann et al., 2001; Grice and Baumann, 2002). IfJapanese learners of German are able to produce only this default Japanese phono-logical form in German L2, regardless of their emotions or situations, they maysometimes risk conveying undesired paralinguistic information.

Our study aims at testing how L2 learners whose L1 and L2 exhibit prosodicdifferences, like those found between Japanese and German, manage to coordinatelexical and paralinguistic prosody to produce an L2 utterance in an appropriateway. To achieve this end, an experiment was conducted to elicit a situation inwhich lexical and paralinguistic properties of these languages compete against eachother. In a simulated attention-seeking situation, speakers were asked to repeat thesame utterance (Excuse me in Japanese and German, to call a waiter) and wereinstructed to imagine becoming frustrated. We analyzed how L1 and L2 speakers ofJapanese and German changed their F0 contours and segmental durations to conveythis paralinguistic (emotional) information. The crucial point is that the Japaneseutterance sumimasen, “excuse me”, was bound to a lexically fixed falling pitchaccent, while the German Entschuldigung, “excuse me”, was not, i.e., both fallingand rising pitch accent were possible depending on the paralinguistic informationthat a speaker intended to convey.

The investigation carried out in this study presents several elements of novelty.First, we investigated bi-directional language transfer as defined in Ueyama (2000),namely language A as L1 to language B as L2 and language B as L1 to language Aas L24. Most previous studies have examined only one direction of language transfer(i.e., either transfer from language A to language B or visa versa), while only a fewstudies of L2 speech production have examined the bi-directional prosodic transferwithin one single study (e.g., Trouvain et al., 2016; Ueyama, 2000; Wester et al.,

3Consonant length contrasts may be lexically used in Swiss German and in some Germandialects such as in Alemannic German (Kraehenmann, 2001; Kraehenmann and Lahiri, 2008).

4Note that the term “‘bi-directional language transfer” in this paper is not used to refer tolanguage transfer from L1 to L2 and L2 to L1, as Mennen (2004) has reported.

4

Page 7: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

2014). Bi-directional analysis enables us to examine cross-linguistic similarity andthe symmetry in transferring, thus is more informative in investigating prosodictransfer than the analysis of only one direction. Second, we examined L2 prosodicchange for a paralinguistic purpose in L2 production that has been rarely studiedto date (exceptions include Li et al., 2013; Maekawa, 2004); if any, there are onlya limited number of studies investigating this issue in L2 perception (Chen et al.,2004; Chen, 2005; Fujisaki and Hirose, 1993). Third, we exploited the fact thatL2 learners’ L1 and L2 exhibit different uses of the same prosodic property F0.Specifically, F0 is primarily used for a lexical purpose in one language, while in theother, it primarily exhibits a post-lexical function. Our experimental design set asituation in which participants had to coordinate the different uses of F0 in theirL1 and L2 to make their L2 prosodic production appropriate. Fourth, we carriedout a joint analysis of F0 contours and segmental durations using an advancedstatistical analysis method called Functional Principal Component Analysis (FPCA)(Gubian et al., 2011, 2015; Ramsay and Silverman, 2009). FPCA is a mathematicalframework that allowed us to compute descriptive and inferential statistical analyses(such as linear-mixed effect regression, henceforth LMER) on whole F0 contoursand segmental durations. The most popular method to analyze F0 contours inlinguistic research requires phonological labeling using the ToBI system (Beckmanet al., 2005; Grice and Baumann, 2002; Venditti, 1997) and measurement of localphonetic variables such as peak heights, pitch ranges, slopes and segmental durationsthat have to be defined ad hoc as relevant factors to characterize F0 contours. Thestatistical analysis that follows is then based on those local descriptors. By doingso, there is a risk of a potential loss or distortion of information caused by usinginadequate or even biased shape descriptors (e.g., by imposing a fixed number of“relevant” peaks where plateaus or more complex configurations may be found).The application of FPCA on contour analyses reduces this risk. The FPCA toolsare in charge of defining and computing data-driven shape and duration descriptorssolely based on the statistical properties of the input data, i.e., F0 contours andsegmental durations. The output of FPCA is in numerical form combined with agraphical counterpart, the latter showing the meaning of the former in terms ofdynamic traits of the input contours. The user (i.e., the linguist who analyzes thedata) can continue the analysis by using the numerical output from FPCA as inputto further ordinary statistical analysis, in this study LMER.

The rest of the paper is structured as follows: Section 2 provides a literaturereview and introduces our hypotheses. After that, Section 3 reports methodologicalfacts and steps made for this study. Background knowledge on FPCA is also pro-vided in this methodological section. In Section 4, results of FPCA are translatedin a way to show how participants modified F0 and durations. Section 5 providesdiscussions and summaries of the outcomes, followed by conclusions in Section 6.Mathematical details of FPCA are provided in Appendix A.

2 Hypotheses and related work

From now onwards, the four groups of participants in the present study are namedas follows: Japanese L1 speakers and German L1 speakers - speakers who produced

5

Page 8: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

utterances in their L1 - and German L2 speakers and Japanese L2 speakers - speakerswho produced utterances in their L2. Note that, in one task, the participants hadroles as L1 speakers, and, in a second task, as L2 speakers, i.e., all speakers of bothlanguages performed the same task both in their L1 and in their L2 (see participantdetails in Section 3.1).

When speakers believe that their utterances were not heard or understood, theyreinforce the utterance to try to get the utterances heard or understood . Prosodicadaptations in reinforcement have been widely studied in terms of hyperarticula-tion (Lombard, 1911; Oviatt et al., 1996, 1998; Pirker and Loderer, 1999; Stentet al., 2008). The term hyperarticulation covers a wide range of adaptations underdifferent environmental factors such as infant-directed speech (Fernald and Simon,1984; Fernald, 1989; Grieser and Kuhl, 1988; Kitamura et al., 2001), speech to L2speakers (Sikveland, 2006), speech to automatic speech recognizers (Oviatt et al.,1996; Pirker and Loderer, 1999), or speech in noise, the so-called Lombard speech(Lombard, 1911). On one hand, common phenomena have been clearly manifestedacross various types of hyperarticulated speech despite the different experimentalsettings and languages in focus. Previous studies show overlapped results such aslonger segmental durations (i.e., slower speech tempo) (Oviatt et al., 1998; Soltauand Waibel, 2000a,b; Stern et al., 1983); exaggeration of pitch contours; variationsof F0; a larger pitch range and higher mean F0 (e.g., Fernald, 1989, 1991; Katz et al.,1996; Oviatt et al., 1996; Papousek, 1992; Trehub et al., 1993); and more rhythmic,musical and stylized intonation (Fernald, 1989; Ladd, 1978; Trainor et al., 1997).Even Chinese-speaking mothers were also found to use higher F0 and wider pitchrange in infant-directed speech (Grieser and Kuhl, 1988), showing a tonal languageexhibiting the same patterns of acoustic modification as a non-tonal language suchas German (see Kitamura et al., 2001, for similar results in Thai). These findingssuggest that the exaggerated acoustic characteristics of hyperarticulated speech areuniversal, although a few studies have reported language-specific modifications aswell.

Languages that share similar or common prosodic systems showed similar be-havior (see Soltau, 2005, for English and German in hyperarticulation to automaticspeech recognizers), while those whose prosody are typologically very different shareless common phenomena (see Shochi et al., 2009, for English and Japanese in infant-directed speech). The languages focused on in our experiment, Japanese and Ger-man, belong to the latter case, i.e., their respective prosodic systems share fewcommon characters and different language-specific prosodic modifications are ex-pected in both L1 speaker groups and interferences from L1 to L2 are predicted.As with Japanese L1 specific modifications, little room seems to remain to modifyprosody for a paralinguistic purpose given the aforementioned lexical restriction inJapanese prosody. Theoretically, the phonological encoding in Levelt’s blueprint ofthe speaker (Levelt, 1989, 1999) supports the view that the lexical use of prosodyoutweighs the paralinguistic use of prosody in speech production. The model showsthat lexical prosody is assigned to the metrical and segmental spellout before a globalprosody is generated in the prosody generator. This means that lexical prosody isprocessed before the post-lexical and paralinguistic uses of prosody are considered,and that the post-lexical and paralinguistic uses of prosody are generated in a way

6

Page 9: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

that allows no interference to lexical prosody. Translating this assumption into thecurrent experiment, Japanese lexical use of F0 should be maintained even whileconveying paralinguistic information.

The question is which aspects of Japanese lexical prosody can still be modifiedto convey post-lexical or paralinguistic information. The modification of Japaneseprosody for a post-lexical purpose has been widely studied in terms of focal promi-nence and infant-directed speech (e.g., Ishihara, 2011; Kitamura and Burnham,2003; Nagahara, 1994), while production studies on the modification of prosody toconvey paralinguistic information still constitutes the minority of research (Maekawa,2004; Li et al., 2013; Ofuka et al., 2000; Shochi et al., 2008).

Focal prominence often requires us to examine the relationship among con-stituents in a whole sentence, whereas our experiment includes an one-word ut-terance. Still, some of the findings in the previous studies on focal prominenceseem to be relevant for our study. For example, Japanese speakers were reportedto use a variety of prosodic mechanisms to mark focal prominence, including: localpitch range expansion; dramatic rise at the beginning of the focus constituent to setoff the focal constituent; post-focal subordination; and prominence-lending bound-ary pitch movements, but, importantly, without the phonological modification of apitch accent (see the review in Venditti et al., 2008). Moreover, analyses on infant-directed speech (Shochi et al., 2009) indicate that Japanese speakers would lengthenan utterance to hyperarticulate speech either by lengthening all morae equally, sothat the durational proportions within the word would not change, or by length-ening the word-final morae, an occurrence which has been reported in a varietyof languages, which is called utterance-final lengthening (Lehiste, 1972) given thatthe utterance-final lengthening does not lead to a lexical competition by formingany contrastive pair in terms of length. The utterance-final lengthening has beenalso found in Japanese, although mainly in spontaneous speech data rather thanin laboratory speech (Ota et al., 2003; Port et al., 1987; Warner and Arai, 2001).Warner and Arai (2001) found that the mean duration of the final two morae of theword was generally longer than the mean duration of morae even in speech withouthyperarticulation. With respect to the modification of prosody to convey paralin-guistic information, Maekawa (2004) and Li et al. (2013) showed that Japanese L1speakers expressed different emotional statuses by varying maximum and minimumof F0 contours, as well as F0 mean values though notably without changing thephonological form of a pitch accent. Ofuka et al. (2000) investigated how Japanesespeakers prosodically convey politeness and showed that F0 of the final vowel of thesentences, i.e., of the question-particle ka, and overall speech rate were significantpredictors for the politeness. The sentence-final question particle ka does not carryany pitch accent, thus it does not contribute to a lexical distinction and F0 of theparticle can be freely modified. The same is true for the overall speech rate thatdoes not change length proportions of the morae of an utterance, thus it is the cuethat can be modified without competing with the rule of Japanese lexical lengthcontrast.

As with German, prosodic modifications for a paralinguistic purpose have beeninvestigated in a series of studies on prosodic features in emotional speech (Burkhardtet al., 2005; Kienast and Sendlmeier, 2000; Paeschke et al., 1999; Paeschke and

7

Page 10: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Sendlmeier, 2000; Paeschke, 2004), which analyzed varieties of phonetic cues suchas F0 range, steepness of F0, duration, spectral tilt and formant in the perceptionof emotions. Emotional speech generally exhibits higher F0 mean values, larger F0

range, steep F0 pitch accent, longer duration and higher loudness (Paeschke et al.,1999; Paeschke and Sendlmeier, 2000; Paeschke, 2004). These modifications arein line with the aforementioned common observations in hyperarticulated speechacross languages. Also in focal prominence, similar modifications in F0 and speechrate have been observed such that if a sentence contains a focus, the reference topline of F0 is raised, provoking a sudden boosting of the pitch accent that correlateswith the focused word and longer duration (e.g., Baumann et al., 2007; Braun, 2006;Fery and Ishihara, 2009). Notably, different phonological types of pitch accent areused to mark broad, narrow and corrective foci, i.e., to mark different informationstructures in German (e.g., Braun, 2006; Fery and Ishihara, 2009), which has notbeen observed in Japanese due to the lexical restriction of the Japanese pitch ac-cent (Kubozono, 2007). Analyses of German infant-directed speech also showed agreater F0 range and higher mean F0 values and longer durations and more alter-nating F0 patterns compared to adult-directed speech with different phonologicaltypes of pitch accents (Zahner et al., 2015). Apart from the common modificationsin hyperarticulated speech across languages (see above), another German-specificprosodic modification in the current study may be the modification of phonologicaltypes of pitch accents. It is widely accepted that phonological types of pitch ac-cents in German convey paralinguistic information such as emotions and attitudesof a speaker (e.g., Banse and Scherer, 1996; Baumann et al., 2001, 2007; Grice andBaumann, 2002; Grice and Bauman, 2007; Kohler, 2005, 2008) and specific into-nation patterns reflect specific emotions both in speech production and perceptionstudies (e.g., Banse and Scherer, 1996; Baumann et al., 2001; Pierrehumbert andHirschberg, 1990; Raithel and Hielscher-Fastabend, 2004). For example, a fallingcontour in German is perceived as more impolite than a rising contour (Baumannet al., 2001). Although previous studies on hyperarticulated speech in German havenot reported this modification, we still expect phonological changes in German F0

contours in the repeated utterances produced by German L1 speakers. In partic-ular, participants will be expected to get frustrated in the repetitions because ofthe communication failure. Therefore, we expect to find more falling pitch contoursproduced by German L1 speakers in the repetitions.

Based on these previous findings, the following hypotheses for L1 hyperarticula-tion are stated:

Hypothesis 1 (Language-specific modifications) The way in which L1 speak-ers phonologically modify F0 contours to convey paralinguistic information dependson the use of F0 in their L1.

More specifically, Japanese L1 speakers will not change F0 contours phonologi-cally in the repetitions since a Japanese falling pitch accent is lexically determined.German L1 speakers will vary F0 contours in the repetitions as the F0 contours ofthe speakers signal one’s attitudinal and emotional states in German. They are morelikely to produce a rising contour in the first attempt, whereas in the repetitions this

8

Page 11: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

changes towards a falling contour due to the frustration caused by communicationfailure.

Secondly, both in Japanese and German L1 data, utterance lengthening andphonetic modifications of F0 (i.e., expansion of local pitch) are expected in therepeated utterances as general phonetic modifications in hyperarticulated speech:

Hypothesis 2 (Language-independent modifications) A general utterance length-ening (with a particularly greater utterance-final lengthening) and the expansion ofa local pitch range are expected in all speaker groups.

Finally, we will predict the following hypothesis for L2 hyperarticulation :

Hypothesis 3 (Transfer from L1 to L2) The difference predicted in Hypothe-sis 1 holds true in L2 production.

Recent studies on L2 prosody have shown phonetic and phonological interfer-ence from an L1 to an L2 in both speech perception and production (e.g., Chenand Mennen, 2008 for L2 English - L1 Italian; Garding, 1981 for L2 French - L1Swedish and Greek; Jilka et al., 2007 for L2 English - L1 German; Jun and Oh,2000 for L2 Korean - L1 English; Mennen et al., 2010 for L2 English - L1 Punjabior Italian; Ueyama and Jun, 1998 for L2 English - L1 Korean or Japanese). Studieson prosodic transfer from L1 German to L2 Japanese and/or from L1 Japanese toL2 German are still scarce and most of these are perception studies (Hayashi et al.,2000; Niikura et al., 2011). Only a few previous studies have focused on prosodicchanges for a paralinguistic purpose in L2 production like the current study does(e.g., Maekawa, 2004), and just a few studies investigated this issue in L2 perception(Chen et al., 2004; Chen, 2005; Fujisaki and Hirose, 1993). In absence of relatedprevious experimental work on L2 production, we limit ourselves to generally hy-pothesize that the phonetics and phonology of L1 prosody will interfere with theacquisition of L2 prosody, so that the use of prosody by L2 speakers differs from theone by L1 speakers. More specifically, German L2 speakers are expected to vary F0

contour in repetitions despite the Japanese lexical restriction. Japanese L2 speakersare predicted to remain faithful to one pattern of F0 contour and will not vary theirF0 contours. They will be more likely to produce a falling pitch accent (H?+L) thana rising accent (L?+H) as they are used to doing so in their L1. As general prosodicmodifications, both German and Japanese L2 speakers are expected to show localpitch range expansions and utterance-final lengthening in the repetitions.

3 Experiment

The experiment reported in this work tested how paralinguistic information affectsprosody at the lexical level and vice versa. In a simulated task, participants wereinstructed to get the attention of a waiter in a crowded and noisy bar by producingthe phrase “Excuse me” in Japanese and German repeatedly until they were toldthat they have succeeded in getting the waiter’s attention. Participants performedthe same task both in their L1 and in their L2 language (Japanese and German).The task was designed to elicit the situation in which L2 speakers would have to use a

9

Page 12: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

given prosodic properties (F0 and segmental durations) for a purpose that is differentfrom the purpose that same properties are used for in their L1. We exploited thefact that the phonological form of a Japanese pitch is lexically determined, whilethe German one is not and can be freely used for a paralinguistic purpose.

3.1 Participants

15 speakers of Tokyo-Japanese who were learners of German (f = 8, m = 7, agedbetween 19 and 36, mean age = 25.1) and 15 speakers of Standard German who werealso learners of Japanese (f = 6, m = 9, aged between 22 and 35, mean age = 28.9)(in total 30 participants) voluntarily took part in the experiment for a small fee.They were all unaware of the purpose of the experiment. None of the participantshad any self-reported speech or hearing deficits. Their L2 proficiency was ratedusing the Japanese Simple Performance-Oriented Test (SPOT-test) (Hatasa andTohsaku, 1997; Sakai, 2015) that measures Japanese usage ability (max. score =50, range between 0 – 48, M = 25.7, SD = 16.0) and the German C-Test (Colemanet al., 1994) that measures general language proficiency (max. score = 60, rangebetween 6 – 60, M = 41.2, SD = 13). All learners had being learning the L2 atleast 2 months (Japanese as L2: range between 2 – 216 months, M = 50.5 months,SD = 58.7 months; German as L2: range between 2 – 420 months, M = 111.3months, SD = 115.8 months). Two of the Japanese L2 speakers have never livedin a German-speaking country (length of stay ranged between 0 – 144 months,M = 22.4 months, SD = 39.0 months). Seven German L2 speakers had neverlived in Japan (length of stay ranged between 0 – 96 months, M = 27 months,SD = 31.7 months). All participants had learned English as an L2 and none ofthem had learned any another language with either lexical pitch accents or tones.This series of participant L2 related factors showed that the learners’ L2 learningbackground/history as well as their L2 proficiency ranged widely from beginnersto advanced learners. Based on the analysis of multicollinearity (Belsley et al.,1980), in which two or more predictor variables in a multiple regression model arehighly correlated and thus the coefficient estimates of the multiple regression maychange erratically and erroneously in response to small changes in the model or thedata, only the L2 test scores were used to analyze an effect of L2 proficiency onthe performance by L2 speakers in the experiment. Speaking ahead of results, L2proficiency did not predict their performance, so that the report on the analysiswith L2 proficiency is not included in this paper.

3.2 Materials

The target words used in the study were the very frequent Japanese and Germanwords sumimasen and Entschuldigung. Both words mean excuse me and can be usedin the same pragmatic context, for instance attention-seeking. Sumimasen containsa lexically specified pitch fall associated with the penultimate mora in the word, seand four-syllabic (and five-moraic). The word may be optionally realized with aninitial low (Gussenhoven, 2004; Vance, 1987), and an example of a typical F0 contourcan be seen in Figure 2. The German word Entschuldigung is also a four-syllabic

10

Page 13: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

word and the syllable schul is stressed.

Figure 2: A typical example contour for sumimasen produced by one of the Japaneseparticipants.

3.3 Design and Procedure

Following the procedure in Prieto and Roseano (2010), materials participants werepresented with descriptions of short scenes and their task was to produce the targetword in the given context, which was to attract a waiter’s attention in a crowdednoisy bar. The experiment was designed with Microsoft PowerPoint 2008 and waspresented on a Macintosh G3 laptop. Each context consists of four slides withthe following description written in the participants’ L1 with a picture of a typicalsituation:

Figure 3: Example of a slide shown to German participants

Slide 1 You are in a crowded noisy bar in Japan/Germany. Please call a waiter bysaying “Excuse me”.

Slide 2 He did not hear you. You are a little bit frustrated. Please try it again bysaying “Excuse me”.

Slide 3 He still did not notice it. You are very frustrated. Please try it as the lasttime by saying “Excuse me”.

11

Page 14: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Slide 4 He finally heard you and is coming to you. Great success!

In this way, we recorded the three attempts (the first attempt and two repe-titions) of the same words sumimasen or Entschuldigung. This 2 (languages) ×3(attempts) design allowed us to test how the strengthening of one aspect influencesthe coordination of prosodic properties ceteris paribus.

In addition, 16 filler contexts were provided in which participants were asked toproduce Japanese or German short sentences in certain emotional situations suchas with anger or irony. The contexts in the participants’ L1 were presented inthe first part of the experiment and those in their L2 in the second part. Thewhole experiment took about 20 minutes. Japanese participants were recruited andtested individually in a quiet room at the Tokyo University of Foreign Studies inJapan and German participants at the Ruhr-University Bochum and Heinrich-Heine-University Dusseldorf , both in Germany. Their responses were digitally recordedonto a computer (44.1kHz, 16Bit) using a unidirectional short-range microphone.After the experiment, they filled out the questionnaire and took the L2 proficiencytest.

3.4 Data preparation: F0 and time extraction

The prosodic analysis of this work was based on sampled F0 contours and segmentalboundaries. We were aware of the fact that vocal expressions of emotion vary alongother continuous phonetic dimensions, such as intensity or spectral tilt and voicequality (Banse and Scherer, 1996; Kienast and Sendlmeier, 2000). In the currentstudy, however, we focused on F0 and segmental durations because it was difficultto reliably measure intensity or spectral tilt without the unfixed distance betweenthe speaker’s mouth and microphone and voice quality without perceptual analyses.Moreover, a joint analysis of two functional data would have become difficult forthree properties (including intensity or spectral tilt) and voice quality data werenot functional data, meaning that these properties were not suitable for a jointfunctional analysis.

As the first step of the data preparation, the first author annotated the recordedraw data using Praat (Boersma and Weenink, 2011), applying standard segmenta-tion criteria (Turk et al., 2005). Since FPCA requires a continuous input in time,without containing holes in the F0 signal, the first fricatives s in sumimasen and schin Entschuldigung were cut off because F0 was absent in these voiceless segments.Moreover, the entire first syllable of Entschuldigung, ent, was also cut off due to thefact that some participants did not produce it, because, in German, the reducedforms Tschuldigung, Schuldigung are acceptable forms of Entschuldigung. Finally,the missing F0 signal along the second fricatives s in sumimasen was artificiallyreconstructed by linear interpolation.

Japanese data were segmented into morae and German into syllables. In detail,internal boundaries were placed as follows: s | u | mi | ma | se | n | for the Japanesematerial, and ent.sch | ul | di | gung | for the German material (vertical lines indicateboundaries and strike lines the cut segments). For nasals, the measurement startedwhen the amplitude in the waveform dropped and the waveform showed less highfrequency components (drop in high frequency energy in spectrogram). L2 data

12

Page 15: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

were even harder to annotate as L2 speakers showed greater variability in phoneticrealizations of segments (for example some Japanese L2 speakers produced [r] inEntschuldigung instead of [l]). Such deviant phonetic forms found in L2 speakerdata were treated as if they were produced as normative segments. Utterancescontaining hesitations and wrong words were discarded from the analysis (N = 8).In total, 172 utterances were used for the analysis: 86 for sumimasen and 86 forEntschuldigung.

After the annotation, F0 contours were computed using the F0 tracker availablein the Praat toolkit (Boersma and Weenink, 2011). A default range of 70-350 Hz formales and 100-500 Hz for females was used. These ranges were adjusted for specificspeakers in order to minimize obvious errors, such as octave jumps.

3.5 Functional Principal Component Analysis (FPCA)

In this study, the procedure introduced in Gubian et al. (2015) and in Gubian et al.(2011) was applied. This section provides a brief introduction to FPCA and themain concepts of FPCA in order to understand the work flow of this study, whileAppendix A provides mathematical details. Moreover, the code and the data usedin this work are available for download at the website maintained by the secondauthor (http://lands.let.ru.nl/FDA).

FPCA is a data-driven technique that produces a parametrization of a set ofcurves. In this work, curves are F0 contours extracted from a single word, but thesame technique can be applied to a longer stretch of speech, like a whole sentence(Gubian et al., 2011; Turco et al., 2011), as well as to other dynamic signals likeformants (Gubian et al., 2015). FPCA turns the rich information contained inthe curve dynamics into a reduced set of parameters called Principal Component(PC) scores. These scores can then be treated using ordinary statistical modelingmethods.

To date, there have been several attempts to apply data-driven (semi)automaticanalysis methods to F0 (e.g., the Fujisaki model, Fujisaki, 1992; polynomial equa-tions, Grabe et al., 2007; MOMEL, Hirst and Espesser, 1993 or Tilt, Taylor, 2000).For example, polynomial equations are a way to describe curves including F0 con-tours in a mathematical expression using variables and constants (Grabe et al.,2007). A fixed “dictionary” of orthogonal shapes (horizontal line, sloped line,parabola and cubic or wave) is used to define the shapes of F0. This method solvesthe problem simply for short curves, while longer ones may need a more flexible setof basic shapes.

There are two main advantages in using FPCA rather than other methods. First,the parameters that describe the curves are extracted automatically from the datathemselves with minimal intervention from the user. Second, the specific procedureused here, joint analysis, allows to capture relevant variation in F0 contours and insegmental duration at the same time.

The input to FPCA is a set of F0 contours, each one paired in the word with theposition of the relevant segmental boundaries (e.g., | ul | di | gung | for Entschuldigungas shown before). The output of FPCA is a set of scores, with each one associatedto a function called PC curve (or PC ), which encodes a systematic variation in

13

Page 16: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

the contour dynamics found in the input data set. For instance, Figure 10 in Sec-tion 4 shows the dynamic variation modulated by the PC1 score s1 for the wordEntschuldigung. As s1 varies from −1 to 1, F0 contours become flatter and, at thesame time, the utterance gets shorter, especially in the last syllable gung. Thismeans that an F0 contour extracted from Entschuldigung with the PC1 score of thevalue 1 is flatter and shorter than one scoring s1 = −1.

In general, more than one score is needed to capture sufficient information from adata set to characterize the data. PC scores are ordered according to the percentageof explained variance, in the same way as in ordinary PCA. In this work, we extractedthe first three PC scores (s1, s2 and s3) both from the sumimasen and from theEntschuldigung data sets (see Appendix A.3 on the distribution of variance acrossPCs).

Crucially, PC scores are computed disregarding any prior information. In ourcase, this means that both language groups L1 and L2, as well as utterances from1st, 2nd and 3rd attempt (cf. Section 3) were pooled together and FPCA did notuse this prior information to compute the scores. This allowed the construction ofLMER models where language group and number of attempt were used to predictPC scores without introducing any circularity.

4 Results

The data were analyzed with the purpose of characterizing performance differencesbetween L1 and L2 speakers in the attention-seeking task and of examining theresults against the hypotheses proposed in Section 2. To this aim, a set of LMERmodels were built where speaker group (L1 vs. L2) and number of attempt (1st, 2ndand 3rd as an ordered factor) were used to predict numeric quantities related to theshape of F0 contours and to segmental durations of the target word: sumimasenor Entschuldigung. These numeric quantities were the PC scores acquired from theapplication of FPCA to F0 contours and segmental durations.

The analysis on sumimasen and Entschuldigung was carried out separately andindependently. In each case, FPCA was applied on F0 contours and segmentaldurations jointly (cf. Section 3.5). We considered the first three PCs, in both cases,that collectively explained 68% of the variance for the sumimasen data and 73% ofthe variance for the Entschuldigung data.

Then, LMER models were built with PC scores (1, 2 or 3) as dependent mea-sures, speaker group and number of attempt as fixed factors, and participant as arandom factor including random slopes for the fixed factors (Barr et al., 2013; Cun-nings, 2012). P-values were calculated using the Satterthwaite approximation inthe R-package lmerTest. Statistical results are reported on the most parsimoniousmodel obtained by eliminating the factors that were not significant, if this did notdeteriorate the fit of the model, and by proceeding with backward elimination basedon log likelihood ratio tests. Whenever a significant interaction or a main effect wasfound in the LMER models, plots are shown where the meaning of the predicted PCscore is illustrated in terms of F0 contour shape and segmental durational changes.Finally, wherever an interaction between number of attempt and language group wasfound, multiple pairwise comparisons of each factor were ran using the R-package

14

Page 17: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

lsmeans.In the following, results from each of the 2 (words) × 3 (PC scores) = 6 LMER

models are reported in sequence. To simplify notation, PC scores are always denotedas PC1, PC2 or PC3 scores, or s1, s2 and s3 in plots, irrespective of which of thetwo independent FPCA analysis they come from.

4.1 Sumimasen

PC1: PC1 explained 31.9% of the variance. Figure 4 translates PC1 scores intoeffects on the shape of time-normalised F0 contours (top) and more durations (bot-tom). Both plots show the effect of PC1 score as varying between −1 and 1 in 5regular steps, which matches the variability range of Figure 5.

Figure 4 shows that PC1 modulated a variation of F0 that mainly affected therightmost part of the curves, corresponding to the more se and n and it had almostno effect on segmental durations. As s1 varied from 1 to −1, an F0 contour showeda steeper falling pitch fall in the rightmost part of the curve.

The LMER analysis showed a significant interaction between language group andnumber of attempt (ß = .70, SE = .29, t = 2.4, p < .05). Figure 5 shows meanPC1 scores and ±1 standard error bars for each attempt in each speaker group andconfirms the interaction found in the LMER analysis. The pairwise comparison ofnumber of attempt in each language group showed a significant difference betweenthe first and the third attempt by German L2 speakers (ß = .88, SE = .31, t = 2.8,p < .05). The pairwise comparison of language group in each number of attemptshowed a significant difference between Japanese and German speakers in all threeattempts (ß = 1.81, SE = .24, t = 7.5, p < .001 in the first attempt, ß = .93, SE= .36, t = 2.6, p < .03 in the second attempt, ß = .82, SE = .33, t = 2.5, p <.03 in the third attempt). While PC1 scores of Japanese L1 speakers were around-1 in all attempts, German L2 speakers’ scores decreased from 1 to 0.5 along theincreasing number of attempts with a significant difference between the first andthe third attempt. The difference between the language groups lied in all threeattempts, with a larger difference in the first attempt.

Taken Figure 4 and Figure 5 together, Japanese L1 speakers, whose PC1 scoreswere around -1 in all attempts, constantly produced F0 contours with a drastic pitchfall (coded as H?+L L-% in the Japanese ToBI, Venditti, 1997), which was a typicalphonetic form of a Japanese pitch accent. German L2 speakers, whose PC1 scoresdecreased from 1 to 0.5 along the increasing number of attempts, produced flat F0

contours in the first attempt (corresponding to the value of 1) and more falling F0

contours in the third attempt (corresponding to the value of 0.5). However, even inthe third attempt, the pitch fall produced by German L2 speakers was not as steepas that produced by Japanese L1 speakers and the pitch range produced by GermanL2 speakers was not as large as that produced by Japanese L1 speakers.

PC2: PC2 explained 20.9% of the variance. Figure 6 shows that PC2 modulatedboth a variation of segmental durations and F0 contours at the same time. As s2varied from 1 to −1, an utterance became longer, especially in the last mora with anucleus se, and at the same time, its F0 contour exhibited larger pitch range with asteeper pitch fall at the end of the contour. The durational lengthening appeared in

15

Page 18: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

TFigure 4: F0 contours (top) and morae durations (bottom) corresponding to PC1scores −1, −0.5, 0, 0.5 and 1 for the word sumimasen.

all segments, though the extent of the lengthening was the largest for the last morawith a nucleus se. The variation found in F0 contours was a phonetic change thatrelated to this utterance-final lengthening.

The LMER analysis showed a significant interaction between language groupand number of attempt (ß = -.50, SE = .16, t = -3.1, p < .01). This interaction isvisualized in Figure 7, showing the mean PC2 scores and ±1 standard error bars foreach attempt in each speaker group. Further, the pairwise comparison of numberof attempt in each language group showed a significant difference between the firstand the second attempt and between the first and the third attempt by JapaneseL1 speakers (ß = .60, SE = .15, t = 3.8, p < .001 and ß = .72, SE = .16, t = 4.5,p < .001 respectively). The pairwise comparison of language group in each numberof attempt showed a significant difference between Japanese and German speakers

16

Page 19: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

L1 L2

−1.0

−0.5

0.0

0.5

1.0

1 2 3 1 2 3attempt

s 1

PC1: Sumimasen

Figure 5: Mean PC1 scores (as dots) and ±1 standard error bars for the wordsumimasen for each attempt (1st, 2nd and 3rd) in each speaker group (L1 and L2).

in the first attempt (ß = -.56, SE = .24, t = -2.3, p < .03). While PC2 scoresof Japanese L1 speakers decreased from 0.4 to -0.3 along the increasing number ofattempts, German L2 speakers’ scores were located around 0 and did not changesignificantly between the attempts.

Taken together, Japanese L1 speakers lengthened their utterances, especially thelast mora with a nucleus se, producing a steeper pitch fall with a larger pitch rangealong the number of attempt. German L2 speakers did not show these phoneticmodifications in the repetitions.

PC3: PC3 explained 15.6% of the variance. Figure 8 shows that PC3 modulateda variation of F0 contours and segmental durations at the same time. As s3 variedfrom 1 to −1, an F0 contour became flatter in the initial part of the utterance (corre-sponding to the more u, mi and ma) and an utterance became longer accompaniedwith longer durations of the same initial moras and of the very last mora n.

The LMER analysis showed a significant main effect of speaker group (ß = 1.13,SE = .19, t = 5.8, p < .001) (see Figure 9). While PC3 scores of Japanese L1were located above the value of 0 (0.4 on average), German L2 speakers’ scores werelocated under the value of 0.

By interpreting the visual and statistical analyses together, Japanese L1 speakersproduced a typical phonetic form of a contour begin, called initial low (Gussenhoven,2004; Vance, 1987), while German L2 speakers did not. Moreover, German L2speakers generally produced longer utterances than Japanese L1 speakers, causedby the longer durations of all morae except for the pitch accented mora se.

4.2 Entschuldigung

PC1: PC1 explained 27.5% of the variance. Figure 10 shows that PC1 modulated avariation of F0 contours and of segmental durations at the same time. As s1 variedfrom 1 to −1, an F0 contour showed a steeper pitch fall after the stressed syllable ultoward the final syllable gung and the durations of all syllables became longer witha particularly larger lengthening of the last syllable gung. Moreover, the lower thescore was, the more likely it was that the utterance contained a falling contour atthe end.

The LMER analysis showed a significant interaction (ß = .82, SE = .24, t =

17

Page 20: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

TFigure 6: F0 contours (top) and morae durations (bottom) corresponding to PC2scores −1, −0.5, 0, 0.5 and 1 for the word sumimasen.

3.4, p < .01). Mean PC1 scores and ±1 standard error bars for each attempt ineach speaker group are shown in Figure 11, which visualizes the interaction foundin the LMER analysis. In order to better understand the nature of the interaction,data were split by speaker group. In both speaker groups, there was a main effectof number of attempt (ß = -1.17, SE = .19, t = -6.1, p < .001 for German L1speakers and ß = -.37, SE = .12, t = -3.0, p < .01 for Japanese L2 speakers).The pairwise comparison of number of attempt in each language group showed asignificant difference between the first and the second attempt and between the firstand the third attempt by German L1 speakers (ß = 1.30, SE = .24, t = 5.4, p <.001 and ß = 1.82, SE = .26, t = 7.1, p < .001 respectively) and between the firstand the third attempt by Japanese L2 speakers (ß = .65, SE = .23, t = 2.8, p <.01). The pairwise comparison of language group in each number of attempt showed

18

Page 21: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

L1 L2

−0.25

0.00

0.25

0.50

1 2 3 1 2 3attempt

s 2

PC2: Sumimasen

Figure 7: Mean PC2 scores (as dots) and ±1 standard error bars for the wordsumimasen for each attempt (1st, 2nd and 3rd) in each speaker group (L1 and L2).

a significant difference between German and Japanese speakers in the first attempt(ß = .64, SE = .29, t = 2.2, p < .03). Both speaker groups decreased their PC1scores in the repetitions, but the extent of the decrease was larger for German L1speakers than for Japanese L2 speakers.

Taken together, the F0 variations shown by German L1 speakers were larger thanthose shown by Japanese L2 speakers. In the first attempt, German L1 speakersproduced contours with a smaller pitch range on the stressed syllable, followed by astronger rising end with generally shorter segmental durations, while, in the secondand the third attempts, they produced a steeper falling pitch accent followed by a flator falling pitch end with longer utterance durations. Japanese L2 speakers also variedthe pitch range and the end of the contour as well as segmental durations in thesame tendency that German L1 speakers showed. However, the L2 speakers’ extentof change was significantly smaller than that of L1 speakers. In the repetitions,utterances produced by L1 and L2 speakers became similar owed by the fact thatboth speaker groups produced F0 contours with a steeper pitch fall at the end.

PC2: PC2 explained 23.4% of the variance. The LMER analysis did not showany significant interaction nor main effect.

PC3: PC3 scores explained 21.9% of the variance. Figure 12 shows that PC3modulated a variation of F0 contours and segmental durations at the same time.As s3 varied from 1 to −1, it became more likely that an F0 contour contained asteeper pitch fall followed by a rising end by exhibiting a larger pitch range afterthe stressed syllable. At the same time, the durations of the stressed syllable ul andthe following syllable di became longer.

The LMER analysis showed a significant main effect of speaker group (ß = .39,SE = .12, t = 3.4, p < .05) (see Figure 13). While PC3 scores of German L1 werelocated under the value of 0, Japanese L2 speakers’ scores were above the value of0.

Taken together, German L1 speakers produced a steeper pitch fall on the stressedsyllable with a larger pitch range which was followed by a rising contour end, whileJapanese L2 speakers produced a falling pitch accent followed by a gradual fall to-ward the end. This phonetic form produced by Japanese L2 speakers made a contourgenerally flatter than that produced by German L1 speakers. Compared to GermanL1 speakers, Japanese L2 speakers produced the last syllable gung generally longer,

19

Page 22: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

TFigure 8: F0 contours (top) and segmental durations (bottom) corresponding toPC3 scores −1, −0.5, 0, 0.5 and 1 for the word sumimasen.

preceded with shorter syllable durations of ul and di. The durational proportionof all syllables produced by Japanese L2 speakers differed from those produced byGerman L1 speakers.

5 Discussion

The experiment presented in this work investigated the integration of lexical andparalinguistic prosody in Japanese and German L1 and L2 production. Numeri-cal data output from FPCA were analyzed using LMER to detect the differencesbetween the L1 and L2 groups with respect to F0 contours and segmental durations.

Results showed both L1-dependent and -independent ways to convey paralin-

20

Page 23: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

L1 L2

−0.50

−0.25

0.00

0.25

0.50

1~3 1~3attempt

s 3

PC3: Sumimasen

Figure 9: Mean PC3 scores (as dots) and ±1 standard error bars for the wordsumimasen in each speaker group (L1 and L2).

guistic information by phonologically and phonetically modifying F0 contours andsegmental durations. In the L2 production, clear interferences from speakers’ L1 toL2 were found. The results shown in Section 4 are summarized along the hypothesesproposed in Section 2.

Hypothesis 1 predicted language-dependent prosodic modifications to conveyparalinguistic information. Japanese L1 speakers were predicted not to change F0

contours phonologically to convey paralinguistic information due to the lexical re-striction of Japanese pitch accent. German L1 speakers were predicted to phono-logically vary F0 contours in the repetitions as the variation of F0 contours signalsspeakers’ attitudinal and emotional states in German. Hypothesis 1 was confirmedby the findings from PC1 scores for sumimasen and PC1 scores for Entschuldigung.For Japanese L1 speakers, PC1 score for sumimasen, which modulated the phono-logical changes of F0, did not change across repetitions, showing that they constantlyproduced the same phonological F0 contours. As with German L1 speakers, PC1score for Entschuldigung decreased in the repetitions with a significant differencebetween the first and the third attempt, which means a higher likelihood to producea rising contour in the first attempt, i.e., the F0 forms known to signal politenessor friendliness (Baumann et al., 2001), and falling contours in the repetitions thatsignal frustration. This finding indicates that participants tried to pursue a responsein an unsuccessful communication by producing more falling F0 contours. The wayGerman L1 speakers varied F0 contours in the repetitions is in line with the currentknowledge about the paralinguistic use of German F0 (e.g., Baumann et al., 2001;Wichmann, 2000).

In agreement with the previous studies (Kitamura and Burnham, 2003; Naga-hara, 1994; Li et al., 2013; Maekawa, 2004; Ishihara, 2011; Venditti et al., 2008),our finding shows a strong restriction of lexical pitch accent for a non-lexical use ofJapanese prosody and indicates that the lexical prosody dominates the non-lexicalone, as supported by the phonological encoding in Levelt’s blueprint of the speaker(Levelt, 1989, 1999).

Hypothesis 2 predicted more exaggerated pitch contours with a larger pitch rangeand longer utterance durations in the repetitions irrespective of the speaker groups,because these changes are known as general prosodic modifications in hyperarticu-lated speech (e.g., Fergzsibm, 1964; Fernald, 1991; Katz et al., 1996; Oviatt et al.,

21

Page 24: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

TFigure 10: F0 contours (top) and syllable durations (bottom) corresponding to PC1scores −1, −0.5, 0, 0.5 and 1 for the word Entschuldigung.

1998; Papousek, 1992; Soltau and Waibel, 2000a,b; Stern et al., 1983; Trehub et al.,1993). This hypothesis was also confirmed by PC2 scores for sumimasen and PC1scores for Entschuldigung. Regarding the Japanese L1 speaker data, their PC2 scoresdecreased in the repetitions, showing that they produced larger pitch range in therepetitions. This phonetic change occurred together with the utterance lengtheningin the repetitions. Japanese L1 speakers lengthened all morae , with a particu-larly large lengthening on the last mora, with a nucleus se, which is known as anutterance-final lengthening that has been reported in a variety of languages (Lehiste,1972) (though not in a context of repetitions like our study investigated), includingin Japanese (Ota et al., 2003; Port et al., 1987; Warner and Arai, 2001). With re-spect to German L1 speaker data, PC1 scores for Entschuldigung modulated bothF0 and segmental durations. Their PC1 scores decreased in the repetitions, show-

22

Page 25: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

L1 L2

−1.0

−0.5

0.0

0.5

1.0

1 2 3 1 2 3attempt

s 1

PC1: Entschuldigung

Figure 11: Mean PC1 scores (as dots) and ±1 standard error bars for the wordEntschuldigung for each attempt (1st, 2nd and 3rd) in each speaker group (L1 andL2).

ing that they produced rising contours in the first attempt, moving toward fallingcontours preceded by a larger pitch range on the stressed syllable in the second andthird attempts. This change in F0 was accompanied with the change in segmentaldurations. All segments became longer in the repetitions, with a particularly largelengthening in the final syllable gung.

In place of changing an F0 contour phonologically, Japanese L1 speakers raisedF0 peaks and produced a steeper pitch fall, which resulted in a local pitch rangeexpansion with a dramatic rise at the beginning of the utterance and a steep fall onthe pitch accented mora. This modifications are line with the findings in the previousstudies on hyperarticulated speech in Japanese (e.g., Ishihara, 2011; Kubozono,2007; Nagahara, 1994; Venditti, 2000). As Venditti (2000) argues, Japanese L1speakers manipulated pitch range to cue many of the same discourse properties whichare cued by changing a phonological type of pitch accent in English or German.

Note that, in Japanese, segmental length (proportions of segmental durations)is lexically contrastive, i.e., it exhibits a lexical restriction, as Japanese F0 does.Despite this lexical restriction of segmental length, Japanese L1 speakers modifiedsegmental durations. This was possible because segmental length is determined rel-atively as proportions within a word, meaning that morae themselves can be length-ened while keeping their relative proportions. As opposed to segmental length,a lexical pitch contrast is perceptually categorical or dichotomous. Therefore, inJapanese, F0 seems to be more restricted than durations for a paralinguistic mod-ification. In our data, the word-final morae were lengthened more than the othermorae, because the utterance-final lengthening accelerated further lengthening. Thewords analyzed in the current study did not form any contrastive pair in terms oflength. A further question is whether the durational property could be modified ina similar way when a word builds a minimal pair in terms of length. In such a case,durations would have to be changed in a more careful way than in the case where aword does not build any minimal pair.

As with L2 speakers’ outcomes, Hypothesis 3 predicted that modifications ofF0 and segmental durations would be influenced by their L1: German L2 speakerswere predicted to vary F0 contours phonologically, even though this change wasnot linguistically allowed in Japanese. Japanese L2 speakers were predicted not to

23

Page 26: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

TFigure 12: F0 contours (left) and syllable durations (right) corresponding to PC3scores −1, −0.5, 0, 0.5 and 1 for the word Entschuldigung.

vary F0 contours in the repetitions and they were expected to produce a fallingpitch accent rather than a rising accent as the speakers were restricted with thelexical falling pitch accent. Regarding their modifications in segmental durations,utterance lengthening was expected to be found as a general modification as hasbeing reported in the previous studies.

Regarding L2 German speakers, the interaction between speaker group and num-ber of attempt in PC1 scores for sumimasen showed that they varied phonologicalforms of the Japanese pitch accent despite its lexical restriction. With respect tosegmental durations, they did not lengthen the utterances in the repetitions and pro-duced generally shorter utterances than Japanese L1 speakers did. In L2 Japanesespeakers’ data, we found the interaction between speaker group and number of at-tempt in PC1 scores for Entschuldigung. In other words, L2 Japanese speakers

24

Page 27: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

L1 L2

−0.2

−0.1

0.0

0.1

0.2

0.3

1~3 1~3attempt

s 3

PC3: Entschuldigung

Figure 13: Mean PC3 scores (as dots) and ±1 standard error bars for the wordEntschuldigung in each speaker group (L1 and L2).

produced a larger pitch range, more falling pitch contours and longer utterances inthe repetitions, though not as much as German L1 speakers did. Since PC1 scoresdid not modulate a phonological change of a pitch accent on the stressed vowel u,it can be still claimed that Japanese speakers’ faithfulness to producing a fallingpitch accent was transferred to their L2 production, because Japanese L2 speakersdid not phonologically vary pitch accents, but phonetically. However, as opposedto their L1 performance, Japanese L2 speakers varied boundary tones in the rep-etitions. The rising final boundary tone preceded with H? L-, which was foundin our data, is indeed an existing phonological pattern to signal interrogatives inJapanese (Fujisaki and Hirose, 1993). Asano (2015), who analyzed the same datamanually using the ToBI system, reported that German L1 speakers produced L?

-H% (rising pitch accent followed by a rising boundary tone), while Japanese L2speakers never produced a rising pitch accent, only a rising boundary tone with asteep pitch rise at the very last part of the utterance. Since the phonological formfound in the final boundary tone was a typical Japanese final boundary tone for aninterrogative sentence, we argue that they may have applied a strategy to produceJapanese interrogatives for a request (as our task was to ask a waiter to order abeer) in their German L2 production. To support this argument, it would be in ourfurther interest to analyze a clear declarative sentence as a control baseline.

In the FPCA analysis, this smaller pitch range and no occurrence of an L?

pitch accent in the L2 speaker data were captured as a rather phonetically marginalvariation of rising boundary tones compared to the variation produced by German L1speakers, thus contributing to the significant interaction in PC1 scores. Interestingly,these two types of final pitch rising produced by L1 and L2 speakers were classifiedby the annotators in Asano (2015) with two distinctive phonological categories, whileFPCA captured these changes as gradual phonetic changes without considering theperceptual categories. This also indicates that FPCA captures characteristics of F0

contours and segmental duration globally but not locally. The fact that this finalpitch rising produced by L2 speakers did not emerge as a separate PC from FPCAdenotes the main limitation of a data-driven tool like FPCA, where phenomenaare ranked with respect to their quantitative variation without taking perceptualcategories into account.

While the utterances produced by L1 speakers were longer in the repetitions

25

Page 28: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

and the greatest durational change by L1 speakers also took place in the word-finalsyllable sen and gung, the comparable durational changes were not observed in theL2 utterances. For sumimasen, German L2 speakers did not change segmental du-rations in the repetitions. For Entschuldigung, Japanese L2 speakers did not changethem as much as L1 speakers did. We interpret this difference between L1 and L2data with respect to the utterance lengthening as follows: It might be possible thatit was too difficult for L2 speakers to concentrate simultaneously on the appropriatecoordination of both F0 and segmental durations in the L2 utterances, given thatL2 processing is more demanding due to the increased cognitive load (e.g., Anto-niou et al., 2015) and given that hyperarticulation may consume cognitive resourcestoo, while working memory capacity is limited (Cowan, 1995). Consequently, fewercognitive resources remained available for additional control.

The finding that German L2 speakers did not lengthen their utterance, especiallythe final morae sen, even though they did in their L1 utterances, invites the followinginterpretations. One possible interpretation is that German L2 speakers stressedthe pitch accented morae sen by employing the German word stress rule that afinal heavy syllable of a word is stressed (Dohmas et al., 2014). When the nucleusvowel with the final syllable sen is produced as being metrically stressed (withhigher spectral tilt), it may be articulatory difficult to lengthen this syllable. Thedifficulty in achieving Japanese mora-timing rhythm in an L2 when the L1 of alearner is a stress-timed language has been reported in previous studies (e.g., Kondo,2005; Masuko and Kiritani, 1992). For example, Kondo (2005) reported that itwas difficult for English L1 speakers to separate the acoustic features of F0 andvowel duration with the manifestation of lexical stress, and thus they often failto manipulate and control two acoustic features of stress in English, i.e., vowelduration and F0 increase independently in L2 Japanese production. Only a few veryadvanced English speakers showed a good mora timing control in Japanese, greatlyincreasing the F0 for accented morae without increasing mora durations. Anotherinterpretation is that duration, in German, is a less dominant prosodic propertythan F0 to convey a paralinguistic meaning. This may be contrary to Japanese,where duration is interpreted as the more dominant property than F0, because theperception of a segmental duration within a word is relative to the proportions ofother segmental durations of the word and thus may be less restricted than F0.Japanese L2 speakers lengthened their utterances in their German L2 utterancebecause the durational property was a more dominant prosodic property in theirL1 (Japanese) to convey paralinguistic information. Since German L1 speakers alsovaried durations (as a less dominant property to change in German), Japanese L2speakers’ attempts to vary durations still appeared to be successful. In the caseof German L2 speakers, they changed a more dominant prosodic property in theirL1 (German) and varied F0 in the same way as Japanese L2 speakers did in theirL2 utterances. Differently to the case for Entschuldigung, however, Japanese L1speakers did not change F0, so the German L2 speakers’ attempts turned out tobe not successful. Our proposed argument is that L2 speakers are more prone tochange a more dominant prosodic property in their L1 to convey a paralinguisticmeaning, i.e., duration for Japanese and F0 for German.

Similarly to durational changes that were expected to be found in all speaker

26

Page 29: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

groups, it was predicted that a larger pitch range in the repetitions would be foundin all groups. This phonetic change was only observed in both L1 speaker groups,whereas L2 speakers either did not modify it or their change was smaller than L1speakers’ one. This finding that L2 speakers did not produce higher F0 peaks andlarger pitch ranges in the repetitions is in line with Urbani (2012) and Wennerstrom(1994, 1998). The papers reported that the utterances produced by L2 speakers borea narrower pitch range than those produced by L1 speakers. One possible interpre-tation for this finding is that L2 speakers generally feel unsure when speaking in anL2 and that they thus do not boldly expand pitch ranges. Another interpretationrelates to the aforementioned argument that L2 speakers had difficulties in appro-priately coordinating F0 and segmental durations simultaneously. Since GermanL2 speakers produced the Japanese pitch accent with a metrical stress accompa-nied with a generally shorter segmental duration, they might have not had enoughtemporal space to expand the pitch range. This appears to be especially true forGerman L2 data, because durations and F0 are dependent on each other in German(Beckman, 1986; Fujisaki et al., 1986).

The results in this study were discussed reflecting that the participants tried toconvey paralinguistic information, i.e., increased frustration due to the communi-cation failure, although such attitudinal changes of the participants are difficult tomeasure quantitatively (Nolan, 2008). It may be argued that the participants mighthave not felt stressed even in the second or third attempt, so that there was actu-ally no “increased frustration” to convey in the repetitions. First of all, we believethat the participants did not necessarily have to actually feel stressed to accomplishthe task appropriately, but it was sufficient for them to follow the task instructionsand produce the words as if they had felt stressed. In the repetitions, participantswere explicitly instructed to imagine being more frustrated due to the unsuccess-ful communication. Findings from the data support the claim that they did so:The multiple pairwise comparison of number of attempt showed that a differencebetween the attempts in each language group mainly lied in the contrast betweenthe first and the second or the third attempt, but not between the second and thethird attempt. More specifically, the lengthening of the utterances were generallyobserved in the second and the third attempts, which is known as a general prosodicadaptation in hyperarticulated speech. This can be evidence for signaling the in-creased frustration in the repetitions. Further, rising pitch accents and boundarytones were mostly found in the first attempt both in the German participants’ L1and L2 utterances. Such rising contours are known to signal politeness and friendli-ness (Baumann et al., 2001). Our data therefore suggest that German participantssignaled politeness and friendliness in the first attempt by producing rising contours.In the repetitions, participants tried to pursue a response with falling F0 contourswhich may sound less polite (Baumann et al., 2001). Finally, Japanese L1 speakersshowed a significantly higher pitch followed by a steeper pitch fall exhibiting a largerpitch range in the second and the third attempt compared to the first attempt. Allthese findings were phonetic and phonological changes due to the hyperarticulationcaused by the pragmatic manipulation. One could still argue that even the firstattempt already included a certain speech act (e.g., calling) (e.g., Searle, 1969) andparticipants hyperarticulated the utterance in the first attempt, because the task

27

Page 30: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

description at the beginning of the task indicated that the bar was noisy. Taken thefinding that a main difference between the attempts lied in the contrast betweenthe first attempt and the repetitions, the degrees of the reinforcement between thefirst attempt and the repetitions were still significantly different, and the nature ofthe hyperarticulation in the first attempt and in the repetitions was different. Nev-ertheless, in order to clearly rule out this argument, it would be worth including acontrol baseline condition in future work in which participants produce an utterancewithout eliciting any intention to hyperarticulate an utterance.

Furthermore, the multiple pairwise comparison of speaker group further showedthat a difference between the two language groups mainly lied in the first attempt.This result lends itself to the interpretation that the L2 speakers had difficultiesin producing appropriate L2 prosody independently from conveying the attitudinalchange (= increased frustration). With respect to L1 and L2 speakers’ PC scoresin the repetitions that often did not significantly differ from each other, we still donot feel confident in drawing the conclusion that Japanese L1 speakers and GermanL2 speakers or German L1 speakers and Japanese L2 speakers applied the samestrategies to hyperarticulate utterances in the repetitions. The reason for this is thatboth groups showed different patterns of changes across the attempts. Therefore,their PC scores in the repetitions can happen to be so similar, still being related todifferent strategic reasons.

What may be further included as future cases in the line of research would bethe potential relation of acoustic strategy and oral communicative skills in L2. TheL2 proficiency tests used in our study were claimed to measure general languageproficiency including oral skills (Oller, 1976). The reason why we did not find anyeffect of L2 proficiency on the learners’ performance in this study may be due to thelarge range in the learners’ L2 proficiency level and their further variable learningbackgrounds. In our future research, more controlled learner groups with clearlydifferent L2 proficiency levels may be tested.

Finally, our findings are especially noteworthy as they were found in highlyfrequent words, in both Japanese and German, (“excuse me”) that L2 speakersshould have encountered very often before. Despite a rich amount of input in theirL2, they still failed to produce appropriate forms of L2 prosody, suggesting thedifficulties to acquire L2 prosody. These findings motivated us to conduct furtherperception and production experiments to determine the source of these deviant L2production forms in L2 perception, storage and production (Asano, 2014, 2016).

One of the novelties of this study was the use of the advanced statistical methodFPCA to analyze F0 data. This semi-automatic data analysis method was expectedto reduce the number of subjective decisions involved in assigning a ToBI categoryor in a local phonetic measurement. Moreover, a further advantage in the use ofFPCA was in enabling us to analyze whole F0 data by capturing the most distinctiveimportant characters to define contour differences. Finally, the joint analysis of F0

and segmental durations made it possible to observe interactive changes in F0 andsegmental durations, which is otherwise difficult to investigate in a single manualanalysis.

The outcomes obtained from FPCA were in line with our own previous investiga-tions with manual annotation using a ToBI and manual measurement of durational

28

Page 31: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

changes (Asano, 2016, Chapter 2). In addition, FPCA provided more insightfulfindings, showing relationships between the change in F0 and segmental durationsproviding numerical data to conduct statistical analyses on whole contours. FPCAcould identify the most distinctive and important phonetic and phonological char-acters for a typical Japanese utterance (e.g., an initial low and a falling pitch ac-cent)(Gussenhoven, 2004; Vance, 1987) without any prior information about theirimportance. A further advantage of FPCA was found in analyzing L2 data. L2 datahave been otherwise reported to be difficult to annotate using the ToBI (Asano, 2016;Asano et al., 2016) due to its more creative and dynamic characteristics of an inter-language containing deviant and unexpected phonetic and phonological realizations(Selinker, 1972). Indeed, several difficulties have been reported when using ToBIannotation for the analysis of L2 data (Asano, 2016; Asano et al., 2016). First,one selected ToBI system (either of an L1 or of an L2) can hardly cover all de-viant variabilities of L2 data that represent the interlanguage characteristics. Alanguage-specific ToBI system does not account for the deviant L2 realizations withrespect to L1, so unexpected contours might be hard or even impossible to describeusing a single ToBI system of either the speakers’ L1 or L2. Second, the annota-tors’ language backgrounds may influence the annotation. Should an annotator bean L1 listener of German when annotating German utterances produced JapaneseL2 speakers? Or is it better when the annotator is a Japanese L1 listener? Thesedifficulties and problems reported in the previous annotation analyses demonstratethe appeal of employing a different analysis method, as we did in the current studyby using FPCA.

However, FPCA has its limitations, like any method does. As briefly discussedbefore, FPCA does not take perceptual categorical thresholds into account. It istherefore important to compare the statistical results and the visual informationprovided in the FPCA plots with original data to correctly interpret results andto ensure that a statistical analysis corresponds to an interpretation based on ourlinguistic knowledge. Moreover, FPCA does not differentiate perceptually importantlocal parts of continuous F0 data. In future works, it may be worth exploringmodifications of FPCA where user-defined weights can be applied to single parts ofa contour to tell the system which parts are perceptually more decisive for comparingcontours. In this way, small but perceptually crucial differences in a portion of F0

contour may not emerge from automatic modeling.

6 Conclusions

The coordination of lexical and paralinguistic prosody in L1 and L2 hyperarticula-tion was examined using the joint analysis of FPCA on F0 and segmental durations.Lexical prosody outweighed paralinguistic prosody and L1 speakers conveyed par-alinguistic information without breaking the restriction of lexical prosody. Moreover,different dominance orders of prosodic properties may exist in Japanese and Germanaccording to which prosodic property was more likely to be modified in hyperartic-ulated speech. L2 speakers were more prone to change a more dominant prosodicproperty in their L1 to convey a paralinguistic meaning, i.e., segmental durationsfor Japanese and F0 for German. The interferences from L1 to L2 showed that it

29

Page 32: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

was difficult for L2 speakers to coordinate F0 and segmental durations in hyper-articulated speech. The results reported in this study are especially telling as theanalyzed words were very frequent words, confirming the difficulties encountered inthe acquisition of L2 prosody despite a rich amount of L2 input.

30

Page 33: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

References

Altmann, H. and Kabak, B. (2011). Second Language Phonology: L1 transfer,universals, and processing from segments to stress. In Bert, B., Kula, N., andNasukawa, K., editors, The Continuum Companion to Phonology, pages 298–319.Continuum, London & New York.

Antoniou, M., Wong, P. C. M., and Wang, S. (2015). The effect of intensifiedlanguage exposure on accommodating talker variability. Journal of Speech andHearing Research, 58(3):722–727.

Asano, Y. (2014). Stability in perceiving non-native segmental length contrasts. InProceedings of the 7th International Conference on Speech Prosody, pages 321–325,Dublin, Ireland.

Asano, Y. (2015). Coordination of lexical and paralinguistic F0 in L2 production.In Proceedings of the 18th International Congress of Phonetic Sciences, pages577–560, Glasgow, UK.

Asano, Y. (2016). Localising Foreign Accents in Speech Perception, Storage andProduction. PhD thesis, University of Konstanz, Konstanz, Germany.

Asano, Y., Gubian, M., and Sasha, D. (2016). Cutting down on manual pitchcontour annotation using data modeling. In Proceedings of the 8th InternationalConference on Speech Prosody, Boston, MA.

Banse, R. and Scherer, K. (1996). Acoustic profiles in vocal emotion expression.Journal of Personality and Social Psychology, 70(3):614–636.

Barr, D. J., Levy, R., Scheepers, C., and Tily, H. J. (2013). Random effects structurefor confirmatory hypothesis testing: Keep it maximal. Journal of Memory andLanguage, 68(3):255–278.

Baumann, S., Becker, J., Grice, M., and Mucke, D. (2007). Tonal and articulatorymarking of focus in German. In Proceedings of the 16th International Congress ofthe Phonetic Sciences, pages 1029–1032, Saarbrucken, Germany.

Baumann, S. and Grice, M. (2006). The intonation of accessibility. Journal ofPragmatics, 38:1636–1657.

Baumann, S., Grice, M., and Benzmuller, R. (2001). GToBI – a phonological systemfor the transcription of German intonation. In Puppel, S. and Demenko, G.,editors, Prosody 2000: Speech Recognition and Synthesis, 21–28. Adam MickiewiczUniversity.

Beckman, M. E. (1986). Stress and Non-Stress Accent. Foris Publications, Dor-drecht, the Netherlands.

Beckman, M. E., Hirschberg, J., and Shattuck-Hufnagel, S. (2005). The originalToBI system and the evolution of the ToBI framework. In Jun, S.-A., editor,

31

Page 34: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Prosodic Typology – The Phonology of Intonation and Phrasing. Oxford UniversityPress, Oxford, UK.

Belsley, D. A., Kuh, E., and Welsch, R. E. (1980). Regression Diagnostics. Identify-ing Influential Data and scourses of Colinearity. Wiley Series in Probabiliby andMathematical Statistics. Wiley, New York, NY.

Best, C. T. (1995). A direct realist view of cross-language speech perception. InStrange, W., editor, Speech Perception and Linguistic Experience. York Press,Timonium MD.

Best, C. T., McRoberts, G. W., and Goodell, E. (2001). Discrimination of nonna-tive consonant contrasts varying in perceptual assimilation to the listeners nativephonological system. Journal of Acoustical Society of America, 109:775–794.

Best, C. T. and Tyler, M. D. (2007). Nonnative and second-language speech percep-tion: Commonalities and complementarities. In Munro, M. J. and Bohn, O.-S.,editors, Second Language Speech learning: The role of language experience inspeech perception and production, pages 13–34. John Benjamins, Amsterdam, theNetherlands.

Boersma, P. and Weenink, D. (2011). Praat: doing phonetics by computer [computerprogram] version 5.2.20.

Braun, B. (2006). Phonetics and phonology of thematic contrast in German. Lan-guage and Cognitive Processes, 26:224–235.

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005). Adatabase of German emotional speech. In Proceedings of Interspeech 2005, Lisbon,Portugal.

Chen, A. (2005). Universal and Language-Specific Perception of Paralinguistic In-tonational Meaning. LOT, Utrecht, the Netherlands.

Chen, A., Gussenhoven, C., and Rietveld, T. (2004). Language-specificity in the per-ception of paralinguistic intonational meaning. Language and Speech, 47(4):311–349.

Chen, A. and Mennen, I. (2008). Encoding interrogativity intonationally in a secondlanguage. In Proceedings of the 4th International Conferences on Speech Prosody,pages 513–516, Campinas, Brazil.

Coleman, J. A., Ruediger, G., and Raatz, U., editors (1994). University LanguageTesting and the C-Test. AKS-Verlag Bochum, Bochum, Germany.

Cowan, N. (1995). Attention and Memory: An Integrated Framework. Number 26in Oxford Psychology Series. Oxford University Press, New York [etc.].

Cunnings, I. (2012). An overview of mixed-effects statistical models for secondlanguage researchers. Second Language Research, 28(3):369–382.

32

Page 35: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Cutler, A. and Otake, T. (1999). Pitch accent in spoken-word recognition inJapanese. Acoustical Society of America, 105(3):1877–1888.

De Boor, C. (2001). A Practical Guide to Splines. Springer, New York, NY.

De Bot, K. (1992). A bilingual processing model: Levelt’s speaking model adapted.Applied Linguistics, 13:1–24.

Dohmas, U., Plag, I., and Carroll, R. (2014). Word stress assignment in German,English and Dutch: Quantitysensitivity and extrametricality revisited. Journalof Computational German Linguistics, 17:59–96.

Els, T. and Bot, K. (1987). The role of intonation in foreign accent. ModernLanguage Journal, 71(2):147–155.

Fergzsibm, C. (1964). Baby talk in six languages. American Anthropologist, 66:103–114.

Fernald, A. (1989). Intonation and communicative intent in mothers’ speech toinfants: Is the melody the message? Child Development, 60:1497–1510.

Fernald, A. (1991). Prosody in speech to children: Prelinguistic and linguisticfunctions. Annals of Child Development, 8:43–80.

Fernald, A. and Simon, T. (1984). Expanded intonation contours in mothers’ speechto newborns. Developmental Psychology, .20:104–113.

Fery, C. (1993). German Intonational Patterns. Niemeyer, Tubingen, Germany.

Fery, C. and Ishihara, S. (2009). How focus and givenness shape prosody. In Zimmer-mann, M. and Fery, C., editors, Information Structure: Theoretical, Typological,and Experimental Perspectives, volume 1, pages 36–63. Oxford University Press,Oxford, UK.

Flege, J. E. (1999). Age of learning and second-language speech. In Birdsong, D.,editor, Second Language Acquisition and the Critical Period Hypothesis, pages101–132. Lawrence Erlbaum Associates, Hilsdale, NJ.

Flege, J. E., MacKay, I., and Piske, T. (2002). Assessing bilingual dominance.Applied Psycholinguistics, 23:567–598.

Flege, J. E., Munro, M. J., and Mackay, I. (1995). Effects of age of second-languagelearning on the production of English consonants. Speech Communication, 16:1–26.

Fujisaki, H. (1992). Modeling the process of fundamental frequency contour gener-ation. In Tohkura, Y., Vatikiotis-Bateson, E., and Sagisaka, Y., editors, SpeechPerception, Production and Linguistic Structure, pages 313–328. Ohmsha, Tokyo,Japan.

33

Page 36: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Fujisaki, H. and Hirose, K. (1993). Analysis and perception of intonation expressingparalinguistic information in spoken Japanese. In ISCA Workshop on Prosody,pages 254–257, Lund, Sweden.

Fujisaki, H., Hirose, K., and Sugito, M. (1986). Comparison of acoustic featuresof word accent in English and Japanese. Journal of Acoustical Society of Japan,7(1):59–63.

Gass, S. and Selinker, L. (1994). Second Language Acquisition: An IntroductoryCourse. Lawrence Erlbaum Associates, Mahwah, NJ.

Gibbon, D. (1998). Intonation in German. In Hirst, D. and Cristo, A., editors, Into-nation systems: a survey of twenty languages, chapter 4, pages 78–95. CambridgeUniversity Press, Cambridge, MA.

Grabe, E., Kochanski, G., and Coleman, J. (2007). Connecting intonation labelsto mathematical descriptions of fundamental frequency. Language and Speech,50(3):281–310.

Garding, E. (1981). Contrastive prosody: A model and its application. StudiaLingusitica, 35:146–165.

Grice, M. and Bauman, S. (2007). An introduction to intonation - functions andmodels. In Trouvain, J. and Gut, U., editors, Non-Native Prosody: PhoneticDescription and Teaching Practice, chapter 1, pages 25–52. Mouton De Gruyter,Berlin & New York.

Grice, M. and Baumann, S. (2002). Deutsche intonation und GToBI. LinguistischeBerichte, 191:267–298.

Grice, M., Baumann, S., and Benzmuller, R. (2005). German intonation inautosegmental-metrical phonology. In Sun-Ah, J., editor, Prosodic Typology. ThePhonology of Intonation and Phrasing, pages 55–83. Oxford University Press, Ox-ford, UK.

Grice, M., Reyelt, M., Benzmuller, R., Mayer, J., and Batliner, A. (1996). Con-sistency in transcription and labelling of German intonation with GToBI. InProceedings of the 4th International Conference on Spoken Language Processing,Philadelphia, PA.

Grieser, D. L. and Kuhl, P. K. (1988). Maternal speech to infants in a tonal language:Support for universal prosodic features in motherese. Developmental Psychology,24:14–20.

Gubian, M., Boves, L., and Cangemi, F. (2011). Joint analysis of F0 and speechrate with Functional Data Analysis. In Proceedings of the 36th InternationalConference on Acoustics, Speech and Signal Processing, pages 4972–4975, Prague,Czech Republic.

34

Page 37: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Gubian, M., Torreira, F., and Boves, L. (2015). Using functional data analysis forinvestigating multidimensional dynamic phonetic contrasts. Journal of Phonetics,49:16–40.

Gussenhoven, C. (2004). The phonology of tone and intonation. Research surveysin linguistics. Cambridge University Press, Cambridge; New York.

Gut, U. (2003). Prosody in second language speech production: the role of thenative language. Fremdsprachen Lehren und Lernen, 32:133–152.

Hatasa, Y. A. and Tohsaku, Y. (1997). Spot as a placement test. Technical report,University of Hawaii Press, Hawaii.

Hayashi, R., Imaizumi, S., Niimi, S., Ueno, S., and Kiritani, S. (2000). Role of pitch-accent in the word recognition using a cross-modal priming task. In Proceedingof the Acoustical Society of Japan, pages 1–8, Tokyo, Japan.

Hirst, D. and Espesser, R. (1993). Automatic modelling of fundamental frequencyusing a quadratic spline function. In Travaux de l’Institut de Phonetique d’Aix,volume 15, pages 75–85, Aix-en-Provence, France.

Ishihara, S. (2011). Japanese focus prosody revisited: Freeing focus from prosodicphrasing. Lingua, 121(13):1870–1889.

Jilka, M., Anufryk, V., Baumotte, H., Lewandowski, N., Rota, G., and Reiterer, S.(2007). Assessing individual talent in second language production and percep-tion. In Rauber, A. S., Watkins, M. A., and Baptista, B. O., editors, New Sounds2007: Proceedings of the 5th International Symposium on the Acquisition of Sec-ond Language Speech, pages 243–258, Florianopolis, Brazil. Federal University ofSanta Catarina.

Jun, S.-A. and Oh, M. (2000). Acquisition of second language intonation. In Proceed-ings of the 6th International Conference on Spoken Language Processing, Beijing,China.

Katz, G., Cohen, J., and Moore, C. (1996). A combination of vocal F0 dynamicand summary features discriminates between three pragmatic categories of infant-directed speech. Child Development, 67:205–217.

Kienast, M. and Sendlmeier, W. (2000). Acoustical analysis of spectral and temporalchanges in emotional speech. In Proceedings of the ISCA-Workshop on “On Speechand Emotion” in Belfast, pages 92–97, Newcastle, Northern Ireland, UK.

Kitamura, C. and Burnham, D. (2003). Pitch and communicative intent in mother’sspeech: Adjustment for age and sex in the first year. Infancy, 41(1):85–110.

Kitamura, C., Thanavisuth, C., Burnham, D., and Luksaneeyanawin, S. (2001).Universality and specificity in infant-directed speech: Pitch modifications as afunction of infant age and sex in a tonal and non-tonal language. Infant Behaviorand Development, 24:372–392.

35

Page 38: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Kohler, K. (2005). Timing and communicative functions of pitch contours. Phonet-ica, 62(2–4):88–105.

Kohler, K. (2008). Prosody in speech interaction – expression of the speaker andappeal to the listener. In Proceeding of the 8th Conference of China and theInternational Symposium on Phonetic Frontiers. Beijing, China.

Kondo, M. (2005). Strategies for acquiring japanese prosody by english speakers. InProceedings of Phonetics Teaching and Learning Conference 2005, pages pdf23,1–4, London, UK.

Kormos, J. (2006). Speech Production and Second Language Acquisition. LawrenceErlbaum Associates, New York & London.

Kormos, J. (2012). Sentence production in a second language. In The Encyclopediaof Applied Linguistics. Blackwell Publishing Ltd, Oxford, UK.

Kraehenmann, A. (2001). Swiss German stops: Geminates all over the word. Phonol-ogy, 18(1):109–145.

Kraehenmann, A. and Lahiri, A. (2008). Duration differences in the articulation andacoustics of Swiss German word-initial geminate and singleton stops. AcousticalSociety of America, 123(6):4446–4455.

Kubozono, H. (1993). The Organization of Japanese Prosody. Kurosio Publishers,Tokyo, Japan.

Kubozono, H. (1999). Mora and syllable. In Tsujimura, N., editor, The handbookof Japanese Linguistics, pages 31–61. Blackwell Publishers, Oxford, UK.

Kubozono, H. (2007). Focus and intonation in Japanese: Does focus trigger pitchreset? In Ishihara, S., editor, Proceedings of the 2nd Workshop on Prosody, Syn-tax, and Information Structure (WPSI 2)in the series of Interdisciplinary Studieson Information Structure, volume 9, pages 1–27, Potsdam, Germany.

Kubozono, H. (2011). Japanese pitch accent. In van Oostendorp, M., Ewen, C. J.,Hume, E., and Rice, K., editors, The Blackwell Companion to Phonology, pages2879–2907. Blackwell Publishing Ltd, Oxford, UK.

Kuhl, P. (1991). Human adults and human infants show a ‘perceptual magneteffect’ for the prototypes of speech categories, monkeys do not. Perception &Psychophysics, 50(2):93–107.

Kuhl, P. and Iverson, P. (1995). Linguistic experience and ‘perceptual magnet effect’.In Strange, W., editor, Speech Perception and Linguistic Experience. York Press,Timonium MD.

Ladd, D. Robert, J. (1978). Stylized intonation. Language, 54(3):517–540.

Lehiste, I. (1972). The timing of utterances and linguistic boundaries. Journal ofAcoustical Society of America, 51:2018–2024.

36

Page 39: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Levelt, W. (1989). Speaking. From Intention to Articulation. MIT Press, Cambridge,MA.

Levelt, W. (1999). Producing spoken language: A blueprint of the speaker. InBrown, C. and Hagoort, P., editors, The Neurocognition of Language, chapter 4,pages 83–122. Oxford University Press, Oxford, UK.

Li, A.-J., Jia, Y., Fang, Q., and Dang, J.-W. (2013). Emotional intonation modeling:A cross-language study on Chinese and Japanese. In Signal and InformationProcessing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific, pages 1–6, Kaohsiung, Taiwan.

Liscombe, J. (2007). Prosody and Speaker State: Paralinguistics, Pragmatics, andProficiency. PhD thesis, Columbia University, New York, NY.

Lombard, E. (1911). Le signe de l’elevation de la voix. Annales des Maladies del’Oreille et du Larynx, 37:101–119.

Maekawa, K. (2004). Production and perception ‘paralinguistic’ information. InProceedings of Speech Prosody 2004, volume 03, pages 367—374, Nara, Japan.

Masuko, S. and Kiritani, S. (1992). Nihongo gakushuusha ni okeru moora onso noshuutoku ni tsuite [acquisition of morae by L2 learners of Japanese. In NihongoKyouikugakkai Shunki Taikai Yokoushuu, pages 19–24, Tokyo, Japan.

Mennen, I. (2004). Bi-directional interference in the intonation of Dutch speakersof Greek. Journal of Phonetics, 32(4):543–563.

Mennen, I. (2007). Phonological and phonetic influences in non-native intonation.In Gut, U. and Trouvain, J., editors, Non-native Prosody: Phonetic Descriptionsand Teaching Practice, pages 53–76. Mouton De Gruyter, Berlin & New York.

Mennen, I., Chen, A., and Karlsson, F. (2010). Characterising the internal structureof learners intonation and its development over time. In New sounds : Interna-tional Symposium on the Acquisition of Second Language Speech, pages 319–324,Poznan, Poland.

Nagahara, H. (1994). Phonological Phrasing in Japanese. PhD thesis, University ofCalifornia, Los Angeles.

Niikura, M., Sugawara, T., and Hirschfeld, U. (2011). A problem of prosodic trans-fer in the perception of German learners of Japanese and Japanese learners ofGerman. In Proceedings of the 17th International Congress of Phonetic Sciences,pages 97–98, Hong-Kong, China.

Nolan, F. (2008). Intonation. In Aarts, B. and McMahon, A., editors, Handbook ofEnglish Linguistics, chapter 19. Blackwell Publishing Ltd, Oxford, UK.

Ofuka, E., McKeown, J., Waterman, M., and Roach, P. (2000). Prosodic cues forrated politeness in Japanese speech. Speech Communication, 32:199–217.

37

Page 40: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Oller, J. W. J. (1976). Evidence for a general language proficiency factor: Anexpectancy grammar. Die Neueren Sprachen, 75(2):165–174.

Ota, M., Ladd, D., and Tsuchiya, M. (2003). Effects of foot structure on moraduration in Japanese? In Proceedings of the 15th International Conference onPhonetic Sciences, pages 459–462, Barcelona, Spain.

Oviatt, S., Levow, G., Moreton, E., and MacEachern, M. (1996). Modeling globaland focal hyperarticulation during human-computer error resolution. Journal ofAcoustical Society of America, 104(5):3080–98.

Oviatt, S., MacEachern, M., and Levow, G. (1998). Predicting hyperarticu-late speech during human-computer error resolution. Speech Communication,24(2):87–110.

Oyama, S. (1976). A sensitive period for the acquisition of a nonnative phonologicalsystem. Journal of Psycholinguistic Research, 5:261–285.

Paeschke, A. (2004). Global trend of fundamental frequency in emotional speech.In Proceedings of Speech Prosody 2004, pages 671–674, Nara, Japan.

Paeschke, A., Kienast, M., and Sendlmeier, W. (1999). F0-contours in emotionalspeech. In Proceedings of the 16th International Congress of Phonetic Sciences,pages 929–932, San Francisco, USA.

Paeschke, A. and Sendlmeier, W. (2000). Prosodic characteristics of emotionalspeech: Measurements of fundamental frequency movements. In Proceedings of theISCA-Workshop on “On Speech and Emotion” in Belfast, pages 75–78, Newcastle,Northern Ireland, UK.

Papousek, H. (1992). Early ontogeny of vocal communication in parent-infant in-teractions. In Papousek, H., Jurgens, U., and Papousek, editors, Nonverbal vocalcommunication, pages 230–282. Cambridge University Press, New York, NY.

Pierrehumbert, J. B. and Hirschberg, J. (1990). The meaning of intonational con-tours in the interpretation of discourse. In Cohen, P. R., Morgan, J., and Pollack,M. E., editors, Intentions in Communication, pages 271–311. MIT Press, Cam-bridge.

Pike, K. L. (1946). The Intonation of American English. University of MichiganPublications, Michigan.

Pirker, H. and Loderer, G. (1999). “I said two ti-ckets”: How to talk to a deafwizard. In ETRW on Dialogue and Prosody, Veldhoven, the Netherlands.

Port, R. F., Dalby, J., and O’Dell, M. (1987). Evidence for mora timing in Japanese.Journal of Acoustical Society of America, 81:1572–1585.

Prieto, P. and Roseano, P., editors (2010). Transcription of Intonation of the SpanishLanguage. Lincom Europa, Munich.

38

Page 41: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Raithel, V. and Hielscher-Fastabend, M. (2004). Emotional and linguistic perceptionof prosody. Folia Phoniatrica et Logopaedica, 56:7–13.

Ramsay, J. O., Hookers, G., and Graves, S. (2009). Functional Data Analysis withR and MATLAB. Springer Verlag, New York, NY.

Ramsay, J. O. and Silverman, B. W. (2009). Functional Data Analysis. Springer,New York, NY.

Sakai, T. (2015). TTBJ. In Li, Z., editor, Guidebook for Japanese Language Testingin Japanese as L2 (Nihongokyouiku no tame no Gengo Testo Gaido Bukku), pages86–109. Kurosio Publishers, Tokyo, Japan.

Searle, J. R. (1969). Speech Acts: An Essa in the Philosophy of Language. CambridgeUniversity Press, Cambridge; NY.

Sekiguchi, T. and Nakajima, Y. (1999). The use of lexical prosody for lexical accessof the Japanese language. Journal of Psycholinguistic Research, 28(4):439–454.

Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics,10:209–241.

Shattuck-Hufnagel, S. and Turk, A. (1996). A prosody tutorial for investigators ofauditory sentence processing. Journal of Psycholinguistic Research, 25(2):193–247.

Shochi, T., Erickson, D., Rilliard, A. Auberge, V., and Martin, J.-C. (2008). Recog-nition of Japanese attitudes in audio-visual speech. In Barbosa, P. A., Madureira,S., and Reis, C., editors, Proceedings of Speech Prosody 2008, pages 689–692,Campinas, Brazil.

Shochi, T., Sekiyama, K., Lees, N., Boyce, M., Gocke, R., and Burnham, D. (2009).Auditory-visual infant directed speech in Japanese and English. In Theobald,B.-J. and Harvey, R., editors, International Conference on Audio-Visual SpeechProcessing, pages 107–112, Norwich, UK.

Sikveland, R. O. (2006). How do we speak to foreigners?—phonetic analyses ofspeech communication between l1 and l2 speakers of norwegian. In ProceedingFonetik. Centre for Language and Literature Lund University, pages 109–112,Lund, Sweden.

Soltau, H. (2005). Compensating Hyperarticulation for Automatic Speech Recogni-tion. PhD thesis, University of Karlsruhe, Karlsruhe, Germany.

Soltau, H. and Waibel, A. (2000a). Phone dependent modeling of hyperarticulatedeffects. In Proceeding of the International Conference on Spoken Language Process,pages 105–108, Beijing, China.

Soltau, H. and Waibel, A. (2000b). Specialized acoustic models for hyperarticu-lated speech. In IEEE International Conference on Acoustics, Speech, and SignalProcessing, pages 1779–1782, Istanbul, Turkey.

39

Page 42: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Stent, A. J., Huffman, M. K., and Brennan, S. E. (2008). Adapting speaking afterevidence of misrecognition: Local and global hyperarticulation. Speech Commu-nication, 50(3):163–178.

Stern, D., Spieker, S., Barnett, R., and MacKain, K. (1983). The prosody of mater-nal speech: Infant age and context related changes. Journal of Child Language,10:1–15.

Stivers, T. and Rossano, F. (2010a). Mobilising response. Research on Languageand Social Interaction, 43:3–31.

Stivers, T. and Rossano, F. (2010b). A scalar view of response relevance. Researchon Language and Social Interaction, 43:49–56.

Taylor, P. (2000). Analysis and synthesis of intonation using the tilt model. TheJournal of the Acoustical Society of America, 107(3):1697–1714.

Trainor, L., Clark, E., Huntley, A., and Adams, B. (1997). The acoustic basis ofinfant preferences for infant-directed singing. Infant Behavior and Development,20:383–396.

Trehub, S., Trainor, L., and Unyk, A. (1993). Music and speech processing in thefirst year of life. In Reese, H., editor, Advances in child development and behavior,volume 24, pages 1–35. Academic Press, New York, NY.

Trouvain, J., Bonneau, A., Colotte, V., Fauth, C., Fohr, D., Jouvet, D., Jugler, J.,Laprie, Y., Mella, O., Mobius, B., and Zimmerer, F. (2016). The IFCASL Corpusof French and German Non-native and Native Read Speech. In Proceedings of theTenth International Conference on Language Resources and Evaluation (LREC2016), Paris, France.

Trouvain, J. and Gut, U., editors (2007). Non-Native Prosody: Phonetic Descriptionand Teaching Practice. Mouton de Gruyter, Berlin & New York.

Turco, G. and Gubian, M. (2012). L1 prosodic transfer and priming effects: A quan-titative study on semi-spontaneous dialogues. In Proceedings of Speech Prosody2012, pages 386–389, Shanghai, China.

Turco, G., Gubian, M., and Schertz, J. (2011). A quantitative investigation of theprosody of Verum Focus in Italian. In Proceedings of Interspeech 2011, Florence,Italy.

Turk, O., Schoder, M., Bozkurt, B., and Arslan, L. (2005). Voice quality interpo-lation for emotional text-to-speech synthesis. In Proceedings of Interspeech 2005,pages 797–800, Lisbon, Portugal.

Ueyama, M. (2000). Prosodic Transfer: An Acoustic Study of L2 English vs. L2Japanese. PhD thesis, University of California, Oakland, CA.

40

Page 43: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Ueyama, M. and Jun, S.-A. (1998). Focus realisation in Japanese English and KoreanEnglish intonation, volume 7. CSLI/Stanford University Press, Redwood City,CA.

Urbani, M. (2012). Pitch range in L1/L2 English. an analysis of F0 using LTDand linguistic measures. In Busa, M. G. and Stella, A., editors, MethodologicalPerspectives on Second Language Prosody, pages 79–83. CLEUP, Padova, Italy.

Vance, T. J. (1987). An Introduction to Japanese Phonology. State University ofNew York Press, Albany, NY.

Venditti, J. (1997). Japanese ToBI labelling guidelines. Ohio State University Work-ing Papers in Linguistics, 50:127–62.

Venditti, J. (2000). Discourse Structure and Attentional Salience Effects on JapaneseIntonation. PhD thesis, The Ohio State University, Columbus, OH.

Venditti, J., Maekawa, K., and Beckman, M. E. (2008). Prominence marking in theJapanese intonation system. In Miyagawa, S. and Saito, M., editors, Handbook ofJapanese Linguistics, pages 456–512. Oxford University Press, Oxford, UK.

Warner, N. and Arai, T. (2001). The role of the mora in the timing of spontaneousJapanese speech. Journal of Acoustical Society of America, 109:1144–1156.

Wennerstrom, A. (1994). Intonational meaning in English discourse: A study ofnon-native speakers. Applied Linguistics, 15(4):399–420.

Wennerstrom, A. (1998). Intonation as cohesion in academic discourse. Studies inSecond Language Acquisition, 20(01):1–25.

Wester, M., Garcıa Lecumberri, M., and Cooke, M. (2014). DIAPIX-FL: A sym-metric corpus of problem-solving dialogues in first and second languages. In Pro-ceedings of Interspeech 2014, pages 509–512, Singapore.

Wichmann, A. (2000). The attitudinal effects of prosody, and how they relate toemotion. In Proceedings of the ISCA-Workshop on “On Speech and Emotion” inBelfast, pages 143–148. Belfast, Northern Ireland.

Wiese, R. (2000). The Phonology of German. Oxford University Press, Oxford, UK.

Zahner, K., Pohl, M., and Braun, B. (2015). Pitch accent distribution in Germaninfant-directed speech. In Proceedings of Interspeech 2015, pages 46–49, Dresden,Germany.

41

Page 44: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

A FPCA for prosodic data

This appendix provides an overview of the data processing method used in this ar-ticle for the analysis of F0 contours. The method is based on FPCA, which is one ofthe data analysis algorithms available within the Functional Data Analysis (FDA)framework (Ramsay and Silverman, 2009; Ramsay et al., 2009). Since the methodis described in Gubian et al. (2015) and Gubian et al. (2011) in detail, this sectionfocuses on a concise exposition of the main concepts and an outline of the mathemat-ical backbone. Readers interested in applying FDA to their own data are invited tovisit the website maintained by the second author (http://lands.let.ru.nl/FDA),from which instructions and codes can be downloaded. The rest of the appendix isdivided in three sections that correspond to the main steps of the procedure, namelysmoothing, landmark registration and FPCA, respectively.

A.1 Smoothing

Smoothing transforms a contour composed of discrete samples into a continuouscurve represented by a mathematical function (this is what “functional” refers toin FPCA or FDA). The target function is chosen from a set of possible functionsspecified by the user. In the case of F0 contours, which can assume a wide rangeof shapes, it is customary to adopt B-splines as the general function set (De Boor,2001). A B-spline is a sequence of polynomial curves that, summed together, ap-proximate the desired contour. By varying the number of polynomial componentsof B-splines and a parameter influencing the curve smoothness, the user controlshow close to the original discrete samples the continuous curve will be, or, in otherwords, the user controls the temporal resolution of the continuous representation(Ramsay and Silverman, 2009). An example is shown in Figure 14, in which thesmoothing curve was chosen in such a way to eliminate some microprosodic detailpresent in the sampled contour. In general it is important to find a good compromisebetween “smoothing too much”, which would delete relevant information from theoriginal contours, and “not smoothing enough”, which would leave irrelevant detailthat makes the extraction of global trends harder in the following steps.

A.2 Landmark registration

The analysis of prosody is generally based on descriptions of the F0 movement inrelation to the underlying segments, e.g., syllables or morae. However, FPCA pro-duces a model based on physical time, where the position of the relevant segmentalboundaries is different from one contour to another. If we used the original contoursas input to FPCA, at the end of the analysis we would not be able to distinguish con-tour shape variations that are a consequence of a variation of segmental durations(e.g., a peak occurs always inside the same segment, but the preceding segmentsvary their duration) from variations of pitch alignment (e.g., a peak occurs beforeor after a given segmental boundary).

In order to interpret the results of FPCA in terms of the segmental structureunderlying the contour, a nonlinear time warping called landmark registration isapplied to the original (smoothed) F0 contours that synchronizes all corresponding

42

Page 45: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Figure 14: Example of a smoothed F0 contour. Dots represent F0 samples obtainedfrom the pitch tracker available in Praat. The curve is a B-spline. This contouris extracted from a realization of the word Entschuldigung, where the first syllablewas discarded (cf. Section 3.4). The y-axis reports F0 values in semitones after theglobal mean value was subtracted (thus corresponding to the zero level).

segmental boundaries. Landmark registration alters the time axis of a curve in sucha way that landmarks, i.e., segmental boundaries, move from their original positionto a new position specified by the user; usually the average position computed onthe whole curve set is chosen as common location of landmarks. The operation iscarried out automatically and it is only based on the time position of the selectedlandmarks where the latter can obtained either from the manual annotation, e.g.,carried out in Praat, or automatically using a speech recognition software (e.g.,see Turco and Gubian, 2012). The time warping involved in landmark registrationis guaranteed to preserve the qualitative aspects of curves, i.e., all rise and fallmovements are preserved in the original order and level, no jumps or ruptures areproduced. Figure 15 shows some examples.

The output of landmark registration is a new set of curves where all correspond-ing segments have the same duration, which means that the information on thedifferences in segmental durations is discarded. Gubian et al. (2011) proposed away to represent the original durations by means of functions (curves) and to incor-porate this representation in FPCA. Let h(t) be the time warping curve producedby landmark registration that specifies the mapping between normalized time t andoriginal time h(t) for a specific normalized curve F0(t) (bottom left panel in Fig-ure 15). This curve contains all the information necessary to reconstruct the originalF0(t) contour (top left panel in Figure 15) from its registered version (top right panelin Figure 15). An equivalent representation of h(t) that is more convenient for thestatistical analysis tools that follow is obtained by

r(t) = − logdh(t)

dt. (1)

43

Page 46: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

TFigure 15: Three F0 contours randomly chosen from the 86 realizations of the wordEntschuldigung, where the first syllable was discarded (cf. Section 3.4). Top left:F0 contours before landmark registration. Top right: F0 contours after landmarkregistration. Bottom left: time warping functions h(t) describing landmark regis-tration for the three contours. Bottom right: time warping expressed as relativelog rate r(t) as in Equation (1). Three different line styles are used across the fourpanels in order to identify each contour. Dots on the curves mark the position of thelandmarks (here syllabic boundaries | ul | di | gung |, and vertical dashed lines showthe position of the registered landmarks, which correspond to the average positioncomputed on the entire set of 86 realizations of this word.

The curve r(t) represents the log of the relative rate of realization of the utterancewith respect to its normalized version. According to Equation (1), if a segment(i.e., anything between two consecutive landmarks) is realized two times faster thanin the normalized version, then r(t) will read values around log2 = 0.7 along thetime interval corresponding to that segment; vice versa, values around log 1

2= −0.7

will be reached for segments pronounced at half rate (i.e., double duration). Thebottom right panel in Figure 15 shows curves r(t) obtained from h(t) in the bottomleft panel. The advantage of using r(t) instead of h(t) is that the former representsproportional variations in duration linearly. Fortunately, since Equation (1) is com-

44

Page 47: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

pletely reversible, once we obtain results in terms of r(t) we can convert them to h(t)values, which correspond to landmark positions, or directly to differences betweenh(t) values, which correspond to segmental durations. This will be shown in thenext section.

A.3 FPCA

The actual statistical analysis on F0 contours is carried out by FPCA. The inputto FPCA consists of a set of (multidimensional) curves, which, in our case, are(F0(t), r(t)) pairs, where F0(t) is represented in normalized time and r(t) was definedin Equation (1). The output is a compact model composed of a few curves, calledPrincipal Components (PCs), which combined in appropriate amounts reproduceevery input curve, with some approximation. Formally, every curve (F0(t), r(t)) isdecomposed as:

F0(t) = mF0(t) + s1 · PC1F0(t) + s2 · PC2F0(t) + · · · (2a)

r(t) = mr(t) + s1 · PC1r(t) + s2 · PC2r(t) + · · · . (2b)

Functions mF0(t) and mr(t) are the mean curves, i.e., the curves obtained by com-puting the mean value of all input curves at each instant in (normalized) time in theF0 and in the r dimension, respectively. Functions PC1F0(t) and PC1r(t) are thefirst PC curves in the F0 and in the r dimension, respectively, and the same holds forthe second and further PCs. Finally, s1, s2, etc. are the so-called PC scores, whichare coefficients that differ for every input curve and that determine the amount ofcorrection to the mean (mF0(t),mr(t)) that each PC should contribute in order toapproximate a given curve (F0(t), r(t)). PCs are ordered by decreasing percentage ofexplained variance. Crucially, Equation (2a) and Equation (2b) share the same PCscores. This means that PCs act jointly on the mean (mF0(t),mr(t)), thus jointlycapturing variations in the shape of time-normalized contours and in their segmentaldurations (Gubian et al., 2011). All PCs are independent from one another, i.e.,the shape and duration variations described by PC1 are not correlated with thosedescribed by PC2, PC3, etc. This means that given an input curve and its s1 score,we cannot predict its s2 and further scores.

In order to mitigate the possibility that the variance of one of the two dimensionsdominates the other, a weight is applied to the curves based on the ratio betweenthe variance of the dimensions (here omitted to simplify notation). However, thedynamics of F0 curves remain rather different from that of log rate curves. This hasthe consequence of producing an unexpected imbalance of explained variance, e.g.,a modest variation in segmental length (coded as log rate) may dominate a local yet(linguistically) more relevant variation in F0 contour shape. Thus, it is advisable tolook at the first few PCs, since interesting variations are not necessarily those thatexplain more variance.

PC scores can be used as variables in ordinary statistical analyses. In this article,they were used as independent variables in linear mixed-effect models. Once aprediction is obtained in terms of PC scores, this has to be translated back intoits meaning in terms of F0 contours. Suppose we build a model based on s1 andwe obtain a predicted value s1 from it. By substituting s1 = s1 inEquation (2)

45

Page 48: “Excuse meeee!!”: (Mis)coordination of lexical and paralinguistic …lands.let.ru.nl/FDA/papers/Sumimasen_accepted2018.pdf · 2018. 5. 25. · \Excuse meeee!!": (Mis)coordination

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

we obtain two predicted curves F0(t) and r(t). While the former is of immediateinterpretation, the latter is not. If we are interested in the durations of segmentsdelimited by the landmarks, we can compute them by inverting Equation (1). Letdi be the normalized duration of the i-th segment, which spans the interval [ti−1, ti]on the normalized time axis t (i.e., di = ti − ti−1). The duration di in the originaltime corresponding to the warping induced by r(t) is given by:

di =

∫ ti

ti−1

exp(−r(t))dt. (3)

Results presented in this paper are based on Equation (3), which allows to representthe effect of PC scores in terms of plain segmental durations (cf. Figures 4, 6, 8, 10and 12 in the main text).

46