11
Intonation Trajectories within Tones in Unaccompanied Soprano, Alto, Tenor, Bass Quartet Singing Jiajie Dai 1, a) and Simon Dixon 1, b Centre for Digital Music, Queen Mary University of London (Dated: 9 September 2019) Unlike fixed-pitch instruments, the voice requires careful regulation during each note in order to maintain a steady pitch. Previous studies have investigated aspects of singing performance such as intonation accuracy and pitch drift, treating pitch as fixed within notes, while the pitch trajectory within notes has hardly been investigated. The aim of this paper is to study pitch variation within vocal notes and ascertain what factors influence the various parts of a note. We recorded five SATB quartets singing two pieces of music in three different listening conditions, according to whether the singers can hear the other participants or not. After analysing all of the individual notes and extracting pitch over time, we observed the following regularities: 1) There are transient parts of approximately 120 ms duration at both the beginning and end of a note, where the pitch varies rapidly; 2) The shapes of transient parts differ significantly according to the adjacent pitch, although all singers tend to have a descending transient at the end of a note; 3) The trajectory shapes of female singers different from those of male singers at the beginnings of notes; 4) Between vocal parts, there is a tendency to expand harmonic intervals (by about 8 cents between adjacent voices); 5) The listening condition had no significant effect on within-note pitch trajectories. c 2019 Acoustical Society of America. [http://dx.doi.org(DOI number)] [XYZ] Pages: 111 PACS numbers: 43.75.Rs, 43.75.Bc, 43.75.Xz, 43.75.St I. INTRODUCTION Singing is important because it is the most universal form of music-making (Brown, 1991), and it allows for personal and expressive communication. Unlike exter- nal instruments which are mastered by a small minority, almost everyone uses their voice on a daily basis, and can, to some extent, sing. Although singing is common to all human societies and we all have our own idea of what singing actually is (Potter, 2000b, p. 1), many as- pects of singing have not been explored in the research literature. For example, the shapes of, and factors that affect, vocal pitch trajectories within notes have yet to be explained. The motivation of this paper is to deter- mine whether pitch trajectories share common shapes, and what factors influence the transient parts of notes. Intonation, defined as the accuracy of pitch in play- ing or singing (Swannell, 1992), is regarded as an im- portant aspect of music performance (Sundberg et al., 2013). Such a definition assumes that a reference exists for pitch, and presumably that this reference is fixed at least for each note, enabling accuracy to be assessed, ei- ther continuously over the duration of a note, or once a) [email protected] b) [email protected] for the entirety of a note. For the latter case, previous studies calculated the mean or median of fundamental frequency (f o ) estimates on short audio frames (Howard, 2007; Mauch et al., 2014). For analysis of intonation within a note, the frame-level estimates describe the pitch trajectory as a time series. The complexity of the vocal apparatus makes it dif- ficult to sing accurately. Voice production requires the coordination of the lungs, vocal folds, larynx, pharynx and mouth (Sundberg, 1977). To produce a tone at a given pitch also requires muscle memory and tonal mem- ory (Alldahl, 2008). Most people, who do not have per- fect pitch (the ability to recognise the pitch of a note or produce any given note), rely on a recent reference for intonation (Takeuchi and Hulse, 1993). Therefore, the instrumental accompaniment or reference pitch is crucial for the tuning. Previous studies have explored vocal pitch trajec- tories for singing voice synthesis, especially for perfor- mance modelling (Umbert et al., 2015), and modelled the observed pitch in an imitation task, given a time- varying stimulus pitch (Dai and Dixon, 2016). This paper presents an exploratory study to find which factors have an effect on the pitch trajectory of vocal notes. There are many factors influencing overall intonation accuracy, such as score information (e.g. target pitch, duration, in- tervals between the target and simultaneous or recent J. Acoust. Soc. Am. / 9 September 2019 1

Intonation Trajectories within Tones in Unaccompanied

Embed Size (px)

Citation preview

Intonation Trajectories within Tones inUnaccompanied Soprano Alto Tenor Bass QuartetSinging

Jiajie Dai1 a) and Simon Dixon1 b

Centre for Digital Music Queen Mary University of London

(Dated 9 September 2019)

Unlike fixed-pitch instruments the voice requires careful regulation during each note in orderto maintain a steady pitch Previous studies have investigated aspects of singing performancesuch as intonation accuracy and pitch drift treating pitch as fixed within notes while thepitch trajectory within notes has hardly been investigated The aim of this paper is to studypitch variation within vocal notes and ascertain what factors influence the various partsof a note We recorded five SATB quartets singing two pieces of music in three differentlistening conditions according to whether the singers can hear the other participants or notAfter analysing all of the individual notes and extracting pitch over time we observed thefollowing regularities 1) There are transient parts of approximately 120 ms duration at boththe beginning and end of a note where the pitch varies rapidly 2) The shapes of transientparts differ significantly according to the adjacent pitch although all singers tend to have adescending transient at the end of a note 3) The trajectory shapes of female singers differentfrom those of male singers at the beginnings of notes 4) Between vocal parts there is atendency to expand harmonic intervals (by about 8 cents between adjacent voices) 5) Thelistening condition had no significant effect on within-note pitch trajectories

ccopy2019 Acoustical Society of America [httpdxdoiorg(DOI number)]

[XYZ] Pages 1ndash11

PACS numbers 4375Rs 4375Bc 4375Xz 4375St

I INTRODUCTION

Singing is important because it is the most universalform of music-making (Brown 1991) and it allows forpersonal and expressive communication Unlike exter-nal instruments which are mastered by a small minorityalmost everyone uses their voice on a daily basis andcan to some extent sing Although singing is commonto all human societies and we all have our own idea ofwhat singing actually is (Potter 2000b p 1) many as-pects of singing have not been explored in the researchliterature For example the shapes of and factors thataffect vocal pitch trajectories within notes have yet tobe explained The motivation of this paper is to deter-mine whether pitch trajectories share common shapesand what factors influence the transient parts of notes

Intonation defined as the accuracy of pitch in play-ing or singing (Swannell 1992) is regarded as an im-portant aspect of music performance (Sundberg et al2013) Such a definition assumes that a reference existsfor pitch and presumably that this reference is fixed atleast for each note enabling accuracy to be assessed ei-ther continuously over the duration of a note or once

a)jdaiqmulacukb)sedixonqmulacuk

for the entirety of a note For the latter case previousstudies calculated the mean or median of fundamentalfrequency (fo) estimates on short audio frames (Howard2007 Mauch et al 2014) For analysis of intonationwithin a note the frame-level estimates describe the pitchtrajectory as a time series

The complexity of the vocal apparatus makes it dif-ficult to sing accurately Voice production requires thecoordination of the lungs vocal folds larynx pharynxand mouth (Sundberg 1977) To produce a tone at agiven pitch also requires muscle memory and tonal mem-ory (Alldahl 2008) Most people who do not have per-fect pitch (the ability to recognise the pitch of a note orproduce any given note) rely on a recent reference forintonation (Takeuchi and Hulse 1993) Therefore theinstrumental accompaniment or reference pitch is crucialfor the tuning

Previous studies have explored vocal pitch trajec-tories for singing voice synthesis especially for perfor-mance modelling (Umbert et al 2015) and modelledthe observed pitch in an imitation task given a time-varying stimulus pitch (Dai and Dixon 2016) This paperpresents an exploratory study to find which factors havean effect on the pitch trajectory of vocal notes Thereare many factors influencing overall intonation accuracysuch as score information (eg target pitch duration in-tervals between the target and simultaneous or recent

J Acoust Soc Am 9 September 2019 1

pitches) individual differences (eg sex training back-ground) and the accompaniment

We created a public data set which involves 20 partic-ipants (five groups of four) singing two pieces of music inthree different listening conditions solo with one vocalpart missing and with all vocal parts Each participantsings their usual vocal part soprano alto tenor or bassSATB singing was chosen for this intonation study as it isa common configuration for singing ensembles in Westernmusic

The remainder of the paper is structured as followsSection II discusses existing work related to singing in-tonation and interaction Section III contains our re-search questions experimental design and methodologyIn Section IV we describe our data analysis includingannotation and calculation of intonation metrics Sec-tion V presents our results which are then discussed inSection VI Our conclusions are found in Section VIIfollowed by details of where the annotated data and soft-ware can be freely obtained in Section VIII

II PREVIOUS WORK

Intonation accuracy is one of the features used toevaluate a singerrsquos performance For calculating intona-tion accuracy from an audio recording pitch and funda-mental frequency (fo) are generally treated as exchange-able (see Section IV A) In the 1930s Seashore measuredfundamental frequency in recordings of renowned singersand revealed considerable departures from equally tem-pered tuning (Seashore 1914 Sundberg et al 2013)Since that time many studies on singing and intonationfocus on accuracy especially measuring the pitch errorwhich is the difference between the observed pitch anda predetermined target pitch Some studies investigatethe pitch drift of singing ensembles (eg Devaney andEllis 2008 Howard 2003 Kalin 2005 Terasawa 2004)or solo singers (Mauch et al 2014) Other studies in-vestigate factors that influence the pitch error (eg Pfor-dresher et al 2010 Welch et al 1997) In a previousstudy (Dai and Dixon 2017) we observed that pitch er-ror and melodic interval error increase when singers canhear each other and in particular that singing withoutthe bass part had less mean absolute pitch error thansinging with all vocal parts In addition we found thatpitch variation within notes was lower when participantssang solo than with their partners

Besides pitch error other studies have investigatedinterval error the extent to which pitch differences be-tween subsequent (melodic interval error) or simultane-ous (harmonic interval error) tones deviate from theirtarget values Some melodic intervals were reported asbeing harder to sing than other intervals such as tri-tones (Dai et al 2015) and perfect fifths (Vurma andRoss 2006) There is a phenomenon called compres-sion whereby sung melodic intervals tend to be smallerthan the target intervals (Pfordresher and Brown 2007)Harmonic intervals constitute another important factorwhich influences intonation Hagerman and Sundberg

(1980) studied the harmonic intervals sung by two bar-bershop quartets and found that intervals did not re-flect just or Pythagorean tuning as expected althoughthe singing was precise (low standard deviations) Theysuggested that deviations from pure intervals (ie wherethe frequencies of notes are related by ratios of smallwhole numbers (Lindley 2001)) could be due to aperiod-icity in the voice which broadens the spectral peaks andrenders beats inaudible They also observed a generalstretching of intervals in performance which they de-scribe as sounding ldquomore active and expressive than flatintervalsrdquo Nordmark and Ternstrom (1996) investigatedthe preferred tuning of major third intervals finding thatparticipants tuned intervals closer to equal temperamentthan pure intonation Howard (2007) observed the useof non-equal-tempered tuning in unaccompanied singingalthough his data did not fully confirm his predictionsbased on the use of pure intervals In both cases singersproduced intervals between the pure and equal-temperedversions of the intervals while Devaney et al (2012) ob-served that some intervals were close to either just orPythagorean tunings but most were within a standarddeviation of equal-temperament

For an individual singer singing is a complicated taskinvolving both perception and production The voice or-gan can be viewed as an instrument consisting of a powersupply (the lungs) an oscillator (the vocal folds) anda resonator (the larynx pharynx and mouth) (Sund-berg 1977) Factors related to production such as mus-cle strength and control can be improved by training andpractice while the perceptual factors involve many cogni-tive components with distinct brain substrates (Stewartet al 2006) External influences such as reference pitchesprovided by accompaniment also affect pitch accuracy

Interaction is an important factor for ensemblesinging which is a cooperative activity involving com-munication within the ensemble and with the audience(Potter 2000a p 158) Few people can produce a cor-rect pitch directly without the use of an external ref-erence pitch (Takeuchi and Hulse 1993) such as thatprovided by instrumental accompaniment Although ac-companiment has been shown to enhance the individuallearning of a piece (Brandler and Peynircioglu 2015) itcan also reduce pitch accuracy during singing even whenthe accompaniment is in unison with the singer (Dai andDixon 2016 2017 Pfordresher and Brown 2007) Mostsingers adjust their intonation using auditory feedbackto reach the intended note (Zarate and Zatorre 2008)and accompaniment might distract singers from hearingtheir own feedback

Much evidence shows that singers are influenced byother choral members in terms of pitch accuracy (egHoward 2003 Terasawa 2004) and various approacheshave been proposed to keep singers in tune by focusingon relative pitches tone memories and muscle memo-ries (eg Alldahl 2008 Bohrer 2002) Dai and Dixon(2017) observed that pitch error and melodic interval er-ror increase when the participants can hear other singersbut harmonic interval error is reduced when all singers

2 J Acoust Soc Am 9 September 2019

hear each other In unaccompanied multi-part singingHoward (2007) demonstrated how singing pure intervalscan cause drift and he found that singers do in facttend to non-equal-tempered tuning and drift in pitchwith modulation With different musical material De-vaney et al (2012) observed that only some intervals weresignificantly different from equal temperament in theirstudy the singers did not exhibit a large amount of drift

Individual differences such as age and sex also influ-ence pitch accuracy (Welch et al 1997) Likewise musi-cal training and experience have some influence Mauchet al (2014) found that self-rated singing ability andchoir experience but not general musical backgroundcorrelated significantly with intonation accuracy Singerswho exhibit much greater than average pitch errors areclassified as poor singers a phenomenon that has beenthe focus of several studies (Berkowska and Dalla Bella2009 Dalla Bella et al 2007 Pfordresher and Brown2007 Pfordresher et al 2010)

Observation of the pitch trajectory within individ-ual notes reveals transient parts at the beginning andend of each note At the beginning of a tone a pitchglide is often observed as the singer adjusts the vocalcords from their previous state (the previous pitch or arelaxed state) Then the pitch is adjusted as the singeruses perceptual feedback to correct for any discrepancybetween the auditory feedback and the intended note(Zarate and Zatorre 2008) Possibly at the same timevibrato may be applied which is an oscillation aroundthe central pitch which is close to sinusoidal for skilledsingers but asymmetric for unskilled singers (Gerhard2005 Seashore 1931 Sundberg 1995) Finally they maynot sustain the pitch at the end of the tone and thepitch often moves in the direction of the following noteor downward (toward a relaxed vocal cord state) if thereis no immediately following note (Xu and Sun 2000)Pitch variation within a note has been modelled for vo-cal synthesis as well as note level features (onset andoffset) intra- and internote features (changes within andbetween notes) and the relationship to timbre variations(Umbert et al 2015)

Vibrato is used to add expression to vocal and in-strumental music In singing it can occur spontaneouslythrough variations in the larynx Professional (particu-larly opera) singers tend to produce vibrato a periodicmodulation of fo which is not normally used in speech(Sundberg 1987) The frequency of the vibrato is usu-ally in the range 5ndash8 Hz according to the vibrato type(Fischer 1993) Although all human voices can producevibrato it has been shown that with training singers areable to elicit control over both vibrato rate and depth(Dromey et al 2003 King and Horii 1993)

Several software systems for pitch analysis have beendeveloped which support scientific measurement such asPraat (Boersma 2002) Sound Visualiser (Cannam et al2006) and Tony (Mauch et al 2015) de Cheveigne andKawahara (2002) introduced a pitch extraction methodYIN which has been applied extensively This algorithmimproves upon the autocorrelation method by means of

a difference function plus several modifications that im-prove system performance PYIN (Mauch and Dixon2014) is a probabilistic extension of YIN which enhancesrobustness against errors and is employed in the Tonysoftware used in this paper

III METHODOLOGY

In this section we describe our exploratory researchquestions the experimental design musical materialparticipants and experimental procedure Links to thedata and score information can be found in Section VIII

A Listening condition

For our experiment three listening conditions weredefined based on what the singer can hear as they sing Inthe closed condition the singer can only hear their ownvoice and metronome thus they are effectively singingsolo In the partial condition the singer can hear somebut not all of the other vocal parts This is achieved byphysically isolating one singer from the other three andallowing acoustic feedback (via microphones and loud-speakers) in one direction only either from the isolatedsinger to the other three singers (one-to-three condition)or from the three singers to the isolated one (three-to-onecondition) Finally in the open condition all singers canhear each other

For testing the partial condition there are four pairsof test conditions corresponding to the vocal part that isisolated and the direction of feedback For example onetest condition is called the soprano isolated one-to-threecondition where the soprano sings in a closed conditionbut all other parts hear each other (the sopranorsquos voicebeing provided to the others via a loudspeaker) In sucha case the isolated singer is called the independent singeras they are not able to react to the other vocal parts tochoose their tuning In other cases the singer can hear all(open condition) or some (partial condition) of the othervoices and thus is called a dependent singer Figure 1visualises the listening and test conditions

B Research questions

This study of interactive intonation in unaccompa-nied SATB singing is driven by a number of researchquestions Firstly we wish to know whether there arepatterns or regularities in the pitch trajectories of indi-vidual notes We expect to find common trends in thenote trajectories with differences due to context and ex-perimental conditions The second question is how tocharacterise the trajectories in terms of the time requiredfor the singer to reach the target pitch The third ques-tions is which factors influence the tendencies of the tran-sient part The note trajectories might show significantdifferences due to context such as when singing after ahigher pitch or a lower pitch We also wish to determinewhether pitch trajectories differ by vocal part or sex Wepreviously observed significant differences between vocal

J Acoust Soc Am 9 September 2019 3

Listening conditions

Closed condition

Partial condition

Open condition

singer

singer

singer

singer

X

(2 trials)

(42 trials)

Soprano isolatedcondition

A STBorAlto isolatedcondition

Tenor isolatedcondition

Bass isolatedcondition

S ATBor

T SABor

B SATor

One‐to‐three condition

Three‐to‐one condition

Independent singer

Dependent singer(Test conditions)

FIG 1 Listening and test conditions The arrows indicate

the direction of acoustic feedback

parts in terms of pitch error (Dai 2019 Dai and Dixon2017 2019b) Finally we would like to see whether thelistening condition affects note trajectories That is dothe shapes of vocal notes differ depending on whether theparticipants can hear other vocal parts or not

C Participants

20 adult amateur singers (10 male and 10 female)with choir experience volunteered to take part in thestudy They came from the music society and a capellasociety of the university and a local choir (There wasalso a pilot experiment involving four participants fromour research group this data is not used in this paper)The age range was from 20 to 55 years old (mean 280median 265 stddev 78) Participants were compen-sated pound10 for their participation The participants wereable to sing their parts comfortably and they were giventhe score and sample audio files at least 2 weeks beforethe experiment

Since training is a crucial factor for intonation ac-curacy all the participants were given a questionnairebased on the Goldsmiths Musical Sophistication Index(Mullensiefen et al 2014) to test the effect of trainingThe participants had an average of 33 years of musiclessons and 58 years of singing experience

DMaterials

Two contrasting musical pieces were selected for thisstudy a Bach chorale ldquoOh Thou of God the Fatherrdquo(BWV 1646) and Leo Mathisenrsquos jazz song ldquoTo be or notto berdquo Both pieces were chosen for their wide range ofharmonic intervals (see Section IV B) the first piece has34 unique harmonic intervals between parts and the sec-ond piece has 30 harmonic intervals To control the du-

ration of the experiment we shortened the original scoreby deleting the repeats We also reduced the tempo fromthat specified in the score in order to make the pieceseasier to sing and compensate for the limited time thatthe singers had to learn the pieces The resulting dura-tion of the first piece is 76 seconds and the second songis 100 seconds Links to the score and training materialscan be found in section VIII

The equipment included an SSL MADI-AX con-verter five cardioid microphones (Shure SM57) and fourloudspeakers (Yamaha HS5) All the tracks were con-trolled and recorded by the software Logic Pro 10 Themetronome and the four starting reference pitches werealso given by Logic Pro The total latency of the systemis 49 ms (33 ms due to hardware and 16 ms from thesoftware)

E Procedure

A pilot experiment with singers not involved in thestudy was performed to test the experimental setup andminimise potential problems such as bleed between mi-crophones Then the participants in the study were dis-tributed into 5 groups according to their self-identifiedvoice type time availability and collaborative experience(the singers from the same music society were placedin the same group) Each group contained two femalesingers (soprano and alto) and two male singers (tenorand bass) Each participant had at least two hours prac-tice before the recording sometimes on separate daysThey were informed about the goal of the study to inves-tigate interactive intonation in SATB singing and theywere asked to sing their best in all circumstances

For each trial the singers were played their startingnotes before commencing the trial and a metronome ac-companied the singing to ensure that the same tempowas used by all groups Each piece was sung 10 timesby each group The first and the last trial were recordedin the open condition The partial and closed conditiontrials consisting of 8 test conditions 4 (isolated voice) times2 (direction of feedback) were recorded in between Theorder of isolated conditions was randomly chosen to con-trol for any learning effect For each isolated conditionthe three-to-one condition always preceded the one-to-three condition We use the performance of the isolatedsinger in the one-to-three condition as the data for theclosed condition

The singers were recorded in two acoustically iso-lated rooms For the partial and closed conditions theisolated singers were recorded in a separate room fromthe other three singers Loudspeakers in each room pro-vided acoustic feedback according to the test conditionThere was no visual contact between singers in differentrooms With the exception of warm-up and rehearsalbut including all the trials and the questionnaire the to-tal duration of the experiment for each group was aboutone hour and a half

4 J Acoust Soc Am 9 September 2019

IV DATA ANALYSIS

This section describes the annotation procedure andthe measurement of pitch error and harmonic intervalerror The experimental data comprises 5 (groups) times 4(singers) times 2 (pieces) times 10 (trials) = 400 audio fileseach containing 65 to 116 notes Any missing noteswere excluded from the analysis The software Tony(Mauch et al 2015) was chosen as the annotation toolTony performs pitch detection using the pYIN algo-rithm which outperforms the YIN algorithm (Mauchand Dixon 2014) and then automatically segments pitchtrajectories into note objects and provides a convenientinterface for manual checking and correction of the re-sulting annotations The automatic segmentation basedon note energy and pitch changes provided the note on-set and offset times for our data and rarely needed anycorrection

For each audio file we exported two csv filesone containing the note-level information (for calculat-ing pitch and interval errors) and the other containingthe pitch trajectories It took about 67 hours to manuallycheck and correct the 400 files resulting in 37246 anno-tated pitch values which were stored with metadata onthe singer experimental condition and score The infor-mation in our database includes group number singernumber vocal part listening condition piece numbernote in trial score onset position score duration scorepitch score interval observed onset time observed dura-tion observed pitch pitch error melodic interval errorharmonic interval error anonymised participant detailsnormalised note trajectories real-time note trajectoriesage sex and questionnaire scores MATLAB 2015a wasused for statistics and modelling

A Conversion of fo

The Tony software segments the recording into notesand silences and outputs the median fundamental fre-quency fo for each note as well as the fo value for each58 ms frame The conversion of fundamental frequencyto musical pitch p is calculated as follows

p = 69 + 12 log2

fo

440 (1)

This scale is chosen such that its units are semitones(one semitone is equal to 100 cents) with integer valuesof p coinciding with MIDI pitch numbers and referencepitch A4 (p = 69) tuned to 440 Hz After automaticannotation every single note was checked manually tomake sure the tracking was consistent with the data andcorrected if it was not

B Intonation Metrics

Intonation accuracy is quantified in terms of pitcherror and harmonic interval error as defined below As-suming that a reference pitch has been given pitch errorcan be defined as the difference between observed pitchand score pitch (Mauch et al 2014) This is usually de-

fined on the level of notes but can also be measured foreach sampling point of the pitch trajectory

epi = poi minus ps

i (2)

where poi is the observed pitch in a single frame of note

i (or the median pi over the duration of the note) andpsi is the score pitch of note i

To evaluate the pitch accuracy of a sung part we usemean absolute pitch error (MAPE) as the measurementFor a group of M notes with pitch errors ep1 epM theMAPE is defined as

MAPE =1

M

Msumi=1

|epi | (3)

A musical interval is the difference between twopitches (Prout 2011) which is proportional to the log-arithm of the ratio of the fundamental frequencies ofthe two pitches We distinguish two types of intervalmelodic intervals where the two notes are sounded in suc-cession and harmonic intervals where both notes soundsimultaneously (although they might not start simulta-neously) In this paper we consider only the harmonicinterval error defined as the difference between the ob-served and score intervals

ehiAjB = (piA minus pjB) minus (psiA minus ps

jB) (4)

where psiA and ps

jB are the score pitches of two overlap-ping notes from singers A and B respectively and piA

and pjB are their observed median pitches Harmonicintervals were evaluated for all pairs of notes which over-lap in time If one singer sings two notes while the secondsinger holds one note in the same time period two har-monic intervals are observed Thus indices i and j arenot assumed to be equal

V RESULTS

This section presents observed patterns in the shapesof note trajectories and investigates differences due tovocal part sex adjacent pitch and listening conditionsmodelling the trajectories according to the shape of tran-sient parts and classifying them into four categories

Based on the metronome tempo the expected du-ration of notes ranges from 025 to 550 seconds (mean086 median 075) while the observed note duration isfrom 001 seconds to 510 seconds (mean 069 median062) We excluded from the results any notes which hada duration shorter than 015 seconds (41) or MAPElarger than one semitone (120)

A The shape of note trajectories

To observe regularities in note trajectories acrossdiffering note durations we compared two methods ofequalising the time-scale of trajectories normalisationand truncation Normalised pitch trajectories are ex-pressed as a function of the fraction of the note that

J Acoust Soc Am 9 September 2019 5

has elapsed (from 0 to 1) while for the truncated trajec-tories the beginning and end of the note are modelledseparately using respectively the first and last 04 sec-onds of the note (77 of notes are over 04 seconds and55 over 055 seconds in duration) For comparing tra-jectories of different score pitches we use the pitch errorthat is the deviation from the target (score) pitch

For the normalisation method the note trajectorieswere re-sampled to 100 sampling points with the MAT-LAB resample function Then any common shape ofvocal notes can be obtained by averaging across notesFigure 2 plots the resulting note trajectory generated bycalculating the mean of all the sampling points For com-parison we also show the absolute pitch error which ismuch larger in magnitude

In Figure 2 we observe transient parts at the begin-ning and end of the note Based on the slope of theMPE curve the initial and final transients each compriseabout 15-20 of the notersquos duration In the following wetake the first 15 and the final 15 of each note as thetransient parts The length of the two transient parts isapproximately the same and the shape is almost sym-metrical consisting of peaks at both ends of the notewith a relatively stable middle portion The mean pitcherror is negative reflecting a tendency to sing flat relativeto the score pitch

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

Pitc

h E

rror

(se

mito

ne)

-02

-01

0

01

02

03

04

05

MPEMAPE

FIG 2 Average pitch trajectory within a tone expressed as

mean pitch error (MPE) across time-normalised notes with

mean absolute pitch error (MAPE) shown for comparison

An alternative way to combine note trajectories ofvarying length is to truncate the time series and onlyconsider the initial and final segments of each note Tak-ing the first (respectively last) 04 seconds of each noteexcluding notes with a duration less than 055 seconds toavoid artefacts due to the transient at the other end of thenote results in the trajectories shown in Figures 3 and4 From these figures we observe that the first 012 andlast 012 seconds of each note have the most pitch vari-ance This corresponds to about 15-20 of the mean noteduration (069 seconds) This result is similar to thatfor normalised trajectories (Figure 2) where the initial

sharp fall and final rise in pitch are not as sharp due tothe normalisation of different length notes The averageresults hide differences in the proportion and direction oftransients which arise due to individual differences scorepitch and vocal part which will be investigated in thefollowing sections

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-02

-015

-01

-005

0

005

01

015

FIG 3 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the initial 04 seconds

of each note

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

FIG 4 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the final 04 seconds of

each note

The appearance of note trajectories is significantlydifferent between singers who have different degrees ofmusical training For the trained singers the note tra-jectories are smoother and the two transient parts havea clear direction For singers with less training theirnote trajectories tend to be uneven and have less com-mon shape in their beginnings and endings

In Figure 3 the first turning point at 002 secondsmay be an artefact of the averaging of different pitch tra-jectory shapes There are several possible factors thatmight influence trajectory shapes such as the pitch ofthe surrounding notes vocal part sex and listening con-dition which we now examine

B Adjacent pitch

In the previous section we observed large pitch fluc-tuations at each end of the note To test whether these

6 J Acoust Soc Am 9 September 2019

fluctuations are influenced by adjacent pitches in thescore we separate the data for each end of a note intotwo situations based on whether the previous (respec-tively next) pitch is lower or higher than the currentpitch Repeated pitches are ignored An analysis of vari-ance (ANOVA) confirms that the pitch error in relativetime is significantly different based on whether the ad-jacent pitch is higher or lower In Figure 5 we observethat singers tend to overshoot the target pitch and thenadjust downward after singing a lower pitch while aftera high pitch they reach the target almost immediatelyJers and Ternstrom (2005 Fig 3-4) observed that singersalso overshoot the interval (undershoot the pitch) beforecorrecting when they transition from a higher pitch toa lower pitch The steady state pitch is 1 cent sharperwhen coming from a lower pitch than when the previ-ous pitch is higher (F(1 38) = 7797p lt 0001) Singersalso prepare for the pitch of the next note at the end ofeach note as evidenced by the significant difference ob-served between ascending and descending following inter-vals (F(1 38) = 798p lt 001 Figure 6) In both casesthere is an increase in pitch followed by a rapid decreaseas the note ends and the vocal cords are relaxed butthe increase in pitch is much more marked in the casethat the succeeding pitch is higher There are some in-dividual differences between singers in this respect butmost exhibit the average behaviour of being influencedby adjacent notes

Relative time (proportion of note elapsed)0 005 01 015 02 025 03 035 04 045 05

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01After a high pitchAfter a low pitch

FIG 5 The effect of singing after a lower or higher pitch

mean pitch error in relative time

C Vocal parts and sex

To explore the factor of vocal part the normalisednote trajectories were plotted for each of the four vocalparts (Figure 7) Firstly we observe about an 8-centpitch difference between each pair of adjacent parts inour data Although the pitch trajectories vary accord-ing to the participants for most participants sopranostend to sing sharp while tenors and basses tend to singflat These pitch differences lead to an expansion of har-monic intervals between vocal parts the opposite of thecompression that is often observed for melodic intervals(Pfordresher and Brown 2007)

Relative time (proportion of note elapsed)05 055 06 065 07 075 08 085 09 095 1

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01A high pitch nextA low pitch next

FIG 6 The effect of singing before a lower or higher pitch

mean pitch error in relative time

This phenomenon is also observed between sexes AnANOVA shows a significant difference between the notebeginnings of male singers and female singers (F(3 76) =5937p lt 001) In general male participants sing 11cents flatter while females sing 4 cents sharper than thescore pitch Male singers tend to begin the note at ahigher pitch and adjust downwards while female singersrsquoinitial trajectories have a convex shape beginning at alower pitch overshooting the target then decreasing to-ward the target All the singers tend to have similarnote ending a slight increase in pitch followed by a rapiddecrease

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

02

SopranoAltoTenorBass

FIG 7 Mean pitch error over note duration for each vocal

part

DModelling the note trajectories

For a better understanding of the tendencies of pitchtrajectories we modelled them as three separate compo-nents initial transient note middle and final transientAs discussed previously the transient parts were definedby the first 15 and last 15 of the duration The ten-dency of each component was approximated by linearregression Figure 8 shows an example of a single pitch

J Acoust Soc Am 9 September 2019 7

trajectory and the linear fits for each of the three com-ponents

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (s

emito

ne)

-025

-02

-015

-01

-005

0

005

01

015

02

025

Observed dataFitting line

FIG 8 Example of the pitch trajectory of a single note and

the fitting lines for the initial middle and final components

of the note

Convex0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Upward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Downward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Concave0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

FIG 9 Mean pitch trajectories of the four trajectory classes

in relative time (proportion of note elapsed)

To describe the different types of trajectories weclassify them into four categories (Concave Convex Up-ward Downward) according to the slopes of their initialand final transients which are either positive or nega-tive Table I shows that the most popular shapes areConvex and Downward both of which have a negativenote release

Table II shows the mean median and standard devi-ation of the slopes of the three note parts Although theaverage trend for the initial transient is a negative slopeless than half of the notes exhibit this behaviour andthere is a large variance in the slope of the initial tran-sient The middle segment has a small positive trendwhile for the final transient most notes have a negativeslope although again this has a large variance Although

the initial and final slopes have high variance Figures 3and 4 show that their starting and ending points are lessspread (smaller standard deviation) than the rest of thetrajectory

E Listening condition

Finally the influence of listening condition on notetrajectory classification was considered The influence oflistening condition on mean pitch is discussed in previouswork (Dai and Dixon 2017 2019a) In this paper theANOVA test on the note trajectories did not show anysignificant difference between listening conditions (solopartial independent partial dependent and open) for ini-tial transient (F(3 49194) = 007p = 079 ) middle sec-tion (F(3 49194) = 162p = 020) and final transient(F(3 49194) = 005p = 083)

VI DISCUSSION

In this study we observed a general stretching of har-monic intervals between vocal parts so that the bass partsang flat and the soprano part sharp relative to the othervocal parts Unlike the piano where stretching of inter-vals is related to the inharmonicity of the partials sungtones are not inharmonic so we are unable to explainthis observation Hagerman and Sundberg (1980) alsoobserved stretched harmonic intervals in an experimentwith barbershop singers giving the explanation that theysound ldquomore active and expressiverdquo

If we compare to the given starting notes the over-all tendency was to sing flat a tendency which increasedover time Pitch drift has been observed in other experi-ments (Devaney and Ellis 2008 Howard 2003 Terasawa2004) and is typically downward in direction althoughupward drift has also been observed

While the averaged note trajectories particularlywhen sorted into categories (Figure 9) show quitesmooth curves the individual pitch trajectories exhibitmuch greater degrees of variation (eg Figure 8 which isnot an extreme example) There is a danger that the fea-tures observed in the average curves might be artefactsof the averaging process and may not occur often if atall in the individual instances For example in Figures2 and 3 we observe a concave shape (a small local min-imum) in the first 5 (respectively 004 seconds) of thenote trajectory If we compare with Figure 7 where thetwo female vocal parts have different initial trajectoriesto the two male vocal parts it is likely that the localminimum arises from averaging the categorically differ-ent shapes of the male and female parts The reason thatthe end of the note trajectory does not exhibit a similarpattern may be due to the greater frequency of Convexand Downward shapes (289 and 368 respectively)which both have a negative final slope across the vocalparts (Table I)

The differences observed in the averaged curves aresmall in magnitude of the same order as the just notice-able difference in pitch (about 5 cents Loeffler (2006))

8 J Acoust Soc Am 9 September 2019

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

pitches) individual differences (eg sex training back-ground) and the accompaniment

We created a public data set which involves 20 partic-ipants (five groups of four) singing two pieces of music inthree different listening conditions solo with one vocalpart missing and with all vocal parts Each participantsings their usual vocal part soprano alto tenor or bassSATB singing was chosen for this intonation study as it isa common configuration for singing ensembles in Westernmusic

The remainder of the paper is structured as followsSection II discusses existing work related to singing in-tonation and interaction Section III contains our re-search questions experimental design and methodologyIn Section IV we describe our data analysis includingannotation and calculation of intonation metrics Sec-tion V presents our results which are then discussed inSection VI Our conclusions are found in Section VIIfollowed by details of where the annotated data and soft-ware can be freely obtained in Section VIII

II PREVIOUS WORK

Intonation accuracy is one of the features used toevaluate a singerrsquos performance For calculating intona-tion accuracy from an audio recording pitch and funda-mental frequency (fo) are generally treated as exchange-able (see Section IV A) In the 1930s Seashore measuredfundamental frequency in recordings of renowned singersand revealed considerable departures from equally tem-pered tuning (Seashore 1914 Sundberg et al 2013)Since that time many studies on singing and intonationfocus on accuracy especially measuring the pitch errorwhich is the difference between the observed pitch anda predetermined target pitch Some studies investigatethe pitch drift of singing ensembles (eg Devaney andEllis 2008 Howard 2003 Kalin 2005 Terasawa 2004)or solo singers (Mauch et al 2014) Other studies in-vestigate factors that influence the pitch error (eg Pfor-dresher et al 2010 Welch et al 1997) In a previousstudy (Dai and Dixon 2017) we observed that pitch er-ror and melodic interval error increase when singers canhear each other and in particular that singing withoutthe bass part had less mean absolute pitch error thansinging with all vocal parts In addition we found thatpitch variation within notes was lower when participantssang solo than with their partners

Besides pitch error other studies have investigatedinterval error the extent to which pitch differences be-tween subsequent (melodic interval error) or simultane-ous (harmonic interval error) tones deviate from theirtarget values Some melodic intervals were reported asbeing harder to sing than other intervals such as tri-tones (Dai et al 2015) and perfect fifths (Vurma andRoss 2006) There is a phenomenon called compres-sion whereby sung melodic intervals tend to be smallerthan the target intervals (Pfordresher and Brown 2007)Harmonic intervals constitute another important factorwhich influences intonation Hagerman and Sundberg

(1980) studied the harmonic intervals sung by two bar-bershop quartets and found that intervals did not re-flect just or Pythagorean tuning as expected althoughthe singing was precise (low standard deviations) Theysuggested that deviations from pure intervals (ie wherethe frequencies of notes are related by ratios of smallwhole numbers (Lindley 2001)) could be due to aperiod-icity in the voice which broadens the spectral peaks andrenders beats inaudible They also observed a generalstretching of intervals in performance which they de-scribe as sounding ldquomore active and expressive than flatintervalsrdquo Nordmark and Ternstrom (1996) investigatedthe preferred tuning of major third intervals finding thatparticipants tuned intervals closer to equal temperamentthan pure intonation Howard (2007) observed the useof non-equal-tempered tuning in unaccompanied singingalthough his data did not fully confirm his predictionsbased on the use of pure intervals In both cases singersproduced intervals between the pure and equal-temperedversions of the intervals while Devaney et al (2012) ob-served that some intervals were close to either just orPythagorean tunings but most were within a standarddeviation of equal-temperament

For an individual singer singing is a complicated taskinvolving both perception and production The voice or-gan can be viewed as an instrument consisting of a powersupply (the lungs) an oscillator (the vocal folds) anda resonator (the larynx pharynx and mouth) (Sund-berg 1977) Factors related to production such as mus-cle strength and control can be improved by training andpractice while the perceptual factors involve many cogni-tive components with distinct brain substrates (Stewartet al 2006) External influences such as reference pitchesprovided by accompaniment also affect pitch accuracy

Interaction is an important factor for ensemblesinging which is a cooperative activity involving com-munication within the ensemble and with the audience(Potter 2000a p 158) Few people can produce a cor-rect pitch directly without the use of an external ref-erence pitch (Takeuchi and Hulse 1993) such as thatprovided by instrumental accompaniment Although ac-companiment has been shown to enhance the individuallearning of a piece (Brandler and Peynircioglu 2015) itcan also reduce pitch accuracy during singing even whenthe accompaniment is in unison with the singer (Dai andDixon 2016 2017 Pfordresher and Brown 2007) Mostsingers adjust their intonation using auditory feedbackto reach the intended note (Zarate and Zatorre 2008)and accompaniment might distract singers from hearingtheir own feedback

Much evidence shows that singers are influenced byother choral members in terms of pitch accuracy (egHoward 2003 Terasawa 2004) and various approacheshave been proposed to keep singers in tune by focusingon relative pitches tone memories and muscle memo-ries (eg Alldahl 2008 Bohrer 2002) Dai and Dixon(2017) observed that pitch error and melodic interval er-ror increase when the participants can hear other singersbut harmonic interval error is reduced when all singers

2 J Acoust Soc Am 9 September 2019

hear each other In unaccompanied multi-part singingHoward (2007) demonstrated how singing pure intervalscan cause drift and he found that singers do in facttend to non-equal-tempered tuning and drift in pitchwith modulation With different musical material De-vaney et al (2012) observed that only some intervals weresignificantly different from equal temperament in theirstudy the singers did not exhibit a large amount of drift

Individual differences such as age and sex also influ-ence pitch accuracy (Welch et al 1997) Likewise musi-cal training and experience have some influence Mauchet al (2014) found that self-rated singing ability andchoir experience but not general musical backgroundcorrelated significantly with intonation accuracy Singerswho exhibit much greater than average pitch errors areclassified as poor singers a phenomenon that has beenthe focus of several studies (Berkowska and Dalla Bella2009 Dalla Bella et al 2007 Pfordresher and Brown2007 Pfordresher et al 2010)

Observation of the pitch trajectory within individ-ual notes reveals transient parts at the beginning andend of each note At the beginning of a tone a pitchglide is often observed as the singer adjusts the vocalcords from their previous state (the previous pitch or arelaxed state) Then the pitch is adjusted as the singeruses perceptual feedback to correct for any discrepancybetween the auditory feedback and the intended note(Zarate and Zatorre 2008) Possibly at the same timevibrato may be applied which is an oscillation aroundthe central pitch which is close to sinusoidal for skilledsingers but asymmetric for unskilled singers (Gerhard2005 Seashore 1931 Sundberg 1995) Finally they maynot sustain the pitch at the end of the tone and thepitch often moves in the direction of the following noteor downward (toward a relaxed vocal cord state) if thereis no immediately following note (Xu and Sun 2000)Pitch variation within a note has been modelled for vo-cal synthesis as well as note level features (onset andoffset) intra- and internote features (changes within andbetween notes) and the relationship to timbre variations(Umbert et al 2015)

Vibrato is used to add expression to vocal and in-strumental music In singing it can occur spontaneouslythrough variations in the larynx Professional (particu-larly opera) singers tend to produce vibrato a periodicmodulation of fo which is not normally used in speech(Sundberg 1987) The frequency of the vibrato is usu-ally in the range 5ndash8 Hz according to the vibrato type(Fischer 1993) Although all human voices can producevibrato it has been shown that with training singers areable to elicit control over both vibrato rate and depth(Dromey et al 2003 King and Horii 1993)

Several software systems for pitch analysis have beendeveloped which support scientific measurement such asPraat (Boersma 2002) Sound Visualiser (Cannam et al2006) and Tony (Mauch et al 2015) de Cheveigne andKawahara (2002) introduced a pitch extraction methodYIN which has been applied extensively This algorithmimproves upon the autocorrelation method by means of

a difference function plus several modifications that im-prove system performance PYIN (Mauch and Dixon2014) is a probabilistic extension of YIN which enhancesrobustness against errors and is employed in the Tonysoftware used in this paper

III METHODOLOGY

In this section we describe our exploratory researchquestions the experimental design musical materialparticipants and experimental procedure Links to thedata and score information can be found in Section VIII

A Listening condition

For our experiment three listening conditions weredefined based on what the singer can hear as they sing Inthe closed condition the singer can only hear their ownvoice and metronome thus they are effectively singingsolo In the partial condition the singer can hear somebut not all of the other vocal parts This is achieved byphysically isolating one singer from the other three andallowing acoustic feedback (via microphones and loud-speakers) in one direction only either from the isolatedsinger to the other three singers (one-to-three condition)or from the three singers to the isolated one (three-to-onecondition) Finally in the open condition all singers canhear each other

For testing the partial condition there are four pairsof test conditions corresponding to the vocal part that isisolated and the direction of feedback For example onetest condition is called the soprano isolated one-to-threecondition where the soprano sings in a closed conditionbut all other parts hear each other (the sopranorsquos voicebeing provided to the others via a loudspeaker) In sucha case the isolated singer is called the independent singeras they are not able to react to the other vocal parts tochoose their tuning In other cases the singer can hear all(open condition) or some (partial condition) of the othervoices and thus is called a dependent singer Figure 1visualises the listening and test conditions

B Research questions

This study of interactive intonation in unaccompa-nied SATB singing is driven by a number of researchquestions Firstly we wish to know whether there arepatterns or regularities in the pitch trajectories of indi-vidual notes We expect to find common trends in thenote trajectories with differences due to context and ex-perimental conditions The second question is how tocharacterise the trajectories in terms of the time requiredfor the singer to reach the target pitch The third ques-tions is which factors influence the tendencies of the tran-sient part The note trajectories might show significantdifferences due to context such as when singing after ahigher pitch or a lower pitch We also wish to determinewhether pitch trajectories differ by vocal part or sex Wepreviously observed significant differences between vocal

J Acoust Soc Am 9 September 2019 3

Listening conditions

Closed condition

Partial condition

Open condition

singer

singer

singer

singer

X

(2 trials)

(42 trials)

Soprano isolatedcondition

A STBorAlto isolatedcondition

Tenor isolatedcondition

Bass isolatedcondition

S ATBor

T SABor

B SATor

One‐to‐three condition

Three‐to‐one condition

Independent singer

Dependent singer(Test conditions)

FIG 1 Listening and test conditions The arrows indicate

the direction of acoustic feedback

parts in terms of pitch error (Dai 2019 Dai and Dixon2017 2019b) Finally we would like to see whether thelistening condition affects note trajectories That is dothe shapes of vocal notes differ depending on whether theparticipants can hear other vocal parts or not

C Participants

20 adult amateur singers (10 male and 10 female)with choir experience volunteered to take part in thestudy They came from the music society and a capellasociety of the university and a local choir (There wasalso a pilot experiment involving four participants fromour research group this data is not used in this paper)The age range was from 20 to 55 years old (mean 280median 265 stddev 78) Participants were compen-sated pound10 for their participation The participants wereable to sing their parts comfortably and they were giventhe score and sample audio files at least 2 weeks beforethe experiment

Since training is a crucial factor for intonation ac-curacy all the participants were given a questionnairebased on the Goldsmiths Musical Sophistication Index(Mullensiefen et al 2014) to test the effect of trainingThe participants had an average of 33 years of musiclessons and 58 years of singing experience

DMaterials

Two contrasting musical pieces were selected for thisstudy a Bach chorale ldquoOh Thou of God the Fatherrdquo(BWV 1646) and Leo Mathisenrsquos jazz song ldquoTo be or notto berdquo Both pieces were chosen for their wide range ofharmonic intervals (see Section IV B) the first piece has34 unique harmonic intervals between parts and the sec-ond piece has 30 harmonic intervals To control the du-

ration of the experiment we shortened the original scoreby deleting the repeats We also reduced the tempo fromthat specified in the score in order to make the pieceseasier to sing and compensate for the limited time thatthe singers had to learn the pieces The resulting dura-tion of the first piece is 76 seconds and the second songis 100 seconds Links to the score and training materialscan be found in section VIII

The equipment included an SSL MADI-AX con-verter five cardioid microphones (Shure SM57) and fourloudspeakers (Yamaha HS5) All the tracks were con-trolled and recorded by the software Logic Pro 10 Themetronome and the four starting reference pitches werealso given by Logic Pro The total latency of the systemis 49 ms (33 ms due to hardware and 16 ms from thesoftware)

E Procedure

A pilot experiment with singers not involved in thestudy was performed to test the experimental setup andminimise potential problems such as bleed between mi-crophones Then the participants in the study were dis-tributed into 5 groups according to their self-identifiedvoice type time availability and collaborative experience(the singers from the same music society were placedin the same group) Each group contained two femalesingers (soprano and alto) and two male singers (tenorand bass) Each participant had at least two hours prac-tice before the recording sometimes on separate daysThey were informed about the goal of the study to inves-tigate interactive intonation in SATB singing and theywere asked to sing their best in all circumstances

For each trial the singers were played their startingnotes before commencing the trial and a metronome ac-companied the singing to ensure that the same tempowas used by all groups Each piece was sung 10 timesby each group The first and the last trial were recordedin the open condition The partial and closed conditiontrials consisting of 8 test conditions 4 (isolated voice) times2 (direction of feedback) were recorded in between Theorder of isolated conditions was randomly chosen to con-trol for any learning effect For each isolated conditionthe three-to-one condition always preceded the one-to-three condition We use the performance of the isolatedsinger in the one-to-three condition as the data for theclosed condition

The singers were recorded in two acoustically iso-lated rooms For the partial and closed conditions theisolated singers were recorded in a separate room fromthe other three singers Loudspeakers in each room pro-vided acoustic feedback according to the test conditionThere was no visual contact between singers in differentrooms With the exception of warm-up and rehearsalbut including all the trials and the questionnaire the to-tal duration of the experiment for each group was aboutone hour and a half

4 J Acoust Soc Am 9 September 2019

IV DATA ANALYSIS

This section describes the annotation procedure andthe measurement of pitch error and harmonic intervalerror The experimental data comprises 5 (groups) times 4(singers) times 2 (pieces) times 10 (trials) = 400 audio fileseach containing 65 to 116 notes Any missing noteswere excluded from the analysis The software Tony(Mauch et al 2015) was chosen as the annotation toolTony performs pitch detection using the pYIN algo-rithm which outperforms the YIN algorithm (Mauchand Dixon 2014) and then automatically segments pitchtrajectories into note objects and provides a convenientinterface for manual checking and correction of the re-sulting annotations The automatic segmentation basedon note energy and pitch changes provided the note on-set and offset times for our data and rarely needed anycorrection

For each audio file we exported two csv filesone containing the note-level information (for calculat-ing pitch and interval errors) and the other containingthe pitch trajectories It took about 67 hours to manuallycheck and correct the 400 files resulting in 37246 anno-tated pitch values which were stored with metadata onthe singer experimental condition and score The infor-mation in our database includes group number singernumber vocal part listening condition piece numbernote in trial score onset position score duration scorepitch score interval observed onset time observed dura-tion observed pitch pitch error melodic interval errorharmonic interval error anonymised participant detailsnormalised note trajectories real-time note trajectoriesage sex and questionnaire scores MATLAB 2015a wasused for statistics and modelling

A Conversion of fo

The Tony software segments the recording into notesand silences and outputs the median fundamental fre-quency fo for each note as well as the fo value for each58 ms frame The conversion of fundamental frequencyto musical pitch p is calculated as follows

p = 69 + 12 log2

fo

440 (1)

This scale is chosen such that its units are semitones(one semitone is equal to 100 cents) with integer valuesof p coinciding with MIDI pitch numbers and referencepitch A4 (p = 69) tuned to 440 Hz After automaticannotation every single note was checked manually tomake sure the tracking was consistent with the data andcorrected if it was not

B Intonation Metrics

Intonation accuracy is quantified in terms of pitcherror and harmonic interval error as defined below As-suming that a reference pitch has been given pitch errorcan be defined as the difference between observed pitchand score pitch (Mauch et al 2014) This is usually de-

fined on the level of notes but can also be measured foreach sampling point of the pitch trajectory

epi = poi minus ps

i (2)

where poi is the observed pitch in a single frame of note

i (or the median pi over the duration of the note) andpsi is the score pitch of note i

To evaluate the pitch accuracy of a sung part we usemean absolute pitch error (MAPE) as the measurementFor a group of M notes with pitch errors ep1 epM theMAPE is defined as

MAPE =1

M

Msumi=1

|epi | (3)

A musical interval is the difference between twopitches (Prout 2011) which is proportional to the log-arithm of the ratio of the fundamental frequencies ofthe two pitches We distinguish two types of intervalmelodic intervals where the two notes are sounded in suc-cession and harmonic intervals where both notes soundsimultaneously (although they might not start simulta-neously) In this paper we consider only the harmonicinterval error defined as the difference between the ob-served and score intervals

ehiAjB = (piA minus pjB) minus (psiA minus ps

jB) (4)

where psiA and ps

jB are the score pitches of two overlap-ping notes from singers A and B respectively and piA

and pjB are their observed median pitches Harmonicintervals were evaluated for all pairs of notes which over-lap in time If one singer sings two notes while the secondsinger holds one note in the same time period two har-monic intervals are observed Thus indices i and j arenot assumed to be equal

V RESULTS

This section presents observed patterns in the shapesof note trajectories and investigates differences due tovocal part sex adjacent pitch and listening conditionsmodelling the trajectories according to the shape of tran-sient parts and classifying them into four categories

Based on the metronome tempo the expected du-ration of notes ranges from 025 to 550 seconds (mean086 median 075) while the observed note duration isfrom 001 seconds to 510 seconds (mean 069 median062) We excluded from the results any notes which hada duration shorter than 015 seconds (41) or MAPElarger than one semitone (120)

A The shape of note trajectories

To observe regularities in note trajectories acrossdiffering note durations we compared two methods ofequalising the time-scale of trajectories normalisationand truncation Normalised pitch trajectories are ex-pressed as a function of the fraction of the note that

J Acoust Soc Am 9 September 2019 5

has elapsed (from 0 to 1) while for the truncated trajec-tories the beginning and end of the note are modelledseparately using respectively the first and last 04 sec-onds of the note (77 of notes are over 04 seconds and55 over 055 seconds in duration) For comparing tra-jectories of different score pitches we use the pitch errorthat is the deviation from the target (score) pitch

For the normalisation method the note trajectorieswere re-sampled to 100 sampling points with the MAT-LAB resample function Then any common shape ofvocal notes can be obtained by averaging across notesFigure 2 plots the resulting note trajectory generated bycalculating the mean of all the sampling points For com-parison we also show the absolute pitch error which ismuch larger in magnitude

In Figure 2 we observe transient parts at the begin-ning and end of the note Based on the slope of theMPE curve the initial and final transients each compriseabout 15-20 of the notersquos duration In the following wetake the first 15 and the final 15 of each note as thetransient parts The length of the two transient parts isapproximately the same and the shape is almost sym-metrical consisting of peaks at both ends of the notewith a relatively stable middle portion The mean pitcherror is negative reflecting a tendency to sing flat relativeto the score pitch

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

Pitc

h E

rror

(se

mito

ne)

-02

-01

0

01

02

03

04

05

MPEMAPE

FIG 2 Average pitch trajectory within a tone expressed as

mean pitch error (MPE) across time-normalised notes with

mean absolute pitch error (MAPE) shown for comparison

An alternative way to combine note trajectories ofvarying length is to truncate the time series and onlyconsider the initial and final segments of each note Tak-ing the first (respectively last) 04 seconds of each noteexcluding notes with a duration less than 055 seconds toavoid artefacts due to the transient at the other end of thenote results in the trajectories shown in Figures 3 and4 From these figures we observe that the first 012 andlast 012 seconds of each note have the most pitch vari-ance This corresponds to about 15-20 of the mean noteduration (069 seconds) This result is similar to thatfor normalised trajectories (Figure 2) where the initial

sharp fall and final rise in pitch are not as sharp due tothe normalisation of different length notes The averageresults hide differences in the proportion and direction oftransients which arise due to individual differences scorepitch and vocal part which will be investigated in thefollowing sections

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-02

-015

-01

-005

0

005

01

015

FIG 3 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the initial 04 seconds

of each note

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

FIG 4 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the final 04 seconds of

each note

The appearance of note trajectories is significantlydifferent between singers who have different degrees ofmusical training For the trained singers the note tra-jectories are smoother and the two transient parts havea clear direction For singers with less training theirnote trajectories tend to be uneven and have less com-mon shape in their beginnings and endings

In Figure 3 the first turning point at 002 secondsmay be an artefact of the averaging of different pitch tra-jectory shapes There are several possible factors thatmight influence trajectory shapes such as the pitch ofthe surrounding notes vocal part sex and listening con-dition which we now examine

B Adjacent pitch

In the previous section we observed large pitch fluc-tuations at each end of the note To test whether these

6 J Acoust Soc Am 9 September 2019

fluctuations are influenced by adjacent pitches in thescore we separate the data for each end of a note intotwo situations based on whether the previous (respec-tively next) pitch is lower or higher than the currentpitch Repeated pitches are ignored An analysis of vari-ance (ANOVA) confirms that the pitch error in relativetime is significantly different based on whether the ad-jacent pitch is higher or lower In Figure 5 we observethat singers tend to overshoot the target pitch and thenadjust downward after singing a lower pitch while aftera high pitch they reach the target almost immediatelyJers and Ternstrom (2005 Fig 3-4) observed that singersalso overshoot the interval (undershoot the pitch) beforecorrecting when they transition from a higher pitch toa lower pitch The steady state pitch is 1 cent sharperwhen coming from a lower pitch than when the previ-ous pitch is higher (F(1 38) = 7797p lt 0001) Singersalso prepare for the pitch of the next note at the end ofeach note as evidenced by the significant difference ob-served between ascending and descending following inter-vals (F(1 38) = 798p lt 001 Figure 6) In both casesthere is an increase in pitch followed by a rapid decreaseas the note ends and the vocal cords are relaxed butthe increase in pitch is much more marked in the casethat the succeeding pitch is higher There are some in-dividual differences between singers in this respect butmost exhibit the average behaviour of being influencedby adjacent notes

Relative time (proportion of note elapsed)0 005 01 015 02 025 03 035 04 045 05

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01After a high pitchAfter a low pitch

FIG 5 The effect of singing after a lower or higher pitch

mean pitch error in relative time

C Vocal parts and sex

To explore the factor of vocal part the normalisednote trajectories were plotted for each of the four vocalparts (Figure 7) Firstly we observe about an 8-centpitch difference between each pair of adjacent parts inour data Although the pitch trajectories vary accord-ing to the participants for most participants sopranostend to sing sharp while tenors and basses tend to singflat These pitch differences lead to an expansion of har-monic intervals between vocal parts the opposite of thecompression that is often observed for melodic intervals(Pfordresher and Brown 2007)

Relative time (proportion of note elapsed)05 055 06 065 07 075 08 085 09 095 1

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01A high pitch nextA low pitch next

FIG 6 The effect of singing before a lower or higher pitch

mean pitch error in relative time

This phenomenon is also observed between sexes AnANOVA shows a significant difference between the notebeginnings of male singers and female singers (F(3 76) =5937p lt 001) In general male participants sing 11cents flatter while females sing 4 cents sharper than thescore pitch Male singers tend to begin the note at ahigher pitch and adjust downwards while female singersrsquoinitial trajectories have a convex shape beginning at alower pitch overshooting the target then decreasing to-ward the target All the singers tend to have similarnote ending a slight increase in pitch followed by a rapiddecrease

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

02

SopranoAltoTenorBass

FIG 7 Mean pitch error over note duration for each vocal

part

DModelling the note trajectories

For a better understanding of the tendencies of pitchtrajectories we modelled them as three separate compo-nents initial transient note middle and final transientAs discussed previously the transient parts were definedby the first 15 and last 15 of the duration The ten-dency of each component was approximated by linearregression Figure 8 shows an example of a single pitch

J Acoust Soc Am 9 September 2019 7

trajectory and the linear fits for each of the three com-ponents

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (s

emito

ne)

-025

-02

-015

-01

-005

0

005

01

015

02

025

Observed dataFitting line

FIG 8 Example of the pitch trajectory of a single note and

the fitting lines for the initial middle and final components

of the note

Convex0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Upward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Downward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Concave0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

FIG 9 Mean pitch trajectories of the four trajectory classes

in relative time (proportion of note elapsed)

To describe the different types of trajectories weclassify them into four categories (Concave Convex Up-ward Downward) according to the slopes of their initialand final transients which are either positive or nega-tive Table I shows that the most popular shapes areConvex and Downward both of which have a negativenote release

Table II shows the mean median and standard devi-ation of the slopes of the three note parts Although theaverage trend for the initial transient is a negative slopeless than half of the notes exhibit this behaviour andthere is a large variance in the slope of the initial tran-sient The middle segment has a small positive trendwhile for the final transient most notes have a negativeslope although again this has a large variance Although

the initial and final slopes have high variance Figures 3and 4 show that their starting and ending points are lessspread (smaller standard deviation) than the rest of thetrajectory

E Listening condition

Finally the influence of listening condition on notetrajectory classification was considered The influence oflistening condition on mean pitch is discussed in previouswork (Dai and Dixon 2017 2019a) In this paper theANOVA test on the note trajectories did not show anysignificant difference between listening conditions (solopartial independent partial dependent and open) for ini-tial transient (F(3 49194) = 007p = 079 ) middle sec-tion (F(3 49194) = 162p = 020) and final transient(F(3 49194) = 005p = 083)

VI DISCUSSION

In this study we observed a general stretching of har-monic intervals between vocal parts so that the bass partsang flat and the soprano part sharp relative to the othervocal parts Unlike the piano where stretching of inter-vals is related to the inharmonicity of the partials sungtones are not inharmonic so we are unable to explainthis observation Hagerman and Sundberg (1980) alsoobserved stretched harmonic intervals in an experimentwith barbershop singers giving the explanation that theysound ldquomore active and expressiverdquo

If we compare to the given starting notes the over-all tendency was to sing flat a tendency which increasedover time Pitch drift has been observed in other experi-ments (Devaney and Ellis 2008 Howard 2003 Terasawa2004) and is typically downward in direction althoughupward drift has also been observed

While the averaged note trajectories particularlywhen sorted into categories (Figure 9) show quitesmooth curves the individual pitch trajectories exhibitmuch greater degrees of variation (eg Figure 8 which isnot an extreme example) There is a danger that the fea-tures observed in the average curves might be artefactsof the averaging process and may not occur often if atall in the individual instances For example in Figures2 and 3 we observe a concave shape (a small local min-imum) in the first 5 (respectively 004 seconds) of thenote trajectory If we compare with Figure 7 where thetwo female vocal parts have different initial trajectoriesto the two male vocal parts it is likely that the localminimum arises from averaging the categorically differ-ent shapes of the male and female parts The reason thatthe end of the note trajectory does not exhibit a similarpattern may be due to the greater frequency of Convexand Downward shapes (289 and 368 respectively)which both have a negative final slope across the vocalparts (Table I)

The differences observed in the averaged curves aresmall in magnitude of the same order as the just notice-able difference in pitch (about 5 cents Loeffler (2006))

8 J Acoust Soc Am 9 September 2019

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

hear each other In unaccompanied multi-part singingHoward (2007) demonstrated how singing pure intervalscan cause drift and he found that singers do in facttend to non-equal-tempered tuning and drift in pitchwith modulation With different musical material De-vaney et al (2012) observed that only some intervals weresignificantly different from equal temperament in theirstudy the singers did not exhibit a large amount of drift

Individual differences such as age and sex also influ-ence pitch accuracy (Welch et al 1997) Likewise musi-cal training and experience have some influence Mauchet al (2014) found that self-rated singing ability andchoir experience but not general musical backgroundcorrelated significantly with intonation accuracy Singerswho exhibit much greater than average pitch errors areclassified as poor singers a phenomenon that has beenthe focus of several studies (Berkowska and Dalla Bella2009 Dalla Bella et al 2007 Pfordresher and Brown2007 Pfordresher et al 2010)

Observation of the pitch trajectory within individ-ual notes reveals transient parts at the beginning andend of each note At the beginning of a tone a pitchglide is often observed as the singer adjusts the vocalcords from their previous state (the previous pitch or arelaxed state) Then the pitch is adjusted as the singeruses perceptual feedback to correct for any discrepancybetween the auditory feedback and the intended note(Zarate and Zatorre 2008) Possibly at the same timevibrato may be applied which is an oscillation aroundthe central pitch which is close to sinusoidal for skilledsingers but asymmetric for unskilled singers (Gerhard2005 Seashore 1931 Sundberg 1995) Finally they maynot sustain the pitch at the end of the tone and thepitch often moves in the direction of the following noteor downward (toward a relaxed vocal cord state) if thereis no immediately following note (Xu and Sun 2000)Pitch variation within a note has been modelled for vo-cal synthesis as well as note level features (onset andoffset) intra- and internote features (changes within andbetween notes) and the relationship to timbre variations(Umbert et al 2015)

Vibrato is used to add expression to vocal and in-strumental music In singing it can occur spontaneouslythrough variations in the larynx Professional (particu-larly opera) singers tend to produce vibrato a periodicmodulation of fo which is not normally used in speech(Sundberg 1987) The frequency of the vibrato is usu-ally in the range 5ndash8 Hz according to the vibrato type(Fischer 1993) Although all human voices can producevibrato it has been shown that with training singers areable to elicit control over both vibrato rate and depth(Dromey et al 2003 King and Horii 1993)

Several software systems for pitch analysis have beendeveloped which support scientific measurement such asPraat (Boersma 2002) Sound Visualiser (Cannam et al2006) and Tony (Mauch et al 2015) de Cheveigne andKawahara (2002) introduced a pitch extraction methodYIN which has been applied extensively This algorithmimproves upon the autocorrelation method by means of

a difference function plus several modifications that im-prove system performance PYIN (Mauch and Dixon2014) is a probabilistic extension of YIN which enhancesrobustness against errors and is employed in the Tonysoftware used in this paper

III METHODOLOGY

In this section we describe our exploratory researchquestions the experimental design musical materialparticipants and experimental procedure Links to thedata and score information can be found in Section VIII

A Listening condition

For our experiment three listening conditions weredefined based on what the singer can hear as they sing Inthe closed condition the singer can only hear their ownvoice and metronome thus they are effectively singingsolo In the partial condition the singer can hear somebut not all of the other vocal parts This is achieved byphysically isolating one singer from the other three andallowing acoustic feedback (via microphones and loud-speakers) in one direction only either from the isolatedsinger to the other three singers (one-to-three condition)or from the three singers to the isolated one (three-to-onecondition) Finally in the open condition all singers canhear each other

For testing the partial condition there are four pairsof test conditions corresponding to the vocal part that isisolated and the direction of feedback For example onetest condition is called the soprano isolated one-to-threecondition where the soprano sings in a closed conditionbut all other parts hear each other (the sopranorsquos voicebeing provided to the others via a loudspeaker) In sucha case the isolated singer is called the independent singeras they are not able to react to the other vocal parts tochoose their tuning In other cases the singer can hear all(open condition) or some (partial condition) of the othervoices and thus is called a dependent singer Figure 1visualises the listening and test conditions

B Research questions

This study of interactive intonation in unaccompa-nied SATB singing is driven by a number of researchquestions Firstly we wish to know whether there arepatterns or regularities in the pitch trajectories of indi-vidual notes We expect to find common trends in thenote trajectories with differences due to context and ex-perimental conditions The second question is how tocharacterise the trajectories in terms of the time requiredfor the singer to reach the target pitch The third ques-tions is which factors influence the tendencies of the tran-sient part The note trajectories might show significantdifferences due to context such as when singing after ahigher pitch or a lower pitch We also wish to determinewhether pitch trajectories differ by vocal part or sex Wepreviously observed significant differences between vocal

J Acoust Soc Am 9 September 2019 3

Listening conditions

Closed condition

Partial condition

Open condition

singer

singer

singer

singer

X

(2 trials)

(42 trials)

Soprano isolatedcondition

A STBorAlto isolatedcondition

Tenor isolatedcondition

Bass isolatedcondition

S ATBor

T SABor

B SATor

One‐to‐three condition

Three‐to‐one condition

Independent singer

Dependent singer(Test conditions)

FIG 1 Listening and test conditions The arrows indicate

the direction of acoustic feedback

parts in terms of pitch error (Dai 2019 Dai and Dixon2017 2019b) Finally we would like to see whether thelistening condition affects note trajectories That is dothe shapes of vocal notes differ depending on whether theparticipants can hear other vocal parts or not

C Participants

20 adult amateur singers (10 male and 10 female)with choir experience volunteered to take part in thestudy They came from the music society and a capellasociety of the university and a local choir (There wasalso a pilot experiment involving four participants fromour research group this data is not used in this paper)The age range was from 20 to 55 years old (mean 280median 265 stddev 78) Participants were compen-sated pound10 for their participation The participants wereable to sing their parts comfortably and they were giventhe score and sample audio files at least 2 weeks beforethe experiment

Since training is a crucial factor for intonation ac-curacy all the participants were given a questionnairebased on the Goldsmiths Musical Sophistication Index(Mullensiefen et al 2014) to test the effect of trainingThe participants had an average of 33 years of musiclessons and 58 years of singing experience

DMaterials

Two contrasting musical pieces were selected for thisstudy a Bach chorale ldquoOh Thou of God the Fatherrdquo(BWV 1646) and Leo Mathisenrsquos jazz song ldquoTo be or notto berdquo Both pieces were chosen for their wide range ofharmonic intervals (see Section IV B) the first piece has34 unique harmonic intervals between parts and the sec-ond piece has 30 harmonic intervals To control the du-

ration of the experiment we shortened the original scoreby deleting the repeats We also reduced the tempo fromthat specified in the score in order to make the pieceseasier to sing and compensate for the limited time thatthe singers had to learn the pieces The resulting dura-tion of the first piece is 76 seconds and the second songis 100 seconds Links to the score and training materialscan be found in section VIII

The equipment included an SSL MADI-AX con-verter five cardioid microphones (Shure SM57) and fourloudspeakers (Yamaha HS5) All the tracks were con-trolled and recorded by the software Logic Pro 10 Themetronome and the four starting reference pitches werealso given by Logic Pro The total latency of the systemis 49 ms (33 ms due to hardware and 16 ms from thesoftware)

E Procedure

A pilot experiment with singers not involved in thestudy was performed to test the experimental setup andminimise potential problems such as bleed between mi-crophones Then the participants in the study were dis-tributed into 5 groups according to their self-identifiedvoice type time availability and collaborative experience(the singers from the same music society were placedin the same group) Each group contained two femalesingers (soprano and alto) and two male singers (tenorand bass) Each participant had at least two hours prac-tice before the recording sometimes on separate daysThey were informed about the goal of the study to inves-tigate interactive intonation in SATB singing and theywere asked to sing their best in all circumstances

For each trial the singers were played their startingnotes before commencing the trial and a metronome ac-companied the singing to ensure that the same tempowas used by all groups Each piece was sung 10 timesby each group The first and the last trial were recordedin the open condition The partial and closed conditiontrials consisting of 8 test conditions 4 (isolated voice) times2 (direction of feedback) were recorded in between Theorder of isolated conditions was randomly chosen to con-trol for any learning effect For each isolated conditionthe three-to-one condition always preceded the one-to-three condition We use the performance of the isolatedsinger in the one-to-three condition as the data for theclosed condition

The singers were recorded in two acoustically iso-lated rooms For the partial and closed conditions theisolated singers were recorded in a separate room fromthe other three singers Loudspeakers in each room pro-vided acoustic feedback according to the test conditionThere was no visual contact between singers in differentrooms With the exception of warm-up and rehearsalbut including all the trials and the questionnaire the to-tal duration of the experiment for each group was aboutone hour and a half

4 J Acoust Soc Am 9 September 2019

IV DATA ANALYSIS

This section describes the annotation procedure andthe measurement of pitch error and harmonic intervalerror The experimental data comprises 5 (groups) times 4(singers) times 2 (pieces) times 10 (trials) = 400 audio fileseach containing 65 to 116 notes Any missing noteswere excluded from the analysis The software Tony(Mauch et al 2015) was chosen as the annotation toolTony performs pitch detection using the pYIN algo-rithm which outperforms the YIN algorithm (Mauchand Dixon 2014) and then automatically segments pitchtrajectories into note objects and provides a convenientinterface for manual checking and correction of the re-sulting annotations The automatic segmentation basedon note energy and pitch changes provided the note on-set and offset times for our data and rarely needed anycorrection

For each audio file we exported two csv filesone containing the note-level information (for calculat-ing pitch and interval errors) and the other containingthe pitch trajectories It took about 67 hours to manuallycheck and correct the 400 files resulting in 37246 anno-tated pitch values which were stored with metadata onthe singer experimental condition and score The infor-mation in our database includes group number singernumber vocal part listening condition piece numbernote in trial score onset position score duration scorepitch score interval observed onset time observed dura-tion observed pitch pitch error melodic interval errorharmonic interval error anonymised participant detailsnormalised note trajectories real-time note trajectoriesage sex and questionnaire scores MATLAB 2015a wasused for statistics and modelling

A Conversion of fo

The Tony software segments the recording into notesand silences and outputs the median fundamental fre-quency fo for each note as well as the fo value for each58 ms frame The conversion of fundamental frequencyto musical pitch p is calculated as follows

p = 69 + 12 log2

fo

440 (1)

This scale is chosen such that its units are semitones(one semitone is equal to 100 cents) with integer valuesof p coinciding with MIDI pitch numbers and referencepitch A4 (p = 69) tuned to 440 Hz After automaticannotation every single note was checked manually tomake sure the tracking was consistent with the data andcorrected if it was not

B Intonation Metrics

Intonation accuracy is quantified in terms of pitcherror and harmonic interval error as defined below As-suming that a reference pitch has been given pitch errorcan be defined as the difference between observed pitchand score pitch (Mauch et al 2014) This is usually de-

fined on the level of notes but can also be measured foreach sampling point of the pitch trajectory

epi = poi minus ps

i (2)

where poi is the observed pitch in a single frame of note

i (or the median pi over the duration of the note) andpsi is the score pitch of note i

To evaluate the pitch accuracy of a sung part we usemean absolute pitch error (MAPE) as the measurementFor a group of M notes with pitch errors ep1 epM theMAPE is defined as

MAPE =1

M

Msumi=1

|epi | (3)

A musical interval is the difference between twopitches (Prout 2011) which is proportional to the log-arithm of the ratio of the fundamental frequencies ofthe two pitches We distinguish two types of intervalmelodic intervals where the two notes are sounded in suc-cession and harmonic intervals where both notes soundsimultaneously (although they might not start simulta-neously) In this paper we consider only the harmonicinterval error defined as the difference between the ob-served and score intervals

ehiAjB = (piA minus pjB) minus (psiA minus ps

jB) (4)

where psiA and ps

jB are the score pitches of two overlap-ping notes from singers A and B respectively and piA

and pjB are their observed median pitches Harmonicintervals were evaluated for all pairs of notes which over-lap in time If one singer sings two notes while the secondsinger holds one note in the same time period two har-monic intervals are observed Thus indices i and j arenot assumed to be equal

V RESULTS

This section presents observed patterns in the shapesof note trajectories and investigates differences due tovocal part sex adjacent pitch and listening conditionsmodelling the trajectories according to the shape of tran-sient parts and classifying them into four categories

Based on the metronome tempo the expected du-ration of notes ranges from 025 to 550 seconds (mean086 median 075) while the observed note duration isfrom 001 seconds to 510 seconds (mean 069 median062) We excluded from the results any notes which hada duration shorter than 015 seconds (41) or MAPElarger than one semitone (120)

A The shape of note trajectories

To observe regularities in note trajectories acrossdiffering note durations we compared two methods ofequalising the time-scale of trajectories normalisationand truncation Normalised pitch trajectories are ex-pressed as a function of the fraction of the note that

J Acoust Soc Am 9 September 2019 5

has elapsed (from 0 to 1) while for the truncated trajec-tories the beginning and end of the note are modelledseparately using respectively the first and last 04 sec-onds of the note (77 of notes are over 04 seconds and55 over 055 seconds in duration) For comparing tra-jectories of different score pitches we use the pitch errorthat is the deviation from the target (score) pitch

For the normalisation method the note trajectorieswere re-sampled to 100 sampling points with the MAT-LAB resample function Then any common shape ofvocal notes can be obtained by averaging across notesFigure 2 plots the resulting note trajectory generated bycalculating the mean of all the sampling points For com-parison we also show the absolute pitch error which ismuch larger in magnitude

In Figure 2 we observe transient parts at the begin-ning and end of the note Based on the slope of theMPE curve the initial and final transients each compriseabout 15-20 of the notersquos duration In the following wetake the first 15 and the final 15 of each note as thetransient parts The length of the two transient parts isapproximately the same and the shape is almost sym-metrical consisting of peaks at both ends of the notewith a relatively stable middle portion The mean pitcherror is negative reflecting a tendency to sing flat relativeto the score pitch

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

Pitc

h E

rror

(se

mito

ne)

-02

-01

0

01

02

03

04

05

MPEMAPE

FIG 2 Average pitch trajectory within a tone expressed as

mean pitch error (MPE) across time-normalised notes with

mean absolute pitch error (MAPE) shown for comparison

An alternative way to combine note trajectories ofvarying length is to truncate the time series and onlyconsider the initial and final segments of each note Tak-ing the first (respectively last) 04 seconds of each noteexcluding notes with a duration less than 055 seconds toavoid artefacts due to the transient at the other end of thenote results in the trajectories shown in Figures 3 and4 From these figures we observe that the first 012 andlast 012 seconds of each note have the most pitch vari-ance This corresponds to about 15-20 of the mean noteduration (069 seconds) This result is similar to thatfor normalised trajectories (Figure 2) where the initial

sharp fall and final rise in pitch are not as sharp due tothe normalisation of different length notes The averageresults hide differences in the proportion and direction oftransients which arise due to individual differences scorepitch and vocal part which will be investigated in thefollowing sections

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-02

-015

-01

-005

0

005

01

015

FIG 3 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the initial 04 seconds

of each note

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

FIG 4 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the final 04 seconds of

each note

The appearance of note trajectories is significantlydifferent between singers who have different degrees ofmusical training For the trained singers the note tra-jectories are smoother and the two transient parts havea clear direction For singers with less training theirnote trajectories tend to be uneven and have less com-mon shape in their beginnings and endings

In Figure 3 the first turning point at 002 secondsmay be an artefact of the averaging of different pitch tra-jectory shapes There are several possible factors thatmight influence trajectory shapes such as the pitch ofthe surrounding notes vocal part sex and listening con-dition which we now examine

B Adjacent pitch

In the previous section we observed large pitch fluc-tuations at each end of the note To test whether these

6 J Acoust Soc Am 9 September 2019

fluctuations are influenced by adjacent pitches in thescore we separate the data for each end of a note intotwo situations based on whether the previous (respec-tively next) pitch is lower or higher than the currentpitch Repeated pitches are ignored An analysis of vari-ance (ANOVA) confirms that the pitch error in relativetime is significantly different based on whether the ad-jacent pitch is higher or lower In Figure 5 we observethat singers tend to overshoot the target pitch and thenadjust downward after singing a lower pitch while aftera high pitch they reach the target almost immediatelyJers and Ternstrom (2005 Fig 3-4) observed that singersalso overshoot the interval (undershoot the pitch) beforecorrecting when they transition from a higher pitch toa lower pitch The steady state pitch is 1 cent sharperwhen coming from a lower pitch than when the previ-ous pitch is higher (F(1 38) = 7797p lt 0001) Singersalso prepare for the pitch of the next note at the end ofeach note as evidenced by the significant difference ob-served between ascending and descending following inter-vals (F(1 38) = 798p lt 001 Figure 6) In both casesthere is an increase in pitch followed by a rapid decreaseas the note ends and the vocal cords are relaxed butthe increase in pitch is much more marked in the casethat the succeeding pitch is higher There are some in-dividual differences between singers in this respect butmost exhibit the average behaviour of being influencedby adjacent notes

Relative time (proportion of note elapsed)0 005 01 015 02 025 03 035 04 045 05

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01After a high pitchAfter a low pitch

FIG 5 The effect of singing after a lower or higher pitch

mean pitch error in relative time

C Vocal parts and sex

To explore the factor of vocal part the normalisednote trajectories were plotted for each of the four vocalparts (Figure 7) Firstly we observe about an 8-centpitch difference between each pair of adjacent parts inour data Although the pitch trajectories vary accord-ing to the participants for most participants sopranostend to sing sharp while tenors and basses tend to singflat These pitch differences lead to an expansion of har-monic intervals between vocal parts the opposite of thecompression that is often observed for melodic intervals(Pfordresher and Brown 2007)

Relative time (proportion of note elapsed)05 055 06 065 07 075 08 085 09 095 1

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01A high pitch nextA low pitch next

FIG 6 The effect of singing before a lower or higher pitch

mean pitch error in relative time

This phenomenon is also observed between sexes AnANOVA shows a significant difference between the notebeginnings of male singers and female singers (F(3 76) =5937p lt 001) In general male participants sing 11cents flatter while females sing 4 cents sharper than thescore pitch Male singers tend to begin the note at ahigher pitch and adjust downwards while female singersrsquoinitial trajectories have a convex shape beginning at alower pitch overshooting the target then decreasing to-ward the target All the singers tend to have similarnote ending a slight increase in pitch followed by a rapiddecrease

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

02

SopranoAltoTenorBass

FIG 7 Mean pitch error over note duration for each vocal

part

DModelling the note trajectories

For a better understanding of the tendencies of pitchtrajectories we modelled them as three separate compo-nents initial transient note middle and final transientAs discussed previously the transient parts were definedby the first 15 and last 15 of the duration The ten-dency of each component was approximated by linearregression Figure 8 shows an example of a single pitch

J Acoust Soc Am 9 September 2019 7

trajectory and the linear fits for each of the three com-ponents

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (s

emito

ne)

-025

-02

-015

-01

-005

0

005

01

015

02

025

Observed dataFitting line

FIG 8 Example of the pitch trajectory of a single note and

the fitting lines for the initial middle and final components

of the note

Convex0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Upward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Downward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Concave0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

FIG 9 Mean pitch trajectories of the four trajectory classes

in relative time (proportion of note elapsed)

To describe the different types of trajectories weclassify them into four categories (Concave Convex Up-ward Downward) according to the slopes of their initialand final transients which are either positive or nega-tive Table I shows that the most popular shapes areConvex and Downward both of which have a negativenote release

Table II shows the mean median and standard devi-ation of the slopes of the three note parts Although theaverage trend for the initial transient is a negative slopeless than half of the notes exhibit this behaviour andthere is a large variance in the slope of the initial tran-sient The middle segment has a small positive trendwhile for the final transient most notes have a negativeslope although again this has a large variance Although

the initial and final slopes have high variance Figures 3and 4 show that their starting and ending points are lessspread (smaller standard deviation) than the rest of thetrajectory

E Listening condition

Finally the influence of listening condition on notetrajectory classification was considered The influence oflistening condition on mean pitch is discussed in previouswork (Dai and Dixon 2017 2019a) In this paper theANOVA test on the note trajectories did not show anysignificant difference between listening conditions (solopartial independent partial dependent and open) for ini-tial transient (F(3 49194) = 007p = 079 ) middle sec-tion (F(3 49194) = 162p = 020) and final transient(F(3 49194) = 005p = 083)

VI DISCUSSION

In this study we observed a general stretching of har-monic intervals between vocal parts so that the bass partsang flat and the soprano part sharp relative to the othervocal parts Unlike the piano where stretching of inter-vals is related to the inharmonicity of the partials sungtones are not inharmonic so we are unable to explainthis observation Hagerman and Sundberg (1980) alsoobserved stretched harmonic intervals in an experimentwith barbershop singers giving the explanation that theysound ldquomore active and expressiverdquo

If we compare to the given starting notes the over-all tendency was to sing flat a tendency which increasedover time Pitch drift has been observed in other experi-ments (Devaney and Ellis 2008 Howard 2003 Terasawa2004) and is typically downward in direction althoughupward drift has also been observed

While the averaged note trajectories particularlywhen sorted into categories (Figure 9) show quitesmooth curves the individual pitch trajectories exhibitmuch greater degrees of variation (eg Figure 8 which isnot an extreme example) There is a danger that the fea-tures observed in the average curves might be artefactsof the averaging process and may not occur often if atall in the individual instances For example in Figures2 and 3 we observe a concave shape (a small local min-imum) in the first 5 (respectively 004 seconds) of thenote trajectory If we compare with Figure 7 where thetwo female vocal parts have different initial trajectoriesto the two male vocal parts it is likely that the localminimum arises from averaging the categorically differ-ent shapes of the male and female parts The reason thatthe end of the note trajectory does not exhibit a similarpattern may be due to the greater frequency of Convexand Downward shapes (289 and 368 respectively)which both have a negative final slope across the vocalparts (Table I)

The differences observed in the averaged curves aresmall in magnitude of the same order as the just notice-able difference in pitch (about 5 cents Loeffler (2006))

8 J Acoust Soc Am 9 September 2019

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

Listening conditions

Closed condition

Partial condition

Open condition

singer

singer

singer

singer

X

(2 trials)

(42 trials)

Soprano isolatedcondition

A STBorAlto isolatedcondition

Tenor isolatedcondition

Bass isolatedcondition

S ATBor

T SABor

B SATor

One‐to‐three condition

Three‐to‐one condition

Independent singer

Dependent singer(Test conditions)

FIG 1 Listening and test conditions The arrows indicate

the direction of acoustic feedback

parts in terms of pitch error (Dai 2019 Dai and Dixon2017 2019b) Finally we would like to see whether thelistening condition affects note trajectories That is dothe shapes of vocal notes differ depending on whether theparticipants can hear other vocal parts or not

C Participants

20 adult amateur singers (10 male and 10 female)with choir experience volunteered to take part in thestudy They came from the music society and a capellasociety of the university and a local choir (There wasalso a pilot experiment involving four participants fromour research group this data is not used in this paper)The age range was from 20 to 55 years old (mean 280median 265 stddev 78) Participants were compen-sated pound10 for their participation The participants wereable to sing their parts comfortably and they were giventhe score and sample audio files at least 2 weeks beforethe experiment

Since training is a crucial factor for intonation ac-curacy all the participants were given a questionnairebased on the Goldsmiths Musical Sophistication Index(Mullensiefen et al 2014) to test the effect of trainingThe participants had an average of 33 years of musiclessons and 58 years of singing experience

DMaterials

Two contrasting musical pieces were selected for thisstudy a Bach chorale ldquoOh Thou of God the Fatherrdquo(BWV 1646) and Leo Mathisenrsquos jazz song ldquoTo be or notto berdquo Both pieces were chosen for their wide range ofharmonic intervals (see Section IV B) the first piece has34 unique harmonic intervals between parts and the sec-ond piece has 30 harmonic intervals To control the du-

ration of the experiment we shortened the original scoreby deleting the repeats We also reduced the tempo fromthat specified in the score in order to make the pieceseasier to sing and compensate for the limited time thatthe singers had to learn the pieces The resulting dura-tion of the first piece is 76 seconds and the second songis 100 seconds Links to the score and training materialscan be found in section VIII

The equipment included an SSL MADI-AX con-verter five cardioid microphones (Shure SM57) and fourloudspeakers (Yamaha HS5) All the tracks were con-trolled and recorded by the software Logic Pro 10 Themetronome and the four starting reference pitches werealso given by Logic Pro The total latency of the systemis 49 ms (33 ms due to hardware and 16 ms from thesoftware)

E Procedure

A pilot experiment with singers not involved in thestudy was performed to test the experimental setup andminimise potential problems such as bleed between mi-crophones Then the participants in the study were dis-tributed into 5 groups according to their self-identifiedvoice type time availability and collaborative experience(the singers from the same music society were placedin the same group) Each group contained two femalesingers (soprano and alto) and two male singers (tenorand bass) Each participant had at least two hours prac-tice before the recording sometimes on separate daysThey were informed about the goal of the study to inves-tigate interactive intonation in SATB singing and theywere asked to sing their best in all circumstances

For each trial the singers were played their startingnotes before commencing the trial and a metronome ac-companied the singing to ensure that the same tempowas used by all groups Each piece was sung 10 timesby each group The first and the last trial were recordedin the open condition The partial and closed conditiontrials consisting of 8 test conditions 4 (isolated voice) times2 (direction of feedback) were recorded in between Theorder of isolated conditions was randomly chosen to con-trol for any learning effect For each isolated conditionthe three-to-one condition always preceded the one-to-three condition We use the performance of the isolatedsinger in the one-to-three condition as the data for theclosed condition

The singers were recorded in two acoustically iso-lated rooms For the partial and closed conditions theisolated singers were recorded in a separate room fromthe other three singers Loudspeakers in each room pro-vided acoustic feedback according to the test conditionThere was no visual contact between singers in differentrooms With the exception of warm-up and rehearsalbut including all the trials and the questionnaire the to-tal duration of the experiment for each group was aboutone hour and a half

4 J Acoust Soc Am 9 September 2019

IV DATA ANALYSIS

This section describes the annotation procedure andthe measurement of pitch error and harmonic intervalerror The experimental data comprises 5 (groups) times 4(singers) times 2 (pieces) times 10 (trials) = 400 audio fileseach containing 65 to 116 notes Any missing noteswere excluded from the analysis The software Tony(Mauch et al 2015) was chosen as the annotation toolTony performs pitch detection using the pYIN algo-rithm which outperforms the YIN algorithm (Mauchand Dixon 2014) and then automatically segments pitchtrajectories into note objects and provides a convenientinterface for manual checking and correction of the re-sulting annotations The automatic segmentation basedon note energy and pitch changes provided the note on-set and offset times for our data and rarely needed anycorrection

For each audio file we exported two csv filesone containing the note-level information (for calculat-ing pitch and interval errors) and the other containingthe pitch trajectories It took about 67 hours to manuallycheck and correct the 400 files resulting in 37246 anno-tated pitch values which were stored with metadata onthe singer experimental condition and score The infor-mation in our database includes group number singernumber vocal part listening condition piece numbernote in trial score onset position score duration scorepitch score interval observed onset time observed dura-tion observed pitch pitch error melodic interval errorharmonic interval error anonymised participant detailsnormalised note trajectories real-time note trajectoriesage sex and questionnaire scores MATLAB 2015a wasused for statistics and modelling

A Conversion of fo

The Tony software segments the recording into notesand silences and outputs the median fundamental fre-quency fo for each note as well as the fo value for each58 ms frame The conversion of fundamental frequencyto musical pitch p is calculated as follows

p = 69 + 12 log2

fo

440 (1)

This scale is chosen such that its units are semitones(one semitone is equal to 100 cents) with integer valuesof p coinciding with MIDI pitch numbers and referencepitch A4 (p = 69) tuned to 440 Hz After automaticannotation every single note was checked manually tomake sure the tracking was consistent with the data andcorrected if it was not

B Intonation Metrics

Intonation accuracy is quantified in terms of pitcherror and harmonic interval error as defined below As-suming that a reference pitch has been given pitch errorcan be defined as the difference between observed pitchand score pitch (Mauch et al 2014) This is usually de-

fined on the level of notes but can also be measured foreach sampling point of the pitch trajectory

epi = poi minus ps

i (2)

where poi is the observed pitch in a single frame of note

i (or the median pi over the duration of the note) andpsi is the score pitch of note i

To evaluate the pitch accuracy of a sung part we usemean absolute pitch error (MAPE) as the measurementFor a group of M notes with pitch errors ep1 epM theMAPE is defined as

MAPE =1

M

Msumi=1

|epi | (3)

A musical interval is the difference between twopitches (Prout 2011) which is proportional to the log-arithm of the ratio of the fundamental frequencies ofthe two pitches We distinguish two types of intervalmelodic intervals where the two notes are sounded in suc-cession and harmonic intervals where both notes soundsimultaneously (although they might not start simulta-neously) In this paper we consider only the harmonicinterval error defined as the difference between the ob-served and score intervals

ehiAjB = (piA minus pjB) minus (psiA minus ps

jB) (4)

where psiA and ps

jB are the score pitches of two overlap-ping notes from singers A and B respectively and piA

and pjB are their observed median pitches Harmonicintervals were evaluated for all pairs of notes which over-lap in time If one singer sings two notes while the secondsinger holds one note in the same time period two har-monic intervals are observed Thus indices i and j arenot assumed to be equal

V RESULTS

This section presents observed patterns in the shapesof note trajectories and investigates differences due tovocal part sex adjacent pitch and listening conditionsmodelling the trajectories according to the shape of tran-sient parts and classifying them into four categories

Based on the metronome tempo the expected du-ration of notes ranges from 025 to 550 seconds (mean086 median 075) while the observed note duration isfrom 001 seconds to 510 seconds (mean 069 median062) We excluded from the results any notes which hada duration shorter than 015 seconds (41) or MAPElarger than one semitone (120)

A The shape of note trajectories

To observe regularities in note trajectories acrossdiffering note durations we compared two methods ofequalising the time-scale of trajectories normalisationand truncation Normalised pitch trajectories are ex-pressed as a function of the fraction of the note that

J Acoust Soc Am 9 September 2019 5

has elapsed (from 0 to 1) while for the truncated trajec-tories the beginning and end of the note are modelledseparately using respectively the first and last 04 sec-onds of the note (77 of notes are over 04 seconds and55 over 055 seconds in duration) For comparing tra-jectories of different score pitches we use the pitch errorthat is the deviation from the target (score) pitch

For the normalisation method the note trajectorieswere re-sampled to 100 sampling points with the MAT-LAB resample function Then any common shape ofvocal notes can be obtained by averaging across notesFigure 2 plots the resulting note trajectory generated bycalculating the mean of all the sampling points For com-parison we also show the absolute pitch error which ismuch larger in magnitude

In Figure 2 we observe transient parts at the begin-ning and end of the note Based on the slope of theMPE curve the initial and final transients each compriseabout 15-20 of the notersquos duration In the following wetake the first 15 and the final 15 of each note as thetransient parts The length of the two transient parts isapproximately the same and the shape is almost sym-metrical consisting of peaks at both ends of the notewith a relatively stable middle portion The mean pitcherror is negative reflecting a tendency to sing flat relativeto the score pitch

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

Pitc

h E

rror

(se

mito

ne)

-02

-01

0

01

02

03

04

05

MPEMAPE

FIG 2 Average pitch trajectory within a tone expressed as

mean pitch error (MPE) across time-normalised notes with

mean absolute pitch error (MAPE) shown for comparison

An alternative way to combine note trajectories ofvarying length is to truncate the time series and onlyconsider the initial and final segments of each note Tak-ing the first (respectively last) 04 seconds of each noteexcluding notes with a duration less than 055 seconds toavoid artefacts due to the transient at the other end of thenote results in the trajectories shown in Figures 3 and4 From these figures we observe that the first 012 andlast 012 seconds of each note have the most pitch vari-ance This corresponds to about 15-20 of the mean noteduration (069 seconds) This result is similar to thatfor normalised trajectories (Figure 2) where the initial

sharp fall and final rise in pitch are not as sharp due tothe normalisation of different length notes The averageresults hide differences in the proportion and direction oftransients which arise due to individual differences scorepitch and vocal part which will be investigated in thefollowing sections

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-02

-015

-01

-005

0

005

01

015

FIG 3 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the initial 04 seconds

of each note

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

FIG 4 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the final 04 seconds of

each note

The appearance of note trajectories is significantlydifferent between singers who have different degrees ofmusical training For the trained singers the note tra-jectories are smoother and the two transient parts havea clear direction For singers with less training theirnote trajectories tend to be uneven and have less com-mon shape in their beginnings and endings

In Figure 3 the first turning point at 002 secondsmay be an artefact of the averaging of different pitch tra-jectory shapes There are several possible factors thatmight influence trajectory shapes such as the pitch ofthe surrounding notes vocal part sex and listening con-dition which we now examine

B Adjacent pitch

In the previous section we observed large pitch fluc-tuations at each end of the note To test whether these

6 J Acoust Soc Am 9 September 2019

fluctuations are influenced by adjacent pitches in thescore we separate the data for each end of a note intotwo situations based on whether the previous (respec-tively next) pitch is lower or higher than the currentpitch Repeated pitches are ignored An analysis of vari-ance (ANOVA) confirms that the pitch error in relativetime is significantly different based on whether the ad-jacent pitch is higher or lower In Figure 5 we observethat singers tend to overshoot the target pitch and thenadjust downward after singing a lower pitch while aftera high pitch they reach the target almost immediatelyJers and Ternstrom (2005 Fig 3-4) observed that singersalso overshoot the interval (undershoot the pitch) beforecorrecting when they transition from a higher pitch toa lower pitch The steady state pitch is 1 cent sharperwhen coming from a lower pitch than when the previ-ous pitch is higher (F(1 38) = 7797p lt 0001) Singersalso prepare for the pitch of the next note at the end ofeach note as evidenced by the significant difference ob-served between ascending and descending following inter-vals (F(1 38) = 798p lt 001 Figure 6) In both casesthere is an increase in pitch followed by a rapid decreaseas the note ends and the vocal cords are relaxed butthe increase in pitch is much more marked in the casethat the succeeding pitch is higher There are some in-dividual differences between singers in this respect butmost exhibit the average behaviour of being influencedby adjacent notes

Relative time (proportion of note elapsed)0 005 01 015 02 025 03 035 04 045 05

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01After a high pitchAfter a low pitch

FIG 5 The effect of singing after a lower or higher pitch

mean pitch error in relative time

C Vocal parts and sex

To explore the factor of vocal part the normalisednote trajectories were plotted for each of the four vocalparts (Figure 7) Firstly we observe about an 8-centpitch difference between each pair of adjacent parts inour data Although the pitch trajectories vary accord-ing to the participants for most participants sopranostend to sing sharp while tenors and basses tend to singflat These pitch differences lead to an expansion of har-monic intervals between vocal parts the opposite of thecompression that is often observed for melodic intervals(Pfordresher and Brown 2007)

Relative time (proportion of note elapsed)05 055 06 065 07 075 08 085 09 095 1

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01A high pitch nextA low pitch next

FIG 6 The effect of singing before a lower or higher pitch

mean pitch error in relative time

This phenomenon is also observed between sexes AnANOVA shows a significant difference between the notebeginnings of male singers and female singers (F(3 76) =5937p lt 001) In general male participants sing 11cents flatter while females sing 4 cents sharper than thescore pitch Male singers tend to begin the note at ahigher pitch and adjust downwards while female singersrsquoinitial trajectories have a convex shape beginning at alower pitch overshooting the target then decreasing to-ward the target All the singers tend to have similarnote ending a slight increase in pitch followed by a rapiddecrease

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

02

SopranoAltoTenorBass

FIG 7 Mean pitch error over note duration for each vocal

part

DModelling the note trajectories

For a better understanding of the tendencies of pitchtrajectories we modelled them as three separate compo-nents initial transient note middle and final transientAs discussed previously the transient parts were definedby the first 15 and last 15 of the duration The ten-dency of each component was approximated by linearregression Figure 8 shows an example of a single pitch

J Acoust Soc Am 9 September 2019 7

trajectory and the linear fits for each of the three com-ponents

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (s

emito

ne)

-025

-02

-015

-01

-005

0

005

01

015

02

025

Observed dataFitting line

FIG 8 Example of the pitch trajectory of a single note and

the fitting lines for the initial middle and final components

of the note

Convex0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Upward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Downward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Concave0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

FIG 9 Mean pitch trajectories of the four trajectory classes

in relative time (proportion of note elapsed)

To describe the different types of trajectories weclassify them into four categories (Concave Convex Up-ward Downward) according to the slopes of their initialand final transients which are either positive or nega-tive Table I shows that the most popular shapes areConvex and Downward both of which have a negativenote release

Table II shows the mean median and standard devi-ation of the slopes of the three note parts Although theaverage trend for the initial transient is a negative slopeless than half of the notes exhibit this behaviour andthere is a large variance in the slope of the initial tran-sient The middle segment has a small positive trendwhile for the final transient most notes have a negativeslope although again this has a large variance Although

the initial and final slopes have high variance Figures 3and 4 show that their starting and ending points are lessspread (smaller standard deviation) than the rest of thetrajectory

E Listening condition

Finally the influence of listening condition on notetrajectory classification was considered The influence oflistening condition on mean pitch is discussed in previouswork (Dai and Dixon 2017 2019a) In this paper theANOVA test on the note trajectories did not show anysignificant difference between listening conditions (solopartial independent partial dependent and open) for ini-tial transient (F(3 49194) = 007p = 079 ) middle sec-tion (F(3 49194) = 162p = 020) and final transient(F(3 49194) = 005p = 083)

VI DISCUSSION

In this study we observed a general stretching of har-monic intervals between vocal parts so that the bass partsang flat and the soprano part sharp relative to the othervocal parts Unlike the piano where stretching of inter-vals is related to the inharmonicity of the partials sungtones are not inharmonic so we are unable to explainthis observation Hagerman and Sundberg (1980) alsoobserved stretched harmonic intervals in an experimentwith barbershop singers giving the explanation that theysound ldquomore active and expressiverdquo

If we compare to the given starting notes the over-all tendency was to sing flat a tendency which increasedover time Pitch drift has been observed in other experi-ments (Devaney and Ellis 2008 Howard 2003 Terasawa2004) and is typically downward in direction althoughupward drift has also been observed

While the averaged note trajectories particularlywhen sorted into categories (Figure 9) show quitesmooth curves the individual pitch trajectories exhibitmuch greater degrees of variation (eg Figure 8 which isnot an extreme example) There is a danger that the fea-tures observed in the average curves might be artefactsof the averaging process and may not occur often if atall in the individual instances For example in Figures2 and 3 we observe a concave shape (a small local min-imum) in the first 5 (respectively 004 seconds) of thenote trajectory If we compare with Figure 7 where thetwo female vocal parts have different initial trajectoriesto the two male vocal parts it is likely that the localminimum arises from averaging the categorically differ-ent shapes of the male and female parts The reason thatthe end of the note trajectory does not exhibit a similarpattern may be due to the greater frequency of Convexand Downward shapes (289 and 368 respectively)which both have a negative final slope across the vocalparts (Table I)

The differences observed in the averaged curves aresmall in magnitude of the same order as the just notice-able difference in pitch (about 5 cents Loeffler (2006))

8 J Acoust Soc Am 9 September 2019

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

IV DATA ANALYSIS

This section describes the annotation procedure andthe measurement of pitch error and harmonic intervalerror The experimental data comprises 5 (groups) times 4(singers) times 2 (pieces) times 10 (trials) = 400 audio fileseach containing 65 to 116 notes Any missing noteswere excluded from the analysis The software Tony(Mauch et al 2015) was chosen as the annotation toolTony performs pitch detection using the pYIN algo-rithm which outperforms the YIN algorithm (Mauchand Dixon 2014) and then automatically segments pitchtrajectories into note objects and provides a convenientinterface for manual checking and correction of the re-sulting annotations The automatic segmentation basedon note energy and pitch changes provided the note on-set and offset times for our data and rarely needed anycorrection

For each audio file we exported two csv filesone containing the note-level information (for calculat-ing pitch and interval errors) and the other containingthe pitch trajectories It took about 67 hours to manuallycheck and correct the 400 files resulting in 37246 anno-tated pitch values which were stored with metadata onthe singer experimental condition and score The infor-mation in our database includes group number singernumber vocal part listening condition piece numbernote in trial score onset position score duration scorepitch score interval observed onset time observed dura-tion observed pitch pitch error melodic interval errorharmonic interval error anonymised participant detailsnormalised note trajectories real-time note trajectoriesage sex and questionnaire scores MATLAB 2015a wasused for statistics and modelling

A Conversion of fo

The Tony software segments the recording into notesand silences and outputs the median fundamental fre-quency fo for each note as well as the fo value for each58 ms frame The conversion of fundamental frequencyto musical pitch p is calculated as follows

p = 69 + 12 log2

fo

440 (1)

This scale is chosen such that its units are semitones(one semitone is equal to 100 cents) with integer valuesof p coinciding with MIDI pitch numbers and referencepitch A4 (p = 69) tuned to 440 Hz After automaticannotation every single note was checked manually tomake sure the tracking was consistent with the data andcorrected if it was not

B Intonation Metrics

Intonation accuracy is quantified in terms of pitcherror and harmonic interval error as defined below As-suming that a reference pitch has been given pitch errorcan be defined as the difference between observed pitchand score pitch (Mauch et al 2014) This is usually de-

fined on the level of notes but can also be measured foreach sampling point of the pitch trajectory

epi = poi minus ps

i (2)

where poi is the observed pitch in a single frame of note

i (or the median pi over the duration of the note) andpsi is the score pitch of note i

To evaluate the pitch accuracy of a sung part we usemean absolute pitch error (MAPE) as the measurementFor a group of M notes with pitch errors ep1 epM theMAPE is defined as

MAPE =1

M

Msumi=1

|epi | (3)

A musical interval is the difference between twopitches (Prout 2011) which is proportional to the log-arithm of the ratio of the fundamental frequencies ofthe two pitches We distinguish two types of intervalmelodic intervals where the two notes are sounded in suc-cession and harmonic intervals where both notes soundsimultaneously (although they might not start simulta-neously) In this paper we consider only the harmonicinterval error defined as the difference between the ob-served and score intervals

ehiAjB = (piA minus pjB) minus (psiA minus ps

jB) (4)

where psiA and ps

jB are the score pitches of two overlap-ping notes from singers A and B respectively and piA

and pjB are their observed median pitches Harmonicintervals were evaluated for all pairs of notes which over-lap in time If one singer sings two notes while the secondsinger holds one note in the same time period two har-monic intervals are observed Thus indices i and j arenot assumed to be equal

V RESULTS

This section presents observed patterns in the shapesof note trajectories and investigates differences due tovocal part sex adjacent pitch and listening conditionsmodelling the trajectories according to the shape of tran-sient parts and classifying them into four categories

Based on the metronome tempo the expected du-ration of notes ranges from 025 to 550 seconds (mean086 median 075) while the observed note duration isfrom 001 seconds to 510 seconds (mean 069 median062) We excluded from the results any notes which hada duration shorter than 015 seconds (41) or MAPElarger than one semitone (120)

A The shape of note trajectories

To observe regularities in note trajectories acrossdiffering note durations we compared two methods ofequalising the time-scale of trajectories normalisationand truncation Normalised pitch trajectories are ex-pressed as a function of the fraction of the note that

J Acoust Soc Am 9 September 2019 5

has elapsed (from 0 to 1) while for the truncated trajec-tories the beginning and end of the note are modelledseparately using respectively the first and last 04 sec-onds of the note (77 of notes are over 04 seconds and55 over 055 seconds in duration) For comparing tra-jectories of different score pitches we use the pitch errorthat is the deviation from the target (score) pitch

For the normalisation method the note trajectorieswere re-sampled to 100 sampling points with the MAT-LAB resample function Then any common shape ofvocal notes can be obtained by averaging across notesFigure 2 plots the resulting note trajectory generated bycalculating the mean of all the sampling points For com-parison we also show the absolute pitch error which ismuch larger in magnitude

In Figure 2 we observe transient parts at the begin-ning and end of the note Based on the slope of theMPE curve the initial and final transients each compriseabout 15-20 of the notersquos duration In the following wetake the first 15 and the final 15 of each note as thetransient parts The length of the two transient parts isapproximately the same and the shape is almost sym-metrical consisting of peaks at both ends of the notewith a relatively stable middle portion The mean pitcherror is negative reflecting a tendency to sing flat relativeto the score pitch

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

Pitc

h E

rror

(se

mito

ne)

-02

-01

0

01

02

03

04

05

MPEMAPE

FIG 2 Average pitch trajectory within a tone expressed as

mean pitch error (MPE) across time-normalised notes with

mean absolute pitch error (MAPE) shown for comparison

An alternative way to combine note trajectories ofvarying length is to truncate the time series and onlyconsider the initial and final segments of each note Tak-ing the first (respectively last) 04 seconds of each noteexcluding notes with a duration less than 055 seconds toavoid artefacts due to the transient at the other end of thenote results in the trajectories shown in Figures 3 and4 From these figures we observe that the first 012 andlast 012 seconds of each note have the most pitch vari-ance This corresponds to about 15-20 of the mean noteduration (069 seconds) This result is similar to thatfor normalised trajectories (Figure 2) where the initial

sharp fall and final rise in pitch are not as sharp due tothe normalisation of different length notes The averageresults hide differences in the proportion and direction oftransients which arise due to individual differences scorepitch and vocal part which will be investigated in thefollowing sections

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-02

-015

-01

-005

0

005

01

015

FIG 3 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the initial 04 seconds

of each note

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

FIG 4 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the final 04 seconds of

each note

The appearance of note trajectories is significantlydifferent between singers who have different degrees ofmusical training For the trained singers the note tra-jectories are smoother and the two transient parts havea clear direction For singers with less training theirnote trajectories tend to be uneven and have less com-mon shape in their beginnings and endings

In Figure 3 the first turning point at 002 secondsmay be an artefact of the averaging of different pitch tra-jectory shapes There are several possible factors thatmight influence trajectory shapes such as the pitch ofthe surrounding notes vocal part sex and listening con-dition which we now examine

B Adjacent pitch

In the previous section we observed large pitch fluc-tuations at each end of the note To test whether these

6 J Acoust Soc Am 9 September 2019

fluctuations are influenced by adjacent pitches in thescore we separate the data for each end of a note intotwo situations based on whether the previous (respec-tively next) pitch is lower or higher than the currentpitch Repeated pitches are ignored An analysis of vari-ance (ANOVA) confirms that the pitch error in relativetime is significantly different based on whether the ad-jacent pitch is higher or lower In Figure 5 we observethat singers tend to overshoot the target pitch and thenadjust downward after singing a lower pitch while aftera high pitch they reach the target almost immediatelyJers and Ternstrom (2005 Fig 3-4) observed that singersalso overshoot the interval (undershoot the pitch) beforecorrecting when they transition from a higher pitch toa lower pitch The steady state pitch is 1 cent sharperwhen coming from a lower pitch than when the previ-ous pitch is higher (F(1 38) = 7797p lt 0001) Singersalso prepare for the pitch of the next note at the end ofeach note as evidenced by the significant difference ob-served between ascending and descending following inter-vals (F(1 38) = 798p lt 001 Figure 6) In both casesthere is an increase in pitch followed by a rapid decreaseas the note ends and the vocal cords are relaxed butthe increase in pitch is much more marked in the casethat the succeeding pitch is higher There are some in-dividual differences between singers in this respect butmost exhibit the average behaviour of being influencedby adjacent notes

Relative time (proportion of note elapsed)0 005 01 015 02 025 03 035 04 045 05

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01After a high pitchAfter a low pitch

FIG 5 The effect of singing after a lower or higher pitch

mean pitch error in relative time

C Vocal parts and sex

To explore the factor of vocal part the normalisednote trajectories were plotted for each of the four vocalparts (Figure 7) Firstly we observe about an 8-centpitch difference between each pair of adjacent parts inour data Although the pitch trajectories vary accord-ing to the participants for most participants sopranostend to sing sharp while tenors and basses tend to singflat These pitch differences lead to an expansion of har-monic intervals between vocal parts the opposite of thecompression that is often observed for melodic intervals(Pfordresher and Brown 2007)

Relative time (proportion of note elapsed)05 055 06 065 07 075 08 085 09 095 1

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01A high pitch nextA low pitch next

FIG 6 The effect of singing before a lower or higher pitch

mean pitch error in relative time

This phenomenon is also observed between sexes AnANOVA shows a significant difference between the notebeginnings of male singers and female singers (F(3 76) =5937p lt 001) In general male participants sing 11cents flatter while females sing 4 cents sharper than thescore pitch Male singers tend to begin the note at ahigher pitch and adjust downwards while female singersrsquoinitial trajectories have a convex shape beginning at alower pitch overshooting the target then decreasing to-ward the target All the singers tend to have similarnote ending a slight increase in pitch followed by a rapiddecrease

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

02

SopranoAltoTenorBass

FIG 7 Mean pitch error over note duration for each vocal

part

DModelling the note trajectories

For a better understanding of the tendencies of pitchtrajectories we modelled them as three separate compo-nents initial transient note middle and final transientAs discussed previously the transient parts were definedby the first 15 and last 15 of the duration The ten-dency of each component was approximated by linearregression Figure 8 shows an example of a single pitch

J Acoust Soc Am 9 September 2019 7

trajectory and the linear fits for each of the three com-ponents

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (s

emito

ne)

-025

-02

-015

-01

-005

0

005

01

015

02

025

Observed dataFitting line

FIG 8 Example of the pitch trajectory of a single note and

the fitting lines for the initial middle and final components

of the note

Convex0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Upward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Downward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Concave0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

FIG 9 Mean pitch trajectories of the four trajectory classes

in relative time (proportion of note elapsed)

To describe the different types of trajectories weclassify them into four categories (Concave Convex Up-ward Downward) according to the slopes of their initialand final transients which are either positive or nega-tive Table I shows that the most popular shapes areConvex and Downward both of which have a negativenote release

Table II shows the mean median and standard devi-ation of the slopes of the three note parts Although theaverage trend for the initial transient is a negative slopeless than half of the notes exhibit this behaviour andthere is a large variance in the slope of the initial tran-sient The middle segment has a small positive trendwhile for the final transient most notes have a negativeslope although again this has a large variance Although

the initial and final slopes have high variance Figures 3and 4 show that their starting and ending points are lessspread (smaller standard deviation) than the rest of thetrajectory

E Listening condition

Finally the influence of listening condition on notetrajectory classification was considered The influence oflistening condition on mean pitch is discussed in previouswork (Dai and Dixon 2017 2019a) In this paper theANOVA test on the note trajectories did not show anysignificant difference between listening conditions (solopartial independent partial dependent and open) for ini-tial transient (F(3 49194) = 007p = 079 ) middle sec-tion (F(3 49194) = 162p = 020) and final transient(F(3 49194) = 005p = 083)

VI DISCUSSION

In this study we observed a general stretching of har-monic intervals between vocal parts so that the bass partsang flat and the soprano part sharp relative to the othervocal parts Unlike the piano where stretching of inter-vals is related to the inharmonicity of the partials sungtones are not inharmonic so we are unable to explainthis observation Hagerman and Sundberg (1980) alsoobserved stretched harmonic intervals in an experimentwith barbershop singers giving the explanation that theysound ldquomore active and expressiverdquo

If we compare to the given starting notes the over-all tendency was to sing flat a tendency which increasedover time Pitch drift has been observed in other experi-ments (Devaney and Ellis 2008 Howard 2003 Terasawa2004) and is typically downward in direction althoughupward drift has also been observed

While the averaged note trajectories particularlywhen sorted into categories (Figure 9) show quitesmooth curves the individual pitch trajectories exhibitmuch greater degrees of variation (eg Figure 8 which isnot an extreme example) There is a danger that the fea-tures observed in the average curves might be artefactsof the averaging process and may not occur often if atall in the individual instances For example in Figures2 and 3 we observe a concave shape (a small local min-imum) in the first 5 (respectively 004 seconds) of thenote trajectory If we compare with Figure 7 where thetwo female vocal parts have different initial trajectoriesto the two male vocal parts it is likely that the localminimum arises from averaging the categorically differ-ent shapes of the male and female parts The reason thatthe end of the note trajectory does not exhibit a similarpattern may be due to the greater frequency of Convexand Downward shapes (289 and 368 respectively)which both have a negative final slope across the vocalparts (Table I)

The differences observed in the averaged curves aresmall in magnitude of the same order as the just notice-able difference in pitch (about 5 cents Loeffler (2006))

8 J Acoust Soc Am 9 September 2019

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

has elapsed (from 0 to 1) while for the truncated trajec-tories the beginning and end of the note are modelledseparately using respectively the first and last 04 sec-onds of the note (77 of notes are over 04 seconds and55 over 055 seconds in duration) For comparing tra-jectories of different score pitches we use the pitch errorthat is the deviation from the target (score) pitch

For the normalisation method the note trajectorieswere re-sampled to 100 sampling points with the MAT-LAB resample function Then any common shape ofvocal notes can be obtained by averaging across notesFigure 2 plots the resulting note trajectory generated bycalculating the mean of all the sampling points For com-parison we also show the absolute pitch error which ismuch larger in magnitude

In Figure 2 we observe transient parts at the begin-ning and end of the note Based on the slope of theMPE curve the initial and final transients each compriseabout 15-20 of the notersquos duration In the following wetake the first 15 and the final 15 of each note as thetransient parts The length of the two transient parts isapproximately the same and the shape is almost sym-metrical consisting of peaks at both ends of the notewith a relatively stable middle portion The mean pitcherror is negative reflecting a tendency to sing flat relativeto the score pitch

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

Pitc

h E

rror

(se

mito

ne)

-02

-01

0

01

02

03

04

05

MPEMAPE

FIG 2 Average pitch trajectory within a tone expressed as

mean pitch error (MPE) across time-normalised notes with

mean absolute pitch error (MAPE) shown for comparison

An alternative way to combine note trajectories ofvarying length is to truncate the time series and onlyconsider the initial and final segments of each note Tak-ing the first (respectively last) 04 seconds of each noteexcluding notes with a duration less than 055 seconds toavoid artefacts due to the transient at the other end of thenote results in the trajectories shown in Figures 3 and4 From these figures we observe that the first 012 andlast 012 seconds of each note have the most pitch vari-ance This corresponds to about 15-20 of the mean noteduration (069 seconds) This result is similar to thatfor normalised trajectories (Figure 2) where the initial

sharp fall and final rise in pitch are not as sharp due tothe normalisation of different length notes The averageresults hide differences in the proportion and direction oftransients which arise due to individual differences scorepitch and vocal part which will be investigated in thefollowing sections

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-02

-015

-01

-005

0

005

01

015

FIG 3 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the initial 04 seconds

of each note

Time (seconds)0 005 01 015 02 025 03 035 04

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

FIG 4 Mean pitch error (crosses) and range of one standard

deviation from the mean (shaded) for the final 04 seconds of

each note

The appearance of note trajectories is significantlydifferent between singers who have different degrees ofmusical training For the trained singers the note tra-jectories are smoother and the two transient parts havea clear direction For singers with less training theirnote trajectories tend to be uneven and have less com-mon shape in their beginnings and endings

In Figure 3 the first turning point at 002 secondsmay be an artefact of the averaging of different pitch tra-jectory shapes There are several possible factors thatmight influence trajectory shapes such as the pitch ofthe surrounding notes vocal part sex and listening con-dition which we now examine

B Adjacent pitch

In the previous section we observed large pitch fluc-tuations at each end of the note To test whether these

6 J Acoust Soc Am 9 September 2019

fluctuations are influenced by adjacent pitches in thescore we separate the data for each end of a note intotwo situations based on whether the previous (respec-tively next) pitch is lower or higher than the currentpitch Repeated pitches are ignored An analysis of vari-ance (ANOVA) confirms that the pitch error in relativetime is significantly different based on whether the ad-jacent pitch is higher or lower In Figure 5 we observethat singers tend to overshoot the target pitch and thenadjust downward after singing a lower pitch while aftera high pitch they reach the target almost immediatelyJers and Ternstrom (2005 Fig 3-4) observed that singersalso overshoot the interval (undershoot the pitch) beforecorrecting when they transition from a higher pitch toa lower pitch The steady state pitch is 1 cent sharperwhen coming from a lower pitch than when the previ-ous pitch is higher (F(1 38) = 7797p lt 0001) Singersalso prepare for the pitch of the next note at the end ofeach note as evidenced by the significant difference ob-served between ascending and descending following inter-vals (F(1 38) = 798p lt 001 Figure 6) In both casesthere is an increase in pitch followed by a rapid decreaseas the note ends and the vocal cords are relaxed butthe increase in pitch is much more marked in the casethat the succeeding pitch is higher There are some in-dividual differences between singers in this respect butmost exhibit the average behaviour of being influencedby adjacent notes

Relative time (proportion of note elapsed)0 005 01 015 02 025 03 035 04 045 05

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01After a high pitchAfter a low pitch

FIG 5 The effect of singing after a lower or higher pitch

mean pitch error in relative time

C Vocal parts and sex

To explore the factor of vocal part the normalisednote trajectories were plotted for each of the four vocalparts (Figure 7) Firstly we observe about an 8-centpitch difference between each pair of adjacent parts inour data Although the pitch trajectories vary accord-ing to the participants for most participants sopranostend to sing sharp while tenors and basses tend to singflat These pitch differences lead to an expansion of har-monic intervals between vocal parts the opposite of thecompression that is often observed for melodic intervals(Pfordresher and Brown 2007)

Relative time (proportion of note elapsed)05 055 06 065 07 075 08 085 09 095 1

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01A high pitch nextA low pitch next

FIG 6 The effect of singing before a lower or higher pitch

mean pitch error in relative time

This phenomenon is also observed between sexes AnANOVA shows a significant difference between the notebeginnings of male singers and female singers (F(3 76) =5937p lt 001) In general male participants sing 11cents flatter while females sing 4 cents sharper than thescore pitch Male singers tend to begin the note at ahigher pitch and adjust downwards while female singersrsquoinitial trajectories have a convex shape beginning at alower pitch overshooting the target then decreasing to-ward the target All the singers tend to have similarnote ending a slight increase in pitch followed by a rapiddecrease

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

02

SopranoAltoTenorBass

FIG 7 Mean pitch error over note duration for each vocal

part

DModelling the note trajectories

For a better understanding of the tendencies of pitchtrajectories we modelled them as three separate compo-nents initial transient note middle and final transientAs discussed previously the transient parts were definedby the first 15 and last 15 of the duration The ten-dency of each component was approximated by linearregression Figure 8 shows an example of a single pitch

J Acoust Soc Am 9 September 2019 7

trajectory and the linear fits for each of the three com-ponents

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (s

emito

ne)

-025

-02

-015

-01

-005

0

005

01

015

02

025

Observed dataFitting line

FIG 8 Example of the pitch trajectory of a single note and

the fitting lines for the initial middle and final components

of the note

Convex0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Upward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Downward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Concave0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

FIG 9 Mean pitch trajectories of the four trajectory classes

in relative time (proportion of note elapsed)

To describe the different types of trajectories weclassify them into four categories (Concave Convex Up-ward Downward) according to the slopes of their initialand final transients which are either positive or nega-tive Table I shows that the most popular shapes areConvex and Downward both of which have a negativenote release

Table II shows the mean median and standard devi-ation of the slopes of the three note parts Although theaverage trend for the initial transient is a negative slopeless than half of the notes exhibit this behaviour andthere is a large variance in the slope of the initial tran-sient The middle segment has a small positive trendwhile for the final transient most notes have a negativeslope although again this has a large variance Although

the initial and final slopes have high variance Figures 3and 4 show that their starting and ending points are lessspread (smaller standard deviation) than the rest of thetrajectory

E Listening condition

Finally the influence of listening condition on notetrajectory classification was considered The influence oflistening condition on mean pitch is discussed in previouswork (Dai and Dixon 2017 2019a) In this paper theANOVA test on the note trajectories did not show anysignificant difference between listening conditions (solopartial independent partial dependent and open) for ini-tial transient (F(3 49194) = 007p = 079 ) middle sec-tion (F(3 49194) = 162p = 020) and final transient(F(3 49194) = 005p = 083)

VI DISCUSSION

In this study we observed a general stretching of har-monic intervals between vocal parts so that the bass partsang flat and the soprano part sharp relative to the othervocal parts Unlike the piano where stretching of inter-vals is related to the inharmonicity of the partials sungtones are not inharmonic so we are unable to explainthis observation Hagerman and Sundberg (1980) alsoobserved stretched harmonic intervals in an experimentwith barbershop singers giving the explanation that theysound ldquomore active and expressiverdquo

If we compare to the given starting notes the over-all tendency was to sing flat a tendency which increasedover time Pitch drift has been observed in other experi-ments (Devaney and Ellis 2008 Howard 2003 Terasawa2004) and is typically downward in direction althoughupward drift has also been observed

While the averaged note trajectories particularlywhen sorted into categories (Figure 9) show quitesmooth curves the individual pitch trajectories exhibitmuch greater degrees of variation (eg Figure 8 which isnot an extreme example) There is a danger that the fea-tures observed in the average curves might be artefactsof the averaging process and may not occur often if atall in the individual instances For example in Figures2 and 3 we observe a concave shape (a small local min-imum) in the first 5 (respectively 004 seconds) of thenote trajectory If we compare with Figure 7 where thetwo female vocal parts have different initial trajectoriesto the two male vocal parts it is likely that the localminimum arises from averaging the categorically differ-ent shapes of the male and female parts The reason thatthe end of the note trajectory does not exhibit a similarpattern may be due to the greater frequency of Convexand Downward shapes (289 and 368 respectively)which both have a negative final slope across the vocalparts (Table I)

The differences observed in the averaged curves aresmall in magnitude of the same order as the just notice-able difference in pitch (about 5 cents Loeffler (2006))

8 J Acoust Soc Am 9 September 2019

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

fluctuations are influenced by adjacent pitches in thescore we separate the data for each end of a note intotwo situations based on whether the previous (respec-tively next) pitch is lower or higher than the currentpitch Repeated pitches are ignored An analysis of vari-ance (ANOVA) confirms that the pitch error in relativetime is significantly different based on whether the ad-jacent pitch is higher or lower In Figure 5 we observethat singers tend to overshoot the target pitch and thenadjust downward after singing a lower pitch while aftera high pitch they reach the target almost immediatelyJers and Ternstrom (2005 Fig 3-4) observed that singersalso overshoot the interval (undershoot the pitch) beforecorrecting when they transition from a higher pitch toa lower pitch The steady state pitch is 1 cent sharperwhen coming from a lower pitch than when the previ-ous pitch is higher (F(1 38) = 7797p lt 0001) Singersalso prepare for the pitch of the next note at the end ofeach note as evidenced by the significant difference ob-served between ascending and descending following inter-vals (F(1 38) = 798p lt 001 Figure 6) In both casesthere is an increase in pitch followed by a rapid decreaseas the note ends and the vocal cords are relaxed butthe increase in pitch is much more marked in the casethat the succeeding pitch is higher There are some in-dividual differences between singers in this respect butmost exhibit the average behaviour of being influencedby adjacent notes

Relative time (proportion of note elapsed)0 005 01 015 02 025 03 035 04 045 05

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01After a high pitchAfter a low pitch

FIG 5 The effect of singing after a lower or higher pitch

mean pitch error in relative time

C Vocal parts and sex

To explore the factor of vocal part the normalisednote trajectories were plotted for each of the four vocalparts (Figure 7) Firstly we observe about an 8-centpitch difference between each pair of adjacent parts inour data Although the pitch trajectories vary accord-ing to the participants for most participants sopranostend to sing sharp while tenors and basses tend to singflat These pitch differences lead to an expansion of har-monic intervals between vocal parts the opposite of thecompression that is often observed for melodic intervals(Pfordresher and Brown 2007)

Relative time (proportion of note elapsed)05 055 06 065 07 075 08 085 09 095 1

MP

E (

sem

itone

)

-01

-008

-006

-004

-002

0

002

004

006

008

01A high pitch nextA low pitch next

FIG 6 The effect of singing before a lower or higher pitch

mean pitch error in relative time

This phenomenon is also observed between sexes AnANOVA shows a significant difference between the notebeginnings of male singers and female singers (F(3 76) =5937p lt 001) In general male participants sing 11cents flatter while females sing 4 cents sharper than thescore pitch Male singers tend to begin the note at ahigher pitch and adjust downwards while female singersrsquoinitial trajectories have a convex shape beginning at alower pitch overshooting the target then decreasing to-ward the target All the singers tend to have similarnote ending a slight increase in pitch followed by a rapiddecrease

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (

sem

itone

)

-03

-025

-02

-015

-01

-005

0

005

01

015

02

SopranoAltoTenorBass

FIG 7 Mean pitch error over note duration for each vocal

part

DModelling the note trajectories

For a better understanding of the tendencies of pitchtrajectories we modelled them as three separate compo-nents initial transient note middle and final transientAs discussed previously the transient parts were definedby the first 15 and last 15 of the duration The ten-dency of each component was approximated by linearregression Figure 8 shows an example of a single pitch

J Acoust Soc Am 9 September 2019 7

trajectory and the linear fits for each of the three com-ponents

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (s

emito

ne)

-025

-02

-015

-01

-005

0

005

01

015

02

025

Observed dataFitting line

FIG 8 Example of the pitch trajectory of a single note and

the fitting lines for the initial middle and final components

of the note

Convex0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Upward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Downward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Concave0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

FIG 9 Mean pitch trajectories of the four trajectory classes

in relative time (proportion of note elapsed)

To describe the different types of trajectories weclassify them into four categories (Concave Convex Up-ward Downward) according to the slopes of their initialand final transients which are either positive or nega-tive Table I shows that the most popular shapes areConvex and Downward both of which have a negativenote release

Table II shows the mean median and standard devi-ation of the slopes of the three note parts Although theaverage trend for the initial transient is a negative slopeless than half of the notes exhibit this behaviour andthere is a large variance in the slope of the initial tran-sient The middle segment has a small positive trendwhile for the final transient most notes have a negativeslope although again this has a large variance Although

the initial and final slopes have high variance Figures 3and 4 show that their starting and ending points are lessspread (smaller standard deviation) than the rest of thetrajectory

E Listening condition

Finally the influence of listening condition on notetrajectory classification was considered The influence oflistening condition on mean pitch is discussed in previouswork (Dai and Dixon 2017 2019a) In this paper theANOVA test on the note trajectories did not show anysignificant difference between listening conditions (solopartial independent partial dependent and open) for ini-tial transient (F(3 49194) = 007p = 079 ) middle sec-tion (F(3 49194) = 162p = 020) and final transient(F(3 49194) = 005p = 083)

VI DISCUSSION

In this study we observed a general stretching of har-monic intervals between vocal parts so that the bass partsang flat and the soprano part sharp relative to the othervocal parts Unlike the piano where stretching of inter-vals is related to the inharmonicity of the partials sungtones are not inharmonic so we are unable to explainthis observation Hagerman and Sundberg (1980) alsoobserved stretched harmonic intervals in an experimentwith barbershop singers giving the explanation that theysound ldquomore active and expressiverdquo

If we compare to the given starting notes the over-all tendency was to sing flat a tendency which increasedover time Pitch drift has been observed in other experi-ments (Devaney and Ellis 2008 Howard 2003 Terasawa2004) and is typically downward in direction althoughupward drift has also been observed

While the averaged note trajectories particularlywhen sorted into categories (Figure 9) show quitesmooth curves the individual pitch trajectories exhibitmuch greater degrees of variation (eg Figure 8 which isnot an extreme example) There is a danger that the fea-tures observed in the average curves might be artefactsof the averaging process and may not occur often if atall in the individual instances For example in Figures2 and 3 we observe a concave shape (a small local min-imum) in the first 5 (respectively 004 seconds) of thenote trajectory If we compare with Figure 7 where thetwo female vocal parts have different initial trajectoriesto the two male vocal parts it is likely that the localminimum arises from averaging the categorically differ-ent shapes of the male and female parts The reason thatthe end of the note trajectory does not exhibit a similarpattern may be due to the greater frequency of Convexand Downward shapes (289 and 368 respectively)which both have a negative final slope across the vocalparts (Table I)

The differences observed in the averaged curves aresmall in magnitude of the same order as the just notice-able difference in pitch (about 5 cents Loeffler (2006))

8 J Acoust Soc Am 9 September 2019

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

trajectory and the linear fits for each of the three com-ponents

Relative time (proportion of note elapsed)0 01 02 03 04 05 06 07 08 09 1

MP

E (s

emito

ne)

-025

-02

-015

-01

-005

0

005

01

015

02

025

Observed dataFitting line

FIG 8 Example of the pitch trajectory of a single note and

the fitting lines for the initial middle and final components

of the note

Convex0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Upward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Downward0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

Concave0 02 04 06 08 1

MP

E (

sem

itone

)

-03

-02

-01

0

01

02

FIG 9 Mean pitch trajectories of the four trajectory classes

in relative time (proportion of note elapsed)

To describe the different types of trajectories weclassify them into four categories (Concave Convex Up-ward Downward) according to the slopes of their initialand final transients which are either positive or nega-tive Table I shows that the most popular shapes areConvex and Downward both of which have a negativenote release

Table II shows the mean median and standard devi-ation of the slopes of the three note parts Although theaverage trend for the initial transient is a negative slopeless than half of the notes exhibit this behaviour andthere is a large variance in the slope of the initial tran-sient The middle segment has a small positive trendwhile for the final transient most notes have a negativeslope although again this has a large variance Although

the initial and final slopes have high variance Figures 3and 4 show that their starting and ending points are lessspread (smaller standard deviation) than the rest of thetrajectory

E Listening condition

Finally the influence of listening condition on notetrajectory classification was considered The influence oflistening condition on mean pitch is discussed in previouswork (Dai and Dixon 2017 2019a) In this paper theANOVA test on the note trajectories did not show anysignificant difference between listening conditions (solopartial independent partial dependent and open) for ini-tial transient (F(3 49194) = 007p = 079 ) middle sec-tion (F(3 49194) = 162p = 020) and final transient(F(3 49194) = 005p = 083)

VI DISCUSSION

In this study we observed a general stretching of har-monic intervals between vocal parts so that the bass partsang flat and the soprano part sharp relative to the othervocal parts Unlike the piano where stretching of inter-vals is related to the inharmonicity of the partials sungtones are not inharmonic so we are unable to explainthis observation Hagerman and Sundberg (1980) alsoobserved stretched harmonic intervals in an experimentwith barbershop singers giving the explanation that theysound ldquomore active and expressiverdquo

If we compare to the given starting notes the over-all tendency was to sing flat a tendency which increasedover time Pitch drift has been observed in other experi-ments (Devaney and Ellis 2008 Howard 2003 Terasawa2004) and is typically downward in direction althoughupward drift has also been observed

While the averaged note trajectories particularlywhen sorted into categories (Figure 9) show quitesmooth curves the individual pitch trajectories exhibitmuch greater degrees of variation (eg Figure 8 which isnot an extreme example) There is a danger that the fea-tures observed in the average curves might be artefactsof the averaging process and may not occur often if atall in the individual instances For example in Figures2 and 3 we observe a concave shape (a small local min-imum) in the first 5 (respectively 004 seconds) of thenote trajectory If we compare with Figure 7 where thetwo female vocal parts have different initial trajectoriesto the two male vocal parts it is likely that the localminimum arises from averaging the categorically differ-ent shapes of the male and female parts The reason thatthe end of the note trajectory does not exhibit a similarpattern may be due to the greater frequency of Convexand Downward shapes (289 and 368 respectively)which both have a negative final slope across the vocalparts (Table I)

The differences observed in the averaged curves aresmall in magnitude of the same order as the just notice-able difference in pitch (about 5 cents Loeffler (2006))

8 J Acoust Soc Am 9 September 2019

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

Shape Attack Release Soprano Alto Tenor Bass Overall

Convex positive negative 347 378 229 211 289

Upward positive positive 171 133 173 161 160

Downward negative negative 329 379 424 339 368

Concave negative positive 153 109 174 289 184

TABLE I Definition of the four trajectory shapes according to the sign of the slope in the attack and release and their relative

frequencies in each vocal part and in total

Initial Middle Final

Mean -0649 0077 -2167

Median 0003 0038 -1766

Stddev 7109 0725 6400

TABLE II The mean median and standard deviation of the

slope (semitones per second) of the initial transient middle

section and final transient

Many of the sung examples have larger differences whichare reduced by the averaging process but are likely to beperceptible in the original examples A listening test us-ing synthetic stimuli would be required to identify theperceptual relevance of the features of pitch trajectoriesidentified in this paper

The general tendency of notes ending with a negativeslope is observed regardless of whether the next pitch ishigher or lower or which vocal part is considered Al-though there is a simple explanation ie the relaxationof the vocal muscles at the end of a note it is notewor-thy that singers show evidence of preparing for a higherfollowing pitch by commencing a rising inflection whichis then followed by a falling pitch at the end of the notewhich might be thought to negate the preparation Evenin the cases of the Upward and Concave trajectories theoverall increasing slope toward the end of a note finisheswith a few sampling points where the pitch decreases(during the final 3 of the note Figure 9)

A skilled singer is able to coordinate their muscles toachieve synchronised control over multiple vocal param-eters Alongside the pitch changes at the ends of eachnote there are also variations in amplitude associatedwith the start or end of the note which might make someparts of the transient imperceptible (alternatively someaudible parts may be omitted from analysis due to theirlow amplitude) The note segmentation (determinationof note onset and offset times) is based on the defaultsettings of the software Tony which segments the pitchtrack into notes according to changes in pitch and energy(Mauch et al 2015) Different settings and segmentationstrategies may influence the results The coarse segmen-tation was checked during annotation A random sam-ple was checked more closely after results were obtained

This revealed a small fraction of ambiguous cases wherethe final slope is dominated by vibrato and thus couldbe classified as positive or negative depending on theprecise offset time Compared to the thousands of noteswhich have a negative slope at the end the few ambigu-ous cases would not change our results significantly ifthey were to be segmented differently

Although vibrato is a feature of many singing pitchtrajectories we did not explicitly model it in this work(cf Dai and Dixon 2016 Mehrabi et al 2017) Theuse of vibrato is less marked in unaccompanied ensemblesinging where the voice does not need to be projected overinstrumental parts and the stylistic goal is for the voicesto blend rather than stand out For example choral stylefavours minimal vibrato and barbershop style generallyforbids vibrato Thus we did not observe strong vibratoin our data and in the cases where vibrato was presentit tended to be uneven which would make it difficult tomodel

VII CONCLUSIONS

In this paper we present a study of pitch trajectoriesof single notes in multi-part singing According to ouranalysis of over 35000 individual notes we find a generalshape of vocal notes which contains transient componentsat the beginning and end of each note

The analysis is based on both absolute and relativetiming of notes where the initial and final transients areabout 120 ms or 15-20 of note duration The resultssuggest that the adjustment of pitch at the ends of notesis governed by absolute timing ie due to physiologi-cal and psychological factors rather than relative timingwhich might imply a musical motivation The transientcomponents vary according to the individual performerprevious pitch next pitch vocal part and sex

Participants tend to overshoot the target pitch whentransitioning from a lower pitch and raise the pitch to-ward the end of the note if the next pitch is higher Wealso observe a general expansion of harmonic intervalsabout 8 cents pitch difference is observed between adja-cent vocal parts with sopranos singing sharper and malesingers flatter than the target pitch Female and malesingers also differ in their initial transients with femalescommencing with a upward glide that overshoots the tar-get followed by a correction while males begin notes

J Acoust Soc Am 9 September 2019 9

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

with a downward glide Participants with fine pitch ac-curacy tend to have smoother pitch trajectories whileless accurate singers have relatively unstable note trajec-tories

In conclusion the main contribution of this paper isthe observation measurement and analysis of the notetransient parts by characterising their shapes and influ-encing factors Although many further issues remain tobe investigated we hope that the current observationsprovide a better understanding of the singing voice

VIII DATA AVAILABILITY

The code and the data needed to repro-duce our results (note annotations questionnaireresults score information) are available fromhttpscodesoundsoftwareacukprojectsanalysis-of-interactive-intonation-in-unaccompanied-satb-ensemblesrepository

ACKNOWLEDGMENTS

The study was conducted with the approval of theQueen Mary Research Ethics Committee (approval num-ber QMREC1560)

Many thanks to all of the participants who con-tributed to this project We also thank Marcus PearceDaniel Stowell and Christophe Rhodes for their advice onexperimental design Jiajie Dai is supported by a ChinaScholarship Council and Queen Mary Joint PhD Schol-arship

Alldahl P-G (2008) Choral Intonation (Gehrmans StockholmSweden)

Berkowska M and Dalla Bella S (2009) ldquoAcquired and con-genital disorders of sung performance A reviewrdquo Adv CognPsychol 5(-1) 69ndash83 doi 102478v10053-008-0068-2

Boersma P (2002) ldquoPraat a system for doing phonetics by com-puterrdquo Glot Int 5(910) 341ndash345

Bohrer J C S (2002) ldquoIntonational strategies in ensemblesingingrdquo PhD thesis City University London

Brandler B J and Peynircioglu Z F (2015) ldquoA comparisonof the efficacy of individual and collaborative music learning inensemble rehearsalsrdquo J Res Music Educ 63(3) 281ndash297

Brown D (1991) Human Universals (Temple University PressPhiladelphia) pp 1ndash160

Cannam C Landone C Sandler M B and Bello J P (2006)ldquoThe Sonic Visualiser A visualisation platform for semanticdescriptors from musical signalsrdquo in 7th Int Conf Music InfRetr pp 324ndash327

Dai J (2019) ldquoModelling intonation and interaction in vocal en-semblesrdquo PhD thesis Queen Mary University of London pp104ndash115

Dai J and Dixon S (2016) ldquoAnalysis of vocal imitations of pitchtrajectoriesrdquo in 17th Int Soc Music Inf Retr Conf pp 87ndash93

Dai J and Dixon S (2017) ldquoAnalysis of interactive intonationin unaccompanied SATB ensemblesrdquo in 18th Int Soc Music InfRetr Conf pp 599ndash605

Dai J and Dixon S (2019a) ldquoSinging together Pitch accuracyand interaction in unaccompanied unison and duet singingrdquo JAcoust Soc Am 145(2) 663ndash675

Dai J and Dixon S (2019b) ldquoUnderstanding intonation tra-jectories and patterns of vocal notesrdquo in Int Conf Multimedia

Modell Springer pp 243ndash253Dai J Mauch M and Dixon S (2015) ldquoAnalysis of intonation

trajectories in solo singingrdquo in 16th Int Soc Music Inf RetrConf pp 420ndash426

Dalla Bella S Giguere J-F and Peretz I (2007) ldquoSinging pro-ficiency in the general populationrdquo J Acoust Soc Am 121(2)1182ndash1189 doi 10112112427111

de Cheveigne A and Kawahara H (2002) ldquoYIN a fundamentalfrequency estimator for speech and musicrdquo J Acoust Soc Am111(4) 1917ndash1930 doi 10112111458024

Devaney J and Ellis D P (2008) ldquoAn empirical approachto studying intonation tendencies in polyphonic vocal perfor-mancesrdquo J Interdiscipl Music Stud 2(1amp2) 141ndash156

Devaney J Mandel M I and Fujinaga I (2012) ldquoA studyof intonation in three-part singing using the automatic musicperformance analysis and comparison toolkit (AMPACT)rdquo in13th Int Soc Music Inf Retr Conf pp 511ndash516

Dromey C Carter N and Hopkin A (2003) ldquoVibrato rateadjustmentrdquo J Voice 17(2) 168ndash178

Fischer P-M (1993) Die Stimme des Sangers Analyse ihrerFunktion und Leistung-Geschichte und Methodik der Stimmbil-dung (The Voice of the Singer Analysis of Function and Perfor-mance - History and Methodology of Voice Training) (MetzlerStuttgart Germany)

Gerhard D (2005) ldquoPitch track target deviation in naturalsingingrdquo in 6th Int Conf Music Inf Retr pp 514ndash519

Hagerman B and Sundberg J (1980) ldquoFundamental frequencyadjustment in barbershop singingrdquo J Res Singing 4 1ndash17

Howard D M (2003) ldquoA capella SATB quartet in-tune singingEvidence of intonation shiftrdquo in Stockholm Music Acoust ConfVol 2 pp 462ndash466

Howard D M (2007) ldquoIntonation drift in A Capella sopranoalto tenor bass quartet singing with key modulationrdquo J Voice21(3) 300ndash315

Jers H and Ternstrom S (2005) ldquoIntonation analysis of a multi-channel choir recordingrdquo Speech Music Hearing Q Prog StatusRep 47(1) 1ndash6

Kalin G (2005) ldquoFormant frequency adjustment in barbershopquartet singingrdquo Masterrsquos thesis Royal Institute of TechnologyDept of Speech Music and Hearing Stockholm

King J B and Horii Y (1993) ldquoVocal matching of frequencymodulation in synthesized vowelsrdquo J Voice 7(2) 151ndash159

Lindley M (2001) ldquoJust intonationrdquo Grove Music Online editedby L Macy httpwwwgrovemusiccom (accessed 30 January2015)

Loeffler D B (2006) ldquoInstrument timbres and pitch estimation inpolyphonic musicrdquo PhD thesis Georgia Institute of Technology

Mauch M Cannam C Bittner R Fazekas G Salamon JDai J Bello J and Dixon S (2015) ldquoComputer-aided melodynote transcription using the Tony software Accuracy and effi-ciencyrdquo in Proc 1st Int Conf Technol Music Not Representpp 23ndash30

Mauch M and Dixon S (2014) ldquoPYIN A fundamental fre-quency estimator using probabilistic threshold distributionsrdquo inIEEE Int Conf Acoust Speech Signal Process pp 659ndash663

Mauch M Frieler K and Dixon S (2014) ldquoIntonation in un-accompanied singing Accuracy drift and a model of referencepitch memoryrdquo J Acoust Soc Am 136(1) 401ndash411

Mehrabi A Dixon S and Sandler M (2017) ldquoVocal imitationof synthesised sounds varying in pitch loudness and spectral cen-troidrdquo J Acoust Soc Am 143(2) 783ndash796

Mullensiefen D Gingras B Musil J and Stewart L (2014)ldquoThe musicality of non-musicians An index for assessing musi-cal sophistication in the general populationrdquo PLoS ONE 9(2)e89642

Nordmark J and Ternstrom S (1996) ldquoIntonation preferencesfor major thirds with non-beating ensemble soundsrdquo Speech Mu-sic Hearing Q Prog Status Rep 37(1) 57ndash62

Pfordresher P Q and Brown S (2007) ldquoPoor-pitch singing inthe absence of lsquotone deafnessrsquordquo Music Percept 25(2) 95ndash115

Pfordresher P Q Brown S Meier K M Belyk M and LiottiM (2010) ldquoImprecise singing is widespreadrdquo J Acoust SocAm 128(4) 2182ndash2190

Potter J (2000a) ldquoEnsemble singingrdquo in The Cambridge Com-panion to Singing edited by J Potter (Cambridge Univer-

10 J Acoust Soc Am 9 September 2019

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11

sity Press Cambridge UK) pp 158ndash164 doi 101017CCOL9780521622257014

Potter J (2000b) ldquoIntroduction singing at the turn of the cen-turyrdquo in The Cambridge Companion to Singing edited by J Pot-ter (Cambridge University Press Cambridge UK) pp 1ndash5 doi101017CCOL9780521622257001

Prout E (2011) Harmony Its Theory and Practice (CambridgeUniversity Press)

Seashore C E (1914) ldquoThe tonoscoperdquo Psychol Monogr 16(3)1ndash12

Seashore C E (1931) ldquoThe natural history of the vibratordquo ProcNatl Acad Sci 17(12) 623ndash626

Stewart L von Kriegstein K Warren J D and Griffiths T D(2006) ldquoMusic and the brain Disorders of musical listeningrdquoBrain 129(10) 2533ndash2553

Sundberg J (1977) ldquoThe acoustics of the singing voicerdquo Sci Am236(3) 82ndash91

Sundberg J (1987) The Science of the Singing Voice (NorthernIllinois University Press DeKalb IL)

Sundberg J (1995) ldquoAcoustic and psychoacoustic aspects of vocalvibratordquo in Vibrato edited by P Dejonckere M Hirano andJ Sundberg (Singular Publishing Group San Diego CA) pp35ndash62

Sundberg J La F M and Himonides E (2013) ldquoIntona-tion and expressivity A single case study of classical Westernsingingrdquo J Voice 27(3) 391ndashe1

Swannell J (1992) The Oxford Modern English Dictionary (Ox-ford University Press USA) p 560

Takeuchi A H and Hulse S H (1993) ldquoAbsolute pitchrdquo Psy-chol Bull 113(2) 345

Terasawa H (2004) ldquoPitch drift in choral musicrdquo Music 221Afinal paper Center for Computer Research in Music and Acous-tics Stanford University URL httpsccrmastanfordedu

~hirokopitchdriftpaper221ApdfUmbert M Bonada J Goto M Nakano T and Sundberg J

(2015) ldquoExpression control in singing voice synthesis Featuresapproaches evaluation and challengesrdquo IEEE Signal ProcessMag 32(6) 55ndash73

Vurma A and Ross J (2006) ldquoProduction and perception ofmusical intervalsrdquo Music Percept 23(4) 331ndash344

Welch G F Sergeant D C and White P J (1997) ldquoAge sexand vocal task as factors in singing lsquoin tunersquo during the first yearsof schoolingrdquo Bull Counc Res Music Educ 133 153ndash160

Xu Y and Sun X (2000) ldquoHow Fast Can We Really ChangePitch Maximum Speed of Pitch Change Revisitedrdquo in 6th IntConf Spoken Lang Process pp 666ndash669

Zarate J M and Zatorre R J (2008) ldquoExperience-dependentneural substrates involved in vocal pitch regulation duringsingingrdquo Neuroimage 40(4) 1871ndash1887

J Acoust Soc Am 9 September 2019 11